Phen-Gen

Instructions

INPUT FILES


VARIANTS

  • File format: Variant Call Format (VCF) to represent SNPs and indels. VCF must contain the exact sample list defined in PED file.
  • Phased and unphased data accepted.

PHENOTYPES

  • File format: Text file containing 1 HPO term per line, as shown in sample file.
  • Human Phenotype Ontology (HPO): a standardized vocabulary to describe patient disease symptoms (phenotypic abnormalities).
  • Since HPO is under active development, please refer to the list of HPO terms accepted by Phen-Gen, with a link to every term on the HPO Browser. HPO terms not in this list will be ignored.

PEDIGREE

  • File format: PED file. The sample IDs should match between the PED and VCF. We accept only 1 family per VCF and PED.
  • Pedigree file: Describes the relationship between individuals, their sex and disease status.
    • A space/tab delimited file with 6 mandatory columns (Family ID, Individual ID, Paternal ID, Maternal ID, Sex, and Phenotype).
    • The phenotype column indicates the disease status of the individual(s). For more details, click here.
  • Unknown disease status in PED is not permissible; suspected carriers of the damaging allele should be labelled as cases.


RUN PARAMETERS


DISEASE INHERITANCE PATTERN

  • The disease inheritance pattern can be either dominant or recessive.
  • Default: recessive.

TYPE OF PREDICTION

  • Coding predictor: Aims to estimate the direct deleterious impact of the variant on protein function.
    • Each called variant in the patient's genome (or exome) is evaluated if:
      • it lies within a reported transcript, or
      • it falls within the splice site definition of the intron-exon boundary.
  • Genomic predictor: Aims to estimate the deleterious regulatory impact of a variant.
    • All coding and non-coding variants are analyzed for their putative functional role.
  • Default: coding.

DISCARDING DE NOVO MUTATIONS

  • A de novo mutation is a genetic mutation that is inconsistent with the pedigree structure.
  • Default: discard de novo mutations.

STRINGENCY

  • Defines the minimum number of cases that should contain the candidate variant.
  • Example: In a father-mother-child trio with the child as the only affected individual, the stringency would be set to 1.
  • Default: total number of affected individuals.


OUTPUT FILES


COMBINED SCORES FILE

  • This file integrates gene scores from the phenotypic and genotypic analyses. Genes with non-zero damaging probability are ranked in descending order.
  • The file has 2 columns:
    • Column 1 indicates gene identifier (gene name, RefSeq, Consensus CDS, UCSC Known Gene, or ENSEMBL).
    • Column 2 shows the probability (0 < P(D) <= 1) that gene is damaging.

VARIANTS FOR TOP GENES

  • This file contains all damaging variants for the genes reported in the 'combined scores file'.
  • The variants are extracted from the input VCF file with following INFO fields added:
    • DCOD, probability of damaging based on coding predictor.
    • DREG, probabily of damaging based on genomic predictor.
    • VART, type of coding variant (i.e. start-loss, stop-gain, stop-loss, splice site, nonsynonymous, synonymous or indel).
    • GNID, gene identifier (gene name, RefSeq, Consensus CDS, UCSC Known Genes, ENSEMBL) annotation for each variant.
  • For non-coding variants, the tag "_neighboring" is added to the GNID for the nearest gene within 50 kb.