- File format: Variant Call Format (VCF) to represent SNPs and indels. VCF must contain the exact sample list defined in PED file.
- Phased and unphased data accepted.
- File format: Text file containing 1 HPO term per line, as shown in sample file.
- Human Phenotype Ontology (HPO): a standardized vocabulary to describe patient disease symptoms (phenotypic abnormalities).
- Since HPO is under active development, please refer to the list of HPO terms accepted by Phen-Gen, with a link to every term on the HPO Browser. HPO terms not in this list will be ignored.
- File format: PED file. The sample IDs should match between the PED and VCF. We accept only 1 family per VCF and PED.
- Pedigree file: Describes the relationship between individuals, their sex and disease status.
- A space/tab delimited file with 6 mandatory columns (Family ID, Individual ID, Paternal ID, Maternal ID, Sex, and Phenotype).
- The phenotype column indicates the disease status of the individual(s). For more details, click here.
- Unknown disease status in PED is not permissible; suspected carriers of the damaging allele should be labelled as cases.
DISEASE INHERITANCE PATTERN
- The disease inheritance pattern can be either dominant or recessive.
- Default: recessive.
TYPE OF PREDICTION
- Coding predictor: Aims to estimate the direct deleterious impact of the variant on protein function.
- Each called variant in the patient's genome (or exome) is evaluated if:
- it lies within a reported transcript, or
- it falls within the splice site definition of the intron-exon boundary.
- Genomic predictor: Aims to estimate the deleterious regulatory impact of a variant.
- All coding and non-coding variants are analyzed for their putative functional role.
- Default: coding.
DISCARDING DE NOVO MUTATIONS
- A de novo mutation is a genetic mutation that is inconsistent with the pedigree structure.
- Default: discard de novo mutations.
- Defines the minimum number of cases that should contain the candidate variant.
- Example: In a father-mother-child trio with the child as the only affected individual, the stringency would be set to 1.
- Default: total number of affected individuals.
COMBINED SCORES FILE
- This file integrates gene scores from the phenotypic and genotypic analyses. Genes with non-zero damaging probability are ranked in descending order.
- The file has 2 columns:
- Column 1 indicates gene identifier (gene name, RefSeq, Consensus CDS, UCSC Known Gene, or ENSEMBL).
- Column 2 shows the probability (0 < P(D) <= 1) that gene is damaging.
VARIANTS FOR TOP GENES
- This file contains all damaging variants for the genes reported in the 'combined scores file'.
- The variants are extracted from the input VCF file with following INFO fields added:
- DCOD, probability of damaging based on coding predictor.
- DREG, probabily of damaging based on genomic predictor.
- VART, type of coding variant (i.e. start-loss, stop-gain, stop-loss, splice site, nonsynonymous, synonymous or indel).
- GNID, gene identifier (gene name, RefSeq, Consensus CDS, UCSC Known Genes, ENSEMBL) annotation for each variant.
- For non-coding variants, the tag "_neighboring" is added to the GNID for the nearest gene within 50 kb.