File format:Variant Call Format (VCF) to represent SNPs and indels. VCF must contain the exact sample list defined in PED file.
Phased and unphased data accepted.
PHENOTYPES
File format: Text file containing 1 HPO term per line, as shown in sample file.
Human Phenotype Ontology (HPO): a standardized vocabulary to describe patient disease symptoms (phenotypic abnormalities).
Since HPO is under active development, please refer to the list of HPO terms accepted by Phen-Gen, with a link to every term on the HPO Browser. HPO terms not in this list will be ignored.
PEDIGREE
File format: PED file. The sample IDs should match between the PED and VCF. We accept only 1 family per VCF and PED.
Pedigree file: Describes the relationship between individuals, their sex and disease status.
A space/tab delimited file with 6 mandatory columns (Family ID, Individual ID, Paternal ID, Maternal ID, Sex, and Phenotype).
The phenotype column indicates the disease status of the individual(s). For more details, click here.
Unknown disease status in PED is not permissible; suspected carriers of the damaging allele should be labelled as cases.
RUN PARAMETERS
DISEASE INHERITANCE PATTERN
The disease inheritance pattern can be either dominant or recessive.
Default: recessive.
TYPE OF PREDICTION
Coding predictor: Aims to estimate the direct deleterious impact of the variant on protein function.
Each called variant in the patient's genome (or exome) is evaluated if:
it lies within a reported transcript, or
it falls within the splice site definition of the intron-exon boundary.
Genomic predictor: Aims to estimate the deleterious regulatory impact of a variant.
All coding and non-coding variants are analyzed for their putative functional role.
Default: coding.
DISCARDING DE NOVO MUTATIONS
A de novo mutation is a genetic mutation that is inconsistent with the pedigree structure.
Default: discard de novo mutations.
STRINGENCY
Defines the minimum number of cases that should contain the candidate variant.
Example: In a father-mother-child trio with the child as the only affected individual, the stringency would be set to 1.
Default: total number of affected individuals.
OUTPUT FILES
COMBINED SCORES FILE
This file integrates gene scores from the phenotypic and genotypic analyses. Genes with non-zero damaging probability are ranked in descending order.
The file has 2 columns:
Column 1 indicates gene identifier (gene name, RefSeq, Consensus CDS, UCSC Known Gene, or ENSEMBL).
Column 2 shows the probability (0 < P(D) <= 1) that gene is damaging.
VARIANTS FOR TOP GENES
This file contains all damaging variants for the genes reported in the 'combined scores file'.
The variants are extracted from the input VCF file with following INFO fields added:
DCOD, probability of damaging based on coding predictor.
DREG, probabily of damaging based on genomic predictor.
VART, type of coding variant (i.e. start-loss, stop-gain, stop-loss, splice site, nonsynonymous, synonymous or indel).
GNID, gene identifier (gene name, RefSeq, Consensus CDS, UCSC Known Genes, ENSEMBL) annotation for each variant.
For non-coding variants, the tag "_neighboring" is added to the GNID for the nearest gene within 50 kb.