Learning to use

Genome-Phenome Analyzer: Variant table fields

The SimulConsult Genome-Phenome Analyzer imports genomic data from an annotated variant file in plain text format.  The details of the file are specified on this page and the variant file in our tiny genome demo can be examined as a guide.  It is most convenient to examine the file in a spreadsheet program; accordingly it is best to use a file extension of .txt or .tsv.  To facilitate examination in a spreadsheet program, values in the file are tab-separated, allowing the data to be human-readable as columns and “cells”.  Be careful, however, when saving the file because Excel will convert the gene SEPT9 or the zygosity 1/1 to a date.  The free spreadsheet Libre Office Calc is used by many genomics groups because it does not perform as many such conversions as Excel does.  Newline characters are used in the file to separate lines (any of the newline conventions, LF, CR+LF, or CR, are allowed if used consistently the file).

The structure of the variant file is as follows:

Row 1: the first “cell” should be:
fileformat=generic
or some other fileformat assigned specifically to your group.  This is followed by newline character(s) as discussed above.

Row 2: for a trio this has the 43 field columns headers as listed below.  The column headers are tab-separated except for newline character(s) at the end.  All headers must be included, even if blank (e.g., when not using a particular conservation score) or not having an individual in the proband trio (e.g., for a proband only with no parents, the parental columns are blank except for the headers).  If there are any individuals beyond the proband trio, new columns are added after those for the proband’s father.  For example, if the proband, father and proband’s sister are included, there would be 4 zygosity column headers: zygProband (with data in rows below), zygMother (header without data), zygFather (with data in rows below), and zygSister (with data in rows below).  Additionally, for each individual beyond the trio, 3 column headers and the data in rows below would be added (total depth, variant depth and quality).

Rows of rows of variant data follow the headers as row 3 and so on (tab-separated, newline at end).  Many of the fields are not required and can be left blank if you don’t plan to use the related functionality, but even if a field is blank, you need to include the tabs in the variant rows and the headers in row 2.  Examining the sample variant file in our tiny genome demo illustrates how several columns can be blank but headers are used anyway.

Problems? If your data has values that we don’t support, such as values for the “effect” field, let us know, and we can support your values.  In many instances blank, “.” , “-“, -9 and -99 are allowed, and interpreted as not used.  In the fields indicated, “NA” and “na” also count as blanks.

CNV: The variant table can also include large copy number variation (CNV) regions. We will add information here about the format when testing of that capability is ready for publication, but if you are interested in trying the CNV analysis, contact us, and we can give you the details and consider you a research collaborator.

The software uploads and analyzes variant files with ~20,000 variants in ~1 second.  The variant file loading text box on the “Load or save patient” screen of the software will report the results of file reading or problems encountered, for example stated gender not matching chromosomal gender.

Field (column header)Sample valueRequired ActionComments
hgncSymbolGBAyes (all)labelMultiple symbols separated by commas are supported
geneNameLong glucosidase, beta, acid no report  
chrPos
(or specify the genome assembly using HG19:chrPos (the default) or HG38:chrPos)
chr1:155206167 or 1:155206167yes (all)computeThe chromosome number and position are displayed in the gene variants display, hyperlinked to the UCSC genome browser, using the genome assembly indicated in the header using the format shown at left.  The chromosome number is also used in choosing the inheritance model used in the Gene Discovery display.  Also, unusual distributions of variants over the chromosomes are reported in the variant table processing text area.  Entries may start with chr or the number or letter for the chromosome.  For chromosome designations, 23 or X, 24 or Y, 25, M or MT are supported.  Characters past a colon are used to display position information. 
cSeqAnnotationNM_014208:ex5yes for some of the 8 fields beginning with this onelinkText (if desired, can combine this + next 7 sequence annotations in this field, and multiple sequence annotations separated by commas are supported). If a DNA position and change is recognized, it is used to construct a ClinVar query.
cPosition38nolinkIf this and the next 2 fields are included, ClinVar URLs for these variants are displayed instead of generic URLs for the gene
cRefAnolinkIf this and the fields before and after are included, ClinVar URLs for these variants are displayed instead of generic URLs for the gene
cAltGyes, if using zygosities with more than one non-wildtype formlinkIf this and the previous 2 fields are included, ClinVar URLs for these variants are displayed instead of generic URLs for the gene
pSeqAnnotationNP_00100574.1 or p.E1149D or E1149DnolinkIf no DNA position and change is recognized, the protein change is used to construct a ClinVar query.
pPosition13noreport 
pRef K no report  
pAltRnoreport  
rsidrs12345678nocomputeThe percent of variants with rsID numbers is reported in the variant table processing text area.
zygProband (to use identifiers within the software, use terms such as zygPaula here and Paula will be used as the identifier within the software)HetyescomputeZygosity of proband: Accepts non-negative integers from 0-100, or case-insensitive text (het, heterozygous, hap, hom, homozygous, hemi, wt).  Inputs treated as wt: unknown, ., -, none.  Inputs of the form x/x (or the phased equivalents using |) are treated as hom for nonzero x; otherwise wt.   Inputs of the form x/y can contribute to compound hererozygotes, even at the same locus.  If the / or | forms are used, 1 (with no pipe or slash) is interpreted as hap (hemi).
zygMother (to use identifiers within the software, use terms such as zygINDIV_35 here and INDIV_35 will be used as the identifier within the software)50no, if no genomic data is available from the proband’s mother leave this blank but use a headercomputeZygosity of mother, as above
zygFather (to use identifiers within the software, use terms such as zygGeorge here and George will be used as the identifier within the software)0no, if no genomic data is available from the proband’s father leave this blank but use a headercomputeZygosity of father, as above
(for beyond the trio, the next additional zygosity column goes here)
effectmissenseyes (most)computeEffect terms that are recognized are listed at this link, though each group typically uses only a few of these, such as the core terms missense, frameshift, and synonymous.  The listing here is periodically updated to include all SnpEff effect prediction terms, but let us know if we are missing any.  The terms are case-insensitive.  If multiple terms are used (separated by “|” or “&” or “,” or “/”), each effect term is considered and the one with the highest severity is chosen. 
freq1 (to use identifiers within the software, use terms such as freqExAC here and ExAC will be used as the identifier in the mini variant table).0015nocomputeReal numbers between 0-1, use the main frequency metric of your choice (e.g., 1000genomes).  “NA”and “na” are interpreted as 0, in addition to the usual blank characters.
freq2 (to use identifiers within the software, use terms such as freqAfrica here and Africa will be used as the identifier in the mini variant table).02nocomputeReal numbers between 0-1, as above.  On the Set Variant Parameters screen these can be chosen.
homoShares0nocomputeCount of times this particular mutation seen in homozygous form in unaffected individuals (for example, at your lab) (non-negative integers)
heteroShares3nocomputeCount of times this particular mutation seen in heterozygous form in unaffected individuals (for example, at your lab) (non-negative integers)
omimNumber606463noreportSix digit number corresponding to the gene
omimDiseaseNamesGaucher diseasenoreportMultiple diseases can be strung together for display
variantAccessionCM065215noreportVariant accession number
variantPathogenicityDM or 5nocomputeVariant pathogenicity report.  For HGMD varient pathogenicity, DM is treated as severity 5, DM? as 4, and DP as 3.  ClinVar values from 2 to 5 and their verbal equivalents (capitalized or uncapitalized) are treated as follows: 2 or Benign are severity 1, 3 or Likely benign are severity 2, 4 or Likely pathogenic are severity 3, and 5 or Pathogenic are severity 5.  These values override other variant severity determinations, though the other scoring is described in the mini variant table that is displayed for non-benign variants.
polyPhenprobably-damagingnocomputeEither verbal (probably-damaging, possibly-damaging, benign, DP, FP, DFP, D, P, B), or numerical (real number between 0 and 1, with damaging values near 1) but not both.  If two terms are present the first one is used.
mutationTaster0.73nocomputeA, D, N, P (case-insensitive) or real number between 0 and 1, with damaging values near 1. 
sift0.68nocomputeEither verbal (D, T (case-insensitive)) or numerical (real number between 0 and 1, with damaging values near 0 (can be configured for customers to use damaging values near 1)) but not both.  If two terms are present the first one is used.
gerp5.34nocomputeReal number, with higher numbers more damaging.
grantham29nocompute0-215, with higher numbers more damaging.
phat0.82nocomputeReal number between 0 and 1, with higher numbers more damaging. “NA”and “na” are interpreted as 0, in addition to the usual blank characters.
phast0.95nocomputeReal number between 0 and 1, with higher numbers more damaging. “NA”and “na” are interpreted as 0, in addition to the usual blank characters.
phyloP 0.75nocomputeRankscore real number between 0 and 1, with higher numbers more damaging. “NA”and “na” are interpreted as 0, in addition to the usual blank characters.
strandBiasno   
knownSplice no   compute Real number between -1 and 1 reflecting disruption of a known splice site, with values near +1 being more damaging.
totDepthP105nocomputeTotal read depth for proband; non-negative integer. Blank or low values result in exclusion of the variant, but blanks for all variants result in this metric being ignored.
varDepthP74noreportVariant read depth for proband; non-negative integer. 
qualP162nocomputeRead quality score for proband; non-negative integer. Blank or low values result in exclusion of the variant, but blanks for all variants result in this metric being ignored.
totDepthM99noreportTotal read depth for mother; non-negative integer. 
varDepthM99noreportVariant read depth for mother; non-negative integer. 
qualM99noreportRead quality score for mother; non-negative integer. 
totDepthF250noreportTotal read depth for father; non-negative integer. 
varDepthF99noreportVariant read depth for father; non-negative integer. 
qualF99noreportRead quality score for father; non-negative integer. 
(for beyond the trio, the next additional total depth column goes here)
(for beyond the trio, the next additional variant depth column goes here)
(for beyond the trio, the next additional quality column goes here)