The SimulConsult Genome-Phenome Analyzer imports genomic data from an annotated variant file in plain text format. The details of the file are specified on this page and the variant file in our tiny demo (choose trio or beyond trio version) can be examined as a guide. It is most convenient to examine the file in a spreadsheet program; accordingly it is best to use a file extension of .txt or .tsv. To facilitate examination in a spreadsheet program, values in the file are tab-separated, allowing the data to be human-readable as columns and “cells”. Be careful, however, when saving the file because Excel will convert the (now deprecated) HGNC gene symbol SEPT9 or the zygosity 1/1 to a date. The free spreadsheet Libre Office Calc is used by many genomics groups because it does not perform as many such conversions as Excel does. Newline characters are used in the file to separate lines (any of the newline conventions, LF, CR+LF, or CR, are allowed if used consistently the file).
The structure of the variant file is as follows:
Row 1: the first “cell” should be:
fileformat=generic
or some other fileformat assigned specifically to your group. This is followed by newline character(s) as discussed above.
Row 2: for a trio this has the 43 field columns headers as listed below. The column headers are tab-separated except for newline character(s) at the end. All headers must be included, even if blank (e.g., when not using a particular conservation score) or not having an individual in the proband trio (e.g., for a proband only with no parents, the parental columns are blank except for the headers). If there are any individuals beyond the proband trio, new columns are added after those for the proband’s father. For example, if the proband, father and proband’s sister are included, there would be 4 zygosity column headers: zygProband (with data in rows below), zygMother (header without data), zygFather (with data in rows below), and zygSister (with data in rows below). Additionally, for each individual beyond the trio, 3 column headers and the data in rows below would be added (total depth, variant depth and quality).
Rows of rows of variant data follow the headers as row 3 and so on (tab-separated, newline at end). Many of the fields are not required and can be left blank if you don’t plan to use the related functionality, but even if a field is blank, you need to include the tabs in the variant rows and the headers in row 2. Examining the sample variant file in our tiny demo (choose trio or beyond trio version) illustrates how several columns can be blank but headers are used anyway.
CNV: The variant table can also include large copy number variation (CNV) regions. Three of the fields below are different: hgncSymbol has a comma-separated list of the genes in the region (e.g., GOLGA6A,RN7SL429P), chrPos has a chromosomal range (e.g., 15:74363000-74365200), and effect has DEL or DUP.
Problems? If your data has values that we don’t support, such as values for the “effect” field, let us know, and we can support your values. In many instances blank, “.” , “-“, -9 and -99 are allowed, and interpreted as not used. In the fields indicated, “NA” and “na” also count as blanks.
The software uploads and analyzes variant files with ~20,000 variants in ~1 second. The variant file loading text box on the “Load or save patient” screen of the software will report the results of file reading or problems encountered, for example stated gender not matching chromosomal gender.
Field (column header) | Sample value | Required | Action | Comments | |
---|---|---|---|---|---|
hgncSymbol | GBA or GOLGA6A,RN7SL429P | yes (all) | label | Multiple symbols separated by commas are supported. For CNV analysis, this field should contain all gene symbols in the CNV. HGNC symbols for genes with known human phenotype that have been replaced since 1 January 2010 are recognized and tied to the current HGNC symbol. | |
geneNameLong | glucosidase, beta, acid | no | report | ||
chrPos (or specify the genome assembly using HG19:chrPos (the default) or HG38:chrPos) | chr1:155206167 or 1:155206167, or for CNV, 1:155200000-1:155444444 | yes (all) | compute | The chromosome number and position are displayed in the gene variants display, hyperlinked to the UCSC genome browser, using the genome assembly indicated in the header using the format shown at left. The chromosome number is also used in choosing the inheritance model used in the Gene Discovery display. Also, unusual distributions of variants over the chromosomes are reported in the variant table processing text area. Entries may start with chr or the number or letter for the chromosome. For chromosome designations, 23 or X, 24 or Y, 25, M or MT are supported. Characters past a colon are used to display position information. | |
cSeqAnnotation | NM_014208:ex5 | some of the 8 sequence columns should be used | link | Text (if desired, can combine this + next 7 sequence annotations in this field, and multiple sequence annotations separated by commas are supported). If a DNA position and change is recognized, it is used to construct a ClinVar query. | |
cPosition | 38 | some of the 8 sequence columns should be used | link | If this and the next 2 fields are included, ClinVar URLs for these variants are displayed instead of generic URLs for the gene | |
cRef | A | some of the 8 sequence columns should be used | link | If this and the fields before and after are included, ClinVar URLs for these variants are displayed instead of generic URLs for the gene | |
cAlt | G | some of the 8 sequence columns should be used | link | If this and the previous 2 fields are included, ClinVar URLs for these variants are displayed instead of generic URLs for the gene | |
pSeqAnnotation | NP_00100574.1 or p.E1149D or E1149D | some of the 8 sequence columns should be used | link | If no DNA position and change is recognized, the protein change is used to construct a ClinVar query. | |
pPosition | 13 | some of the 8 sequence columns should be used | report | ||
pRef | K | some of the 8 sequence columns should be used | report | ||
pAlt | R | some of the 8 sequence columns should be used | report | ||
rsid | rs12345678 | no | compute | The percent of variants with rsID numbers is reported in the variant table processing text area. | |
zygProband (to use identifiers within the software, use terms such as zygPaula here and Paula will be used as the identifier within the software) | Het | yes | compute | Zygosity of proband: Accepts non-negative integers from 0-100, or case-insensitive text (het, heterozygous, hap, hom, homozygous, hemi, hemizygous, wt). Other inputs treated as wt and displayed as wt: -, ., and none. Other inputs treated as wt but displayed as “Unk” in the mini variant tables in the software: blank (empty string), unknown, unk, -9, -99, na and n/a. Inputs of the form x/x (or the phased equivalents using |) are treated as hom for nonzero x; otherwise wt. Inputs of the form x/y can contribute to compound heterozygotes, even at the same locus. If the / or | forms are used, 1 (with no pipe or slash) is interpreted as hap (hemi). | |
zygMother (to use identifiers within the software, use terms such as zygINDIV_35 here and INDIV_35 will be used as the identifier within the software) | 50 | no, if no genomic data is available from the proband’s mother leave this blank but use a header | compute | Zygosity of mother, as above | |
zygFather (to use identifiers within the software, use terms such as zygGeorge here and George will be used as the identifier within the software) | 0 | no, if no genomic data is available from the proband’s father leave this blank but use a header | compute | Zygosity of father, as above | |
(for beyond the trio, the next additional zygosity column goes here) | |||||
effect | missense, or for CNV, DEL or DUP | yes (most) | compute | Effect terms that are recognized are listed at this link, though each group typically uses only a few of these, such as the core terms missense, frameshift, and synonymous. The listing here is periodically updated to include all SnpEff effect prediction terms, but let us know if we are missing any. The terms are case-insensitive. If multiple terms are used (separated by “|” or “&” or “,” or “/”), each effect term is considered and the one with the highest severity is chosen. | |
freq1 (to use identifiers within the software, use terms such as freqExAC here and ExAC will be used as the identifier in the mini variant table) | .0015 | yes | compute | Real numbers between 0-1. “NA”and “na” are interpreted as 0, in addition to the usual blank characters. | |
freq2 (to use identifiers within the software, use terms such as freqAfrica here and Africa will be used as the identifier in the mini variant table) | .02 | no | compute | Real numbers between 0-1. “NA”and “na” are interpreted as 0, in addition to the usual blank characters. | |
homoShares | 0 | no | compute | Count of times this particular mutation seen in homozygous form in unaffected individuals (for example, at your lab) (non-negative integers) | |
heteroShares | 3 | no | compute | Count of times this particular mutation seen in heterozygous form in unaffected individuals (for example, at your lab) (non-negative integers) | |
omimNumber | 606463 | no | report | Six digit number corresponding to the gene | |
omimDiseaseNames | Gaucher disease | no | report | Multiple diseases can be strung together for display | |
variantAccession | CM065215 | no | report | Variant accession number | |
variantPathogenicity | DM or 5 | no | compute | Variant pathogenicity report. For HGMD varient pathogenicity, DM is treated as severity 5, DM? as 4, and DP as 3. ClinVar values from 2 to 5 and their verbal equivalents (capitalized or uncapitalized) are treated as follows: 2 or Benign are severity 1, 3 or Likely benign are severity 2, 4 or Likely pathogenic are severity 3, and 5 or Pathogenic are severity 5. These values override other variant severity determinations, though the other scoring is described in the mini variant table that is displayed for non-benign variants. | |
polyPhen | probably-damaging | no | compute | Either verbal (probably-damaging, possibly-damaging, benign, DP, FP, DFP, D, P, B), or numerical (real number between 0 and 1, with damaging values near 1) but not both. If two terms are present the first one is used. | |
mutationTaster | 0.73 | no | compute | A, D, N, P (case-insensitive) or real number between 0 and 1, with damaging values near 1. | |
sift | 0.68 | no | compute | Either verbal (D, T (case-insensitive)) or numerical (real number between 0 and 1, with damaging values near 0 (can be configured for customers to use damaging values near 1)) but not both. If two terms are present the first one is used. | |
gerp | 5.34 | no | compute | Real number, with higher numbers more damaging. | |
grantham | 29 | no | compute | 0-215, with higher numbers more damaging. | |
phat | 0.82 | no | compute | Real number between 0 and 1, with higher numbers more damaging. “NA”and “na” are interpreted as 0, in addition to the usual blank characters. | |
phast | 0.95 | no | compute | Real number between 0 and 1, with higher numbers more damaging. “NA”and “na” are interpreted as 0, in addition to the usual blank characters. | |
phyloP | 0.75 | no | compute | Rankscore real number between 0 and 1, with higher numbers more damaging. “NA”and “na” are interpreted as 0, in addition to the usual blank characters. | |
strandBias | no | ||||
knownSplice | no | compute | Real number between -1 and 1 reflecting disruption of a known splice site, with values near +1 being more damaging. | ||
totDepthProband | 105 | no | compute | Total read depth for proband; non-negative integer. Blank or low values result in exclusion of the variant, but blanks for all variants result in this metric being ignored. | |
varDepthProband | 74 | no | report | Variant read depth for proband; non-negative integer. | |
qualProband | 162 | no | compute | Read quality score for proband; non-negative integer. Blank or low values result in exclusion of the variant, but blanks for all variants result in this metric being ignored. | |
totDepthMother | 99 | no | report | Total read depth for mother; non-negative integer. | |
varDepthMother | 99 | no | report | Variant read depth for mother; non-negative integer. | |
qualMother | 99 | no | report | Read quality score for mother; non-negative integer. | |
totDepthFather | 250 | no | report | Total read depth for father; non-negative integer. | |
varDepthFather | 99 | no | report | Variant read depth for father; non-negative integer. | |
qualFather | 99 | no | report | Read quality score for father; non-negative integer. | |
(for beyond the trio, the next additional total depth column goes here) | |||||
(for beyond the trio, the next additional variant depth column goes here) | |||||
(for beyond the trio, the next additional quality column goes here) |