Small Variants
Small Variant Calling and Filtering
The DRAGEN TSO 500 ctDNA Analysis Software supports calling SNVs, indels, MNVs, and delins from cfDNA samples by using mapped and aligned DNA reads from a plasma sample as input.
Variants are detected via both column wise pileup analysis and local de novo assembly of haplotypes. The de novo haplotypes allow the detection of much larger insertions and deletions than possible through column wise pileup analysis only. Insertions and deletions called by the TSO 500 ctDNA analysis software do not have a size limitation but has different level of performance testing depending on the length, see Performance Testing page for more details.
To call variants via local de novo assembly of haplotypes in active regions, haplotypes are first generated with de Bruijn graph. The likelihood of a read supporting a haplotype is calculated using a Paired Hidden Markov Model. Somatic Score (SQ) is calculated as the joint posterior probability that a variant is present in the sample. For each variant candidate, background noise at the same site is taken into account using a systematic noise file. A p-value is calculated using the observed variant depth, total depth, and the systematic noise using binomial distribution and then converted to a variant Quality Score (AQ).
Variants are called if SQ >= 2 and AQ >= 20 for variants present in Catalogue of Somatic Mutations in Cancer (COSMIC) with count > 50 (hotspots) or if SQ >= 2 and AQ >= 60 for remaining sites (nonhotspots).
In addition, DRAGEN uses the de novo assembly to detect SNVs, insertions, and deletions that are co-phased and part of the same haplotypes. Any such co-phased variants that are within a window of 15 bp are then reassembled into complex variants (MNVs and delins).
The pipeline makes no ploidy assumptions, enabling detection of low-frequency alleles.
DRAGEN small variant calling includes the following steps:
Detects regions with sufficient read coverage (callable regions).
Detects regions where the reads deviate from the reference and there is a possibility of a germline or somatic call (active regions).
Assembles de novograph haplotypes are assembled from reads (haplotype assembly).
Extracts possible somatic or germline calls (events) from column wise pileup analysis.
Calibrates read base qualities to account for sample-specific noise.
Computes read likelihoods for each read/ haplotype pair.
Performs variant calling by summing the genotype probabilities across all reads/haplotype pairs.
Performs additional filtering to improve variant calling accuracy (see Filter Status).
Systematic Noise File
The DRAGEN TSO 500 ctDNA Analysis Software uses a systematic noise file to improve variant calling accuracy. The file indicates the statistical probability of noise at specific positions in the genome. Illumina has constructed the noise file using 60 normal cfDNA libraries. Regions where noise is common (eg, difficult to map regions) have higher noise values. The small variant caller penalizes those regions to reduce the probability of making false positive calls.

Germline, Somatic and Clonal Hematopoiesis (CH) tagging
The Tumor Mutational Burden (TMB) module of DRAGEN TSO 500 ctDNA Analysis Software, predicts whether a small variant is of germline or somatic origin as well as whether the variant is associated with Clonal Hematopoiesis (CH). The results are output in the TMB Trace TSV and Small Variant VCF files.
Please review the TMB algorithm page for more details.
Variant statuses (somatic, germline, clonal hematopoiesis (CH) variant) are predictions intended for TMB calculation. Use caution if using them separately as their performance has not been tested outside of the TMB algorithm.
Outputs
The DRAGEN TSO 500 ctDNA Analysis Software produces several files with small variant calling results, including:
Combined Variant Output File,
{SampleID}_CombinedVariantOutput.tsv
Small Variant VCF
{SampleID}_hard-filtered.vcf
Small Variant Genome VCF
{SAMPLE_ID}_hard-filtered.gvcf.gz
Small Variant Annotated JSON
{SAMPLE_ID}
_SmallVariants_Annotated.json.gz
Combined Variant Output File
File name: {SampleID}_CombinedVariantOutput.tsv
All variants with the FILTER field marked as PASS in the Small Variant Genome VCF are present in the Combined variant Output.
Gene information is only present for variants belonging to canonical transcripts that are within the Gene Allow List–Small Variants.
Transcript information is only present for variants belonging to canonical transcripts that are within the Gene Allow List–Small Variants.
Combined variant output produces small variants with blank fields in the following situations:
The variant has been matched to a canonical RefSeq transcript on an overlapping gene not targeted by TruSight Oncology 500 ctDNA.
The variant is located in a region designated iSNP, indel, or Flanking in the
TST500_Manifest.bed
file located in the Resources folder.
Small Variant VCF
File name: {SampleID}_hard-filtered.vcf
The Small Variant VCF file outputs all small variant calling results.
MNVs and Phased Variants
The small variant file contains both phased variants and all other small variants. The header sections from both the phased variant (complex) VCF and the small variant VCF are included in this merged VCF. Variants that are found for both phased variants and small variants are only displayed as phased variants.
Germline Status
The Small Variant VCF file contains predicted germline, somatic, and clonal hematopiesis (CH) variants that can be further filtered down using GermlineStatus in the INFO field. See this section for more details.
Filter Status
Variants can be filtered down using different tags assigned in the field FILTER as described in the table below.
.
PASS
WT.
., A, C, G, etc1
low_depth
Reference positions and non-passing variants with coverage below 1000X. For variant calls, low_depth is not applied when a position has a PASS filter.
A, C, G, etc1
PASS
PASS variants.
A, C, G, etc1
weak_evidence
Filtered variant candidate with low SQ score (< 2).
A, C, G, etc1
excluded_regions2
Position with high background noise. Not available for variant detection.
A, C, G, etc1
systematic_noise
Filtered variant candidate with low AQ score (< 20 for hotspots, < 60 for nonhotspots).
A, C, G, etc1
mapping_quality
Filtered variant candidate with low median mapping quality (< 30).
A, C, G, etc1
read_position
Filtered variant candidate showed bias clustered at fragment ends.
A, C, G, etc1
multiallelic
Filtered if there are two or more ALT alleles at this location.
A, C, G, etc1
low_frac_info_reads
Filtered if the fraction of informative reads is low (< 0.5).
1 Etc refers to other variant types not mentioned in the table.
2 This is a static list of regions compiled by Illumina. Email Illumina Technical Support for more information.
Small Variant Genome VCF
File name: {SAMPLE_ID}_hard-filtered.gvcf.gz
The small variant genome VCF file includes the variant call status for all positions in all targeted intervals.
Small Variant Annotated JSON
File name: {SAMPLE_ID}
_SmallVariants_Annotated.json.gz
The small variants annotated file provides variant annotation information for all non-reference positions in the VCF, which includes non-pass variants. The variant consequence definition is available on the Sequence Ontology website.
All pass variant calls are annotated using the Illumina Annotation Engine (IAE), also known as Nirvana, with the following information (using the RefSeq transcript):
HGNC Gene
Transcript
Exon
Consequence
c.HGVS
p.HGVS
COSMIC
Last updated
Was this helpful?