TMB
Tumor mutational burden (TMB) is a total number of somatic mutations present within the cancer genome.
To calculate TMB, the algorithm follows the following steps.
Small variant calling
Refer to Small Variants on how small variants are called.
Eligible region detection
TMB is computed over protein coding regions with sufficient coverage, excluding low confidence regions (our blocklist regions.) In case of the DRAGEN TSO 500 ctDNA analysis software, the total coding region with coverage ≥ 1000X is used.
Germline variant identification
To exclude germline variants from TMB calculation, the algorithm includes two methods for predicting germline variant origin.
1. Database filter
Variants with a population allele count ≥ 10 in either the 1000 Genome or gnomAD database are marked as germline and assigned a tag Germline_DB in the “tmb.trace.tsv” and “hard-filtered.vcf” files.
2. Proxi filter
In the TSO 500 ctDNA pipeline, the proxi filter uses a probabilistic approach. For a target variant, it estimates the expected germline allele frequency using the surrounding germline variants. It then tests whether the allele frequency of the target variant is similar to the expected germline allele frequency. If the allele frequency is similar to expected, a tag Germline_Proxi is assigned to the target variant in the “tmb.trace.tsv” and “hard-filtered.vcf” files.
Clonal hematopoiesis (CH) variant identification
Clonal hematopoiesis (CH) is characterized by the overrepresentation of blood cells derived from a single clone. CH is common and increases in prevalence with age. For the accurate determination of TMB, the CH variants need to be excluded.
The TSO 500 ctDNA pipeline uses two methods to tag variants as CH variants.
1. CH genes whitelist
Some of the most commonly mutated genes in clonal hematopoiesis, DNMT3A, TET2, PPM1D, and ASXL1, are included into the CH genes whitelist. If the variant is in one of these genes, a tag Somatic_Putative_CH is assigned to the variant in the “tmb.trace.tsv” and “hard-filtered.vcf” files.
2. cfDNA fragment size analysis
CH-derived cfDNA fragments are generally longer compared to tumor-derived cfDNA, which tends to be shorter. This difference is used to identify CH variants based on the fragment size of reads supporting variant calls. Non-germline variants from the longer fragments are tagged as Somatic_Putative_CH in the “tmb.trace.tsv” and “hard-filtered.vcf” file.
Only variants with sufficient level of supporting reads or variant allele counts (VAC) > 50 are tested for fragment size difference between the reads supporting reference allele and reads supporting the variant allele. Non-germline variants with lower levels of VAC or without enough statistical power for the size difference test will remain tagged as Somatic in the “tmb.trace.tsv” and “hard-filtered.vcf” file.

Tumor driver variant identification
Excluding tumor driver variants helps reduce bias for the bTMB calculations that could be due to targeted enrichment of the panel of genes. Variants with count ≥ 50 in the COSMIC database are treated as tumor driver variants and excluded from the calculation.
Nonsynonymous variant identification
The nonsynonymous variant are defined as described in the DRAGEN user guide. Only nonsynonymous variants are used to calculate Nonsynonymous TMB.
TMB calculation
The TMB is calculated using the following equations:
The eligible variants and effective panel size of the TMB calculation are summarized in the following table:
Eligible variants (numerator)
Variants in the coding region (RefSeq Cds)
Variant frequency ≥ 0.2%
Coverage ≥ 1000X
SNVs and Indels (MNVs excluded)
Nonsynonymous and synonymous variants. Only nonsynonymous variants are used for Nonsynonymous TMB.
Variants with count ≥ 50 in the COSMIC database are excluded
Mutations in ASXL1, DNMT3A, PPM1D, and TET2 are excluded
Fragment-size based potential clonal hematopoiesis (CH) variants are excluded
Effective panel size (denominator)
Total coding region with coverage ≥ 1000X.
TMB Output Files
The TMB algorithm outputs results in several files:
Combined Variant Output File,
{SampleID}_CombinedVariantOutput.tsv
TMB Metrics CSV file,
{Sample_ID}.tmb.metrics.csv
TMB Trace TSV file,
{Sample_ID}.tmb.trace.tsv
TMB Max Somatic VAF file,
{Sample_ID}.tmb.msaf.csv
1. Combined Variant Output File
File name: {SampleID}_CombinedVariantOutput.tsv
The TMS results are output in the section [TMB] and include:
The TMB value
Coding Region Size in Megabases (a denominator for the TMB formula)
Number of Passing Eligible Variants (a numerator for the TMB formula)
2. TMB Metrics CSV
File name: {Sample_ID}.tmb.metrics.csv
The TMB metrics file contains the TMB and Nonsynonimous TMB calculation results and values used to calculated them for each DNA sample.
Total Input Variant Count
Total number of variant considered by the algorithm
Total Input Variant Count in TMB region
Total number of variant considered by the algorithm in the TMB eligible region
Filtered Variant Count
Variants remaining after filtering, see TMB algorithm page for details
Filtered Nonsyn Variant Count
Nonsynonymous variants remaining after filtering, see TMB algorithm page for details
Eligible Region (MB)
The eligible region, in megabases, that meet the minimum coverage threshold.
TMB
TMB value for the sample
Nonsyn TMB
Nonsynonymous TMB value for the sample
3. TMB Trace File
The TMB trace file provides comprehensive information on how the TMB value is calculated for a given sample. All passing small variants from the small variant filtering step are included in this file. To view eligible variants for TMB calculation, set the filter for the column IncludedInTMBNumerator to TRUE.
Variant statuses (somatic, germline, clonal hematopoiesis (CH) variant) are predictions intended for TMB calculation. Use caution if using them separately as their performance has not been tested outside of the TMB algorithm.
Chromosome
Chromosome
Position
Position of variant
RefCall
Reference base
AltCall
Alternate base
VAF
Variant allele frequency
Depth
Coverage of position
CytoBand
Cytoband of variant
GeneName
Name of gene if applicable. A semicolon delimited list is used for multiple genes.
VariantType
Type of the variant: SNV, insertion, deletion, MNV
CosmicIDs
Cosmic IDs, if multiple concatenated by “;”
MaxCosmicCount
Maximum COSMIC study count
ClinVarIDs
Reference ClinVar Variation IDs (RCV IDs)
ClinVarSignificance
Variant Classification in ClinVar database
AlleleCountsGnomadExome
Variant allele count in gnomAD exome database
AlleleCountsGnomadGenome
Variant allele count in gnomAD genome database
AlleleCounts1000Genomes
Variant allele count in 1000 Genomes database
MaxDatabaseAlleleCounts
Maximum variant allele count over the three databases
GermlineFilterDatabase
TRUE if variant was filtered by the database filter
GermlineFilterProxi
TRUE if variant was filtered by the proxi filter
CodingVariant
TRUE if variant is in the coding region
Nonsynonymous
TRUE if variant has any transcript annotations with nonsynonymous consequences
IncludedinTMBNumerator
TRUE if variant is used in the TMB calculation
Status
Germline_DB or Germline_Proxi if the variant was filtered by the Database or the Proxi filter, correspondingly. Somatic_Putative_CH if the variant was predicted to be associated with clonal hematopoiesis (CH). Somatic - variants not determined to be germline or CH.
ProteinChange
p.HGVS
CDSChange
c.HGVS
Exons
Exon, where the variant is located
Consequence
Variant consequence
4. TMB Max Somatic VAF file
The file outputs a variant with the Max Somatic VAF, using the same file format as the TMB Trace File.
Last updated
Was this helpful?