DNA Expanded Metrics

DNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.

Metric

Description

Troubleshooting

TOTAL_PF_READS (count)

Total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.

Primarily driven by data output of sequencer, quality of library and balancing of library in library pool. If TOTAL_PF_READS is in line with other samples, but coverage metrics are more may suggest non-specific enrichment.

Low values for all samples indicate a poor quality run with possible low cluster numbers or low numbers of Q30 and PF%.

A low value for an individual sample indicates poor pooling of this library into the final pool.

MEAN_FAMILY_SIZE (count)

A UMI Family is a group of reads that all have the same UMI barcode. The family size is the number of reads in family. MEAN_FAMILY_SIZE is the mean of the entire population of reads assembled into UMI families. In V1 chemistry only the TSO500 manifest is considered while in V2 the TSO500 and HRD manifests are both considered.

The mean UMI family size decreases with increased unique read numbers, and more input DNA leads to more unique reads. Conversely over sequencing of a fixed population of unique DNA molecules leads to increased family size.

As a guide, for a good run with optimal cluster density, passing specs, even sample pooling, and good quality DNA we usually observe values <10.

UMI family size = 1 is not ideal as it is harder to correct for errors.

UMI family size of 2 to 5 enables efficient error correction without wasting sequencing capacity on high percentages of duplicate reads.

MEDIAN_TARGET_COVERAGE (count)

Median depth across all the unique loci occurring in all regions of the manifest file.

Lower median target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output.

PCT_EXON_100X (%)

Percentage of exon bases with 100X fragment coverage. Calculated against all regions in manifest containing _exon in name.

Can be used in combination with other PCT_EXON metrics to understand under or over coverage of exons.

PCT_READ_ENRICHMENT (%)

Percentage of reads that have overlapping sequence with the target regions defined in the sample manifest. In V1 chemistry only the TSO500 manifest is considered while in V2 the TSO500 and HRD manifests are both considered.

Indicative of general enrichment performance. Reduced proportions of enriched reads may indicate issues with the enrichment proportion of the library preparation.

PCT_USABLE_UMI_READS (%)

Percentage of reads that have valid UMI sequences associated with them.

As UMI reads are sequenced at the start of each read, loss of valid UMI sequence may be cause by sequencing issues impacting the quality of base calling in this portion of the sequencing read.

MEAN_TARGET_COVERAGE (count)

Mean depth across all the unique loci defined in the manifest file.

Lower mean target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output. Large differences between the median and mean target coverage values may indicated a skewed distribution of target coverage.

PCT_ALIGNED_READS (%)

Proportion of aligned reads that are non-supplementary, non-secondary and pass QC versus aligned reads that are non-supplementary, non-secondary, mapped and pass QC.

PCT_CONTAMINATION_EST (%)

This metric should only be evaluated if the CONTAMINATION_SCORE metric exceed the USL. This metric estimates the amount of contamination in a sample. The contamination level is computed by taking 2.0* the average of the adjusted allele frequencies of all variants that were selected. The adjusted alllele frequency is either the actual allele frequency of the variant if it is less than 0.5, or 1 -allele frequency if it is greater than or equal to 0.5.

If the sample does not fail the CONTAMINATION_SCORE this metric has no intended meaning as it will be driven by statistical noise (e.g. the few variants that naturally fall outside an expected interval around 0.5 due to random chance)

High contamination estimates may be due to any of the following:

Inter-sample contamination caused by mixing of samples during extraction or library preparation.

Intra-sample contamination, due to mixing of clonally different cell populations during extraction. Large scale genomic rearrangements that cause unexpected VAFs for large numbers of variants.

PCT_TARGET_0.4X_MEAN (%)

Parentage of target (all locations in manifest) reads that have a coverage depth of greater the 0.4x the mean target coverage depth (see definition above).

Provides an indication of uniformity of coverage of the target regions in the manifest file. When trended over time reductions in this metric may indicate an issue with the enrichment process resulting in coverage bias.

PCT_TARGET_50X (%)

Percentage of target bases with 50X fragment coverage. Calculated against all regions in manifest file.

Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.

PCT_TARGET_100X (%)

Percentage of target bases with 100X fragment coverage. Calculated against all regions in manifest file.

Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.

PCT_TARGET_250X (%)

Percentage of target bases with 250X fragment coverage. Calculated against all regions in manifest file.

Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.

PCT_SOFT_CLIPPED_BASES (%)

percentage of based that were not used for alignment but retained as part of the alignment file

Soft clipped reads are used as a part of the downstream analysis for small variants calling. A higher-than-expected number could indicate a low-quality enrichment step.

PCT_Q30_BASES (%)

Average percentage of bases ≥ Q30. A prediction of the probability of an incorrect base call (Q‑score).

An indicator of sequencing run quality, low Q30 across all samples on a run could be the result of run overclustering.

ALLELE DOSAGE_RATIO (HRD samples)

Proprietary Myriad Genetics estimate of b-allele dosage based on b-allele noise/signal ratio. B-Allele noise is correlated with coverage; lower coverage samples will have higher noise. B-allele signal is also correlated with tumor fraction; a higher tumor fraction produces a higher signal for b-allele sites. Samples with lower tumor fraction and higher amount of noise (or lower coverage) will have higher Allele Dosage Ratio.

The upper limit of the score is 50, therefore any sample with 50 Allele Dosage Ratio can be assumed to have tumor fraction close to zero and typically has a GIS = 0.

MEDIAN TARGET HRD (HRD samples)

Median target fragment coverage across all target positions in the HRD manifest. Coverage is the total number of non-duplicate pair alignments that overlap.

PreviousQuality Control NextRNA Expanded Metrics

Last updated 2 months ago

Was this helpful?