Mapping and Alignment
DNA Alignment and Read Collapsing
The alignment step uses DRAGEN Aligner with UMI collapsing to align DNA sequences in FASTQ files to the hg19_decoy genome. This step combines sets of reads (ie, families) that are grouped together based on genomic locations and UMI tags into representative sequences. This process accurately removes duplicate reads and sequencing errors without losing the signal of very low frequency (< 1%) sequence variations.
This alignment step generates BAM files (*.bam) and BAM index files (*.bam.bai) that are saved to the alignment folder. A BAM file is the compressed binary version of a SAM file that is used to represent aligned sequences.
Read collapsing adds the following BAM tags:
RX/XU: UMI combination. RX is duplicated from XU to satisfy the BAM/SAM format
XV: Number of reads in the family on one strand.
XW: Number of reads in the duplex-family or 0 if not a duplex family.
Indel Realignment and Read Stitching
The Gemini software component performs local indel realignment, paired‑read stitching, and read filtering. A stitched read is a single read that has been combined from a pair of reads. Reads near detected indels are realigned to remove alignment artifacts. The input is a single BAM file and the reference genome FASTA used to align it; the output is a corresponding single BAM file with stitched, pair‑realigned reads. Read pairs with poor map quality or supplementary and secondary alignments from the input BAM are ignored.
The following BAM tags are added to the stitched reads:
XD—Directional support string indicating forward, reverse, and stitched positions.
XR—Read pair orientation, which can be either forward-reverse (FR) or reverse-forward (RF).
Last updated
Was this helpful?