Skip to main content
Menu
getting the most blog hero

Blog

14 min read

Getting the most from ATAC-seq: key QC metrics and the mitochondrial DNA problem.

Help us improve your Revvity blog experience!

Feedback

Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) has become the default way to look at open chromatin. It lets you ask where transcription factors can bind, which enhancers are active, and how chromatin changes across conditions or cell types, often starting from very little material. For many groups “doing ATAC” is now as routine as running RNA-seq1.

In a surprising number of experiments, though, a large slice of the sequencing reads is spent on mitochondrial DNA (mtDNA). mtDNA fragments can easily account for 20–50% or more of all reads in a library-5. Most analyses discard them, so they consume sequencing budget without adding information and can drag down key ATAC-seq quality metrics5.

If you are setting up ATAC-seq for the first time, mitochondrial QC plots can feel like just one more knob to worry about. If you run ATAC-seq every week, you may have simply accepted that losing a chunk of every lane to mtDNA is the price of doing business. In the sections that follow, we describe how ATAC-seq works, what a good library looks like, why mitochondrial reads loom so large in many datasets, and which levers you can realistically pull, from sample handling to Cas9-based depletion, to reclaim those reads2-5.

ATAC-seq in a nutshell

ATAC-seq asks a simple question: where in the genome is DNA physically accessible inside nuclei1. The method uses a hyperactive Tn5 transposase that has been pre-loaded with sequencing adapters. When you add this transposase to intact nuclei, it inserts adapters wherever it can bind and cut DNA. In open chromatin, nucleosomes are depleted or mobile, leaving the DNA backbone relatively exposed. In closed chromatin, nucleosomes are tightly packed, blocking access.

Because the reaction is performed in situ on nuclei rather than on purified genomic DNA, the transposase is constrained by the native chromatin structure at the moment of lysis. Open regions such as active promoters and enhancers accumulate many insertions, while heterochromatin is sampled much less. This simple physical bias toward accessible DNA is the entire basis of ATAC-seq.

The ATAC-seq workflow is correspondingly compact. Cells or nuclei are prepared under relatively gentle conditions, incubated with the transposase complex for a short period, and then purified DNA is PCR amplified to add full sequencing adapters and indexes. The resulting fragments are sequenced, aligned to the genome, and used to call “peaks” that mark accessible regions. The fragment size distribution carries additional information about nucleosomes: very short fragments (roughly 50 to 150 bp) come from nucleosome-free regions, while peaks at mono, di, and tri nucleosome sizes reflect linker spacing around positioned nucleosomes1.

Over time, ATAC-seq has expanded into several variants. Bulk ATAC-seq is run on millions of cells and provides an average view of chromatin accessibility. Nuclei optimized protocols, sometimes referred to as Omni-style approaches, refine lysis and detergent conditions so that the same basic chemistry works better on primary cells and tissues, with reduced background and improved signal to noise2. Single-cell ATAC-seq encapsulates individual nuclei, tags each with a barcode, and reads out accessibility profiles per cell. Multiome formats layer ATAC-seq with transcriptomics or other readouts in the same cell.

What makes a “good” ATAC-seq library

Because ATAC-seq reads reflect both chromatin biology and the mechanics of the assay, several quality metrics have become standard. These metrics are useful whether you are setting up ATAC-seq for the first time or trying to diagnose a subtle problem in a mature pipeline5.

  1. Fraction of reads in peaks (FRiP)

Fraction of reads in peaks, commonly abbreviated FRiP, is the proportion of mapped reads that fall inside called peaks rather than in the “background” genome. Conceptually, it asks what fraction of your sequencing is being spent on reproducible, high confidence accessible regions.

In practice, FRiP is calculated after alignment and peak calling by dividing the number of reads (or read pairs) overlapping the peak set by the total number of mapped reads. Higher values indicate a better signal to noise ratio. Tutorials and pipelines often consider values above about 0.2 as acceptable for bulk ATAC-seq, with 0.3 or higher reflecting particularly clean data, although the exact thresholds depend on cell type, depth, and analysis choices2-5.

Mitochondrial reads have a direct and simple effect on FRiP. Peaks are called on the nuclear genome, so reads that align to the mitochondrial chromosome never fall inside those peaks. When the mitochondrial fraction is high, the denominator of the FRiP calculation grows while the numerator does not, and the score drops even if the nuclear signal itself is decent3-5.

  1. TSS enrichment

This parameter measures how strongly ATAC-seq signal is concentrated at transcription start sites (TSS) compared with neighboring genomic regions. To compute it, signal is aggregated over windows centered on known TSS positions across the genome. High quality libraries show a tall, narrow spike of accessibility exactly at the TSS, with lower signal in the flanking regions and visible nucleosome peaks on either side2-5.

This metric reports on both signal to noise and on proper preservation of chromatin structure. When nuclei are well prepared and the transposase is behaving as intended, open promoters are strongly enriched and the TSS profile is sharp. Poorly prepared libraries, or libraries dominated by random fragments and background, show a flatter profile with a relatively modest increase at the TSS. Single-cell pipelines often compute TSS enrichment per cell and use it as part of their quality control filters.

Mitochondrial fragments do not overlap nuclear TSS positions, so they contribute only to the background component of this calculation. A library where one third or one half of reads are mitochondrial will inevitably have fewer nuclear reads in the TSS windows, which flattens the enrichment profile and may push the TSS score below commonly used cutoffs even when the underlying promoter signal is strong3-5.

  1. Fragment length distribution and nucleosome pattern

The fragment size distribution provides an intuitive view of how well the assay captured nucleosome structure. In a typical ATAC-seq library, one expects a prominent peak of short fragments that correspond to nucleosome free regions, followed by peaks near the mono, di, and tri nucleosome sizes as the enzyme cuts in the linker DNA between wrapped nucleosomes1-2.

A clear ladder of these peaks is a signature of intact chromatin and precise tagmentation. Smeared distributions, loss of the nucleosome free peak, or absence of the mono nucleosome peak can indicate nuclei damage, over digestion, random shearing, or other technical issues. Experienced analysts often glance at the fragment histogram before any other metric.

Mitochondrial fragments complicate this picture in two ways. First, mtDNA tends to produce an abundance of relatively short fragments, which can inflate the smallest size bins and dilute the apparent contribution from genuine nucleosome free regions. Second, when mitochondrial reads dominate the library, there may simply not be enough nuclear fragments left to reveal a clean nucleosome ladder3-5.

  1. Mitochondrial read fraction

Finally, the mitochondrial read fraction itself has become a de facto QC metric. It is usually defined as the fraction of mapped reads aligning to the mitochondrial genome.

Real datasets show a wide range. The PEPATAC pipeline paper notes that ATAC-seq libraries can have mitochondrial fractions between 15 and 50 percent in typical experiments and up to roughly 95 percent in extreme cases5. Other studies report that particular tissues or lysis conditions yield libraries where most reads are of mitochondrial origin2,3.

In bulk ATAC-seq, a high mitochondrial fraction is mostly a cost and efficiency problem. Those reads are usually removed before analysis and never contribute to peak calling, so each percentage point of mtDNA represents a percentage point of the sequencing budget that is effectively wasted. In single-cell ATAC-seq, mitochondrial fraction is also used per cell as a proxy for viability and stress. Cells with very high mitochondrial signal are often filtered out early because they tend to correspond to damaged or dying cells.

Why mitochondrial reads dominate so many ATAC-seq libraries

In standard ATAC-seq, the assay is performed on isolated nuclei, not whole cells, to allow the Tn5 transposase direct access to chromatin while minimizing mtDNA contamination. However, contamination remains common because mitochondria frequently co-purify with nuclei during the isolation process. In addition, the detergents typically used to permeabilize cell and nuclear membranes can also disrupt mitochondrial membranes, exposing mtDNA to the transposase.

Each cell contains hundreds to thousands of mitochondria, and each mitochondrion carries several copies of a compact ~16.5 kb genome. These genomes are packaged with proteins but are generally not organized into the same nucleosome-based chromatin structure as nuclear DNA. Once mitochondrial membranes break, the Tn5 complex sees a dense cloud of very accessible DNA with repeated copies of the same sequence2,3. This makes mtDNA a perfect Tn5 substrate.

If the lysis step is too harsh, if cells are stressed, or if tissue dissociation has already damaged mitochondria, the enzyme will spend a disproportionate amount of its activity on mtDNA rather than on nuclear chromatin. The result is a library where a large share of fragments map to the mitochondrial genome, often in the form of simple coverage spikes that add little information and are removed during analysis3-5.

The extent of this problem varies by sample type and protocol. Omni-style protocols reduce mitochondrial signal relative to the earliest ATAC-seq implementations by adjusting detergents and buffer composition so that plasma membranes are permeabilized, but mitochondria remain largely intact. Corces and colleagues showed that this approach improves signal to background and reduces mitochondrial contamination across a range of cell types, including tissues that were previously challenging for ATAC-seq2. Even so, mitochondrial fractions in the lower double digits are common even in careful experiments.

When mitochondrial reads become a dominant component of the library, every other metric suffers. FRiP and TSS enrichment fall because a growing share of reads lands on a separate chromosome that is excluded from peak calling and promoter meta analysis. Fragment length distributions lose their characteristic ladder as mtDNA derived fragments swamp the nuclear signal. At the same time, the parts of the genome that scientists care about receive a smaller share of the sequencing budget than intended, which reduces statistical power for detecting subtle changes in accessibility3-5.

In single-cell and multiome assays, high mitochondrial signal carries an additional cost. Pipelines such as ArchR and Signac routinely flag cells with high mitochondrial fraction as low quality and exclude them from downstream analysis to avoid artefacts from dying or ruptured cells2,5. In datasets where sample handling or lysis conditions promote mitochondrial leakage, this can lead to large numbers of cells being filtered out, even when some of those cells might otherwise contain usable nuclear information.

Strategies to manage the mitochondrial DNA problem

Researchers have taken several complementary approaches to this problem. These strategies can be grouped into preventing Tn5 from encountering mitochondrial DNA, handling mt reads computationally, and actively depleting mt derived fragments from libraries.

  1. Preventing the enzyme from seeing mtDNA

The first line of defense is thoughtful sample and nuclei preparation. Many improved ATAC-seq protocols focus on gentle lysis that releases nuclei while leaving mitochondria intact, combined with washes that remove organelles and cytoplasmic debris. The choice and concentration of detergents, as well as buffer ionic strength and temperature, are tuned to perforate the plasma membrane while preserving internal membranes2.

These refinements can reduce mitochondrial fraction substantially, particularly in standardized contexts such as cultured cells or certain tissues. However, they cannot prevent all mtDNA exposure. Stressed or apoptotic cells, cryopreserved material, and mitochondria rich tissues such as heart or skeletal muscle will still leak DNA even in optimized conditions.

  1. Computational handling

Once reads have been sequenced, the simplest approach is to align to both the nuclear and mitochondrial genomes and then discard any reads that map to the mitochondrial chromosome. Pipelines such as PEPATAC even advocate aligning first to the mitochondrial genome and then to the nuclear genome to improve alignment statistics and quality control5.

This practice is essential for clean nuclear peak calling but it does not recover any of the resources spent generating those reads. If 40 percent of a lane maps to the mitochondrial genome and is filtered out, you are left with only 60 percent of the depth to cover the much larger nuclear genome. In highly multiplexed runs or in studies with large numbers of samples, that loss accumulates quickly.

  1. Sequence specific depletion with Cas9

To reduce wasted sequencing rather than simply ignoring mitochondrial reads, several groups have explored sequence specific depletion of mt derived fragments at the library stage. The general concept, sometimes referred to as Depletion of Abundant Sequences by Hybridization (DASH), uses Cas9 programmed with guide RNAs against unwanted sequences to cut those fragments in a double-stranded DNA library. The cut fragments are then lost during cleanup or fail to amplify efficiently, enriching the remaining library for fragments that are not targeted4.

Montefiori and colleagues applied this idea directly to ATAC-seq3. They designed sets of guide RNAs tiling the human mitochondrial genome and treated ATAC-seq libraries with Cas9 prior to sequencing. Compared with standard protocols, libraries treated with mitochondrial Cas9 showed a marked reduction in mtDNA reads, more unique non mitochondrial reads, and an increased number of peaks at promoters and enhancers at the same overall sequencing depth. The authors also compared Cas9 based depletion with a detergent removal strategy and found that Cas9 more effectively increased the proportion of usable reads.

Together with similar efforts to deplete ribosomal RNA and other repetitive elements in other library types, these studies show that a Cas9-based depletion step can redirect a fixed amount of sequencing toward regions of real interest.3,4 For ATAC-seq, mitochondrial depletion is a natural first target because mtDNA is compact, well annotated, and clearly undesirable in most chromatin accessibility studies.

Where mitochondrial depletion fits into ATAC-seq workflows

It is helpful to be explicit about where a mitochondrial depletion module would sit in a typical workflow. A standard bulk ATAC-seq protocol can be simplified into four conceptual stages: nuclei preparation and lysis, tagmentation with Tn5, cleanup and PCR amplification, and final library QC and sequencing. The key point is that the Cas9 complex operates on double-stranded DNA libraries, which means it belongs after the initial amplification step, once adapters and indexes are in place.

In a bulk context, one approach is to perform tagmentation and PCR as usual, quantify the library, and then treat an aliquot with the mito targeting Cas9 RNP. After incubation and cleanup, the depleted library can be re amplified if needed and submitted for sequencing. The untreated fraction can serve as a paired control. Comparing the two libraries reveals the reduction in mitochondrial fraction and the effect on FRiP, TSS enrichment, and peak counts at the same sequencing depth3.

In a salvage scenario, an already sequenced library with an unacceptably high mitochondrial fraction can be taken from storage, treated with the depletion module, and re-sequenced. This offers a way to rescue data from experiments that would otherwise be written off as failures or would require a full repeat from cells or tissue3.

Single-cell ATAC-seq and multiome workflows introduce additional considerations because barcodes and indexes are often introduced earlier and because per cell mitochondrial fraction is used as a quality metric. Conceptually, though, the same principle applies. The depletion step can be placed after amplification of the barcoded library or cDNA pool, where each fragment already carries its cell barcode and sample index. The Cas9 complex cuts mitochondrial and NUMT derived fragments regardless of which cell barcode they carry, enriching the remaining library for nuclear fragments while preserving single-cell identity. Analysis pipelines need to interpret per cell mitochondrial fraction considering the depletion step, since the dynamic range of that metric will shrink.

In all cases, the advantage of a dsDNA stage depletion approach is that it overlays onto existing ATAC workflows rather than replacing them. Tn5 transposition, buffer conditions, choice of nuclei protocol, and sequencing platform remain unchanged. Users simply add a short depletion step to the back end of the process to reclaim reads that would otherwise go to the mitochondrial genome3,4.

Caveats and special cases

While mitochondrial depletion is attractive in many settings, it is not universally appropriate. Some studies deliberately exploit mtDNA. Single-cell lineage tracing approaches, for example, use naturally occurring mitochondrial mutations as clonal markers to reconstruct cell lineages6. Other projects focus on mitochondrial copy number, heteroplasmy, or genome integrity. In these contexts, removing mitochondrial fragments would eliminate the signal of interest.

Even in more typical chromatin accessibility studies, it is worth thinking carefully about how depletion interacts with downstream quality controls, especially in single-cell analysis. Pipelines that have been tuned on untreated libraries may need adjustments when mitochondrial fraction is systematically lower or less variable across cells. Metrics such as FRiP and TSS enrichment, on the other hand, often improve because a greater proportion of the sequencing budget is directed to nuclear accessible regions3-5.

Bringing it together

ATAC-seq compresses a very complex biological question into an elegant biochemical assay. A hyperactive Tn5 transposase samples chromatin in intact nuclei and reports where DNA is accessible. Metrics such as FRiP, TSS enrichment, fragment length distribution, and mitochondrial fraction give a multidimensional view of how well that assay worked on a given sample1,2,5.

Mitochondrial DNA sits at the intersection of biology and technique. Its abundance, compactness, and relative lack of nucleosomes make it a perfect substrate for Tn5 once mitochondrial membranes are breached. The result is a hidden tax on many ATAC-seq datasets, where substantial fractions of reads map to the mitochondrial genome and are later discarded2,3,5. This affects cost, how sensitively you can detect real chromatin-accessibility changes, and sometimes whether a library passes quality thresholds at all.

Improved lysis protocols and careful handling can reduce that tax but rarely eliminate it. Cas9 based mitochondrial depletion at the double-stranded library stage provides a complementary solution. By selectively cutting mtDNA and related fragments, a mito specific RNP module shifts a fixed sequencing budget toward nuclear regulatory regions, improves key QC metrics, and offers a path to recover value from high mitochondrial libraries that would otherwise be unusable3,4.

For groups that regularly see double digit mitochondrial fractions in ATAC-seq or that work with mitochondria rich or fragile samples, it is worth asking a few simple questions. How much of each lane is currently going to mtDNA. How often are libraries or cells being discarded because of mitochondrial metrics. And what would it mean for experimental design if even a portion of that wasted depth could be redirected toward the nuclear peaks and cell types that actually drive the biology of interest.

A clear understanding of ATAC-seq QC metrics, and of the role mitochondrial DNA plays in them, is the foundation for answering those questions. With that understanding in place, tools such as targeted mitochondrial depletion are easier to evaluate and easier to deploy in a way that genuinely improves data quality rather than simply adding another step to an already complex workflow.
 


References:
  • Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013 Dec;10(12):1213-1218. doi:10.1038/nmeth.2688.
  • Corces MR, Trevino AE, Hamilton EG, Greenside PG, Sinnott-Armstrong NA, Vesuna S, et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat Methods. 2017 Oct;14(10):959-962. doi:10.1038/nmeth.4396.
  • Montefiori L, Hernandez L, Zhang Z, Gilad Y, Ober C, Crawford GE, Nobrega MA, Sakabe NJ. Reducing mitochondrial reads in ATAC-seq using CRISPR/Cas9. Sci Rep. 2017 May 26;7:2451. doi:10.1038/s41598-017-02547-w.
  • Gu W, Crawford ED, O’Donovan BD, Wilson MR, Chow ED, Retallack H, DeRisi JL. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications. Genome Biol. 2016 Mar 1;17:41. doi:10.1186/s13059-016-0904-5.
  • Smith JP, Corces MR, Xu J, Reuter VP, Chang HY, Sheffield NC. PEPATAC: an optimized pipeline for ATAC-seq data analysis with serial alignments. NAR Genom Bioinform. 2021 Nov 23;3(4):lqab101. doi:10.1093/nargab/lqab101.
  • Ludwig LS, Lareau CA, Ulirsch JC, Christian E, Muus C, Li LH, et al. Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics. Cell. 2019 Mar 21;176(6):1325-1339.e22. doi:10.1016/j.cell.2019.01.022

For research use only. Not for use in diagnostic procedures.

line

Questions?
We’re here to help.

Contact us

Revvity AI Assistant Beta