DRUG-seq provides a scalable way to incorporate transcriptomic readouts into screening1, but the resulting data are only as informative as the experimental design behind them. In practice, many uninterpretable datasets trace back to decisions made before the screen begins, particularly the choice of cell model, compound dosing strategy, and treatment timing.
Before committing to a screen, the first question is what transcriptional signal you realistically expect to observe. Although this sounds straightforward, it requires defining several experimental variables that are often only considered once a screen is underway.
What mechanism of action are you trying to detect or assign? Does the chosen cell model express the relevant biology? What treatment time and compound concentration will produce a measurable transcriptional response without overwhelming the signal with general stress or cytotoxicity? If these questions cannot be answered based on published data or a small pilot experiment, the screen may still generate expression data, but it may not yield interpretable biological insight1,2.
This issue is particularly relevant for teams familiar with high-content screening (HCS), where assay readouts are typically direct, such as morphological changes, reporter signals, or fluorescent markers3. Transcriptomics is more indirect: mechanism is inferred from coordinated gene expression changes rather than a single phenotype. For researchers in functional genomics, transcriptional readouts are familiar, but compound perturbations differ from genetic ones. Unlike CRISPR knockouts, small molecules are dose-dependent and often affect multiple pathways, so treatment conditions must be chosen to capture on-target transcriptional responses without allowing secondary effects to dominate the signature.
Choosing the right cell model
The choice of cell model is one of the most consequential design decisions in a DRUG-seq screen. It determines which biology can be observed, how interpretable the resulting signatures will be, and whether the data will be actionable in a given disease context. Although there is no universally correct model, several practical principles can help guide the choice.
Biological suitability for mechanism detection
Not all cell models produce equivalent transcriptional responses to compound perturbation. Highly transformed cell lines, particularly
those with heavily altered signaling landscapes, may respond to compounds in ways that are difficult to interpret biologically because pathways may be constitutively active, silenced, or rewired relative to a physiologically relevant context.
For MoA-driven screening, the model must express the genes and pathways you intend to perturb. A pathway active in primary hepatocytes, for example, may be largely silent in a commonly used cancer cell line.
Before committing to a model, it is therefore useful to verify pathway expression and responsiveness using available RNA-seq datasets such as the Cancer Cell Line Encyclopedia4 (CCLE) or GTEx5. A practical check is whether known tool compounds generate a detectable transcriptional response in that model. If reference compounds fail to produce the expected signature, screening compounds are unlikely to do so either.
This is also familiar to HCS teams: a cell line may be easy to image yet still represent a poor biological model for mechanism inference3.
Signal robustness in screening conditions
Primary cells and iPSC-derived models offer greater physiological relevance but typically produce more variable transcriptional responses than established cell lines, particularly at the cell densities achievable in 384-well format. For large primary screens, assay reproducibility and signal consistency can therefore become as important as biological relevance.
Established cell lines with well-characterized transcriptional responses to reference compounds, such as A549, HepG2 or U2OS, often provide a more predictable signal-to-noise baseline and are therefore commonly used for first-pass screening.
Primary or iPSC-derived models are generally better suited to focused follow-up experiments, where throughput is lower and physiological relevance and disease context become the primary objective.
Compatibility with screening format
For DRUG-seq in 384-well format, cell seeding density typically needs to be optimized within roughly 2,000–10,000 cells per well, depending on the model. Cells that do not plate uniformly, adhere poorly in miniaturized format, or require extracellular matrix conditions incompatible with standard tissue culture plates can lead to variable RNA yield per well and uneven sequencing depth after demultiplexing.
Although manageable, this variability adds noise. Running a simple seeding-density optimization experiment (measuring cell number per well and checking basic transcriptional QC metrics across a density range) can substantially improve assay robustness before running the full screen.
Controls, dose, and time
Controls
Controls define the baseline against which compound-induced signatures are interpreted. Positive controls: two or three compounds with well-characterized, mechanistically distinct transcriptional signatures at the tested concentrations. These serve as assay quality controls and reference anchors for MoA clustering. Suitable examples include proteasome inhibitors, HDAC inhibitors, or topoisomerase inhibitors, which reliably produce strong transcriptional responses1. For HCS teams, they also confirm that the assay distinguishes biologically distinct states rather than simply separating “active” from “inactive” wells.
DMSO vehicle controls: multiple wells distributed across the plate to capture positional effects and define the baseline expression state. These wells are typically used for normalization and differential expression analysis across the plate. Too few DMSO controls increase variance in differential expression estimates and reduce statistical power.
Transcriptionally inactive controls (optional): when available, one or two compounds expected to be inactive in the chosen model can help estimate the biological noise floor of the assay and identify signatures arising from non-specific assay variability.
Dose
Single-point screening is common in primary DRUG-seq screens for throughput reasons, but it limits interpretability. A compound tested at a single concentration may sit below its transcriptional EC50, producing a weak signature, or exceed a cytotoxic threshold, producing stress and cell death responses that obscures the on-target biology.
Where throughput allows, a dose-response design provides a stronger context than a single concentration1. In practice, however, most large primary screens are run at a single concentration, with dose–response profiling reserved for smaller focused libraries or follow-up screens.
Even a limited two-concentration design (for example one near the expected EC50 and one higher concentration) can provide useful signal redundancy when library size permits. A reproducible dose–response pattern in the transcriptional signature increases confidence that the compound drives the observed expression changes and strengthens downstream MoA interpretation.
Treatment time
Treatment duration is one of the most under-optimized parameters in DRUG-seq experiments, and it has a strong impact on data interpretability. The optimal treatment time depends on the mechanism being interrogated. Direct transcriptional regulators (for example HDAC or BET inhibitors) can produce measurable signatures within a few hours, whereas compounds acting upstream of transcription, such as kinase inhibitors or receptor modulators, often require longer exposure for downstream responses to accumulate. Cytotoxic compounds progressively shift the signature toward stress and cell death programs regardless of target, eventually obscuring the on-target biology.
For this reason, there is rarely a single “correct” DRUG-seq time point. The goal is to identify the window where the biological response is strongest while secondary stress responses remain limited. A small pilot time-course experiment using a few reference compounds can help identify this window before committing to a large screen. Testing two or three time points (for example, early and late exposures) is often sufficient to determine where the on-target signal is best separated from background noise. If DRUG-seq is combined with Cell Painting on the same plate, the time-course optimization should consider both readouts, as morphological and transcriptional responses often peak at different times.
Interpreting DRUG-seq screening results
A DRUG-seq screen produces a gene-by-sample expression matrix, but extracting biological insight depends on how the data are analyzed. The steps below highlight the analyses that most often determine whether a dataset yields interpretable mechanism-of-action (MoA) signals in a screening context.
Quality control at the well level
Before interpreting biology, the data should be examined for basic assay quality. Useful metrics include total unique molecular identifiers (UMIs) per well as a proxy for RNA yield and cell number, the number of genes detected per well, and the consistency of DMSO control wells across the plate. Wells with low UMI counts or poor correlation with DMSO controls may reflect dispensing errors, cell loss, or positional artefacts rather than compound effects. Removing these wells helps prevent technical noise from distorting downstream MoA clustering.
From gene signatures to pathways
Differential expression analysis relative to DMSO controls provides the transcriptional signature of each compound6. In practice, interpretation is often more robust when focusing on the strongest part of the signature rather than the full list of marginally changed genes.
Gene set enrichment analysis against curated pathway collections such as MSigDB Hallmarks7, converts gene signatures into pathway-level responses. For screening teams, these pathway signatures provide a clearer biological summary and facilitate comparison between compounds.
MoA clustering with reference datasets
The most informative use of DRUG-seq data is often the comparison of compound signatures with reference perturbation datasets. Resources such as the Connectivity Map contain transcriptional profiles from thousands of chemical and genetic perturbations, allowing compounds to be ranked by transcriptional similarity8. Compounds that cluster with known reference perturbations are strong candidates for shared mechanisms, while isolated signatures may represent novel biology.
For functional genomics teams, DRUG-seq signatures can also be compared with CRISPR perturbation profiles to identify genetic dependencies that phenocopy compound activity.
When silence is informative
A compound that produces no detectable transcriptional signature is not necessarily inactive. It may act below the transcriptional EC50, require a longer treatment window, or operate through mechanisms that do not immediately trigger transcriptional responses. Before concluding inactivity, it is important to confirm that positive controls behave as expected and to consider orthogonal phenotypes, for example, from Cell Painting. A compound that produces a morphological phenotype, but minimal transcriptional change may still provide mechanistic clues about how it perturbs cellular processes.
Conclusion
DRUG-seq can produce mechanistically rich datasets when experiments are designed with the underlying biology in mind from the outset. The investment in a well-designed pilot (including cell model validation, time-course optimization, and dose-range finding with reference compounds) is typically small relative to the cost of a full screening campaign, yet it has a disproportionate impact on the interpretability of the resulting data.
Screens built on a robust experimental foundation are far more likely to generate clear, reproducible transcriptional signatures that support mechanism-of-action analysis and guide downstream follow-up. Ultimately, interpretable biology is not the result of downstream data processing alone, but of thoughtful experimental design implemented before the first plate is run.
For research use only. Not for use in diagnostic procedures.
References:
- Ye, C., et al. (2018). DRUG-seq for miniaturized high-throughput transcriptome profiling in drug discovery. Nat Commun. 9(1):4307. doi: 10.1038/s41467-018-06500-x.
- Li, J., et al. (2022). DRUG-seq Provides Unbiased Biological Activity Readouts for Neuroscience Drug Discovery. ACS Chem Biol. 17(6):1401-1414. doi: 10.1021/acschembio.1c00920.
- Bray, M.A., et al. (2016). Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc. 11(9):1757-74. doi: 10.1038/nprot.2016.105.
- Ghandi, M., et al. (2019). Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature. 569(7757):503-508. doi: 10.1038/s41586-019-1186-3.
- GTEx Consortium. (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 369(6509):1318-1330. doi:10.1126/science.aaz1776.
- Love, M.I., et al. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12):550. doi: 10.1186/s13059-014-0550-8.
- Liberzon, A., et al. (2015). The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1(6):417-425. doi: 10.1016/j.cels.2015.12.004.
- Subramanian, A., et al. (2017). A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell, 171(6):1437–1452.e17. doi: 10.1016/j.cell.2017.10.049.