Abstract
Abstract Background Droplet-based single-cell RNA sequence analyses assume that all acquired RNAs are endogenous to cells. However, any cell-free RNAs contained within the input solution are also captured by these assays. This sequencing of cell-free RNA constitutes a background contamination that confounds the biological interpretation of single-cell transcriptomic data. Results We demonstrate that contamination from this "soup" of cell-free RNAs is ubiquitous, with experiment-specific variations in composition and magnitude. We present a method, SoupX, for quantifying the extent of the contamination and estimating "background-corrected" cell expression profiles that seamlessly integrate with existing downstream analysis tools. Applying this method to several datasets using multiple droplet sequencing technologies, we demonstrate that its application improves biological interpretation of otherwise misleading data, as well as improving quality control metrics. Conclusions We present SoupX, a tool for removing ambient RNA contamination from droplet-based single-cell RNA sequencing experiments. This tool has broad applicability, and its application can improve the biological utility of existing and future datasets.
Keywords
Affiliated Institutions
Related Publications
Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the int...
DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors
Single-cell RNA sequencing (scRNA-seq) data are commonly affected by technical artifacts known as "doublets," which limit cell throughput and lead to spurious biological conclus...
A molecular cell atlas of the human lung from single-cell RNA sequencing
Although single-cell RNA sequencing studies have begun to provide compendia of cell expression profiles1–9, it has been difficult to systematically identify and localize all mol...
Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets
Cells, the basic units of biological structure and function, vary broadly in type and state. Single-cell genomics can characterize cell identity and function, but limitations of...
IOBR: Multi-Omics Immuno-Oncology Biological Research to Decode Tumor Microenvironment and Signatures
Recent advances in next-generation sequencing (NGS) technologies have triggered the rapid accumulation of publicly available multi-omics datasets. The application of integrated ...
Publication Info
- Year
- 2020
- Type
- article
- Volume
- 9
- Issue
- 12
- Citations
- 1413
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/gigascience/giaa151