10x Genomics
Chromium Single Cell Multiome ATAC + Gene Exp.
Cell Ranger ARC4.2, printed on 11/05/2024
Multiome Data Concepts
If you've used Loupe Browser before to analyze gene expression or ATAC data, you will find exploring multiome
data familiar in some ways, and different in others. The Cell Ranger ARC algorithm documentation
covers algorithms and analysis in more detail, but in short, here are some key things to keep in mind when looking at
Multiome ATAC + Gene Expression data:
- Both ATAC and gene expression data are available for the same cells. This means that many types of
features are available for the same cell, including UMI counts per cell,
cut sites per cell, transcription
factor motif z-scores, and aggregate features (see below).
- A pair of t-SNE and UMAP projections are computed from gene expression and ATAC data, respectively.
- Peaks are genomic regions where there were significant upticks in fragment cut sites, which indicate regions of open chromatin. They are named by their location (e.g., "chr1:10244-10510")
- Unlike genes, peaks are likely to be different between different datasets, or even the same dataset sequenced at different depths.
- There are typically more distinct peaks than there are genes.
- The dynamic range of gene expression per cell is typically much wider than the dynamic range of cut sites per peak per cell.
- The main new datatype in a Multiome ATAC + Gene Expression dataset is feature linkages, which are correlations between
gene expression and patterns of open chromatin across all cells in the dataset. Linked features can be gene-to-peak,
or peak-to-peak. Features must be within a megabase of each other to be linked.
- In addition to genes and peaks, there are several aggregate feature types which can be also used to differentiate
cells:
- Nearby gene sums, which are the sums of peak cut sites per cell (within peaks) that are close to any transcript of that gene.
This is computed in all datasets with ATAC data, including Single Cell ATAC. The direct readout of gene expression
data available in Single Cell Multiome ATAC + Gene Expression datasets may be more useful, but this
is available for comparison.
- Promoter sums, which are the sums of cut sites per cell (within peaks) which are close to one of the
transcription start sites for that gene. These features are named "(Gene) Sum". Not all peaks
are associated with a gene.
- Transcription factor motifs, which are the sums of cut sites per cell which fall within peaks
associated with a motif by the Cell Ranger ARC pipeline. Motif features are named after the
motifs themselves (e.g., "SPI1"). A peak is usually associated with multiple motifs.
- A Single Cell Multiome ATAC + Gene Expression dataset takes up several times as much disk space (per cell) than a gene expression dataset,
and a small amount more than Single Cell ATAC datasets.
- To see fragment locations per cluster in high resolution, you need access to the
atac_fragments.tsv.gz
file
for that run, generated by the Cell Ranger ARC pipeline. These files are typically several times larger than
the .cloupe file, which is why they are not bundled. You can either specify the location of this file on a locally mounted file system, or on the web via a URL.