Cell Ranger7.0, printed on 11/21/2024
Cell Ranger 7.0 introduces support for Fixed RNA Profiling (FRP) with the cellranger multi pipeline. While some steps are similar to the existing algorithm for Gene Expression, there are a few differences illustrated and described below for singleplex and multiplex Fixed RNA Profiling experiments.
Singleplex Fixed RNA Profiling steps:
Multiplex Fixed RNA Profiling steps:
The cell barcode will be composed only of the 10x GEM Barcode and all reads with valid barcodes are considered to be part of the sample. This configuration does not use a [samples]
section in the multi config file.
The cell barcode will be composed of both the 10x GEM Barcode and the Probe Barcode. Reads must have valid 10x GEM and Probe Barcodes in order to be assigned to a sample. The corrected Probe Barcode sequence is mapped to its corresponding Probe Barcode ID. If this ID has been specified within the probe_barcode_ids
column of the [samples]
section, the read is assigned to the sample paired with this Probe Barcode ID.
Note that each Probe Barcode ID (e.g., BC001) is actually a mix of four distinct barcode sequences, which ensures balanced base composition during sequencing. Raw Probe Barcode sequences are first corrected independently of the corresponding 10x GEM Barcode using the full list of all Probe Barcode sequences from probe construction. These corrected sequences are then collapsed such that the Probe Barcode sequence used within the cell barcode is always lexicographically the first barcode amongst the four barcodes from a given mixture. For example, if the four Probe Barcodes are ACTTTAGG
, CGAGGGTA
, GACACTAC
, and TTGCACCT
, all instances of these Probe Barcodes will be converted to ACTTTAGG
for use within the Cell Ranger pipeline and outputs (e.g., feature-barcode matrix). The original sequence will be maintained in uncorrected form within the raw cell barcode tag (CR
) included in the BAM file.
Cell Ranger 7.0 introduces support for analyzing fixed RNA using the cellranger multi pipeline. During the fixed RNA workflow, whole transcriptome probe panels, consisting of multiple pairs of probes for each targeted gene, are added to the tissue. These probe pairs hybridize to their target transcript and are then ligated together. To analyze fixed RNA data, the probe-set
parameter must be used to specify a probe set reference CSV file. When this parameter is used, Cell Ranger will count ligation events using the probe aligner algorithm (reads are also aligned to the reference transcriptome using STAR, but only to determine their alignment positions and CIGAR strings; STAR alignments are not used to assign reads to genes for fixed RNA data). Distinct ligation events are counted for Fixed RNA Profiling rather than distinct transcripts. Sequencing reads are aligned to the probe set reference and assigned to the genes they target. For genes targeted by more than one probe pair, the UMI counts of all probe pairs targeting that gene are summed up and output as the UMI count for that gene in the feature-barcode matrix (for more information, see this article).
The probe alignment algorithm is similar to a seed-and-extend aligner, where each half of the read is a seed, as described in detail below:
The BAM tag pr:Z
reports a semicolon-separated list of probe IDs. See BAM alignment tags for a detailed description.
The cell calling algorithm is the same as described on the Gene Expression algorithm page, but done for each sample individually.
Probes that are predicted to have off-target activity to homologous genes or sequences are excluded from analysis by default (all probes are present in the raw matrix file). These probes are marked with FALSE
in the included column of the probe set reference CSV. Any gene that has at least one probe with predicted off-target activity will be excluded from filtered outputs. Setting filter-probes
to false
in the multi config file for cellranger multi will result in UMI counts from all non-deprecated probes, including those with predicted off-target activity, to be used in the analysis. Probes whose ID is prefixed with DEPRECATED
are always excluded from the analysis. Please see the Probe Set Overview for details on the probes.
Cell calling is performed and we utilize the 10x GEM and Probe Barcodes to estimate the mean of the Poisson distribution that should describe the number of Probe Barcodes observed per GEM under optimal chip loading. Note that the mean of this distribution will increase roughly linearly with cell load. Using the expected distribution, we estimate the 99.9th percentile for the number of Probe Barcodes observed per GEM, and remove all cells belonging to GEMs that fall beyond this threshold in the observed distribution to mitigate higher than expected barcode collision rates.
High amounts of filtering due to high occupancy GEMs may reflect partial clogs/wetting failures or significant deviations from the recommended chip loading protocol. In an optimal experiment, the percentage of cells passing high occupancy GEM filtering that can be used for analysis is usually more than 90%.
The lefthand plot below shows an example of a multiplex Fixed RNA Profiling experiment with 16 Probe Barcodes aiming to capture 100,000 cells. Cells contained in GEMs falling in the red zone in the observed distribution will be removed from Cell Ranger's filtered outputs (1.14% removed, 98.86% used). Note that the Y-axes of the histograms are log10 scaled.
For comparison, the righthand plot illustrates an experiment with a lower targeted cell number (5,000 cells) with 16 Probe Barcodes, for which 0.12% of cells are removed and 99.88% can be used.
The secondary analyses are described on the Gene Expression algorithm page.