Cell Ranger3.1, printed on 11/21/2024
Cell Ranger processes all Feature Barcoding data through a basic counting pipeline that determines the count of each feature in each cell. This analysis is done by the cellranger count pipeline. The pipeline outputs a unified feature-barcode matrix that contains gene expression counts alongside Feature Barcode counts for each cell barcode. The feature-barcode matrix replaces the gene-barcode matrix emitted by older versions of Cell Ranger.
The pipeline first extracts and corrects the cell barcode and UMI from the feature library using the same methods as gene expression read processing. It then then matches the Feature Barcoding read against the list of features declared in the Feature Barcode Reference. The counts for each feature are available in the feature-barcode matrix output files and in the Loupe Browser output file.
To enable Feature Barcoding analysis, cellranger count needs two new inputs:
--libraries
flag, and declares the FASTQ files and library type for each
input dataset. In a typical Feature Barcoding analysis there will be two input
libraries: one for the normal single-cell gene expression reads, and one for
the Feature Barcoding reads. This argument replaces the --fastqs
argument.--feature-ref
flag and declares the set of Feature Barcoding reagents in use
in the experiment. For each unique Feature Barcode used, this file declares a
feature name and identifier, the unique Feature Barcode sequence associated
with this reagent, and a pattern indicating how to extract the Feature Barcode
sequence from the read sequence. See Feature Barcode Reference
for details on how to construct the feature reference.After creating the CSV files, run cellranger count:
$ cd /home/jdoe/runs $ cellranger count --id=sample345 \ --libraries=library.csv \ --transcriptome=/opt/refdata-cellranger-GRCh38-3.0.0 \ --feature-ref=feature_ref.csv \ --expect-cells=1000
The complete set of arguments to cellranger count are covered in Single-Sample Analysis.
When inputting Feature Barcode data to Cell Ranger via the Libraries CSV File, you must declare the library_type of each library. Specific values for library_type will enable additional downstream processing, specifically for CRISPR Guide Capture and Antibody Capture. The following table outlines the types of libraries that can be specified and what they mean for the downstream processing.
library_type | Description |
---|---|
Antibody Capture | For use with experiments measuring cell surface protein expression levels via an antibody and/or antigen-multimer staining assay. Enables a t-SNE projection of the cells using only the Antibody Capture / Cell Surface Protein feature counts. This projection is available in an output file and in Loupe Browser. See the Antibody Algorithms page for more details. |
CRISPR Guide Capture | Enables an analysis of gene expression changes caused by the presence of CRISPR perturbations, in a Perturb-Seq style assay. See the CRISPR Overview page for more details. This mode also creates a t-SNE projection using only the CRISPR guide counts. This projection is available in an output file and in Loupe Browser. |
Custom | Provides processing of the Feature Barcoding reads and a basic summary of the sequencing quality and library quality, but performs no special processing of the Feature Barcoding counts. |
The Libraries CSV File declares the input FASTQ data for the libraries that make
up a Feature Barcoding experiment. This will include one library containing
Single Cell Gene Expression reads, and on more more libraries containing Feature
Barcoding reads. To use cellranger count in Feature Barcoding mode,
you must create a Libraries CSV File and pass it with the --libraries
flag.
The following table describes what the content should be in the Libraries CSV
File.
Column Name | Description |
---|---|
fastqs | A fully qualified path to the directory containing the demultiplexed FASTQ files for this sample. Analogous to the --fastqs arg to cellranger count. This field does not accept comma-delimited paths. If you have multiple sets of fastqs for this library, add an additional row, and use the use same library_type value. |
sample | Same as the --sample arg to cellranger count. Sample name assigned in the bcl2fastq sample sheet. |
library_type | The FASTQ data will be interpreted using the rows from the feature reference file that have a ‘feature_type’ that matches this library_type. This field is case-sensitive, and must match a valid library type as described in the Library / Feature Types section. Must be Gene Expression for the gene expression libraries. Must be one of Custom, Antibody Capture, or CRISPR Guide Capture for Feature Barcoding libraries. |
Note: Each unique sample id requires a separate line in the library CSV file |
Gene expression + CRISPR libraries. In this example we've demultiplexed
the sequencing data from two libraries named GEX_sample1
and CRISPR_sample1
on the bcl2fastq / mkfastq sample sheet. This generated FASTQ files named
GEX_sample1_S0_L001_001.fastq.gz
and CRISPR_sample1_S0_L001_001.fastq.gz
into the path /opt/foo
. We pass the FASTQ sample names and paths to Cell
Ranger with the appropriate library types:
fastqs | sample | library_type |
---|---|---|
/opt/foo/ | GEX_sample1 | Gene Expression |
/opt/foo/ | CRISPR_sample1 | CRISPR Guide Capture |
Gene expression + Antibody libraries. In this example we've demultiplexed
the sequencing data from two libraries named GEX_sample2
and Ab_sample2
on
the bcl2fastq / mkfastq sample sheet. This generated FASTQ files named
GEX_sample2_S0_L001_001.fastq.gz
and Ab_sample2_S0_L001_001.fastq.gz
into
the path /opt/foo
. We pass the FASTQ sample names to Cell Ranger with the
appropriate library types:
fastqs | sample | library_type |
---|---|---|
/opt/foo/ | GEX_sample2 | Gene Expression |
/opt/foo/ | Ab_sample2 | Antibody Capture |
If your assay scheme creates a library containing multiple library_types, for example if you're using CRISPR Guide Capture and Antibody Capture features, you will need to select a single library_type for the library when inputting it into the Libraries CSV File. This will provide only one kind of specialized library analysis. To get multiple specialized analyses, you will need to run Cell Ranger multiple times, passing different library_type values in the Libraries CSV File. This is a limitation of Cell Ranger 3.0 that will be lifted in future releases. Regardless of the library_type specified, the feature-barcode matrix outputs will contain counts for all specified features.
A Feature Reference CSV File is required when processing Feature Barcoding data.
It declares the molecule structure and unique Feature Barcode sequence of each
feature present in your experiment. Each line of the CSV declares one unique
Feature Barcode. The Feature Reference CSV File is passed to cellranger
count with the --feature-ref
flag. Please note that the CSV
may not contain characters outside of the ASCII range.
This table describes the columns in the Feature Reference CSV File. Example files can be found below.
Column Name | Description |
---|---|
id |
Unique ID for this feature. Must not contain whitespace, quote or comma characters. Each ID must be unique and must not collide with a gene identifier from the transcriptome. |
name |
Human-readable name for this feature. Must not contain whitespace. This name will be displayed in Loupe Browser. |
read |
Specifies which RNA sequencing read contains the Feature Barcode sequence. Must be R1 or R2. Note: in most cases R2 is the correct read. |
pattern |
Specifies how to extract the Feature Barcode sequence from the read. See the Barcode Extraction Pattern section below for details. |
sequence |
Nucleotide barcode sequence associated with this feature. E.g., antibody barcode or sgRNA protospacer sequence. |
feature_type |
Type of the feature. See the Library/Feature Types section for details on allowed values of this field. FASTQ data specified in the Library CSV File with a library_type that matches the feature_type will be scanned for occurrences of this feature. Each feature type in the feature reference must match a library_type entry in the Libraries CSV File. This field is case sensitive.
|
target_gene_id |
(Optional) Reference gene identifier of the target gene of a CRISPR guide RNA. A gene with this id must exist in the reference transcriptome. Providing target_gene_id and target_gene_name will enable the pipeline to perform differential expression analysis, assuming that control ("Non-Targeting") guides are also specified. Non-targeting guides must contain the value "Non-Targeting" in the "target_gene_id" and "target_gene_name" fields. See the CRISPR Overview section for more details. |
target_gene_name |
(Optional) Gene name of target gene of a CRISPR guide RNA. The gene name corresponding to the gene referenced in the target_gene_id field must match the gene name given here. See the CRISPR Overview section for more details. |
The pattern field of the feature reference defines how to locate the Feature Barcode within a read. The Feature Barcode may appear at a known offset with respect to the start or end of the read or may appear at a fixed position relative to a known anchor sequence. The pattern column can be made up of a combination of these elements:
Any constant sequences made up of A, C, G and T in the pattern must match exactly in the read sequence. Any N in the pattern is allowed to match a single arbitrary base. A modest number of fixed bases should be used to minimize the chance of a sequencing error disrupting the match. The fixed sequence should also be long enough to uniquely identify the position of the Feature Barcode. For feature types that require an non-N anchor, we recommend 12bp-20bp of constant sequence. The extracted Feature Barcode sequences are corrected up to a Hamming distance of 1 using the 10x barcode correction algorithm that is used to correct cell barcodes.
TotalSeq™-B is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 3' v3 assay. The Feature Barcode sequence appears at a fixed position (base 10) in the R2 read.
read | pattern |
---|---|
R2 | 5PNNNNNNNNNN(BC) |
Dataset Example
Example TotalSeq™-B Feature Reference CSV
Please note, this is a pre-release set of TotalSeq-B antibodies. The Feature
Barcode sequences have since changed. Please refer to
https://www.biolegend.com/totalseq for the latest conjugated Feature Barcode
information.
TotalSeq™-C is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 5' assay. The Feature Barcode sequence appears at a fixed position (base 10) in the R2 read.
read | pattern |
---|---|
R2 | 5PNNNNNNNNNN(BC) |
Dataset Example
Example TotalSeq™-C Feature Reference CSV
The feature reference for
Immudex's dMHC Dextramer® libraries with dCODE Dextramers
has the same feature barcode pattern as TotalSeq™-C. Use "Antibody Capture" in
the feature_type
column for dextramer or multimer reagents. Therefore, the
same
feature reference example for TotalSeq™-C
can also be used for MHC Dextramer® libraries.
TotalSeq™-A is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 3' v2 and Single Cell 3' v3 kits. The Feature Barcode sequence appears at the start of the R2 read.
Although TotalSeq™-A can be used with the CITE-Seq assay, CITE-Seq is not a 10x supported assay. Please contact New York Genome Center or BioLegend for assistance with the assay or software.
read | pattern |
---|---|
R2 | 5P(BC) |
Example TotalSeq™-A Feature Reference CSV
In CRISPR Guide Capture assays, the Feature Barcode sequence is the
CRISPR
protospacer sequence. The protospacer is followed by a downstream constant
sequence in the guide RNA which is used as an anchor to identify the location of
the protospacer. We recommend using a 12bp-20bp constant sequence that can be
uniquely identified, but is short enough that it is unlikely to be disrupted by
a sequencing error. In the example Feature Reference CSV file we declare six
guide RNA features with six distinct barcode / protospacer sequences. We use the
target_gene_id
and target_gene_name
columns to declare the target gene of
each guide RNA, for use in downstream CRISPR perturbation analysis. Two guides
are declared with target_gene_id
as Non-Targeting
. Cells containing
Non-Targeting
guides will be used as controls for CRISPR perturbation
analysis. The four remaining guides target two genes.
read | pattern |
---|---|
R2 | (BC)GACCAGGATGGGCACCACCC |