Cell Ranger7.1, printed on 11/05/2024
In this tutorial, you will learn how to:
This tutorial is written with Cell Ranger v6.1.2. Commands are compatible with other versions of Cell Ranger, unless noted otherwise. |
The
cellranger
count pipeline aligns sequencing reads in FASTQ files to a reference
transcriptome and generates a .cloupe
file for visualization and
analysis in
Loupe
Browser, along with a number of other
outputs
compatible with other publicly-available tools for further analysis.
We call our working directory the yard. Start by making a directory to run the analysis in.
mkdir ~/yard/run_cellranger_count cd ~/yard/run_cellranger_count
Next, download FASTQ files from one of the publicly-available data sets on the 10x Genomics support site. This example uses the 1,000 PBMC data set from human peripheral blood mononuclear cells (PBMC), consisting of lymphocytes (T cells, B cell, and NK kills) and monocytes.
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_fastqs.tar
The size of this dataset is 5.17G and takes a few minutes to download.
Since this is a tar file and not a tar.gz
file, you don't need the -z
argument used in previous tutorials to extract it.
tar -xvf pbmc_1k_v3_fastqs.tar
The output is similar to the following:
pbmc_1k_v3_fastqs/ pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_I1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_I1_001.fastq.gz
Now you have a directory of two sets of FASTQ files, and can see they are named
based on the
bcl2fastq2 naming convention:
Sample_S1_L00X_R1_001.fastq.gz
. The files names indicate that they were all
from the same sample called pbmc_1k_v3 and the library was run on two lanes,
Lane 1: L001 and lane 2: L002.
Next, you need a reference transcriptome. From the download page for the FASTQ files it showed that these are human cells. There are several prebuilt human reference transcriptome packages on the 10x Genomics support site. Download the latest package and decompress it.
wget https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz tar -zxvf refdata-gex-GRCh38-2020-A.tar.gz
The size of the reference genome is 10.6G and takes ~five minutes to download.
Once you have downloaded and extracted the reference transcriptome files, you can keep them for future runs. However, if you need to delete to save space on your server between runs, the pre-compiled reference files are publicly-available, and can re-downloaded if needed.
Your raw data FASTQ files, however, are raw data that cannot be replaced. We strongly recommend backing these up and archiving them in case something happens to the disk space.
Once you have FASTQ files and a reference transcriptome, you are ready to run cellranger count.
Print the usage statement to see what is needed to build the command.
cellranger count --help
The output is similar to the following:
cellranger-count Count gene expression (targeted or whole-transcriptome) and/or feature barcode reads from a single sample and GEM well USAGE: cellranger count [FLAGS] [OPTIONS] --id--transcriptome FLAGS: --no-bam Do not generate a bam file --nosecondary Disable secondary analysis, e.g. clustering. Optional --include-introns Include intronic reads in count --no-libraries Proceed with processing using a --feature-ref but no Feature Barcode libraries specified with the 'libraries' flag --no-target-umi-filter Turn off the target UMI filtering subpipeline. Only applies when --target-panel is used --dry Do not execute the pipeline. Generate a pipeline invocation (.mro) file and stop --disable-ui Do not serve the web UI --noexit Keep web UI running after pipestance completes or fails --nopreflight Skip preflight checks -h, --help Prints help information ...
To run cellranger count, you need to specify an --id
.
This can be any string, which is a sequence of alpha-numeric characters,
underscores, or dashes and no spaces, that is less than 64 characters. Cell
Ranger creates an output directory that is named using this id. This directory
is called a "pipeline instance" or pipestance for short.
The --fastqs
should be a path to the directory containing the FASTQ
files. If you demultiplexed your data using
cellranger
mkfastq, you can use the path to fastq_path
directory in the
outs
from the pipeline. If there is more than one sample in the
FASTQ directory, use the --sample
argument to specify which samples
to use. This --sample
argument works off of the sample id at the
beginning of the FASTQ file name. It is unnecessary for this tutorial run because all of
the FASTQ files are from the same sample, but it is included as an example. The
last argument needed is the path to the --transcriptome
reference
package. Be sure to edit the file paths in red in the command below.
cellranger count --id=run_count_1kpbmcs \ --fastqs=/mnt/home/user.name/yard/run_cellranger_count/pbmc_1k_v3_fastqs \ --sample=pbmc_1k_v3 \ --transcriptome=/mnt/home/user.name/yard/run_cellranger_count/refdata-gex-GRCh38-2020-A
Since this is a full-sized dataset, it can take several hours to complete.
The output is similar to the following:
/mnt/yard/user.name/yard/apps/cellranger-6.1.2/bin cellranger count (6.1.2) Copyright (c) 2021 10x Genomics, Inc. All rights reserved. ------------------------------------------------------------------------------- Martian Runtime - v4.0.6 ... 2021-10-15 17:12:42 [perform] Serializing pipestance performance data. Waiting 6 seconds for UI to do final refresh. Pipestance completed successfully!
When the output of the cellranger count command says, “Pipestance completed successfully!”, this means the job is done.
The cellranger count pipeline outputs are in the pipestance directory in the outs folder. List the contents of this directory with ls -1.
ls -1 run_count_1kpbmcs/outs
The output is similar to the following:
├── analysis ├── cloupe.cloupe ├── filtered_feature_bc_matrix ├── filtered_feature_bc_matrix.h5 ├── metrics_summary.csv ├── molecule_info.h5 ├── possorted_genome_bam.bam ├── possorted_genome_bam.bam.bai ├── raw_feature_bc_matrix ├── raw_feature_bc_matrix.h5 └── web_summary.html
Check the web_summary.html to see results of the experiment. You can also load the cloupe.cloupe file into the Loupe Browser and start an analysis. This outs/ directory also contains a number of outputs that can be used as input for software tools developed outside of 10x Genomics, such as the Seurat R package.