10x Genomics
Chromium Single Cell Immune Profiling

Cell Ranger2.0, printed on 04/24/2025

Reference Support

Cell Ranger provides a pre-built human reference package for use with the pipeline. If you would like to use your genome FASTA or gene GTF annotations, Cell Ranger supports the use of customer-generated references.

Making a Reference Package

The cellranger mkvdjref tool can be used to generate a custom reference package.

$ cellranger mkvdjref --genome=my_vdj_ref \
                      --fasta=GRCh38_ensembl.fasta \
                      --genes=GRCh38_ensembl.gtf

A Cell Ranger V(D)J reference consists of germline gene segment sequences. It assumes that these sequences are contained within a genome reference FASTA, and that a gene annotation GTF points to the relevant gene segments. Currently it assumes the GTF is in an Ensembl-like format. If you are using a transcriptome- or segment- based V(D)J reference rather than a genome-based reference, you can make the "chromosomes" be the transcripts and construct a GTF which annotates the transcripts appropriately.

Input FASTA file

cellranger mkvdjref expects a FASTA file containing genomic reference sequences whose names are consistent with the names used in the GTF file.

Input GTF file

Cell Ranger V(D)J expects a GTF file in an Ensembl-like format that contains information about V(D)J gene segments.

GTF columns

GTF Column	Name	Description
1	Chromosome	Must refer to a chromosome/contig in the genome fasta.
2	Source	Unused.
3	Feature	Cell Ranger only uses rows where this line is equal to one of `CDS` or `five_prime_utr`.
4	Start	Start position on the reference (1-based inclusive).
5	End	End position on the reference (1-based inclusive).
6	Score	Unused.
7	Strand	Strandedness of this feature on the reference: `+` or `-`.
8	Frame	Unused.
9	Attributes	A semicolon-delimited list of key-value pairs of the form `key "value"`. The attribute keys used by Cell Ranger V(D)J are detailed below.

GTF Attributes

GTF Attribute	Description
transcript_id	Becomes the `record_id` in the Cell Ranger V(D)J reference entry format.
transcript_biotype	The value is used to infer the V(D)J segment type. Either `transcript_biotype` or `gene_biotype` must be a value in the "Accepted Biotypes" list below. If `transcript_biotype` is not on the accepted list, then `gene_biotype` is used.
gene_biotype	See `transcript_biotype`.
gene_name	Must be specified. Becomes the `gene_name` in the Cell Ranger V(D)J reference entry format.

Accepted Biotypes

TR_C_gene
TR_D_gene
TR_J_gene
TR_V_gene
IG_C_gene
IG_D_gene
IG_J_gene
IG_V_gene

Example minimal GTF row used by Cell Ranger V(D)J

14      havana  CDS     21621904        21621946        .       +       0       transcript_id "ENST00000542354"; gene_name "TRAV1-1"; transcript_biotype "TR_V_gene";

Reference package format

cellranger mkvdjref creates a directory whose named is specified by the --genome argument.

$ tree my_vdj_ref
my_vdj_ref
├── fasta
│   └── regions.fa
└── reference.json

Generating the Cell Ranger V(D)J reference package

The Cell Ranger V(D)J human reference package refdata-cellranger-vdj-GRCh38-alts-ensembl-2.0.0 was generated with the following steps.

Input files

The Ensembl v87 top-level genome FASTA containing patches and alternative haplotypes: Homo_sapiens.GRCh38.dna.toplevel.fa
The corresponding Ensembl v87 genes GTF: Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf
10x-specific addendum to the genes GTF: vdj_GRCh38_alts_ensembl_10x_genes-2.0.0.gtf
10x-specific list of blacklisted transcript IDs: vdj_GRCh38_alts_ensembl_10x_ignore_transcripts-2.0.0.txt

`mkvdjref` command

This reference was constructed by adding to and removing some entries from the Ensembl GTF. Adding entries from multiple GTFs is accomplished by specifying the --genes argument multiple times. Entries are removed by providing a list of transcript IDs to the --rm-transcripts argument. For details please see cellranger mkvdjref --help

$ wget ftp://ftp.ensembl.org/pub/release-87/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
$ gunzip Homo_sapiens.GRCh38.dna.toplevel.fa.gz


$ wget ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf.gz
$ gunzip Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf.gz


$ cellranger mkvdjref --genome vdj_GRCh38_alts_ensembl \
                      --fasta=Homo_sapiens.GRCh38.dna.toplevel.fa \
                      --genes=Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf \
                      --genes=vdj_GRCh38_alts_ensembl_10x_genes-2.0.0.gtf \
                      --rm-transcripts=vdj_GRCh38_alts_ensembl_10x_ignore_transcripts-2.0.0.txt \
                      --ref-version=2.0.0

10x Genomics
Chromium Single Cell Immune Profiling

Reference Support

Making a Reference Package

Input FASTA file

Input GTF file

GTF columns

GTF Attributes

Accepted Biotypes

Example minimal GTF row used by Cell Ranger V(D)J

Reference package format

Generating the Cell Ranger V(D)J reference package

Input files

`mkvdjref` command

About

Legal Notices

Resources

Headquarters

Social

10x GenomicsChromium Single Cell Immune Profiling

Reference Support

Making a Reference Package

Input FASTA file

Input GTF file

GTF columns

GTF Attributes

Accepted Biotypes

Example minimal GTF row used by Cell Ranger V(D)J

Reference package format

Generating the Cell Ranger V(D)J reference package

Input files

mkvdjref command

10x Genomics
Chromium Single Cell Immune Profiling

`mkvdjref` command