Cell Ranger6.1, printed on 11/21/2024
A V(D)J transcript has the following structure:
UTR: Untranslated region; FWR: Framework region; CDR: Complementarity determining region
The cellranger vdj pipeline provides amino acid and nucleotide sequences for framework and complementarity determining regions (CDRs). The V(D)J annotations on the assembled contigs and on the clonotype consensus sequences are produced in multiple formats.
Learn more about productive contigs on the Annotation Algorithm page.
File type | Description |
---|---|
CSV | High-level annotations with one contig, consensus, or clonotype per row. |
JSON | Detailed annotations, including alignment coordinates and amino acid translations. |
BED | Germline V(D)J segments as features, for use with a tool like IGV. |
TSV | Used for the AIRR rearrangement format of VDJ contigs and consensus sequences. |
File | Description |
---|---|
clonotypes.csv | High-level descriptions of each clonotype. |
consensus_annotations.csv | High-level and detailed annotations of each clonotype consensus sequence. |
filtered_contig_annotations.csv | High-level annotations of each high-confidence, cellular contig. This is a subset of all_contig_annotations.csv . |
all_contig_annotations.{csv,bed,json} | High-level and detailed annotations of each contig. |
airr_rearrangement.tsv | Annotated contigs and consensus sequences of VDJ rearrangements in the AIRR format. |
Column | Description |
---|---|
clonotype_id | The ID of the clonotype to which this consensus sequence was assigned. |
frequency | The observed number of cell barcodes with this clonotype. |
proportion | The observed fraction of cell barcodes with this clonotype. |
cdr3s_aa | A semicolon-delimited list of chain:sequence pairs, where chain is for example TRA, TRB, IGK, IGL, or IGH and sequence is the CDR3 amino acid sequence for that chain. |
cdr3s_nt | A semicolon-delimited list of chain:sequence pairs, where chain is for example TRA, TRB, IGK, IGL, or IGH and sequence is the CDR3 nucleotide sequence for that chain. |
inkt_evidence | For T cells, this column would contain the evidence, if any, that this clonotype is a group of iNKT cells. The evidence is semicolon-delimited list of chain:matches , where chain is one of TRA or TRB and matches is one of genes , junction or genes+junction . See iNKT/MAIT for more information |
mait_evidence | For T cells, this column would contain the evidence, if any, that this clonotype is a group of MAIT cells. The evidence is semicolon-delimited list of chain:matches , where chain is one of TRA or TRB and matches is one of genes , junction or genes+junction . See iNKT/MAIT for more information |
Column | Description |
---|---|
barcode | Cell-barcode for this contig. |
is_cell | True or False value indicating whether the barcode was called as a cell. |
contig_id | Unique identifier for this contig. |
high_confidence | True or False value indicating whether the contig was called as high-confidence (unlikely to be a chimeric sequence or some other artifact). |
length | The contig sequence length in nucleotides. |
chain | The chain associated with this contig; for example, TRA, TRB, IGK, IGL, or IGH. A value of "Multi" indicates that segments from multiple chains were present. |
v_gene | The highest-scoring V segment, for example, TRAV1-1. |
d_gene | The highest-scoring D segment, for example, TRBD1. |
j_gene | The highest-scoring J segment, for example, TRAJ1-1. |
c_gene | The highest-scoring C segment, for example, TRAC. |
full_length | If the contig was declared as full-length. |
productive | If the contig was declared as productive. |
fwr1 | The predicted FWR1 amino acid sequence. |
fwr1_nt | The predicted FWR1 nucleotide sequence. |
cdr1 | The predicted CDR1 amino acid sequence. |
cdr1_nt | The predicted CDR1 nucleotide sequence. |
fwr2 | The predicted FWR2 amino acid sequence. |
fwr2_nt | The predicted FWR2 nucleotide sequence. |
cdr2 | The predicted CDR2 amino acid sequence. |
cdr2_nt | The predicted CDR2 nucleotide sequence. |
fwr3 | The predicted FWR3 amino acid sequence. |
fwr3_nt | The predicted FWR3 nucleotide sequence. |
cdr3 | The predicted CDR3 amino acid sequence. |
cdr3_nt | The predicted CDR3 nucleotide sequence. |
fwr4 | The predicted FWR4 amino acid sequence. |
fwr4_nt | The predicted FWR4 nucleotide sequence. |
reads | The number of reads aligned to this contig. |
umis | The number of distinct UMIs aligned to this contig. |
raw_clonotype_id | The ID of the clonotype to which this cell barcode was assigned. |
raw_consensus_id | The ID of the consensus sequence to which this contig was assigned. |
exact_subclonotype_id | The ID of the exact subclontype to which this cell barcode was assigned. |
Details on how the Cell Ranger algorithm delimits CDRs (Complementarity Determining Regions) and FWRs (Frame Work Regions) are provided on the enclone features page.
Column | Description |
---|---|
clonotype_id | The ID of the clonotype to which this consensus sequence was assigned. |
consensus_id | The ID of this consensus sequence. |
v_start | 0-based index of the V region start position on the consensus sequence. |
v_end | 0-based index of the V region end position on the consensus sequence. |
v_end_ref | 0-based index of the V gene end position on the reference |
j_start | 0-based index of the J region start position on the consensus sequence. |
j_start_ref | 0-based index of the J gene start position on the reference. |
j_end | 0-based index of the J region end position on the consensus sequence. |
cdr3_start | 0-based index of the CDR3 region start position on the consensus sequence. |
cdr3_end | 0-based index of the CDR3 region end position on the consensus sequence. |
The remaining columns are shared with those under the Contig Annotation CSV Files section.
Column | Description |
---|---|
cell_id | Cell barcode defining the cell for the query sequence. |
clone_id | Clonotype ID/clonotype assignment. |
rev_comp | Set to false by default (10x Genomics VDJ sequences are not reverse complemented). |
sequence_id | The name of the contig associated with the rearrangement. |
sequence | The nucleotide sequence of the rearrangement. |
sequence_aa | The amino acid sequence of the rearrangement. |
productive | Whether or not the rearrangement is productive. |
v_call | The name of the aligned V gene for the rearrangement. |
v_cigar | The CIGAR string of the V gene alignment. |
v_sequence_start | 1-based index on the contig of the V region start position. |
v_sequence_end | 1-based index on the contig of the V region end position. |
d_call | The name of the aligned D gene for the rearrangement. |
d_cigar | The CIGAR string of the D gene alignment. |
d_sequence_start | 1-based index on the contig of the D region start position. |
d_sequence_end | 1-based index on the contig of the D region end position. |
j_call | The name of the aligned J gene for the rearrangement. |
j_cigar | The CIGAR string of the J gene alignment. |
j_sequence_start | 1-based index on the contig of the J region start position. |
j_sequence_end | 1-based index on the contig of the J region end position. |
c_call | The name of the aligned C gene for the rearrangement. |
c_cigar | The CIGAR string of the C gene alignment. |
c_sequence_start | 1-based index on the contig of the C region start position. |
c_sequence_end | 1-based index on the contig of the C region end position. |
sequence_alignment | The aligned sequence of the VDJ rearrangement. |
germline_alignment | The assembled, aligned, full-length inferred germline sequence of the aligned sequence. |
junction | The nucleotide sequence of the rearrangement's junction (CDR3). |
junction_aa | The amino acid sequence of the rearrangement's junction (CDR3). |
duplicate_count | The number of unique molecular identifiers associated with this rearrangement. |
consensus_count | The number of reads associated with this rearrangement. |
junction_length | The length of the rearrangement's junction nucleotide sequence. |
junction_aa_length | The length of the rearrangement's junction amino acid sequence. |
is_cell | Is this rearrangement cell-associated? |
The AIRR rearrangement file includes all mandatory AIRR fields and several optional variables to enhance reproducibility and guide analyses.