Cell Ranger5.0, printed on 12/26/2024
The cellranger vdj pipeline outputs several indexed FASTA and FASTQ files.
File type | Primary Use Cases |
---|---|
FASTA | Downstream tools such as the Integrated Genome Viewer (IGV) or V(D)J annotation tools like IGBLAST. |
FASTQ | Inspecting assembly base quality scores |
File | Records | Description |
---|---|---|
filtered_contig.fasta | Assembled contigs | High-confidence contig sequences in cell barcodes. |
filtered_contig.fastq | Assembled contigs | High-confidence contig sequences in cell barcodes. |
all_contig.fasta | Assembled contigs | All assembled contig sequences. |
all_contig.fastq | Assembled contigs | All assembled contig sequences. |
consensus.fasta | Clonotype consensus sequences | Clonotype consensus sequences. |
concat_ref.fasta | Concatenated reference segments | Concatenated V(D)J reference segments for the segments detected on each consensus sequence. These serve as an approximate reference for each consensus sequence. |
donor_regions.fa | Inferred germline genes | See below |
Cell Ranger 5.0 infers the germline V genes used to rearrange T cell and B cell receptors. See Clonotype Grouping for more information. All cells with a given V gene (including cells in unrelated clonotypes) are inspected for shared mutations relative to the V(D)J reference; mutations shared across all cells are likely to be somatic mutations present in the germline V gene of the donor. In Cell Ranger 5.0, these inferred V gene germline sequences are exported as pipeline outs (vdj_reference/fasta/donor_regions.fa
). D, J, and C germline genes are not inferred in Cell Ranger 5.0.
Each donor_regions.fa
file contains a list of unique records, wherein each record corresponds to a unique, donor-specific V gene that differs from the V gene found in the V(D)J reference. The nucleotide sequence exported in the record spans the translated RNA sequence through the beginning of CDR3 (i.e. leader peptide to CDR3) and does not include the 5’ UTR.
The header of each record in donor_regions.fa
contains four elements. Consider the following example header:
>454:d1:1:TRAV1-2 (reference record id : donor name : allele number : gene name)
There are four elements in the header:
Typically, quality scores in a FASTQ file indicate the Phred-encoded probability that the base is correct. When a FASTQ file contains records for sequencing reads, the quality scores usually indicate the confidence of the base-caller at each base. Because cellranger vdj
produces quality scores for assembled bases, the interpretation is slightly different.
File | Interpretation |
---|---|
filtered_contig.fastq | Probability that the base is not a sequencing, PCR, or reverse-transcription (RT) error. The quality score is computed using the per-read sequencing Q-scores and an assumed RT error rate. |
all_contig.fastq | Same as above. |