, printed on 11/21/2024
After June 30, 2023, new Cell Ranger and Space Ranger releases will no longer support targeted gene expression analysis. |
In the panel selection overview we provide links to pages containing the following files for all predesigned target gene panels:
File | Description |
---|---|
Gene Metadata File |
A TSV file that lists all the genes included in the final target gene panel. |
Target Panel CSV File |
This CSV file is a required input for Cell Ranger to enable analysis of targeted GEX data. It specifies details of the target genes and bait sequences in the target gene panel. |
Bait BED File |
A BED12 file containing containing the sequences and genomic coordinates for all baits in the target gene panel. Use this file to visualize the bait locations on genome browsers like IGV (Integrated Genomics Viewer) and the UCSC Genome Browser. |
Customized designs created through the 10x Genomics Custom Panel Designer will include each of the files above, as well as the following additional files:
File | Description |
---|---|
Bait Design File |
Use this Excel (.xlsx) file to order your custom or add-on panel from a compatible oligo provider. It contains the bait sequences for add-on genes and custom sequences. |
Custom Sequence GTF File |
A GTF file corresponding to the custom input sequences entered for panel design. Use this file to create a custom reference compatible with Cell Ranger by appending its contents to the transcriptome GTF file. |
Custom Sequence FASTA File |
A FASTA file corresponding to the custom input sequences entered for panel design. Use this file to create a custom reference compatible with Cell Ranger by appending its contents to the reference FASTA file. |
Files containing information about individual baits have a column corresponding to the bait identifier (ID) that uniquely identifies each bait and is useful for matching up entries across different files as needed. Bait IDs take the following format:
gene_id|gene_name|bait_number
For example, the first bait listed for the gene TSPAN6, which has the Ensembl ID ENSG00000000003 in the GRCh38-2020-A reference, would have the bait ID:
ENSG00000000003|TSPAN6|1
The first bait listed for BRCA1 would be:
ENSG00000012048|BRCA1|1
For custom sequences, the gene_id
and gene_name
fields within the bait ID will be the same and match the name provided for each sequence.
This CSV file is a required input for Cell Ranger to enable analysis of targeted GEX data. It specifies the target genes and bait sequences that make up the target gene panel. See a description of the --target-panel
argument to cellranger count
in the cellranger count documentation.
The following is a portion of an example target panel file:
#panel_name=Human Gene Signature Panel #panel_type=predesigned #reference_genome=GRCh38 #reference_version=2020-A #target_panel_file_format=1.0 gene_id,bait_seq,bait_id ENSG00000000003,AGTTG[...]GCGTC,ENSG00000000003|TSPAN6|1 ENSG00000000003,CCCGT[...]GGCAA,ENSG00000000003|TSPAN6|2 ENSG00000000003,GGTGA[...]ACCTG,ENSG00000000003|TSPAN6|3 [ ... ]
The columns for this file are the following:
Column Name | Description |
---|---|
gene_id |
The Ensembl gene identifier associated with this bait. |
bait_seq |
The nucleotide sequence of the bait. |
bait_id |
The bait ID associated with this bait formatted as described above. |
The file also contains a number of required metadata fields in the header:
Metadata Field | Description |
---|---|
panel_name |
This is the name of the panel. |
panel_type |
Indicates whether this was a predesigned or custom panel. One of predesigned, custom_ |
reference_genome |
The genome build of cellranger reference baits were designed against. |
reference_version |
The version of the cellranger reference baits were designed against. |
target_panel_file_format |
The version of the target_panel file format specification this file conforms to. |
These metadata columns take the format:
#key=value
In general, we strongly recommend the use of target panel CSV files we have provided for download or those generated via the 10x Genomics Custom Panel Designer. If you do need to make your own target panel, you must provide entries for all metadata fields. The gene_id
column is required, whereas the bait_seq
and bait_id
columns are strongly recommended but optional. The bait_id
values do not need to conform to the format outlined above, but they are required to be unique to each bait.
A BED12-formatted file (12 columns detailed below) file containing the sequences and genomic coordinates for all baits in the target gene panel. Use this file to visualize the bait locations on genome browsers like IGV (Integrated Genomics Viewer) and the UCSC Genome Browser or to perform custom analyses.
The following is a portion of an example BED12 file:
chrX 100636685 100636805 ENSG00000000003|TSPAN6|1 0 - 100636685 100636694 0 1 120 0 chrX 100635704 100636685 ENSG00000000003|TSPAN6|2 0 - 100635704 100636685 0 2 42,78 0,903 chrX 100635584 100635704 ENSG00000000003|TSPAN6|3 0 - 100635584 100635704 0 1 120 0
The columns of BED12 files we provide are as follows (adapted from UCSC Genome Browser documentation):
Column Name | Description |
---|---|
chromosome |
Chromosome of the target gene. |
chromStart |
0-based start coordinate of the targeted sequence on the chromosome. |
chromEnd |
0-based non-inclusive end coordinate on the chromosome. |
name |
Bait ID as described above. |
score |
Set to 0 for all entries. |
strand |
+ or - to indicate the strand of the targeted gene. |
thickStart |
The starting position at which the feature is drawn as a thick line in browsers (matches display of the corresponding transcript region). |
thickEnd |
The ending position at which the feature is drawn as a thick line in browsers (matches display of the corresponding transcript region). |
itemRgb |
Set to 0 for all entries. |
blockCount |
The number of blocks (continuous intervals). |
blockSizes |
Comma-separated list of the block sizes, contains blockCount entries. |
blockStarts |
Comma-separated list of block starts relative to chromStart column, contains blockCount entries. |
BED12 format was chosen because it allows baits that span splice junctions to be conveniently represented on a single line and allows genome browsers to visualize links between regions of baits that are discontinuous in genomic space. Browsers such as UCSC Genome Browser or IGV will render BED12 files appropriately, appearing in a visually similar manner to how transcripts in the genome are displayed.
This format is also well-supported by command-line tools. For example, bedtools provides a -split
command-line flag for some subcommands to allow the individual blocks within each line of a BED12 file to be treated independently as needed. This can be useful for calculating intersections, for example, where you may be interested in intersections with the regions covered by the baits themselves rather than intersections with the entire genomic interval the bait coordinates span including intronic regions. bedtools
also provides the subcommand bed12tobed6 for conversion of BED12 files to BED6 format -- in the resulting file each bait would appear on multiple lines when spanning one or more splice junctions.
A TSV file that lists all the genes included in the final panel design along with additional metadata.
The following is a portion of an example gene metadata file:
ensembl_id gene_name alternate_symbols synonyms description mappability_flag total_baits ENSG00000121410 A1BG - A1B;ABG;GAB;HYST2477 alpha-1-B glycoprotein FALSE 29 ENSG00000268895 A1BG-AS1 - A1BG-AS;A1BGAS;NCRNA00181 A1BG antisense RNA 1 FALSE 27 ENSG00000148584 A1CF - ACF;ACF64;ACF65;APOBEC1CF;ASP APOBEC1 complementation factor FALSE 77 ENSG00000175899 A2M - A2MD;CPAMD5;FWP007;S863-7 alpha-2-macroglobulin FALSE 50 ENSG00000245105 A2M-AS1 - - A2M antisense RNA 1 FALSE 24
Columns contain the following information:
Column Name | Description |
---|---|
ensembl_id |
Ensembl ID for gene as used in the cellranger reference. |
gene_name |
Gene symbol/name for this gene as used in the cellranger reference. |
alternate_symbols |
Semi-colon separated list of alternate gene symbols provided by NCBI or - if there are none provided or not applicable. These are different from synonyms in that they are annotated as official symbols by NCBI, whereas synonyms are largely not. |
synonyms |
Semi-colon separated list of synonyms provided by NCBI or - if none provided by NCBI or not applicable. |
description |
Long-form description of this gene as provided by Ensembl or - if not applicable. |
mappability_flag |
TRUE/FALSE to indicate whether or not a gene has low mappability or low bait coverage near transcript ends due to filtering of repetitive sequences. |
total_baits |
The number of baits designed to target this gene. |
Files downloaded from 10x Genomics Custom Panel Designer for customized panels will additionally include the following columns:
Column Name | Description |
---|---|
user_input |
Original gene name entered by user or - if not applicable (e.g. a gene that was already on a predesigned panel when creating an add-on design). |
Use this Excel (.xlsx) file to order your custom or add-on panel from a compatible oligo provider. It contains the bait sequences for add-on genes and custom sequences in a commonly accepted format.
For custom panels containing custom sequence designs, a FASTA and GTF file are automatically generated and available for download. These files can be used with Cell Ranger’s mkref
command to make custom references (see documentation).