Cell Ranger ARC2.0, printed on 11/05/2024
Peaks are mapped to gene based on the genomic location of the nearby gene. The general principle is as follows:
The annotation procedure is as follows:
The output file of peak annotation is peak_annotation.tsv. It has the following format:
Column Number | Name | Description |
---|---|---|
1 | chrom | Contig that contains the peak |
2 | start | Peak start location |
3 | end | Peak end location |
4 | gene | Gene symbol based on the gene annotation in the reference. |
5 | distance | Distance of peak from TSS of gene. Positive distance means the start of the peak is downstream of the position of the TSS, whereas negative distance means the end of the peak is upstream of the TSS. Zero distance means the peak overlaps with the TSS or the peak overlaps with the transcript body of the gene. |
6 | peak_type | Can be "promoter", "distal" or "intergenic". |
Below is an example of a subsection of a peak_annotation.tsv. Each row represents one annotation assigned to one peak. Note that the same peak can be annotated with multiple genes and these entries appear on successive lines. This happens when a peak is proximal to multiple genes and we do not have a way to disambiguate. For example, the peak chr14:77786487-77786973 is annotated as being a candidate promoter peak for GSTZ1 or a candidate distal regulator of POMT2.
chrom start end gene distance peak_type ... chr14 77769877 77770568 POMT2 16659 distal chr14 77781976 77782953 POMT2 4274 distal chr14 77781976 77782953 GSTZ1 -4274 distal chr14 77786487 77786973 POMT2 254 distal chr14 77786487 77786973 GSTZ1 -254 promoter chr14 77787130 77787963 POMT2 0 promoter chr14 77787130 77787963 GSTZ1 0 promoter chr14 77843033 77843952 TMED8 0 promoter ...