10x Genomics
Chromium Single Cell Gene Expression

Cell Ranger4.0, printed on 03/04/2025

Cell Ranger mkfastq

In this tutorial, you will learn how to:

Setup the Command for cellranger mkfastq
Run cellranger mkfastq
Explore the Output of cellranger mkfastq

Setup the Command for cellranger mkfastq

Once Cell Ranger is installed, you are ready to run the cellranger mkfastq pipeline. The cellranger mkfastq pipeline is a wrapper around Illumina’s bcl2fastq program for demultiplexing Illumina base call files (BCL). This means that bcl2fastq (version 2.20) is a dependency for this pipeline. The bcl2fastq program is not bundled with Cell Ranger. For guidance on installing bcl2fastq (version 2.20), see this Questions and Answers article.

First make a directory in the yard to run the analysis.

mkdir ~/yard/run_cellranger_mkfastq
cd ~/yard/run_cellranger_mkfastq

Now that you are in a directory, you are ready to build the command to run cellranger mkfastq. When running any pipeline for the first time, it is helpful to run the command with the --help option to print the usage statement. This shows the proper syntax of the command, what options are required and optional, and what other information is necessary to include.

cellranger mkfastq --help

The output looks similar to this:

/mnt/home/user.name/yard/apps/cellranger-3.1.0/cellranger-cs/3.1.0/bin
cellranger mkfastq (3.1.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------
Run Illumina demultiplexer on sample sheets that contain 10x-specific sample 
index sets, and generate 10x-specific quality metrics after the demultiplex.  
Any bcl2fastq argument works, except a few that are set by the pipeline 
to ensure proper trimming and sample indexing. The FASTQ output generated 
is the same as when running bcl2fastq directly.

...
Usage:
    cellranger mkfastq --run=PATH [options]
    cellranger mkfastq -h | --help | --version

Required:
    --run=PATH              Path of Illumina BCL run folder.

There are three arguments or inputs that are added to the cellranger mkfastq command: –-id, --run, and --csv.

The --id can be anything. It is used by the pipeline to name the output directory that Cell Ranger is going to create to run in. This directory is called a pipestance, which is short for pipeline instance.

The --run argument points to the Illumina run folder that contains the BCL files.

The --csv argument is a comma-separated values (CSV) file that describes how samples were indexed on the Illumina flow cell.

Note on CSV files

The CSV file must be formatted as a text file, Unicode (UTF-8) (no BOM) with Unix line-feed (LF) end-of-line markers. Sometimes Microsoft Windows or Mac OS add extra characters to the ends of lines or text encoding. To avoid parsing errors in the pipeline software, text editing programs, such as nano or TextWrangler (Mac OS), are better for creating sample sheets than Microsoft Excel.

Download Data

For this tutorial, we use the tiny BCL data set described here. We use the wget command to download the Illumina run directory and the simple sample sheet.

wget https://cf.10xgenomics.com/supp/cell-exp/cellranger-tiny-bcl-1.2.0.tar.gz
wget https://cf.10xgenomics.com/supp/cell-exp/cellranger-tiny-bcl-simple-1.2.0.csv
tar -zxvf cellranger-tiny-bcl-1.2.0.tar.gz

Before building the cellranger mkfastq command, look closer at these files. Use the command cat (for "concatenate") to print the sample sheet to the screen.

cat cellranger-tiny-bcl-simple-1.2.0.csv

The output looks like this:

Lane,Sample,Index
1,test_sample,SI-P03-C9

This file tells the pipeline that the library "test_sample" was sequenced on lane 1 of the flow cell and indexed using the SI-P03-C9 index set. The Sample column is used as the prefix for naming output files. This prefix also serves as the sample id in later Cell Ranger pipelines. In this case the prefix/sample id is test_sample.

The run directory has the following directory structure. Use the tree command to print the directory structure to the screen.

tree -L 2 cellranger-tiny-bcl-1.2.0/

The output will look something like this:

cellranger-tiny-bcl-1.2.0
├── Config
│   ├── h35kcbcxy_Oct-20-16\ 15-51-48_Effective.cfg
│   ├── HiSeqControlSoftware.Options.cfg
│   ├── RTAStart.bat
│   └── Variability_HiSeq_C.bin
├── Data
│   └── Intensities
├── InterOp
│   ├── ControlMetricsOut.bin
│   ├── CorrectedIntMetricsOut.bin
│   ├── ErrorMetricsOut.bin
│   ├── ExtractionMetricsOut.bin
│   ├── ImageMetricsOut.bin
│   ├── QMetricsOut.bin
│   └── TileMetricsOut.bin
├── RTAComplete.txt
├── RunInfo.xml
└── runParameters.xml

Your data may look a little different than this depending on the sequencing instrument used. Looking at the first few levels of directories is a good way to check for completeness. If you do not see the RTAComplete.txt, RunInfo.xml, and runParameters.xml files, then it is likely that the pipeline will fail. If any of these files are missing, contact your sequencing provider and ask for the complete run directory.

If tree is not installed on your system, you can use ls to explore the run directory.

ls -altR cellranger-tiny-bcl-1.2.0/

Run cellranger mkfastq

Next build the command line using the paths to these data and run it.

cellranger mkfastq --id=tutorial_walk_through \
--run=/mnt/home/user.name/yard/run_cellrnager_mkfastq/cellranger-tiny-bcl-1.2.0 \
--csv=/mnt/home/user.name/yard/run_cellrnager_mkfastq/cellranger-tiny-bcl-simple-1.2.0.csv

The output looks similar to this:

/mnt/home/user.name/opt/yard/apps/cellranger-3.1.0/cellranger-cs/3.1.0/bin
cellranger mkfastq (3.1.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Martian Runtime - '3.1.0-v3.2.3'
...
Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!

2019-09-12 15:05:08 Shutting down.
Saving pipestance info to "tutorial_walk_through/tutorial_walk_through.mri.tgz"

Explore the Output of cellranger mkfastq

Run times vary based on the system resources, but it shouldn’t take more than a few minutes. Now go to the fastq_path directory.

cd /mnt/home/user.name/yard/run_cellrnager_mkfastq/tutorial_walk_through/outs/fastq_path
ls -1

The output looks similar to this:

H35KCBCXY
Reports
Stats
Undetermined_S0_L001_I1_001.fastq.gz
Undetermined_S0_L001_R1_001.fastq.gz
Undetermined_S0_L001_R2_001.fastq.gz

The Undetermined FASTQ files here at this level contain sequences that were unable to be assigned to valid index.

Demultiplexed FASTQ files with valid sequencing indices are found under the directory named after the flow cell id, in this case 'H35KCBCXY'.

ls -1 H35KCBCXY/test_sample

The output looks similar to this:

test_sample_S1_L001_I1_001.fastq.gz
test_sample_S1_L001_R1_001.fastq.gz
test_sample_S1_L001_R2_001.fastq.gz

These outputs from the cellranger mkfastq pipeline serve as input to the cellranger count pipeline described in Running cellranger count.

Cell Ranger

Loupe

10x Genomics
Chromium Single Cell Gene Expression

Cell Ranger mkfastq

Setup the Command for cellranger mkfastq

Note on CSV files

Download Data

Run cellranger mkfastq

Explore the Output of cellranger mkfastq

Other Tutorials in this Series

About

Legal Notices

Resources

Headquarters

Social

Cell Ranger

Loupe

10x GenomicsChromium Single Cell Gene Expression

Cell Ranger mkfastq

Setup the Command for cellranger mkfastq

Note on CSV files

Download Data

Run cellranger mkfastq

Explore the Output of cellranger mkfastq

Other Tutorials in this Series

10x Genomics
Chromium Single Cell Gene Expression