CITE-seq, or Cellular Indexing of Transcriptomes and Epitopes by Sequencing

CITE-seq, or Cellular Indexing of Transcriptomes and Epitopes by sequencing [1], is one of the latest innovations for studying single-cell biology. It enables researchers to simultaneously capture RNA and surface protein expression on the same cells with next generation sequencing technology. Scientists can correlate the two data types and identify biomarkers and better characterize cell phenotypes [2]. The normalization and comparison of the two data types, however, do present data analysis challenges. We want to make CITE-seq data analysis easier and help you discover more.

Here’s an approach that utilizes the Cytobank platform and should help you effectively analyze your multi-omic CITE-seq data.

This three-step workflow is a general approach to analyze CITE-seq data, integrating a third-party software, a Cytobank-developed R script, and the existing Cytobank platform.

  1. A sequencing mapper maps raw gene sequencing reads to a reference genome (such as GRCh38 or GRCh37) and informs gene expression based on the location of the mapped sequencing reads. We recommend using Cell Ranger to align data from 10X Genomics. For alignment of other types of single-cell RNA-seq data, you can use STAR. CITE-seq-Count can then process the antibody reads contained in the CITE-seq data and produce the raw antibody expression count matrix.
  2. Normalize and filter the raw gene and antibody expression data using an R script we developed. The script will automatically apply several data QC filters to remove noise from the raw data and normalize the filtered data to correct for the sequencing depth bias. The script will also combine gene and antibody expression and output the merged expression file per sample.
  3. Upload the processed data into Cytobank via DROP and start to run Cytobank machine learning algorithms (viSNE, FlowSOM, and CITRUS) to unpack the enriched information encoded in your CITE-seq data. See below for more details. 

Explore Underlying Data Patterns with CITE-seq + viSNE

For a quick view of your CITE-seq data, you can run viSNE with the filtered and normalized single-cell RNA-Seq gene expression data. Using an example data set downloaded from the Satija Lab, we found that viSNE nicely separates cell events based on the CITE-seq gene expression data (Figure 2).

 CITE-seq data renders well defined populations in viSNE

Figure 1. CITE-seq data renders well-defined populations. Rather than using all the genes, we clustered cells with the top 700 most variable genes to effectively uncover the underlying data pattern of a cord blood mononuclear cells sample data.

A Closer Look at Antibody Expression on Gene Clusters

To confirm the gene expression clustering result, you can overlay the antibody expression on the gene clusters (Figure 2).


 CITE-seq expression level indications by viSNE coloring

Figure 2. Overlay Gene and Protein Expression. Red and yellow dots are events that have high expression of the surface protein indicated on the tSNE plot header.

On the Cytobank platform, you also can visualize surface protein expression across the identified cell populations with a heatmap (Figure 3).


Cytobank heatmap showing the expression level of multiple surface proteins by cell type

Figure 3. Heatmap Visualization. Heatmap shows the expression of multiple surface proteins per cell type.

You also can look at the expression of surface proteins and corresponding protein-coding genes together (Figure 4). 

Cytobank heatmap showing correlation of gene and protein expression 

Figure 4. Heatmap Correlating Gene and Protein Expression. CD3D and CD3E are protein-coding genes of CD3. FCGR3A is the coding gene of CD16. CD8 has CD8B and CD8A as its corresponding protein-coding genes. The heatmap demonstrates that the gene expression corresponds with protein expression across the identified cell population.

You can examine individual surface proteins to find out how well the protein expression correlates with the gene expression data by creating an overlaid dot plot in the Cytobank Working Illustration (Figure 5). In this example, expression of CD3 is positively correlated with the expression of the CD3E gene even though there is a cluster of cells that have low gene expression and high antibody expression.

CITE-seq data correlated with protein expression data

Figure 5. Dot Plot Correlating Gene and Protein Expression. The x-axis of the above dot plot represents protein expression. The y-axis of the plot is gene expression.


CITE-seq is a powerful technology that allows you to simultaneously look at intracellular gene expression and extracellular protein marker expression at a single-cell level of resolution. This workflow allows you to conduct an integrated analysis on the Cytobank platform. This is the first step for Cytobank in this area, and we are interested in developing additional workflows for analyzing CITE-seq data and appreciate your feedback regarding what would be helpful.

For Research Use Only.  Not for use in diagnostic procedures.