1. Introduction

1.1. About the software

BioTuring Browser, or BBrowser, is a desktop application that performs analyses on sequencing data. The software is also connected to a database hosting sequencing data from the latest publications. Users can use BBrowser to analyze their own data or analyze the public data available.

The software allows scientists, even ones without programming experience, to quickly investigate massive amounts of sequencing data from in-house and published work and compare them together. All data submitted by users and data downloaded from the BBrowser database is stored and secured on the local computer.

The application was first released in October of 2018, running on Windows, macOS, and Ubuntu.

1.2. System requirements

Operating systems

Network

Hard Drive

Memory (RAM)

Notice

The application is only available on Ubuntu 18.04 (xenial xerus).

After downloading the .deb package, you can either install the software by the graphical user interface or the command-line interface as following:

Graphical User Interface:

Command Line Interface:

Please note that the software may open a local web socket to execute R commands for certain analyses. Internal tcp permissions for port 9004 need to be available for the software to run properly.

To install BBrowser on MacOS, after downloading the installation package:

There are 2 options for running BBrowser on Windows: portable version or installed version.

Portable version

The portable version does not require any installation and is only available for Windows. Users who do not have the necessary privileges to modify the system registry can download this version to use instantly. There is no difference between the portable version and the installed version of the BBrowser in terms of the interface and functionality.

The portable version is provided in a zipped file. After downloading, you need to unzip the file and double-click on it to run the software.

Alternatively, to launch BBrowser, users can run the executable binary, BBrowser2.exe, located in a folder called BBrowser2-win32-x64.

If you want to move the software storage, you must move the entire BBrowser2-win32-x64 folder. Modifying or removing any files in this folder may cause BBrowser to stop working. In either case, your single-cell data is not affected as it is stored in a different location.

Installation

If you want to install BBrowser to your computer, download the installer .exe file and run it with administrator permission. An installation window will guide you through the installation.

Although the program can be installed anywhere on your computer, we highly recommend you putting it in the usual Program Files folder. However, this action will require the administrator’ s permission.

If your computer has more than one account using the software, each account can only access its own data.

To install BBrowser on Centos 7, first, you need to install some dependencies:

yum install libgfortran libXScrnSaver

Then, use the following command to install BioTuring Browser:

rpm -iU BBrowser-xxx.x86_64.rpm

Please replace “xxx” with the version that you downloaded. Installation of the software and its dependencies may require root access. After installing, BBrowser can be found in Applications > Accessories

3. Program Interface

3.1. Login page

BBrowser Login page appears when you open the software for the first time. Once the software successfully records your BioTuring account, it will automatically log in the next time you start BBrowser.

Please enter your credentials and claim your academic or non-academic status to access the different sets of features, then Enter or click on Login.

Log in credentials are encrypted and stored individually for each user if multiple users are using the same computer.

If you are using a network with proxy, please configure Proxy settings at this point, before any connection to BioTuring server is made. The software needs to have correct proxy settings in order to connect to our server and verify your credentials, as well as to get access to our public database.

3.2. Home page

BBrowser Home page shows you all data that you can download to the local computer, including public data from BioTuring server and data from your remote repositories.

From BBrowser, the Home page offers another feature for you to look for gene expression levels across studies in the BioTuring database.

3.3. Data page

BBrowser Data page shows you all data that you have downloaded or submitted to the local computer. You can refer to this page as your local database. You also need to go here to submit a new dataset.

3.4. Settings page

BBrowser Settings page helps you to:

3.5. Analysis dashboard

When you click on a study on Data page or click on Explore a study from Home page, that dataset will open in the Analysis dashboard.

Here is where you can visualize the data and perform all the analyses.

The main visualization is a scatter plot of dimensionality reduction, with each point representing a single cell. Cell color, size, and shape change when you run different analyses. The scatter plot is interactive, allowing you to zoom, move, rotate (in 3D mode), or select cells.

Inside the main visualization window are some function boxes:

On the right of the main visualization window are main function tabs

There are 4 tabs here, each comes in a small window which can be expanded or collapse. These tabs either give you more insights about the data or provide additional visualization, which are:

At the bottom of function tabs are information about study input/ output and visualization and analysis settings.

The other 2 interfaces: Sub-clustering dashboard and differential expression dashboard will be described in their specific section.

If you need help while doing analysis, press Alt (on Windows) or hover your mouse to the top left of the screen (on macOS) and click on Help to view our tutorials or to contact us.

4. BioTuring public database

Massive amounts of single-cell RNA sequencing data generated have opened avenues for exploration, yet also brought up new challenges to standardize data formats, systematically access transcription profiles of cell types across studies and integrate multiple datasets.

Hence, in BioTuring Browser, we have indexed published single-cell RNA sequencing data from multiple formats to our platform to remove that barrier. All data are processed and annotated to be instantly accessed and explored in a uniform visualization and analytics interface.

In addition to that, we have developed our set of marker genes for over 200 cell types and use that gene list to verify the author’s annotations and re-label the cell types to BioTuring cell ontology to systemize cell types available in our database.

Users can also query a single or multiple gene expression across all datasets in the database and see how the genes expressed in different clusters without downloading any dataset.

The section below explains how we index published data and how the gene query across the database works.

4.1. Curation method

Step 1. Data collection

Single-cell gene expression matrices or Seurat/Scanpy objects are obtained from the author or public repositories. If Seurat or Scanpy objects are available, we will reserve the analysis results and move to the annotation step (6).

Step 2. Filtering and normalization

Cells and genes from the submitted matrices are filtered to avoid drop-out, doublets, and apoptotic cells. Data are then subjected to log normalization and highly variable genes selection. QC criteria are subject to the authors’ descriptions.

In case details of the filtering and normalization are not available, we will process the data by ourselves to get the most similar results with the publication.

Step 3. Batch effect correction

We follow the methods used in each study. If not provided, we will apply CCA correction.

Step 4. Dimensionality reduction and clustering

We use the first 30 components of PCA to calculate 2D and 3D t-SNE or UMAP, the parameters of which are taken from the author’ descriptions.

Step 5. Clustering

The dataset will go through both graph-based clustering by the igraph package (Csardi and Nepusz, 2006) ⁠ and k-means clustering (Neter et al., 1998).

Step 6. Annotation and standardization of cell type labels

Cell type annotation matrices are obtained from authors and loaded in BioTuring Browser, together with metadata of the experimental design. We then manually verify cell type annotations using known markers and unify the terminology based on our internal cell ontology.

If annotation and metadata are not available, we will extract information directly from the publications.

4.2. List of studies

Users of BBrowser can view all studies in the public database when opening the Home page of the software. You can also access the list of studies available in BioTuring website: https://bioturing.com/bbrowser/datasets

We select the studies to index based on the needs of our users and community.

If you have a study of interest and would want it to be indexed by BioTuring team, please contact us at support@bioturing.com

If you are an author, we are very happy to distribute your data on BBrowser for public access. Please also contact us at support@bioturing.com

4.3. Query gene expression across the database

Since version 2.1.3, we introduced a special search engine to help you look at one gene or multiple gene expression across every public dataset of BBrowser. Without downloading anything from the server, the gene search engine lets you skim through a huge amount of information in the most efficient way.

You can find the gene search engine in Home page > Search genes tab

If you search for one gene, the result of this search engine is a series of violin plots, each of which is the gene expression in a public dataset of BBrowser. On the plot:

All violin plots are interactive. You can hover your mouse over the plot to get the statistics (e.g. quantiles, median, mean, etc.), or drag to enlarge an area of the plot. Double click on any part of the plot will bring it back to the original setting.

On the top right of each dataset, there is a horizontal bar telling the percentage of cells that express the gene. The search result is sorted descending based on this number.

Information about the study and option to Download are the same as in the Search studies tab.

 

If you search for multiple genes, the results will be a series of heatmaps, each of which is from one dataset. Each heatmap shows:

You can get a dataset on BBrowser by downloading it from BioTuring server or from your internal server or by importing the data from your local computer.

Currently, BBrowser supports analyzing data from human (Homo sapiens) and mouse (Mus musculus). If you input data of a species rather than those, the software can still process the data (except transcript quantification step). However, some features that are related to gene information will be disabled, such as gene-set enrichment analysis and gene functional reminder.

BioTuring Browser hosts a public database of published studies that are selected, processed, verified and uniformly labeled by the BioTuring team. You can view the list of studies in this database in the BBrowser Home page.

To download a study from BBrowser public database, you need to be connected to the internet and follow these steps.

Search studies

If you want to search your studies through general terms like its title, authors and other keywords:

Advanced filtering:


Search genes

This type of searching allows looking for studies that express your gene(s) of interest. Go to Home page > Search genes.

Download the data

Notice:

5.2. Import FASTQ file

To import a single-cell RNA sequencing study with raw data, you need to provide a folder containing all your FASTQ files.

Notice:

5.3. Import Expression matrix (MTX, TSV, CSV)

BBrowser supports importing expression matrices as MTX, TSV, and CSV files with integer counts.

The expression matrix files can be unzipped or zipped in gzip.

5.3.1. Import MTX file(s)

To import a study by single or multiple MTX files, you need to provide a folder with exactly 3 files:

When multiple folders containing data from multiple batches are submitted, options for selecting batch correction methods will be available.

If multiple folders were submitted, in the Analysis dashboard you will find the input metadata classification with the name of clusters are input folders’ names. This helps you visualize (colored and shaped) the cells based on which batch they come from.

The three files barcodes.tsv, features.tsv (or genes.tsv), and matrix.mtx are the standard files from 10X CellRanger. Below, we describe some more details of the data format that will affect the analysis.

5.3.2. Import TSV, CSV file(s)

To import a study by single or multiple CSV/ TSV files:

A .tsv or .csv files are simply a table in which values are separated by a delimiter. It can be a tab (in .tsv) or a comma (in .csv). If you use a table editor, such as Excel, Libre, or Google Sheet, it always can export your table into either .csv or .tsv format.

BBrowser requires a strict format in order to parse the information correctly. Please make sure that the first column of the table has the gene names / Ensembl identifiers, and the first row of the table has the barcodes.

For users who want to export a matrix using R, please be careful because writing a matrix in R may lose one first cell of the first row. For example, given a matrix object having 1000 rows and 500 columns:

> str(m)
num [1:1000, 1:500] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:1000] "ENSG1111111111" "ENSG1111111112" "ENSG1111111113" "ENSG1111111114" ...
..$ : chr [1:500] "CTGGTCCGGTGTTATCAG" "TTACTGGGACGACTCGGG" "ACGAGGAGACCCGAGATA" "CTTTGCAGTAGGGGCAAC" …

Write a .csv file in this way will lose the first cell. The first row of the file will only contain 500 values while other rows will be 501:

write.table(m, 'matrix.csv', sep=",", col.names=T, row.names=T)

Please use this command instead. It is much easier:

write.csv(m, ‘matrix.csv’)

For .tsv file, the best way is to use the common write.table, then manual insert one tab on the beginning of the first row.

5.4. Import Seurat, scanpy object

BBrowser supports importing processed scRNA-seq and CITE-seq data by Seurat (.rds) and Scanpy objects (.h5ad/ h5) with integer counts (raw counts or rounded counts).

Quality control parameters and the dimensionality reduction method are not needed because these steps have been done on the Seurat/ scanpy object.

A Seurat or scanpy object must contain an expression matrix with information on barcodes and genes. BBrowser can also adopt some analysis results in the object. These results include, but are not limited to:

Upon receiving the Seurat or Scanpy object, BBrowser will read all data available and runs analyses to get the missing information.

Seurat object

BBrowser is able to read a Seurat object stored in .rds format. To create a .rds file from Seurat, you can use the saveRDS function in R. We will not go into detail about the structure since the software does not require any specific modification of the original Seurat structure. The most critical information in each object is the count matrix, which should be store in @assays$RNA@counts for gene expression data and @assays$ADT@counts for antibody captured data.

Scanpy object

For users who analyze with Python via the scanpy library, the final AnnData class should be stored in .h5/.h5ad format using the .write function within the class itself. Unfortunately, hdf5 is too general and there are many variations of the structure in which the information is recorded. BBrowser expects the following structure:

Notice:

6. Data analysis pipeline

We are fully aware that different datasets were generated under different experimental designs and may have to be treated uniquely in order to represent all biological variations in the samples and for public studies, to reproduce the published results in the most faithful way. That is also the long-term plan for BioTuring Browser to maintain the speed and ease of use, while enhancing the flexibility of the analyses. All public datasets and imported data underwent the same pipeline, separate steps of which will be discussed in this section.

6.1. Transcript quantification

Transcript quantification is only applied when you create a new study with raw sequencing files (FASTQ). The process is run by Hera-T (version 1.2.0) (Tran et al. 2018), a new algorithm developed by BioTuring team. This is applied to data generated by 10X protocol on Chromium v2 and v3. The processing speed is up to 10 - 100 times faster than CellRanger 3.0 with better accuracy (Tran et al. 2018). The output of transcript quantification is an expression matrix in MTX file format and the file will be submitted for further processing steps below.

6.2. Quality control

The process from quality control to dimensionality reduction is applied to public and in-house datasets imported in MTX, TSV or CSV files.

Quality control filters out poor-quality cells in terms of gene expression and redundant non-expressed genes in the data.

In public datasets without a detailed processing script from the author, genes having at least 1 UMI count in less than 3 cells are excluded. Then, cells with less than 200 genes having at least 1 UMI count and more than 5% of mitochondria genes are excluded. The process creates a new expression matrix that may have fewer cells than the original data, and BBrowser only takes the cells and genes of this filtered matrix for the next processing steps.

For in-house data, BBrowser allows users to define the cut-off for quality control or to skip any filtering steps. In the data import pop-up, you can

 6.3. Batch effect removal

This process is applied when multiple MTX, TSV or CSV files are submitted, usually from multiple batches of sample preparation and sequencing. The software considers each file as a batch and will try to scale all batches with the chosen method

Currently, we provide 3 methods to remove batch effects for your preference:

6.4. Dimensionality reduction

On BBrowser, you can choose to run dimensionality reduction by t-SNE or UMAP.

t-SNE (Maaten and Hinton 2008):

The analysis is done by the Rtsne package (Krijthe 2015). The default perplexity for t-SNE is set at 30

UMAP (McInnes and Healy 2018):

The analysis is done by the uwot package (Melville 2018) ⁠. The number of neighbors is set at 30.

6.5. Clustering methods

This analysis runs on the PCA results. For every dataset, the software will calculate both louvain (graph-base) and k-means clustering.

6.6. Finding marker genes

BBrowser uses a non-parametric approach, called Venice, to detect marker genes. It is an open-source algorithm and can effectively run on a large amount of data while the accuracy is outperform other methods (Hy et al. 2019).

We first defined marker genes of a group of cells in a data set as the genes that can be used to distinguish such cells from the rest. From this idea, we used the accuracy of classification as a metric to score the significance of a marker gene.

Considering each gene separately, we denote a cell as  where  is the label of a group of cells.  if the cell is in the group of interest (group 1 - the group that we want to find the marker genes for).  if the cell is not in the group of interest (group 2 - the rest of the data). We denote  as the complement group of .

The probability for a cell being in group , given its expression level  is:

In most of the cases, the group of interest is much smaller than the rest of the data and can generate a sampling bias. To avoid this bias of sample size, we set:

Hence:


Accuracy of the classifier is:


The accuracy of prediction is:

Intuitively, For the robustness of the calculation, we divide the expression into  intervals:


Where is the number of cells of group  in group , and  is the number of cells in group . For each gene, we can estimate the accuracy measure for using this gene to predict cells inside or outside the cluster and use this as a metric for ranking the marker genes.

We tested Venice on both real and simulated datasets. The benchmark considered the performance on 2 different sequencing technologies (full-lenght and UMI count), 4 different kinds of marker genes (including transitional genes), and 2 different kinds of null genes. Venice exhibited the best performance and accuracy in all cases. It could effectively detect different types of marker genes and avoid false-positive results while keeping a modest running time.

Venice is also incorporated in Signac, a single-cell analytics package developed by BioTuring. The package is available at https://www.github.com/bioturing/signac

6.7. Gene set enrichment analysis

This analysis is adopted from the GSEA method (Subramanian et al. 2005)⁠, a common analysis for selecting potential biological terms given a sorted list of genes. The software performs GSEA on 4 different terms: biological process, molecular function, cellular component, and biological pathway. The first 3 terms are from the gene ontology (Consortium 2004), and the last one is from the reactome database (Joshi-Tope et al. 2005)⁠.

Enrichment analysis can be found in both the Analysis dashboard and the Differential expression dashboard

6.8. Cell-type prediction

This feature shows you the suggested cell-type for a group of cells. When a user does a selection by clicking a cluster/annotation or using the Select cell tool, the software picks genes that express in at least 35% of the group. This process does not select from the whole transcriptome, but instead on a list of cell-type markers in our curated knowledge base. Then, it takes that gene profile to estimate the correlation with the cell-types profile. A cut-off of 0.5 is applied to remove non-potential candidates. The remaining cell types will undergo and tree search to find the common parents. Parents which have less weight (e.g. distinct from the rest) are removed. This process is repeated until only one cell type left. The whole analysis usually takes 1-3 seconds to finish, hence, it triggered automatically.

6.9. Differential expression analysis

 BBrowser supports finding the differential expressed genes between two groups of cells, with each group must have at least 3 cells. It finds differentially expressed genes using Venice, the same method for finding marker genes. Users can switch to edgeR, a more common method but takes at least 5 times longer.

For the log2FC value of each gene, we use the same method of the Seurat package (Gribov et al. 2010). Below is the detail formula:

Notations:


7. Adjusting data visualization

7.1. Visualization methods

Depending on the data available in your study, you can choose between several visualization methods:

By default, the main plot is calculated by gene expression. It can be t-SNE or UMAP subject to the method chosen during the pre-processing step. To check if the current visualization is t-SNE or UMAP, go to Settings > Analysis > Dimensionality reduction method > Apply and you can switch between t-SNE and UMAP.

To switch from an RNA-based plot to a protein-based plot or feature plot, go to the dropdown box next to Clonotype at the bottom of the screen.

To generate a feature plot, type in your gene/ protein of interest for the X and Y axes. Both axes must be either genes or proteins.

7.2. Interactive 2D - 3D plot

t-SNE/ UMAP of gene expression can be view in 2D or 3D, while other plots are set as 2D.

You can interact with the plot by zoom in/ zoom out, switch between 2D and 3D, move and rotate the plot and reset it to the original state.

On the bottom right corner of the scatter plot, there are several buttons that control the visualization as well as how a user can define a selection.

●   Reset: this button reset the scatter plot to the original state without any selection and cells are colored by the last clustering factor/annotation used.

●   Pencil tool (lasso selection): this button activates the free selection mode.

●   Hand tool: this button activates the navigation mode: moving and rotating the plot, as well as whole cluster selection.

●   2D / 3D: these buttons help you switch between 2-D and 3-D scatter plot. Rotation is only enabled for 3D plot. For Seurat/ scanpy object calculated for dimensionality reduction in 2D but not 3D coordinate, BBrowser can calculate the 3D coordinate based on PCA results and vice versa.

●    Zoom (plus/minus): these buttons help you zoom in and out. The point size of the scatter plot remains unchanged when zooming. Alternatively, you can use your mouse wheel to zoom.

●   Download: Screencap of the current scatter plot and cluster labeling and export as an image or data

7.3. Customize the plot

Users can customize the theme, point size, transparency and color palette of the main plot.

Options for altering the scatter plot appearance includes:

7.4. Color by

Color by tab helps you to color the cells to your preference. Users will decide the group of clusters they would want to visualize, hence, changing the way cells are colored and filtering with their conditions.

The software offers various classification methods: unbiased graph-based clustering, k-means clustering, classification by input metadata, or by your own definition and annotation. You can import your annotation matrix from a file in Color by tab.

Color by tab is always activated.

8. Query gene or protein expression

8.1. For a single gene or protein

To see how a gene or protein is expressed in the given dataset, you can type the gene/ protein name or its Ensembl ID or alias into the gene/protein query box at the top right corner of the scatter plot and Enter.

Upon querying a gene/ protein, BBrowser provides two ways to visualize its expression

Export options:

Notice:

In case the expression values are stretched in a large range, you can choose to visualize from 5th to 95th percentile of the data to eliminate outliner points.


Image exported when querying a gene expression showing all expression values (left) or top 5th-to-95th-percentile of expression values (right)

 

8.2. For two genes or proteins

8.3. For multiple genes or proteins

BBrowser supports viewing the expression of multiple genes or proteins, in the scatter plot or in a heatmap.

By default, the heatmap shows the Z-scores of gene expression / protein expression measurements across the clusters. You can also click on the Settings icon at the top left corner to use expression values to draw the heatmap.

8.4. Gene gallery

Gene gallery allows you to save a screenshot of your gene(s)/protein(s) of interest for reviewing later.

9. Select a cell population

For other analyses: add annotation, view compositional breakdown, finding marker genes and differential expression analysis, etc., you first need to select the cells. Cells that are selected will be colored in white.

9.1. Hand tool & Pencil tool

The most common way to select cells is by using the Hand tool and Pencil tool.

Pencil tool (above) and Hand tool (below)

Using pencil tool to select a cell population

9.2. Color by

You can also select cells that are already clustered/ annotated from the Color by tab

To select cells in one cluster:

9.3. Select cells by gene expression

You can select cells that shared the same expression level of one given gene:

9.4 Select cells based on multiple conditions (Advanced filter)

From BBrowser 2.4.36, you can filter cells by multiple conditions using the Advanced Filter tab, combining two methods: filter by expression and filter by metadata. BBrowers will display the groups of desired cells from your filter.

For this example, we will show how to select CD8+ T cells from the Responder group of Sade-Feldman et al 2018 (combining 3 conditions: CD3D positive, CD8 positive, belonging to Responders).

If you want to filter cells from a group of annotation, select Filter by metadata.

If you want to add a gene expression cut-off, select Filter by expression

10. Cell type prediction tool

BBrowser cell-type prediction tool takes a list of marker genes defined by the users as the reference and evaluates the expression of all those marker genes in the selected population to predict the cell-type. Whenever a cell population is selected, the process will automatically be done. The cell type prediction result will appear in the infobox on the top left corner of the scatter plot. It includes the cell type name and the marker genes’ information.

By default, cell type prediction is applied only to data with less than 50,000 cells due to the long processing time needed for a large dataset. You can enable the function for large data by increasing the cell number limit in Settings > Analysis > Cell-type prediction limit.

11. Cell search (beta version)

(Updating....)

12. Find marker genes and enriched processes

Finding marker genes and enriched processes in a group of cells helps you to see the genes and processes that are differently expressed in that selected group, compared to the rest of the cell population. The information is essential to define which cell type the cluster belongs to. To run the analysis:

Details on the marker genes and enrichment analysis include:

 Marker genes:

Enrichment analysis:



13. Add an annotation

13.1. Add an annotation

You can add multiple annotations to a cell, regarding cell type, subtype, expression level of a gene or set of genes or clonotype, etc. There are 2 ways to add an annotation:

For each annotation, you need to put in Group name as the name of the classification (cell type, sub-type, T cell sub-type …) and cluster name is the name of the cluster (macrophage, microglia, COL1A4+ fibroblast, …).

To import an annotation matrix by a file:

To manually annotate each cluster:

Fill in Group name and cluster name to create a new group and cluster or choose an existing group and cluster to add the selected cells to that cluster.

Click OK to implement.

 

13.2. Edit an annotation

After an annotation is added, you can edit it by changing name, merging 2 clusters together, delete the cluster or the whole group.

14. Study cellular composition

BBrowser supports cellular composition analysis for any group of cells, whether annotated or not annotated. Users will define the group of cells they want to view composition and the type of classification. The software will identify the percentage of each cluster from the chosen classification in the group of selected cells and sort the clusters by order of majority.

For standard function

For Normalized composition

We recommend the Normalized by total tool for reducing the bias of unequal distribution affected by unbalanced sample sizes.

See below for an example. The total number of cells from disease doubles that from non-disease. Therefore, if you want to discover the percentages of disease and non-disease in Macrophage, it is more likely that the percentage of disease will be dominating. That will create the biased proportion of macrophage cells from two groups. After using Normalized by total tool, the result will show the cell composition without the bias coming from sample size.

The normalizing formulation is as follows:

Total number of cells from group 1 (Disease): A; 

Total number of cells from group 2 (Non-disease): B; 

Number of cells from group 1 (Disease) in the selected population (Macrophage): a

Number of cells from group 2 (Non-disease) in the selected population (Macrophage): b

Normalized percentage of group 1 in the selected population:

(a/A)/(a/A+b/B)

15. Differential expression (DE) analysis

Performing differential expression analysis on any given two clusters will help you to find out the genes that cause differences between 2 clusters and processes associated with them.

15.1. Running DE in the Composition panel

You can run DE analysis on 2 clusters in the same annotation in the Composition panel

15.2. Running DE in Differential Expression panel

You can run DE analysis on any 2 selected groups of cells.

BBrowser offers 6 methods to run DE analysis: our in-house algorithm Venice and 6 differential expression analysis algorithms from Seurat package –Wilcoxon, Likelihood-ratio test, T-test, Poisson, and Logistic regression.

To choose a method, go to Settings > Analysis > Differential expression analysis.

15.3. The DE analysis dashboard

After you run the DE analysis on two clusters of interest, the software will proceed to the DE dashboard, showing differentially expressed genes by a volcano plot of all genes, a box plot of a single gene expression, a table of genes and enriched processes, and a scatter plot of cells in two clusters

If you click on a gene on the volcano plot or the table, the scatter plot will show the selected gene’s expression. You can also query a specific gene expression by filling the gene name in the top right box.

15.4. Save and view previous DE analysis results

DE analysis results are automatically saved right after you run it, so you do not have to perform the analysis again in the future. To review the DE analysis result, click on the Differential Expression panel > View previous results.

You can edit the name or delete the analysis by clicking at the top right corner of it, click on Save/ Confirm to save the change.

16. Sub-clustering

Sub-clustering is an advanced feature that takes out a group of cells and treats them as a new set of data. The software will calculate new principal components and dimensionality reduction results to plot the selected cells in a new scatter plot. They will also be re-clustered based on louvain and k-means clustering methods.

Focusing on a subset of data with less cells than the original one helps you to identify more principal components and components that are significant only to this group of cells. Therefore, you can further group the cells to smaller clusters with distinct expression profiles. This feature is suitable for analyzing clusters with large heterogeneity.

16.1. Run sub-clustering

To run sub-clustering, first select a group of cells (refer to section 9) and click on the Sub-clustering icon. Name the sub-cluster as you like and click on Apply.

Re-calculation for the sub-cluster usually takes some minutes. After that, the Sub-clustering dashboard will be automatically open.

Sub-clustering dashboard is similar to the Analysis dashboard and can be used for query gene expression, find marker genes and enriched processes, study cellular composition, etc. but not differential expression analysis. A Mini map at the bottom left of the dashboard shows the main scatter plot with all cells of the sub-cluster highlighted in white.

To go back to main Analysis dashboard, click on the name of the sub-cluster at the top left corner and choose Main cluster from the drop-down.

16.2. Annotation of sub-clusters

Adding annotation in the sub-cluster dashboard is like in the Analysis dashboard.

First, select a group of cells, then click on Create an annotation and define the Group name and Cluster name.

Annotation created in Sub-cluster dashboard is treated equally to the one created in the main dashboard. Hence, you can view your sub-clusters in the main scatter plot or annotate sub-clusters in the different sub-clustering dashboard under the same group name.

17. Study clonotype

Sequencing the TCR is a powerful instrument to dissect the complexity and diversity of the T cell response repertoire. By associating the TCR with gene expression, BBrowser can provide an unbiased classification of a population of interest and the association of the transcriptional landscape of each cell with its TCR.

17.1. Getting started

On BBrowser, click on the Clonotype button at the bottom of the main scatter plot will show you the Clonotype dashboard. All cells in main scatter plot will be changed to gray color and spot size is decreased. A mini map will pop-up showing you the previous coloring of the scatter plot.

Now, you can add TCR sequencing data by clicking on Upload.

In case your data coming from multiple batches, the TCR sequencing data should be submitted for individual batch. Clicking Upload data button in that case will show you a pop-up to select input file for each batch.

Cells with recognized TCR sequence will now be colored according to their clonotype and spot size is changed to normal. The cells will be highlighted and enlarged if you hover the mouse on the clonotype name. Details on the number of cells in each clonotype and relevant antigen information are displayed in a table format.

On the left side of the dashboard, you can change clonotype data, or do clonotype counting and create an annotation for cells with a TCR sequence. By having this conversion to annotation, you can run any analysis on different clonotypes including marker gene detection, enrichment analysis, composition, and differential expression analysis.

17.2. Accepted data format

TCR sequencing results can be imported as TSV or CSV file.

The input matrix must have enough information for a typical V(D)J annotations. BBrowser only reads data from columns with the column name fall into the list below. Columns that are not in this list will be ignored.

The software only chooses clonotypes that are both full_length and productive. The CDR3 amino acid sequencing are used to map with the VDJdb (Shugay et al. 2017) to find out about the information of relevant epitopes.

17.3. Clonotype counting

There are two ways to perform clonotype count:

18. Export Image and Data

Table of export formats

18.1. Export figures

BBrowser supports exporting graphs into .PNG or .SVG formats.

Export violin plot as SVG file from Search genes tab

Export box plots, violin plots and density as SVG files from Gene query expression panel

Export box plots and violin plot as SVG file from DE analysis dashboard

Export main scatter plots as PNG file

Export heatmap as PNG file when querying multiple genes

Export composition plot as PNG file

Export scatter plots of two groups as a PNG file from DE analysis dashboard

Export Volcano plot as PNG file showing DE genes. 

18.2. Export tables and data

18.3. Export HTML report

18.4. Export figures for customization in BioVinci

BBrowser also supports exporting figures to BioVinci through the Export to BioVinci button to enable more flexible plot editing. (See more at Table of export formats )

In BioVinci, you can:

19. Frequently asked questions

 

My computer has 8GB RAM, can I process large data?

We recommend using computer with 16GB RAM for data having more than 100,000 cells or processed from FASTQ file. However, on computer with 8GB RAM, you can still open large Seurat objects if they are fully processed with PCA and dimensionality reduction results (tested with 300,000 cells object). If you want to submit count matrices, 8 GB RAM can smoothly process data of 30,000 cells.

I got the message “Cannot connect to server”. What can I do?

If you are using a server with a proxy, the message might come up when you try to login to the software since the proxy connection to BioTuring server cannot be made to verify your credentials. Please click on Proxy settings at the bottom of the login screen and configure your server.

What file formats can I import to BBrowser?

You can import FASTQ, MTX, TSV, CSV, .H5, .H5AD, and .RDS files to BBrowser.

For details about the structure of each file, please refer to section 4.

Does BBrowser support importing a dataset downloaded from GEO?

It depends on the format and structure of the file.

If the file fulfills all the requirements of the software, you can import it to BBrowser.

Otherwise, if the author of the study is willing to share their annotations, BioTuring team would be happy to consider hosting the data in our platform and will index the data based on our standard process.

Why is the scatter plot in BBrowser different from the plot in the publication?

Since we cannot obtain all the parameters of the data processing steps from the authors, for some steps, our default parameters may be different from those of the authors.

How can I combine multiple datasets?

To combine multiple datasets, first, you need to make sure they are in the same format (MTX, TSV or CSV).

After that, open BBrowser > Data > Add new study to import all files, select your method for batch correction and name the study, then click Start to run the processing.

How can I generate an image for publication?

BBrowser supports exporting multiple graphs: scatter plots, box plots, violin plots, etc. in either SVG or PNG format with a fixed design and layout.

If you want to customize the color of the graph, go to Settings > Visualization and change the color scale there.

An alternative is to export data of the graph to tsv and reconstruct it by your preferred tools outside BBrowser. BioTuring team also offers a drag-and-drop data visualization tool called BioVinci.

How can I compare a gene’s expression in different groups?

To compare gene expression across different clusters, first, choose the annotation with the clusters you are interested in. Then, type in the gene name or Ensembl ID in the gene query box and click Enter to query for the gene expression. Click on the arrow at the bottom of the color scale to extend the box and click on the Plot button to generate a box plot of gene expression across different clusters.


Azizi, E., Carr, A. J., Plitas, G., Cornish, A. E., Konopacki, C., Prabhakaran, S., ... & Choi, K. (2018). Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell, 174(5), 1293-1308.

Butler, Andrew, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. "Integrating single-cell transcriptomic data across different conditions, technologies, and species." Nature biotechnology 36, no. 5 (2018): 411.

Consortium, Gene Ontology. 2004. “The Gene Ontology (GO) Database and Informatics Resource.” Nucleic acids research 32(suppl_1): D258--D261.

Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal, Complex Systems 1695(5): 1–9.

Gribov, Alexander et al. 2010. “SEURAT: Visual Analytics for the Integrated Analysis of Microarray Data.” BMC medical genomics 3(1): 21.

Haghverdi, Laleh, Aaron T L Lun, Michael D Morgan, and John C Marioni. 2018. “Batch Effects in Single-Cell RNA-Sequencing Data Are Corrected by Matching Mutual Nearest Neighbors.” Nature biotechnology 36(5): 421.

Joshi-Tope, G et al. 2005. “Reactome: A Knowledgebase of Biological Pathways.” Nucleic acids research 33(suppl_1): D428--D432.

Korsunsky, Ilya, Jean Fan, Kamil Slowikowski, Fan Zhang, Kevin Wei, Yuriy Baglaenko, Michael Brenner, Po-Ru Loh, and Soumya Raychaudhuri. "Fast, sensitive, and flexible integration of single cell data with Harmony." BioRxiv (2018): 461954.

Korthauer, K. D., Chu, L. F., Newton, M. A., Li, Y., Thomson, J., Stewart, R., & Kendziorski, C. (2016). A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome biology, 17(1), 222.

Krijthe, J H. 2015. “Rtsne: T-Distributed Stochastic Neighbor Embedding Using Barnes-Hut Implementation.” R package version 0.13, URL https://github. com/jkrijthe/Rtsne.

Love, Michael I, Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome biology 15(12): 550.

Maaten, Laurens van der, and Geoffrey Hinton. 2008. “Visualizing Data Using T-SNE.” Journal of machine learning research 9(Nov): 2579–2605.

McInnes, Leland, and John Healy. 2018. “Umap: Uniform Manifold Approximation and Projection for Dimension Reduction.” arXiv preprint arXiv:1802.03426.

Melville, James. 2018. “Uwot: The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction.” https://github.com/jlmelville/uwot.

Robinson, Mark D, Davis J McCarthy, and Gordon K Smyth. 2010. “EdgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” Bioinformatics 26(1): 139–40.

Shugay, M., Bagaev, D. V., Zvyagin, I. V., Vroomans, R. M., Crawford, J. C., Dolton, G., ... & Eliseev, A. V. (2017). VDJdb: a curated database of T-cell receptor sequences with known antigen specificity. Nucleic acids research, 46(D1), D419-D427.

Subramanian, Aravind et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences 102(43): 15545–50.

Tran, Thang, Thao Truong, Hy Vuong, and Son Pham. 2019. "Hera-T: An Efficient And Accurate Approach For Quantifying Gene Abundances From 10X-Chromium Data With High Rates Of Non-Exonic Reads.". doi:10.1101/530501.

Wang, T., Li, B., Nelson, C. E., & Nabavi, S. (2019). Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC bioinformatics, 20(1), 40.