BioTuring Browser, or BBrowser, is a desktop application that performs analyses on sequencing data. The software is also connected to a database hosting sequencing data from the latest publications. Users can use BBrowser to analyze their own data or analyze the public data available.
The software allows scientists, even ones without programming experience, to quickly investigate massive amounts of sequencing data from in-house and published work and compare them together. All data submitted by users and data downloaded from the BBrowser database is stored and secured on the local computer.
The application was first released in October of 2018, running on Windows, macOS, and Ubuntu.
To install BBrowser on MacOS, after downloading the installation package:
There are 2 options for running BBrowser on Windows: portable version or installed version.
The portable version does not require any installation and is only available for Windows. Users who do not have the necessary privileges to modify the system registry can download this version to use instantly. There is no difference between the portable version and the installed version of the BBrowser in terms of the interface and functionality.
The portable version is provided in a zipped file. After downloading, you need to unzip the file and double-click on it to run the software.
Alternatively, to launch BBrowser, users can run the executable binary, BBrowser2.exe, located in a folder called BBrowser2-win32-x64.
If you want to move the software storage, you must move the entire BBrowser2-win32-x64 folder. Modifying or removing any files in this folder may cause BBrowser to stop working. In either case, your single-cell data is not affected as it is stored in a different location.
If you want to install BBrowser to your computer, download the installer .exe file and run it with administrator permission. An installation window will guide you through the installation.
Although the program can be installed anywhere on your computer, we highly recommend you putting it in the usual Program Files folder. However, this action will require the administrator’ s permission.
If your computer has more than one account using the software, each account can only access its own data.
To install BBrowser on Centos 7, first, you need to install some dependencies:
yum install libgfortran libXScrnSaver
Then, use the following command to install BioTuring Browser:
rpm -iU BBrowser-xxx.x86_64.rpm
Please replace “xxx” with the version that you downloaded. Installation of the software and its dependencies may require root access. After installing, BBrowser can be found in Applications > Accessories
BBrowser Login page appears when you open the software for the first time. Once the software successfully records your BioTuring account, it will automatically log in the next time you start BBrowser.
Please enter your credentials and claim your academic or non-academic status to access the different sets of features, then Enter or click on Login.
Log in credentials are encrypted and stored individually for each user if multiple users are using the same computer.
If you are using a network with a proxy, please configure Proxy settings at this point, before any connection to the BioTuring server is made. The software needs to have correct proxy settings in order to connect to our server and verify your credentials, as well as to get access to our public database.
BBrowser Home page shows you all data that you can download to the local computer, including public data from BioTuring server and data from your remote repositories.
BBrowser Data page shows you all data that you have downloaded or submitted to the software. You can refer to this page as your local database. Also, you can submit a new dataset here.
BBrowser Settings page helps you to:
When you click on a study on the Data page or click on Explore a study from the Home page, you will enter the Analysis dashboard.
Here you can visualize the data and perform all the analyses.
The main visualization is a scatter plot of dimensionality reduction (t-SNE or UMAP), with each point representing a single cell. Cell color, size, and shape change when you run different analyses. The scatter plot is interactive, allowing you to zoom, move, rotate (in 3D mode), or select cells
Inside the main visualization window are some function boxes:
On the right of the main visualization window are main function tabs
There are 4 tabs here, each comes in a small window which can be expanded or closed. These tabs either give you more insights about the data or provide additional visualization, which are:
At the bottom of function tabs are information about study input/ output and visualization and analysis settings.
The other 2 interfaces: Sub-clustering dashboard and differential expression dashboard will be described in their specific section.
If you need help while doing analysis, press Alt (on Windows) or hover your mouse to the top left of the screen (on macOS) and click on Help to view our tutorials or to contact us.
Massive amounts of single-cell RNA sequencing data generated have opened avenues for exploration, yet also brought up new challenges to standardize data formats, systematically access transcription profiles of cell types across studies and integrate multiple datasets.
Hence, in BioTuring Browser, we have indexed published single-cell RNA sequencing data from multiple formats to our platform to remove that barrier. All data are processed and annotated to be instantly accessed and explored in a uniform visualization and analytics interface.
In addition to that, we have developed our set of marker genes for over 200 cell types and use that gene list to verify the author’s annotations and re-label the cell types to BioTuring cell ontology to systemize cell types available in our database.
Users can also query a single or multiple gene expression across all datasets in the database and see how the genes expressed in different clusters without downloading any dataset.
The section below explains how we index published data and how the gene query across the database works.
Step 1. Data collection
Single-cell gene expression matrices or Seurat/Scanpy objects are obtained from the author or public repositories. If Seurat or Scanpy objects are available, we will reserve the analysis results and move to the annotation step (6).
Step 2. Filtering and normalization
Cells and genes from the submitted matrices are filtered to avoid drop-out, doublets, and apoptotic cells. Data are then subjected to log normalization and highly variable genes selection. QC criteria are subject to the authors’ descriptions.
In case details of the filtering and normalization are not available, we will process the data by ourselves to get the most similar results with the publication.
Step 3. Batch effect correction
We follow the methods used in each study. If not provided, we will apply CCA correction.
Step 4. Dimensionality reduction and clustering
We use the first 30 components of PCA to calculate 2D and 3D t-SNE or UMAP, the parameters of which are taken from the author’ descriptions.
Step 5. Clustering
The dataset will go through both graph-based clustering by the igraph package (Csardi and Nepusz, 2006) and k-means clustering (Neter et al., 1998).
Step 6. Annotation and standardization of cell type labels
Cell type annotation matrices are obtained from authors and loaded in BioTuring Browser, together with metadata of the experimental design. We then manually verify cell type annotations using known markers and unify the terminology based on our internal cell ontology.
If annotation and metadata are not available, we will extract information directly from the publications.
Users of BBrowser can view all studies in the public database when opening the Home page of the software. You can also access the list of studies available in BioTuring website: https://bioturing.com/bbrowser/datasets
We select the studies to index based on the needs of our users and community.
If you have a study of interest and would want it to be indexed by BioTuring team, please contact us at firstname.lastname@example.org
If you are an author, we are very happy to distribute your data on BBrowser for public access. Please also contact us at email@example.com
Since version 2.1.3, we introduced a special search engine to help you look at one gene or multiple gene expression across every public dataset of BBrowser. Without downloading anything from the server, the gene search engine lets you skim through a huge amount of information in the most efficient way.
You can find the gene search engine in Home page > Search genes tab
The search result is sorted descending based on this number. Information about the study and option to Download are the same as in the Search studies tab. Please refer to Section 5.1_Search and download a public study to search your studies.
You can get a dataset on BBrowser by downloading it from BioTuring server or from your internal server or by importing the data from your local computer.
Currently, BBrowser supports analyzing data from human (Homo sapiens) and mouse (Mus musculus). If you input data of a species rather than those, the software can still process the data (except transcript quantification step). However, some features that are related to gene information will be disabled, such as gene-set enrichment analysis and gene functional reminder.
BioTuring Browser hosts a public database of published studies that are selected, processed, verified and uniformly labeled by the BioTuring team. You can view the list of studies in this database in the BBrowser Home page.
To download a study from BBrowser public database, you need to be connected to the internet and follow these steps.
If you want to search your studies through general terms like its title, authors and other keywords:
This type of searching allows looking for studies that express your gene(s) of interest. Go to Home page > Search genes.
If you search one gene, you will get a violin plot of that gene’s expression. On the plot:
All violin plots are interactive. You can hover your mouse over the plot to get the statistics (e.g. quantiles, median, mean, etc.), or drag to enlarge an area of the plot. Double click on any part of the plot will bring it back to the original setting.
On the top right of each dataset, there is a horizontal bar telling the number of cells that express the gene in total. The search result is sorted descending based on this number.
BBrowser will show all studies that express the gene of interest. You can scroll down to find the study you want to download or click the camera button to export the plot.
If you search for multiple genes, the results will be a series of heatmaps, each of which is from one dataset. Each heatmap shows:
Download the data
To import a single-cell RNA sequencing study with raw data, you need to provide a folder containing all your FASTQ files.
BBrowser supports importing expression matrices as MTX, TSV, and CSV files with integer counts.
The expression matrix files can be unzipped or zipped in gzip.
To import a study by single or multiple MTX files, you need to provide a folder with exactly 3 files:
When multiple folders containing data from multiple batches are submitted, options for selecting batch correction methods will be available.
If multiple folders were submitted, in the Analysis dashboard you will find the input metadata classification with the name of clusters are input folders’ names. This helps you visualize (colored and shaped) the cells based on which batch they come from.
The three files barcodes.tsv, features.tsv (or genes.tsv), and matrix.mtx are the standard files from 10X CellRanger. Below, we describe some more details of the data format that will affect the analysis.
To import a study by single or multiple CSV/ TSV files:
A .tsv or .csv files are simply a table in which values are separated by a delimiter. It can be a tab (in .tsv) or a comma (in .csv). If you use a table editor, such as Excel, Libre, or Google Sheet, it always can export your table into either .csv or .tsv format.
BBrowser requires a strict format in order to parse the information correctly. Please make sure that the first column of the table has the gene names / Ensembl identifiers, and the first row of the table has the barcodes.
For users who want to export a matrix using R, please be careful because writing a matrix in R may lose one first cell of the first row. For example, given a matrix object having 1000 rows and 500 columns:
num [1:1000, 1:500] 0 0 0 0 0 0 0 0 0 0 …
– attr(*, “dimnames”)=List of 2
..$ : chr [1:1000] “ENSG1111111111” “ENSG1111111112” “ENSG1111111113” “ENSG1111111114” …
..$ : chr [1:500] “CTGGTCCGGTGTTATCAG” “TTACTGGGACGACTCGGG” “ACGAGGAGACCCGAGATA” “CTTTGCAGTAGGGGCAAC” …
Write a .csv file in this way will lose the first cell. The first row of the file will only contain 500 values while other rows will be 501:
write.table(m, ‘matrix.csv’, sep=”,”, col.names=T, row.names=T)
Please use this command instead. It is much easier:
For .tsv file, the best way is to use the common write.table, then manual insert one tab on the beginning of the first row.
BBrowser supports importing processed scRNA-seq and CITE-seq data by Seurat (.rds) and Scanpy objects (.h5ad/ h5) with integer counts (raw counts or rounded counts).
Quality control parameters and the dimensionality reduction method are not needed because these steps have been done on the Seurat/ scanpy object.
A Seurat or scanpy object must contain an expression matrix with information on barcodes and genes. BBrowser can also adopt some analysis results in the object. These results include, but are not limited to:
Upon receiving the Seurat or Scanpy object, BBrowser will read all data available and runs analyses to get the missing information.
BBrowser is able to read a Seurat object stored in .rds format. To create a .rds file from Seurat, you can use the saveRDS function in R. We will not go into detail about the structure since the software does not require any specific modification of the original Seurat structure. The most critical information in each object is the count matrix, which should be store in @assays$RNA@counts for gene expression data and @assays$ADT@counts for antibody captured data.
For users who analyze with Python via the scanpy library, the final AnnData class should be stored in .h5/.h5ad format using the .write function within the class itself. Unfortunately, hdf5 is too general and there are many variations of the structure in which the information is recorded. BBrowser expects the following structure:
We are fully aware that different datasets were generated under different experimental designs and may have to be treated uniquely in order to represent all biological variations in the samples and for public studies, to reproduce the published results in the most faithful way. That is also the long-term plan for BioTuring Browser to maintain the speed and ease of use, while enhancing the flexibility of the analyses. All public datasets and imported data underwent the same pipeline, separate steps of which will be discussed in this section.
Transcript quantification is only applied when you create a new study with raw sequencing files (FASTQ). The process is run by Hera-T (version 1.2.0) (Tran et al. 2018), a new algorithm developed by BioTuring team. This is applied to data generated by 10X protocol on Chromium v2 and v3. The processing speed is up to 10 – 100 times faster than CellRanger 3.0 with better accuracy (Tran et al. 2018). The output of transcript quantification is an expression matrix in MTX file format and the file will be submitted for further processing steps below.
The process from quality control to dimensionality reduction is applied to public and in-house datasets imported in MTX, TSV or CSV files.
Quality control filters out poor-quality cells in terms of gene expression and redundant non-expressed genes in the data.
In public datasets without a detailed processing script from the author, genes having at least 1 UMI count in less than 3 cells are excluded. Then, cells with less than 200 genes having at least 1 UMI count and more than 5% of mitochondria genes are excluded. The process creates a new expression matrix that may have fewer cells than the original data, and BBrowser only takes the cells and genes of this filtered matrix for the next processing steps.
For in-house data, BBrowser allows users to define the cut-off for quality control or to skip any filtering steps. In the data import pop-up, you can
This process is applied when multiple MTX, TSV or CSV files are submitted, usually from multiple batches of sample preparation and sequencing. The software considers each file as a batch and will try to scale all batches with the chosen method
Currently, we provide 3 methods to remove batch effects for your preference:
On BBrowser, you can choose to run dimensionality reduction by t-SNE or UMAP.
t-SNE (Maaten and Hinton 2008):
The analysis is done by the Rtsne package (Krijthe 2015). The default perplexity for t-SNE is set at 30
UMAP (McInnes and Healy 2018):
The analysis is done by the uwot package (Melville 2018) . The number of neighbors is set at 30.
This analysis runs on the PCA results. For every dataset, the software will calculate both louvain (graph-base).
BBrowser uses a non-parametric approach, called Venice, to detect marker genes. It is an open-source algorithm and can effectively run on a large amount of data while the accuracy is outperform other methods (Hy et al. 2019).
We first defined marker genes of a group of cells in a data set as the genes that can be used to distinguish such cells from the rest. From this idea, we used the accuracy of classification as a metric to score the significance of a marker gene.
Considering each gene separately, we denote a cell as where is the label of a group of cells. if the cell is in the group of interest (group 1 – the group that we want to find the marker genes for). if the cell is not in the group of interest (group 2 – the rest of the data). We denote as the complement group of .
The probability for a cell being in group , given its expression level is:
In most of the cases, the group of interest is much smaller than the rest of the data and can generate a sampling bias. To avoid this bias of sample size, we set:
Accuracy of the classifier is:
The accuracy of prediction is:
Intuitively, For the robustness of the calculation, we divide the expression into intervals:
Where is the number of cells of group in group , and is the number of cells in group . For each gene, we can estimate the accuracy measure for using this gene to predict cells inside or outside the cluster and use this as a metric for ranking the marker genes.
We tested Venice on both real and simulated datasets. The benchmark considered the performance on 2 different sequencing technologies (full-lenght and UMI count), 4 different kinds of marker genes (including transitional genes), and 2 different kinds of null genes. Venice exhibited the best performance and accuracy in all cases. It could effectively detect different types of marker genes and avoid false-positive results while keeping a modest running time.
Venice is also incorporated in Signac, a single-cell analytics package developed by BioTuring. The package is available at https://www.github.com/bioturing/signac
This analysis is adopted from the GSEA method (Subramanian et al. 2005), a common analysis for selecting potential biological terms given a sorted list of genes. The software performs GSEA on 4 different terms: biological process, molecular function, cellular component, and biological pathway. The first 3 terms are from the gene ontology (Consortium 2004), and the last one is from the reactome database (Joshi-Tope et al. 2005).
Enrichment analysis can be found in both the Analysis dashboard and the Differential expression dashboard
This feature shows you the suggested cell-type for a group of cells. When a user does a selection by clicking a cluster/annotation or using the Select cell tool, the software picks genes that express in at least 35% of the group. This process does not select from the whole transcriptome, but instead on a list of cell-type markers in our curated knowledge base. Then, it takes that gene profile to estimate the correlation with the cell-types profile. A cut-off of 0.5 is applied to remove non-potential candidates. The remaining cell types will undergo and tree search to find the common parents. Parents which have less weight (e.g. distinct from the rest) are removed. This process is repeated until only one cell type left. The whole analysis usually takes 1-3 seconds to finish, hence, it triggered automatically.
BBrowser supports finding the differential expressed genes between two groups of cells, with each group must have at least 3 cells. It finds differentially expressed genes using Venice, the same method for finding marker genes. Users can switch to edgeR, a more common method but takes at least 5 times longer.
For the log2FC value of each gene, we use the same method of the Seurat package (Gribov et al. 2010). Below is the detail formula:
Depending on the data available in your study, you can choose between several visualization methods:
By default, the main plot is calculated by gene expression. It can be t-SNE or UMAP subject to the method chosen during the pre-processing step. To check if the current visualization is t-SNE or UMAP, go to Settings > Analysis > Dimensionality reduction method > Apply and you can switch between t-SNE and UMAP.
To switch from an RNA-based plot to a protein (ADT) – based plot or 2D/3D feature plot, go to the dropdown box next to Clonotype at the bottom of the screen and select Feature Plot.
To generate a feature plot, type in your gene(s) / protein(s) of interest for the X, Y and Z axes > Apply. These axes must be either genes or proteins. However, you cannot view genes and proteins at the same time.
Filling in the boxes of XYZ-axis
Create 2D feature plot with 2 proteins
Create 3D feature plot with 3 proteins
Or create 3D feature plot with 3 genes
Watch our tutorial video here
t-SNE/ UMAP of gene expression can be view in 2D or 3D, while other plots are set as 2D.
You can interact with the plot by zoom in/ zoom out, switch between 2D and 3D, move and rotate the plot and reset it to the original state.
On the bottom right corner of the scatter plot, there are several buttons that control the visualization as well as how a user can define a selection.
From top to bottom:
● Reset: this button reset the scatter plot to the original state without any selection and cells are colored by the last clustering factor/annotation used.
● Lasso tool: this button activates the free selection mode. This tool will help to create a new free-form selection by drawing. You can hold the Ctrl/Cmd button and draw to add current selection or remove them by holding Shift and drawing. For deselecting all, hold the Ctrl/Cmd button and press D.
● Move tool: this button activates the navigation mode: moving and rotating the plot. You also can choose the cluster of current metadata.
● 2D / 3D: these buttons help you switch between 2-D and 3-D scatter plot. Rotation is only enabled for 3D plot. For Seurat/ scanpy object calculated for dimensionality reduction in 2D but not 3D coordinate, BBrowser can calculate the 3D coordinate based on PCA results and vice versa.
● Zoom (plus/minus): these buttons help you zoom in and out. The point size of the scatter plot remains unchanged when zooming. Alternatively, you can use your mouse wheel to zoom.
● Download: Screencap of the current scatter plot and cluster labeling, also export as an image or data.
Users can customize the theme, point size, transparency and color palette of the main plot.
Options for altering the scatter plot appearance includes:
Metadata tab helps you to color the cells to your preference. Users will decide the group of clusters they would want to visualize, hence, changing the way cells are colored and filtering with their conditions.
The software offers various classification methods: unbiased graph-based clustering, classification by input metadata, or by your own definition and annotation. You can also import your annotation matrix from a TSV file in Metadata tab.
Metadata tab is always activated.
To see how a gene or protein is expressed in the given dataset, you can type the gene/ protein name or its Ensembl ID or alias into the gene/protein query box at the top right corner of the scatter plot and Enter.
Upon querying a gene/ protein, BBrowser provides two ways to visualize its expression
In case the expression values are stretched in a large range, you can choose to visualize from 5th to 95th percentile of the data to eliminate outliner points.
Image exported when querying a gene expression showing all expression values (top) or top 5th-to-95th-percentile of expression values (bottom)
BBrowser supports viewing the expression of multiple genes or proteins, in the scatter plot or in a heatmap.
BBrowser also supports you to view the pairwise correlation of multiple genes’ expression among clusters or groups from annotation. BBrowser offers 3 methods, including Spearman, Pearson and Kendall correlation.
Watch our tutorial video here.
When saving the multiple genes/ proteins query to Gene gallery, you can save your gene(s)/protein(s) of interest for reviewing later.
For other analyses: add annotation, view compositional breakdown, finding marker genes and differential expression analysis, etc., you first need to select the cells. Cells that are selected will be colored in white.
9.1.1. Lasso tool & Move tool
The most common way to select cells is by using the Move tool and Lasso tool. You can find these tools at the bottom right corner of the main scatter plot.
Lasso tool (above) and Move tool (below). Just hover on the Lasso tool to see more instructions.
Using Lasso tool to select a cell population
You can also select cells that are already clustered/ annotated from the Metadata tab.
To select cells in one cluster:
9.1.3. Select cells by gene expression
You can select cells that shared the same expression level of one given gene:
For this example, we will show how to select CD8+ T cells from the Responder group of Sade-Feldman et al 2018 (combining 2 conditions: CD8 T cells, belonging to Responders).
From BBrowser 2.5.3, you can filter cells by multiple conditions using the Advanced Filter tab, combining two methods: filter by expression and filter by metadata. BBrowers will display the groups of desired cells from your filter.
For this example, we will show how to select CD8+/TCF7+ T cells from the Responder group of Sade-Feldman et al 2018 (combining 3 conditions: CD8 positive, TCF7 positive, belonging to Responders).
If you want to add a gene expression cut-off, select Filter by expression
BBrowser cell-type prediction tool takes a list of marker genes defined by the users as the reference and evaluates the expression of all those marker genes in the selected population to predict the cell-type. Whenever a cell population is selected, the process will automatically be done. The cell type prediction result will appear in the infobox on the top left corner of the scatter plot. It includes the cell type name and the marker genes’ information.
By default, cell type prediction is applied only to data with less than 50,000 cells due to the long processing time needed for a large dataset. You can enable the function for large data by increasing the cell number limit in Settings > Analysis > Cell-type prediction limit.
The Cell Search Engine is designed to help you find cell populations in BioTuring public database which have similar transcription profiles to your selected cells – suggesting the cell type and signature genes, enrichment processes of the selected group.
The server will return results in a pop-up window, including:
All the cell search queries will be automatically saved in the Cell search tab to reopen, rename or reselect. You also can delete them.
Watch our tutorial video here.
Finding marker genes/proteins and enriched processes in a group of cells helps you to see the genes/proteins and processes that are differently expressed in that selected group, compared to the rest of the cell population. The information is essential to define which cell type the cluster belongs to. To run the analysis:
Details on the markers and enrichment analysis include:
This section shows how to find markers for every group in your current metadata by a single click and create a heatmap for them.
BBrowser will ask you how many top marker genes you want to add to the gallery. Just name your gallery (for example, “Top 10 marker genes of every cluster”).
Opt to pick out just gene or protein markers. After that, select the number of top gene or protein markers for each group that you want to add to the gallery. You can use the default option, which is to create a gallery for the top 10 markers of each cluster.
You can click on each picture in the Gallery to query the top 10 markers for each group.
To see the full list of gene or protein markers for each group, select that group on the Metadata tab and scroll to the Marker Features tab to view them.
Watch our tutorial video here.
You can add multiple annotations to a cell, regarding cell type, subtype, expression level of a gene or set of genes or clonotype, etc. There are 2 ways to add an annotation:
For each annotation, you need to put in Group name as the name of the classification (cell type, sub-type, T cell sub-type …) and cluster name is the name of the cluster (macrophage, microglia, COL1A4+ fibroblast, …).
To import an annotation matrix by a file:
To manually annotate each cluster:
Click Apply to implement.
After an annotation is added, you can edit it by changing name, merging 2 clusters together, delete the cluster or the whole group.
When clicking the plus button next to the current metadata, you can add your own metadata from files. The file should have the format with .tsv, and contain 1 column for barcodes, the others column for metadata.
BBrowser supports cellular composition analysis for any group of cells, whether annotated or not annotated. Users will define the group of cells they want to view composition and the type of classification. The software will identify the percentage of each cluster from the chosen classification in the group of selected cells and sort the clusters by order of majority.
For standard function
For Normalized composition
We recommend the Normalized by total tool for reducing the bias of unequal distribution affected by unbalanced sample sizes.
See below for an example. The total number of cells from disease doubles that from non-disease. Therefore, if you want to discover the percentages of disease and non-disease in Macrophage, it is more likely that the percentage of disease will be dominating. That will create the biased proportion of macrophage cells from two groups. After using Normalized by total tool, the result will show the cell composition without the bias coming from sample size.
The normalizing formulation is as follows:
Total number of cells from group 1 (Disease): A;
Total number of cells from group 2 (Non-disease): B;
Number of cells from group 1 (Disease) in the selected population (Macrophage): a
Number of cells from group 2 (Non-disease) in the selected population (Macrophage): b
Normalized percentage of group 1 in the selected population:
Performing differential expression analysis on any given two clusters will help you to find out the genes that cause differences between 2 clusters and processes associated with them. You also can run DE analysis in Sub-cluster with the same steps with Main Cluster. (Refer Section 16_Sub-Clustering)
You can run DE analysis on 2 clusters in the same annotation in the Composition panel
You can run DE analysis on any 2 selected groups of cells.
BBrowser offers 5 methods to run DE analysis: our in-house algorithm Venice and 6 differential expression analysis algorithms from Seurat package –Wilcoxon, Likelihood-ratio test, T-test, Poisson, and Logistic regression.
To choose a method, go to Settings > Analysis > Differential expression analysis.
After you run the DE analysis on two clusters of interest, the software will proceed to the DE dashboard, showing differentially expressed genes by a volcano plot of all genes, a box plot of a single gene expression, a table of genes and enriched processes, and a scatter plot of cells in two clusters
If you click on a gene on the volcano plot or the table, the scatter plot will show the selected gene’s expression. You can also query a specific gene expression by filling the gene name in the top right box.
The DE dashboard toolbar (at the bottom right of the scatter plot):
DE analysis results are automatically saved right after you run it, so you do not have to perform the analysis again in the future. To review the DE analysis result, click on the Differential Expression panel > View previous results.
You can edit the name or delete the analysis by clicking at the top right corner of it, click on Save/ Confirm to save the change.
Sub-clustering is an advanced feature that takes out a group of cells and treats them as a new set of data. The software will calculate new principal components and dimensionality reduction results to plot the selected cells in a new scatter plot. They will also be re-clustered based on louvain and k-means clustering methods.
Focusing on a subset of data with less cells than the original one helps you to identify more principal components and components that are significant only to this group of cells. Therefore, you can further group the cells to smaller clusters with distinct expression profiles. This feature is suitable for analyzing clusters with large heterogeneity.
To run sub-clustering, first select a group of cells (refer to section 9_Select a cell population) and click on the Sub-clustering icon. Name the sub-cluster as you like and click on Apply.
Re-calculation for the sub-cluster usually takes some minutes. After that, the Sub-clustering dashboard will be automatically open.
Sub-clustering dashboard is similar to the Analysis dashboard and can be used for query gene expression, find marker genes and enriched processes, study cellular composition, etc. but not differential expression analysis. A Mini map at the bottom left of the dashboard shows the main scatter plot with all cells of the sub-cluster highlighted in white.
To go back to main Analysis dashboard, click on the name of the sub-cluster at the top left corner and choose Main cluster from the drop-down.
Adding annotation in the sub-cluster dashboard is like in the Analysis dashboard.
First, select a group of cells, then click on Create an annotation and define the Group name and Cluster name, follow section 13_Add a annotation.
Annotation created in Sub-cluster dashboard is treated equally to the one created in the main dashboard. Hence, you can view your sub-clusters in the main scatter plot or annotate sub-clusters in the different sub-clustering dashboard under the same group name.
In sub-cluster, you can run DE to compare any 2 groups like in the main cluster. There are also two ways to run DE analysis in sub-cluster: Composition panel and DE tab. For more details on the differential expression analysis, please view this Section 15_Differential expression analysis.
Sequencing the TCR is a powerful instrument to dissect the complexity and diversity of the T cell response repertoire. By associating the TCR with gene expression, BBrowser can provide an unbiased classification of a population of interest and the association of the transcriptional landscape of each cell with its TCR
On BBrowser, click on the Clonotype button at the bottom of the main scatter plot will show you the Clonotype dashboard. All cells in the main scatter plot will be changed to gray color and spot size is decreased. A mini map will pop-up showing you the previous coloring of the scatter plot.
Now, you can add TCR sequencing data by clicking on Upload data.
In case your data comes from multiple batches, the TCR sequencing data should be submitted for individual batches. Clicking Upload data button in that case will show you a pop-up to select the input file for each batch.
Cells with recognized TCR sequence will now be colored according to their clonotype and spot size is changed to normal. The cells will be highlighted and enlarged if you hover the mouse on the clonotype name. Details on the number of cells in each clonotype and relevant antigen information are displayed in a table format.
On the left side of the dashboard, you can change clonotype data, or do clonotype counting and create an annotation for cells with a TCR sequence. By having this conversion to annotation, you can run any analysis on different clonotypes including marker gene detection, enrichment analysis, composition, and differential expression analysis.
TCR sequencing results can be imported as TSV or CSV file.
The input matrix must have enough information for a typical V(D)J annotations. BBrowser only reads data from columns with the column name fall into the list below. Columns that are not in this list will be ignored.
The software only chooses clonotypes that are both full_length and productive. The CDR3 amino acid sequencing is used to map with the VDJdb (Shugay et al. 2017) to find out about the information of relevant epitopes
There are two ways to perform clonotype count:
17.4. View shared clonotypes
In your clonotype table, the Groups column will show what groups that clonotype belongs to.. (In the example, the first clonotype (48 cells) exists in P1211 (48 cells) and the second clonotype (39 cells) exists in P1023 (39 cells) with the Patient metadata)
The Groups column will correspond to your current annotations in the metadata tab. (For example, if you want to view what Locations (Normal, Blood or Tumor) have a clonotype, change the annotation to Location.)
To view the full list of clonotypes in a bigger table, you can click on the small window button next to the export button.
Table of export formats
BBrowser supports exporting graphs into .PNG or .SVG formats.
Export violin plot as SVG file from Search genes tab
Export box plots, violin plots and density as SVG files from Gene query expression panel
Export box plots and violin plot as SVG file from DE analysis dashboard
Export main scatter plots as PNG file
Export heatmap as PNG file when querying multiple genes
Export composition plot as PNG file
Export scatter plots of two groups as a PNG file from DE analysis dashboard
Export Volcano plot as PNG file showing DE genes.
BBrowser also supports exporting figures to BioVinci through the Export to BioVinci button to enable more flexible plot editing. (See more at Table of export formats )
In BioVinci, you can:
We recommend using computer with 16GB RAM for data having more than 100,000 cells or processed from FASTQ file. However, on computer with 8GB RAM, you can still open large Seurat objects if they are fully processed with PCA and dimensionality reduction results (tested with 300,000 cells object). If you want to submit count matrices, 8 GB RAM can smoothly process data of 30,000 cells.
If you are using a server with a proxy, the message might come up when you try to login to the software since the proxy connection to BioTuring server cannot be made to verify your credentials. Please click on Proxy settings at the bottom of the login screen and configure your server.
You can import FASTQ, MTX, TSV, CSV, .H5, .H5AD, and .RDS files to BBrowser.
For details about the structure of each file, please refer to section 4.
It depends on the format and structure of the file.
If the file fulfills all the requirements of the software, you can import it to BBrowser.
Otherwise, if the author of the study is willing to share their annotations, BioTuring team would be happy to consider hosting the data in our platform and will index the data based on our standard process.
Since we cannot obtain all the parameters of the data processing steps from the authors, for some steps, our default parameters may be different from those of the authors.
To combine multiple datasets, first, you need to make sure they are in the same format (MTX, TSV or CSV).
After that, open BBrowser > Data > Add new study to import all files, select your method for batch correction and name the study, then click Start to run the processing.
BBrowser supports exporting multiple graphs: scatter plots, box plots, violin plots, etc. in either SVG or PNG format with a fixed design and layout.
If you want to customize the color of the graph, go to Settings > Visualization and change the color scale there.
An alternative is to export data of the graph to tsv and reconstruct it by your preferred tools outside BBrowser. BioTuring team also offers a drag-and-drop data visualization tool called BioVinci.
To compare gene expression across different clusters, first, choose the annotation with the clusters you are interested in. Then, type in the gene name or Ensembl ID in the gene query box and click Enter to query for the gene expression. Click on the arrow at the bottom of the color scale to extend the box and click on the Plot button to generate a box plot of gene expression across different clusters.
Azizi, E., Carr, A. J., Plitas, G., Cornish, A. E., Konopacki, C., Prabhakaran, S., … & Choi, K. (2018). Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell, 174(5), 1293-1308.
Butler, Andrew, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. “Integrating single-cell transcriptomic data across different conditions, technologies, and species.” Nature biotechnology 36, no. 5 (2018): 411.
Consortium, Gene Ontology. 2004. “The Gene Ontology (GO) Database and Informatics Resource.” Nucleic acids research 32(suppl_1): D258–D261.
Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal, Complex Systems 1695(5): 1–9.
Gribov, Alexander et al. 2010. “SEURAT: Visual Analytics for the Integrated Analysis of Microarray Data.” BMC medical genomics 3(1): 21.
Haghverdi, Laleh, Aaron T L Lun, Michael D Morgan, and John C Marioni. 2018. “Batch Effects in Single-Cell RNA-Sequencing Data Are Corrected by Matching Mutual Nearest Neighbors.” Nature biotechnology 36(5): 421.
Joshi-Tope, G et al. 2005. “Reactome: A Knowledgebase of Biological Pathways.” Nucleic acids research 33(suppl_1): D428–D432.
Korsunsky, Ilya, Jean Fan, Kamil Slowikowski, Fan Zhang, Kevin Wei, Yuriy Baglaenko, Michael Brenner, Po-Ru Loh, and Soumya Raychaudhuri. “Fast, sensitive, and flexible integration of single cell data with Harmony.” BioRxiv (2018): 461954.
Korthauer, K. D., Chu, L. F., Newton, M. A., Li, Y., Thomson, J., Stewart, R., & Kendziorski, C. (2016). A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome biology, 17(1), 222.
Krijthe, J H. 2015. “Rtsne: T-Distributed Stochastic Neighbor Embedding Using Barnes-Hut Implementation.” R package version 0.13, URL https://github. com/jkrijthe/Rtsne.
Love, Michael I, Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome biology 15(12): 550.
Maaten, Laurens van der, and Geoffrey Hinton. 2008. “Visualizing Data Using T-SNE.” Journal of machine learning research 9(Nov): 2579–2605.
McInnes, Leland, and John Healy. 2018. “Umap: Uniform Manifold Approximation and Projection for Dimension Reduction.” arXiv preprint arXiv:1802.03426.
Melville, James. 2018. “Uwot: The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction.” https://github.com/jlmelville/uwot.
Robinson, Mark D, Davis J McCarthy, and Gordon K Smyth. 2010. “EdgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” Bioinformatics 26(1): 139–40.
Shugay, M., Bagaev, D. V., Zvyagin, I. V., Vroomans, R. M., Crawford, J. C., Dolton, G., … & Eliseev, A. V. (2017). VDJdb: a curated database of T-cell receptor sequences with known antigen specificity. Nucleic acids research, 46(D1), D419-D427.
Subramanian, Aravind et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences 102(43): 15545–50.
Tran, Thang, Thao Truong, Hy Vuong, and Son Pham. 2019. “Hera-T: An Efficient And Accurate Approach For Quantifying Gene Abundances From 10X-Chromium Data With High Rates Of Non-Exonic Reads.”. doi:10.1101/530501.
Wang, T., Li, B., Nelson, C. E., & Nabavi, S. (2019). Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC bioinformatics, 20(1), 40.