BioTuring Browser, or BBrowser, is a desktop application that performs analyses on sequencing data. The software is also connected to a database hosting sequencing data from the latest publications. Users can use BBrowser to analyze their own data or analyze the public data available.
The software allows scientists, even ones without programming experience, to quickly investigate massive amounts of sequencing data from in-house and published work and compare them together. All data submitted by users and data downloaded from the BBrowser database is stored and secured on the local computer.
The application was first released in October of 2018, running on Windows, macOS, and Ubuntu.
To install BBrowser on MacOS, after downloading the installation package:
There are 2 options for running BBrowser on Windows: portable version or installed version.
The portable version does not require any installation and is only available for Windows. Users who do not have the necessary privileges to modify the system registry can download this version to use instantly. There is no difference between the portable version and the installed version of the BBrowser in terms of the interface and functionality.
The portable version is provided in a zipped file. After downloading, you need to unzip the file and double-click on it to run the software.
Alternatively, to launch BBrowser, users can run the executable binary, BBrowser2.exe, located in a folder called BBrowser2-win32-x64.
If you want to move the software storage, you must move the entire BBrowser2-win32-x64 folder. Modifying or removing any files in this folder may cause BBrowser to stop working. In either case, your single-cell data is not affected as it is stored in a different location.
If you want to install BBrowser to your computer, download the installer .exe file and run it with administrator permission. An installation window will guide you through the installation.
Although the program can be installed anywhere on your computer, we highly recommend you putting it in the usual Program Files folder. However, this action will require the administrator’ s permission.
If your computer has more than one account using the software, each account can only access its own data.
BBrowser Login page appears when you open the software for the first time. Once the software successfully records your BioTuring account, it will automatically log in the next time you start BBrowser.
Please enter your credentials and claim your academic or non-academic status to access the different sets of features, then Enter or click on Login.
Log in credentials are encrypted and stored individually for each user if multiple users are using the same computer.
If you are using a network with a proxy, please configure Proxy settings at this point, before any connection to the BioTuring server is made. The software needs to have correct proxy settings in order to connect to our server and verify your credentials, as well as to get access to our public database.
BBrowser Home page shows you all data that you can download to the local computer, including public data from BioTuring server and data from your remote repositories.
BBrowser Data page shows you all data that you have downloaded or submitted to the software. You can refer to this page as your local database. Also, you can submit a new dataset here.
BBrowser Settings page helps you to:
When you click on a study on the Data page or click on Explore a study from the Home page, you will enter the Analysis dashboard.
Here you can visualize the data and perform all the analyses.
The main visualization is a scatter plot of dimensionality reduction (t-SNE or UMAP), with each point representing a single cell. Cell color, size, and shape change when you run different analyses. The scatter plot is interactive, allowing you to zoom, move, rotate (in 3D mode), or select cells
Inside the main visualization window are some function boxes:
On the right of the main visualization, window are main function tabs
There are 4 tabs here, each comes in a small window which can be expanded or closed. These tabs either give you more insights about the data or provide additional visualization, which are:
At the bottom of function tabs are information about study input/ output and visualization and analysis settings.
The other 2 interfaces: Sub-clustering dashboard and differential expression dashboard will be described in their specific section.
If you need help while doing analysis, press Alt (on Windows) or hover your mouse to the top left of the screen (on macOS) and click on Help to view our tutorials or to contact us.
Massive amounts of single-cell RNA sequencing data generated have opened avenues for exploration, yet also brought up new challenges to standardize data formats, systematically access transcription profiles of cell types across studies and integrate multiple datasets.
Hence, in BioTuring Browser, we have indexed published single-cell RNA sequencing data from multiple formats to our platform to remove that barrier. All data are processed and annotated to be instantly accessed and explored in a uniform visualization and analytics interface.
In addition to that, we have developed our set of marker genes for over 200 cell types and use that gene list to verify the author’s annotations and re-label the cell types to BioTuring cell ontology to systemize cell types available in our database.
Users can also query a single or multiple gene expression across all datasets in the database and see how the genes expressed in different clusters without downloading any dataset.
The section below explains how we index published data and how the gene query across the database works.
Step 1. Data collection
Single-cell gene expression matrices or Seurat/Scanpy objects are obtained from the author or public repositories. If Seurat or Scanpy objects are available, we will reserve the analysis results and move to the annotation step (6).
Step 2. Filtering and normalization
Cells and genes from the submitted matrices are filtered to avoid drop-out, doublets, and apoptotic cells. Data are then subjected to log normalization and highly variable genes selection. QC criteria are subject to the authors’ descriptions.
In case details of the filtering and normalization are not available, we will process the data by ourselves to get the most similar results with the publication.
Step 3. Batch effect correction
We follow the methods used in each study. If not provided, we will apply CCA correction.
Step 4. Dimensionality reduction and clustering
We use the first 30 components of PCA to calculate 2D and 3D t-SNE or UMAP, the parameters of which are taken from the author’ descriptions.
Step 5. Clustering
The dataset will go through both graph-based clustering by the igraph package (Csardi and Nepusz, 2006) and k-means clustering (Neter et al., 1998).
Step 6. Annotation and standardization of cell type labels
Cell type annotation matrices are obtained from authors and loaded in BioTuring Browser, together with metadata of the experimental design. We then manually verify cell type annotations using known markers and unify the terminology based on our internal cell ontology.
If annotation and metadata are not available, we will extract information directly from the publications.
Users of BBrowser can view all studies in the public database when opening the Home page of the software. You can also access the list of studies available in BioTuring website: https://bioturing.com/bbrowser/datasets
We select the studies to index based on the needs of our users and community.
If you have a study of interest and would want it to be indexed by BioTuring team, please contact us at firstname.lastname@example.org
If you are an author, we are very happy to distribute your data on BBrowser for public access. Please also contact us at email@example.com
Since version 2.1.3, we introduced a special search engine to help you look at one gene or multiple gene expression across every public dataset of BBrowser. Without downloading anything from the server, the gene search engine lets you skim through a huge amount of information in the most efficient way.
You can find the gene search engine in Home page > Search genes tab
The search result is sorted descending based on this number. Information about the study and option to Download are the same as in the Search studies tab. Please refer to Section 5.1_Search and download a public study to search your studies.
You can get a dataset on BBrowser by downloading it from BioTuring server or from your internal server or by importing the data from your local computer.
Currently, BBrowser supports analyzing data from human (Homo sapiens) and other species, such as mouse (Mus musculus), rat (Rattus norvegicus), zebrafish (Danio rerio), and fruit fly (Drosophila melanogaster). If you input data of other species rather than those, the software can still process the data (except transcript quantification step), however, the gene information will be disabled.
BioTuring Browser hosts a public database of published studies that are selected, processed, verified and uniformly labeled by the BioTuring team. You can view the list of studies in this database in the BBrowser Home page.
To download a study from BBrowser public database, you need to be connected to the internet and follow these steps.
If you want to search your studies through general terms like its title, authors and other keywords:
This type of searching allows looking for studies that express your gene(s) of interest. Go to Home page > Search genes.
If you search one gene, you will get a violin plot of that gene’s expression. On the plot:
All violin plots are interactive. You can hover your mouse over the plot to get the statistics (e.g. quantiles, median, mean, etc.), or drag to enlarge an area of the plot. Double click on any part of the plot will bring it back to the original setting.
On the top right of each dataset, there is a horizontal bar telling the number of cells that express the gene in total. The search result is sorted descending based on this number.
BBrowser will show all studies that express the gene of interest. You can scroll down to find the study you want to download or click the camera button to export the plot.
If you search for multiple genes, the results will be a series of heatmaps, each of which is from one dataset. Each heatmap shows:
Download the data
To import a single-cell RNA sequencing study with raw data, you need to provide a folder containing all your FASTQ files.
BBrowser supports importing expression matrices as MTX, TSV, and CSV files with integer counts.
The expression matrix files can be unzipped or zipped in gzip.
To import a study by single or multiple MTX files, you need to provide a folder with exactly 3 files:
When multiple folders containing data from multiple batches are submitted, options for selecting batch correction methods will be available.
If multiple folders were submitted, in the Analysis dashboard you will find the input metadata classification with the name of clusters are input folders’ names. This helps you visualize (colored and shaped) the cells based on which batch they come from.
The three files barcodes.tsv, features.tsv (or genes.tsv), and matrix.mtx are the standard files from 10X CellRanger. Below, we describe some more details of the data format that will affect the analysis.
To import a study by single or multiple CSV/ TSV files:
A .tsv or .csv files are simply a table in which values are separated by a delimiter. It can be a tab (in .tsv) or a comma (in .csv). If you use a table editor, such as Excel, Libre, or Google Sheet, it always can export your table into either .csv or .tsv format.
BBrowser requires a strict format in order to parse the information correctly. Please make sure that the first column of the table has the gene names / Ensembl identifiers, and the first row of the table has the barcodes.
For users who want to export a matrix using R, please be careful because writing a matrix in R may lose one first cell of the first row. For example, given a matrix object having 1000 rows and 500 columns:
num [1:1000, 1:500] 0 0 0 0 0 0 0 0 0 0 …
– attr(*, “dimnames”)=List of 2
..$ : chr [1:1000] “ENSG1111111111” “ENSG1111111112” “ENSG1111111113” “ENSG1111111114” …
..$ : chr [1:500] “CTGGTCCGGTGTTATCAG” “TTACTGGGACGACTCGGG” “ACGAGGAGACCCGAGATA” “CTTTGCAGTAGGGGCAAC” …
Write a .csv file in this way will lose the first cell. The first row of the file will only contain 500 values while other rows will be 501:
write.table(m, ‘matrix.csv’, sep=”,”, col.names=T, row.names=T)
Please use this command instead. It is much easier:
For .tsv file, the best way is to use the common write.table, then manual insert one tab on the beginning of the first row.
BBrowser supports importing processed scRNA-seq and CITE-seq data by Seurat (.rds) and Scanpy objects (.h5ad/ h5) with integer counts (raw counts or rounded counts).
Quality control parameters and the dimensionality reduction method are not needed because these steps have been done on the Seurat/ scanpy object.
A Seurat or scanpy object must contain an expression matrix with information on barcodes and genes. BBrowser can also adopt some analysis results in the object. These results include, but are not limited to:
Upon receiving the Seurat or Scanpy object, BBrowser will read all data available and runs analyses to get the missing information.
BBrowser is able to read a Seurat object stored in .rds format. To create a .rds file from Seurat, you can use the saveRDS function in R. We will not go into detail about the structure since the software does not require any specific modification of the original Seurat structure. The most critical information in each object is the count matrix, which should be store in @assays$RNA@counts for gene expression data and @assays$ADT@counts for antibody captured data.
For users who analyze with Python via the scanpy library, the final AnnData class should be stored in .h5/.h5ad format using the .write function within the class itself. Unfortunately, hdf5 is too general and there are many variations of the structure in which the information is recorded. BBrowser expects the following structure:
BBrowser supports importing the spatial expression matrices from 10X Genomics Visium system as MTX. You need to provide a folder with exactly 3 files for gene expression information and 1 spatial folder:
To import the spatial transcriptomics data:
We are fully aware that different datasets were generated under different experimental designs and may have to be treated uniquely in order to represent all biological variations in the samples and for public studies, to reproduce the published results in the most faithful way. That is also the long-term plan for BioTuring Browser to maintain the speed and ease of use, while enhancing the flexibility of the analyses. All public datasets and imported data underwent the same pipeline, separate steps of which will be discussed in this section.
Transcript quantification is only applied when you create a new study with raw sequencing files (FASTQ). The process is run by Hera-T (version 1.2.0) (Tran et al. 2018), a new algorithm developed by BioTuring team. This is applied to data generated by 10X protocol on Chromium v2 and v3. The processing speed is up to 10 – 100 times faster than CellRanger 3.0 with better accuracy (Tran et al. 2018). The output of transcript quantification is an expression matrix in MTX file format and the file will be submitted for further processing steps below.
The process from quality control to dimensionality reduction is applied to public and in-house datasets imported in MTX, TSV or CSV files.
Quality control filters out poor-quality cells in terms of gene expression and redundant non-expressed genes in the data.
In public datasets without a detailed processing script from the author, genes having at least 1 UMI count in less than 3 cells are excluded. Then, cells with less than 200 genes having at least 1 UMI count and more than 5% of mitochondria genes are excluded. The process creates a new expression matrix that may have fewer cells than the original data, and BBrowser only takes the cells and genes of this filtered matrix for the next processing steps.
For in-house data, BBrowser allows users to define the cut-off for quality control or to skip any filtering steps. In the data import pop-up, you can
This process is applied when multiple MTX, TSV or CSV files are submitted, usually from multiple batches of sample preparation and sequencing. The software considers each file as a batch and will try to scale all batches with the chosen method
Currently, we provide 3 methods to remove batch effects for your preference:
On BBrowser, you can choose to run dimensionality reduction by t-SNE or UMAP.
t-SNE (Maaten and Hinton 2008):
The analysis is done by the Rtsne package (Krijthe 2015). The default perplexity for t-SNE is set at 30
UMAP (McInnes and Healy 2018):
The analysis is done by the uwot package (Melville 2018) . The number of neighbors is set at 30.
This analysis runs on the PCA results. For every dataset, the software will calculate both louvain (graph-base).
BBrowser uses a non-parametric approach, called Venice, to detect marker genes. It is an open-source algorithm and can effectively run on a large amount of data while the accuracy is outperform other methods (Hy et al. 2019).
We first defined marker genes of a group of cells in a data set as the genes that can be used to distinguish such cells from the rest. From this idea, we used the accuracy of classification as a metric to score the significance of a marker gene.
Considering each gene separately, we denote a cell as where is the label of a group of cells. if the cell is in the group of interest (group 1 – the group that we want to find the marker genes for). if the cell is not in the group of interest (group 2 – the rest of the data). We denote as the complement group of .
The probability for a cell being in group , given its expression level is:
In most of the cases, the group of interest is much smaller than the rest of the data and can generate a sampling bias. To avoid this bias of sample size, we set:
Accuracy of the classifier is:
The accuracy of prediction is:
Intuitively, For the robustness of the calculation, we divide the expression into intervals:
Where is the number of cells of group in group , and is the number of cells in group . For each gene, we can estimate the accuracy measure for using this gene to predict cells inside or outside the cluster and use this as a metric for ranking the marker genes.
We tested Venice on both real and simulated datasets. The benchmark considered the performance on 2 different sequencing technologies (full-lenght and UMI count), 4 different kinds of marker genes (including transitional genes), and 2 different kinds of null genes. Venice exhibited the best performance and accuracy in all cases. It could effectively detect different types of marker genes and avoid false-positive results while keeping a modest running time.
Venice is also incorporated in Signac, a single-cell analytics package developed by BioTuring. The package is available at https://www.github.com/bioturing/signac
This analysis is adopted from the GSEA method (Subramanian et al. 2005), a common analysis for selecting potential biological terms given a sorted list of genes. The software performs GSEA on 4 different terms: biological process, molecular function, cellular component, and biological pathway. The first 3 terms are from the gene ontology (Consortium 2004), and the last one is from the reactome database (Joshi-Tope et al. 2005).
Enrichment analysis can be found in both the Analysis dashboard and the Differential expression dashboard
This feature shows you the suggested cell-type for a group of cells. When a user does a selection by clicking a cluster/annotation or using the Select cell tool, the software picks genes that express in at least 35% of the group. This process does not select from the whole transcriptome, but instead on a list of cell-type markers in our curated knowledge base. Then, it takes that gene profile to estimate the correlation with the cell-types profile. A cut-off of 0.5 is applied to remove non-potential candidates. The remaining cell types will undergo and tree search to find the common parents. Parents which have less weight (e.g. distinct from the rest) are removed. This process is repeated until only one cell type left. The whole analysis usually takes 1-3 seconds to finish, hence, it triggered automatically.
BBrowser supports finding the differential expressed genes between two groups of cells, with each group must have at least 3 cells. It finds differentially expressed genes using Venice, the same method for finding marker genes. Users can switch to edgeR, a more common method but takes at least 5 times longer.
For the log2FC value of each gene, we use the same method of the Seurat package (Gribov et al. 2010). Below is the detail formula:
Depending on the data available in your study, you can choose between several visualization methods:
By default, the main plot is calculated by gene expression. It can be t-SNE or UMAP subject to the method chosen during the pre-processing step.
Filling in the boxes of XYZ-axis
Create 2D feature plot with 2 proteins
Create 3D feature plot with 3 proteins
Or create 3D feature plot with 3 genes
Watch our tutorial video here
t-SNE/ UMAP of gene expression can be viewed in 2D or 3D, while other plots are set as 2D.
You can interact with the plot by zoom in/ zoom out, switch between 2D and 3D, move and rotate the plot and reset it to the original state.
On the bottom right corner of the scatter plot, there are several buttons that control the visualization as well as how a user can define a selection
From top to bottom:
Flip: Flip the graph by the horizontal or vertical axis
Zoom (plus/minus): zoom in and out. The point size of the scatter plot remains unchanged when zooming. Alternatively, you can use your mouse wheel to zoom.
Rotate: Rotation is enabled for both 2D and 3D plots.
Dimension: Switch between 2D and 3D scatter plots. For Seurat/ scanpy objects that do not have 3D coordinates, BBrowser can re-calculate the 3D coordinates based on PCA results.
Opacity: Adjust the opacity of cells.
Change from 2D tSNE plot into 3D tSNE plot.
Users can customize the theme, point size, transparency and color palette of the main plot.
Options for altering the scatter plot appearance includes:
Metadata tab helps you to color the cells to your preference. Users will decide the group of clusters they would want to visualize, hence, changing the way cells are colored and filtering with their conditions.
The software offers various classification methods: unbiased graph-based clustering, classification by input metadata, or by your own definition and annotation. You can also import your annotation matrix from a TSV file in Metadata tab.
Metadata tab is always activated.
You can query genes/proteins’ expression, and visualize it with different plot types. The plot types will be subject to the number of genes you query, and whether your current metadata is categorical or numeric.
8.1.1. For a single gene or protein
To see how a gene or protein is expressed in the given dataset, you can type the gene/ protein name or its Ensembl ID or alias into the gene/protein query box at the top right corner of the scatter plot and Enter.
Upon querying a gene/ protein, BBrowser provides two ways to visualize its expression
In case the expression values are stretched in a large range, you can choose to visualize from 5th to 95th percentile of the data to eliminate outliner points.
Image exported when querying a gene expression showing all expression values (top) or top 5th-to-95th-percentile of expression values (bottom)
8.1.2 For two genes or proteins
8.1.3. For multiple genes or proteins
BBrowser supports viewing the expression of multiple genes or proteins, in the scatter plot or in a heatmap.
8.1.4. Gene-gene correlation
BBrowser also supports you to view the pairwise correlation of multiple genes’ expression among clusters or groups from annotation. BBrowser offers 3 methods, including Spearman, Pearson and Kendall correlation.
Watch our tutorial video here.
If you query genes while showing the numerical metadata (e.g. mitochondrial percentages, gene counts,…), the plots will look different from those created while showing the categorical metadata.
8.2.1. For a single gene
If you show the numerical metadata and query 1 gene, you can create a trend line.
In the trend line, each dot is a cell. The y-axis shows the gene expression values. The x-axis reflects the values of numerical metadata, such as UMI count, gene count, mitochondrial count, and etc. The cubic regression is also calculated and represented by a red line in this plot.
8.2.2. For multiple genes
If you show the numerical metadata and query multiple genes, you will get a heatmap that shows gene expression values and numerical values of all cells..
When saving the multiple genes/ proteins query to Gene gallery, you can save your gene(s)/protein(s) of interest for reviewing later.
For other analyses: add annotation, view compositional breakdown, finding marker genes and differential expression analysis, etc., you first need to select the cells. Cells that are selected will be colored in white.
9.1.1. Lasso tool & Move tool
The most common way to select cells is by using the Move tool and Lasso tool. You can find these tools at the bottom right corner of the main scatter plot.
Lasso tool (above) and Move tool (below). Just hover on the Lasso tool to see more instructions.
Using Lasso tool to select a cell population
You can also select cells that are already clustered/ annotated from the Metadata tab.
To select cells in one cluster:
9.1.3. Select cells by gene expression
You can select cells that shared the same expression level of one given gene:
For this example, we will show how to select CD8+ T cells from the Responder group of Sade-Feldman et al 2018 (combining 2 conditions: CD8 T cells, belonging to Responders).
From BBrowser 2.5.3, you can filter cells by multiple conditions using the Advanced Filter tab, combining two methods: filter by expression and filter by metadata. BBrowers will display the groups of desired cells from your filter.
For this example, we will show how to select CD8+/TCF7+ T cells from the Responder group of Sade-Feldman et al 2018 (combining 3 conditions: CD8 positive, TCF7 positive, belonging to Responders).
If you want to add a gene expression cut-off, select Filter by expression
-Box select: this option allows you to select cells within a range of numeric values by dragging the bidirectional arrow across the histogram. The selected cells will be highlighted in white.
-Deselect: deselect all the cells
-Zoom: this option allows you to drag the bidirectional arrow and zoom into a specific region on the histogram.
-Zoom in/ Zoom out: zoom in or zoom out the histogram
Besides Cell Search, BBrowser can predict a cell type given a custom definition. This cell-type prediction tool takes a list of marker genes defined by the users as the reference and evaluates the expression of all those marker genes in the selected population to predict the cell-type. Whenever a cell population is selected, the process will automatically be done. The cell type prediction result will appear in the infobox on the top left corner of the scatter plot. It includes the cell type name and the marker genes’ information.
By default, cell type prediction is applied only to data with less than 50,000 cells due to the long processing time needed for a large dataset. You can enable the function for large data by increasing the cell number limit in Settings > Analysis > Cell-type prediction limit.
Interface of cell-type prediction settings
Click on the gene in the information box to query it
The cell type definitions from the user will be stored in a reference table, in which columns are the labels (cell types) and rows are the union list of markers across all labels. This table consists of 1, -1, and 0 indicating for positive, negative, and neutral markers respectively.
Based on the selection of the cells, a list of expressions is extracted from the dataset. BBrowser then creates a percentage profile indicating the percentage of expressed cells using the genes in the reference table.
Given a column in the reference table, y= [y1,y2, …,yn], and percentage profile, x= [x1,x2, …,xn], where n indicates the number of genes in the reference table.
A quality control step takes place right after having the percentage profile. This step filters out the labels in the reference table that are not worth prediction. A label is valid only if:
Basically, BBrowser uses a 50% as an upper threshold and lower threshold for negative and positive markers, respectively.
After quality control, the prediction is made based on a likelihood function derived from the one for logistic regression to handle -1 and 1.
Given a list of likelihoods from all labels, BBrowser returns the most fit as the result.
The Cell Search Engine is designed to help you find cell populations in BioTuring public database which have similar transcription profiles to your selected cells – suggesting the cell type and signature genes, enrichment processes of the selected group.
The server will return results in a pop-up window, including:
This is a beta version of the function. We are looking forward to your thoughts and comments to improve it for future release. Click on “Share your thoughts on this function” to leave your feedback.
All the cell search queries will be automatically saved in the Cell search tab to reopen, rename or reselect. You also can delete them.
Watch our tutorial video here.
Finding marker genes/proteins and enriched processes in a group of cells helps you to see the genes/proteins and processes that are differently expressed in that selected group, compared to the rest of the cell population. The information is essential to define which cell type the cluster belongs to. To run the analysis:
Details on the markers and enrichment analysis include:
This section shows how to find markers for every group in your current metadata by a single click and create a heatmap for them.
BBrowser will ask you how many top marker genes you want to add to the gallery. Just name your gallery (for example, “Top 10 marker genes of every cluster”).
Opt to pick out just gene or protein markers. After that, select the number of top gene or protein markers for each group that you want to add to the gallery. You can use the default option, which is to create a gallery for the top 10 markers of each cluster.
You can click on each picture in the Gallery to query the top 10 markers for each group.
To see the full list of gene or protein markers for each group, select that group on the Metadata tab and scroll to the Marker Features tab to view them.
Watch our tutorial video here.
You can add multiple annotations to a cell, regarding cell type, subtype, expression level of a gene or set of genes or clonotype, etc. There are 2 ways to add an annotation:
Click OK to implement.
– All batches: import annotations for all the batches using 1 single annotation file.
– Each batch: import annotations for each batch using separate annotation files.
For numeric metadata, such as mitochondrial percentage, ages, treatment time, number of genes expressed, and pseudotime results, BBrowser visualizes them in a histogram.
After an annotation is added, you can change the names, merge 2 clusters together, delete the cluster or the whole group.
BBrowser supports cellular composition analysis for any group of cells, whether annotated or not annotated. Users will define the group of cells they want to view composition and the type of classification. The software will identify the percentage of each cluster from the chosen classification in the group of selected cells and sort the clusters by order of majority.
We recommend the Normalized by total tool for reducing the bias of unequal distribution affected by unbalanced sample sizes.
See below for an example. The total number of cells from disease doubles that from non-disease. Therefore, if you want to discover the percentages of disease and non-disease in Macrophage, it is more likely that the percentage of disease will be dominating. That will create the biased proportion of macrophage cells from two groups. After using Normalized by total tool, the result will show the cell composition without the bias coming from sample size.
The normalizing formulation is as follows:
Total number of cells from group 1 (Disease): A;
Total number of cells from group 2 (Non-disease): B;
Number of cells from group 1 (Disease) in the selected population (Macrophage): a
Number of cells from group 2 (Non-disease) in the selected population (Macrophage): b
Normalized percentage of group 1 in the selected population:
Performing differential expression analysis on any given two clusters will help you to find out the genes that cause differences between 2 clusters and processes associated with them. You also can run DE analysis in Sub-cluster with the same steps with Main Cluster. (Refer Section 16_Sub-Clustering)
You can run DE analysis on 2 clusters in the same annotation in the Composition panel
If you don’t select any cell population in the Metadata tab.
You can run DE analysis on any 2 selected groups of cells.
BBrowser offers 5 methods to run DE analysis: our in-house algorithm Venice and 6 differential expression analysis algorithms from Seurat package –Wilcoxon, Likelihood-ratio test, T-test, Poisson, and Logistic regression.
To choose a method, go to Settings > Analysis > Differential expression analysis.
After you run the DE analysis on two clusters of interest, the software will proceed to the DE dashboard, showing differentially expressed genes by a volcano plot of all genes, a box plot of a single gene expression, a table of genes and enriched processes, and a scatter plot of cells in two clusters
If you click on a gene on the volcano plot or the table, the scatter plot will show the selected gene’s expression. You can also query a specific gene expression by filling the gene name in the top right box.
The DE dashboard toolbar (at the bottom right of the scatter plot):
DE analysis results are automatically saved right after you run it, so you do not have to perform the analysis again in the future. To review the DE analysis result, click on the Differential Expression panel > View previous results.
You can edit the name or delete the analysis by clicking at the top right corner of it, click on Save/ Confirm to save the change.
Sub-clustering is an advanced feature that takes out a group of cells and treats them as a new set of data. The software will calculate new principal components and dimensionality reduction results to plot the selected cells in a new scatter plot. They will also be re-clustered based on louvain and k-means clustering methods.
Focusing on a subset of data with less cells than the original one helps you to identify more principal components and components that are significant only to this group of cells. Therefore, you can further group the cells to smaller clusters with distinct expression profiles. This feature is suitable for analyzing clusters with large heterogeneity.
To run sub-clustering, first select a group of cells (refer to section 9_Select a cell population) and click on the Sub-clustering icon. Name the sub-cluster as you like and click on Apply.
Re-calculation for the sub-cluster usually takes some minutes. After that, the Sub-clustering dashboard will be automatically open.
Sub-clustering dashboard is similar to the Analysis dashboard and can be used for query gene expression, find marker genes and enriched processes, study cellular composition, etc. but not differential expression analysis. A Mini map at the bottom left of the dashboard shows the main scatter plot with all cells of the sub-cluster highlighted in white.
To go back to main Analysis dashboard, click on the name of the sub-cluster at the top left corner and choose Main cluster from the drop-down.
Adding annotation in the sub-cluster dashboard is like in the Analysis dashboard.
First, select a group of cells, then click on Create an annotation and define the Group name and Cluster name, follow section 13_Add a annotation.
Annotation created in Sub-cluster dashboard is treated equally to the one created in the main dashboard. Hence, you can view your sub-clusters in the main scatter plot or annotate sub-clusters in the different sub-clustering dashboard under the same group name.
In sub-cluster, you can run DE to compare any 2 groups like in the main cluster. There are also two ways to run DE analysis in sub-cluster: Composition panel and DE tab. For more details on the differential expression analysis, please view this Section 15_Differential expression analysis.
Sequencing the TCR is a powerful instrument to dissect the complexity and diversity of the T cell response repertoire. By associating the TCR with gene expression, BBrowser can provide an unbiased classification of a population of interest and the association of the transcriptional landscape of each cell with its TCR.
Expand the Clonotype panel on the right side to show the Clonotype dashboard. All cells in the main scatter plot will be changed to gray color and spot size is decreased. A mini map will pop-up showing you the previous coloring of the scatter plot.
Now, you can add TCR sequencing data by clicking on Upload data.
TCR sequencing results can be imported as TSV or CSV files.
The input matrix must have enough information for typical V(D)J annotations. BBrowser only reads data from columns with the column name fall into the list below. Columns that are not in this list will be ignored.
The software only chooses clonotypes that are both full_length and productive. The CDR3 amino acid sequencing is used to map with the VDJdb (Shugay et al. 2017) to find out about the information of relevant epitopes.
In case your data comes from multiple batches, the TCR sequencing data should be submitted for individual batches. Clicking Upload data button in that case will show you a pop-up to select the input file for each batch.
Cells with recognized TCR sequence will now be colored according to their clonotype and spot size is changed to normal. The cells will be highlighted and enlarged if you hover the mouse on the clonotype name. Details on the number of cells in each clonotype and relevant antigen information are displayed in a table format.
There are two ways to perform clonotype count. You can switch between them using the Group by option:
You can view shared clonotypes with stacked bar charts in the clonotype panel. The Groups column will show what groups that clonotype belongs to. (In the example, the first clonotype (48 cells) exists in P1211 (48 cells) and the second clonotype (39 cells) exists in P1023 (39 cells) with the Patient metadata)
The Groups column will correspond to your current annotations in the metadata tab. (For example, if you want to view what Locations (Normal, Blood or Tumor) have a clonotype, change the annotation to Location.)
To view the full list of clonotypes in a bigger table, you can click on the small window button next to the export button.
BBrowser also supports you to convert the clonotype information to annotations, so that you can run any analysis on different clonotypes including marker gene detection, enrichment analysis, composition, and differential expression analysis.
To do this, click Add to metadata button at the top of the Clonotype panel.
Table of export formats
BBrowser supports exporting graphs into .PNG or .SVG formats.
Export violin plot as SVG file from Search genes tab
Export box plots, violin plots and density as SVG files from Gene query expression panel
Export box plots and violin plot as SVG file from DE analysis dashboard
Export main scatter plots as PNG file
Export heatmap as PNG file when querying multiple genes
Export composition plot as PNG file
Export scatter plots of two groups as a PNG file from DE analysis dashboard
Export Volcano plot as PNG file showing DE genes.
BBrowser also supports exporting figures to BioVinci through the Export to BioVinci button to enable more flexible plot editing. (See more at Table of export formats )
In BioVinci, you can:
To learn how to import, move to section 5_Get your data/Input Spatial transcriptomics data
BBrowser supports viewing both t-SNE/UMAP and spatial images of the dataset at a time in multiple interactive windows after importing spatial data (refer section 5_Get your data). Selection of spots on a window can be reflected on another in real time.
You can also see how a gene or multiple genes are expressed and co-expressed on the spatial coordinates. Just type the gene name or its Ensembl ID or alias into the gene query box at the top and Enter.
For more information, please refer to Section 8_Query gene or protein expression
Violin plots, box plots, bar plots, 2D density, or heatmap are available for easier comparison of gene expression among layers. The options can be found below the color bar when you query genes.
For more information, please refer to Section 8_Query gene or protein expression
BBrowser also supports you to create a heatmap for all the top marker genes for all clusters/layers
For more information, please refer to Section 12_Find marker genes or proteins and enriched processes
You can also study enriched processes in a layer by using the Enrichment Analysis option.
For more information, please refer to Section 12_Find marker genes or proteins and enriched processes
The differential expression analysis lets you compare any two layers and find the differentially expressed genes. The example shows the differentially expressed genes between cluster 1 and cluster 2, including Mef2c, Stmn1, Arpp19, Snap25…
For more information, please refer to section 15_Differential expression (DE) analysis
You also need to include the operating system where this application was installed and used to generate the results. We highly recommend that you provide step-by-step instructions to perform the analysis in BioTuring Browser so that other people can easily reproduce your work. For parameters and technical details, please refer to our documentation.
You can use the following examples as a guide:
“Sequencing data (.fastq) was aligned and quantified by Hera-T (Tran et. al) based on the human GRCh37 reference. The count matrix was processed to filter out low-quality cells having less than 200 genes and mitochondrial genes higher than 10%. All processing steps were done by BioTuring Browser version x.x.x for Mac OS developed by BioTuring Inc., San Diego California USA, www.bioturing.com”
“Preprocessed data (.rds) was submitted to BioTuring Browser (version x.x.x for Mac OS, developed by BioTuring Inc., San Diego California USA) for visualization and downstream analyses. The differential expression analysis was performed by Venice (BioTuring Inc.)”
“Data analyzed by/Plot generated by BioTuring Browser, ver x.x.x”
[MLA] BioTuring Inc. BioTuring Browser: A Platform for Single-Cell Data Analysis. Version 2.6.0, https://bioturing.com/bbrowser, 2020.
[AMA] BioTuring Browser: a platform for single-cell data analysis [Computer software]. Version: 2.6.0. https://bioturing.com/bbrowser. BioTuring Inc; 2020.
[APA] BioTuring Inc. (2020). BioTuring Browser: a platform for single-cell data analysis (Version 2.6.0). [Computer Program]. Retrieved from: https://bioturing.com/bbrowser.
[Chicago] BioTuring Inc. BioTuring Browser: A Platform for Single-Cell Data Analysis. V. 2.6.0. https://bioturing.com/bbrowser. 2020.
[Harvard] BioTuring Inc (2020) BioTuring Browser (Version 2.6.0) [Computer Program]. Available at: https://bioturing.com/bbrowser.
For additional information on citing BioTuring Browser, please contact the BioTuring support team: firstname.lastname@example.org.
Human brain organoids reveal accelerated development of cortical neuron classes as a shared feature of autism risk genes
Bruna Paulsen, Silvia Velasco, Amanda J. Kedaigle, Martina Pigoni, Giorgia Quadrato, Anthony Deo, XianAdiconis, Ana Uzquiano, Kwanho Kim, Sean K. Simmons, Kalliopi Tsafou, Alex Albanese, Rafaela Sartore, Catherine Abbate, Ashley Tucewicz, Samantha Smith,…
BioRxiv, 376509. https://doi.org/10.1101/2020.11.10.376509
Published: 12 Nov 2020
The balance of stromal BMP signaling mediated by GREM1 and ISLR drives colorectal carcinogenesis
Hiroki Kobayashi, Krystyna A. Gieniec, Josephine A. Wright, Tongtong Wang,Naoya Asai, Yasuyuki Mizutani, Tadashi Ida, Ryota Ando, Nobumi Suzuki, TamsinRM. Lannagan, Jia Q. Ng, Akitoshi Hara, Yukihiro Shiraki, Shinji Mii, Mari Ichinose,Laura Vrbanac, Matthew J. Lawrence, Tarik Sammour, Kay Uehara,…
Gastroenterology, 011. https://doi.org/10.1053/j.gastro.2020.11.011
Accepted Date: 9 November 2020
Comparison of visualization tools for single-cell RNAseq data
Batuhan Cakir, Martin Prete, Ni Huang, Stijn van Dongen, Pinar Pir, Vladimir Yu Kiselev
NAR Genomics and Bioinformatics, Volume 2, Issue 3, September 2020, lqaa052, https://doi.org/10.1093/nargab/lqaa052
Published: 29 July 2020
Hemolysis transforms liver macrophages into antiinflammatory erythrophagocytes
Marc Pfefferlé, Giada Ingoglia, Christian A. Schaer, Ayla Yalamanoglu, Raphael Buzzi, Irina L. Dubach, Ge Tan, Emilio Y. López-Cano, Nadja Schulthess, Kerstin Hansen, Rok Humar, Dominik J. Schaer, and Florence Vallelian
J Clin Invest. 2020;130(10):5576–5590. https://doi.org/10.1172/JCI137282.
Published: September 14, 2020
The Chemical Biology of Long Noncoding RNAs
Stefan Jurga, Jan Barciszewski (Editors)
Springer Nature Switzerland AG, RNA Technologies 11, p.122, https://doi.org/10.1007/978-3-030-44743-4
Pancreatic Cancer Cells Require the Transcription Factor MYRF to Maintain ER Homeostasis
Marta Milan, Chiara Balestrieri, Gabriele Alfarano, Sara Polletti, Elena Prosperini, Paola Nicoli, Paola Spaggiari, Alessandro Zerbi, Vincenzo Cirulli, Giuseppe R. Diaferia, and Gioacchino Natoli
Developmental Cell 55, 1–15, https://doi.org/10.1016/j.devcel.2020.09.011
Published: 23 Nov 2020
In silico immune infiltration profiling combined with functional enrichment analysis reveals a potential role for naïve B cells as a trigger for severe immune responses in the lungs of COVID-19 patients
Yi-Ying Wu, Sheng-Huei Wang, Chih-Hsien Wu, Li-Chen Yen, Hsing-Fan Lai, Ching-Liang Ho, Yi-Lin Chiu
PLoS ONE 15(12): e0242900, https://doi.org/10.1371/journal.pone.0242900
Published: December 2, 2020
Identification of a subset of immunosuppressive P2RX1-negative neutrophils in pancreatic cancer liver metastasis
Xu Wang, Li-Peng Hu, Wei-Ting Qin, Qin Yang, De-Yu Chen, Qing Li , Kai-Xia Zhou , Pei-Qi Huang, Chun-Jie Xu , Jun Li , Lin-Li Yao , Ya-Hui Wang , Guang-Ang Tian , Jian-Yu Yang, Min-Wei Yang, De-Jun Liu, Yong-Wei Sun, Shu-Heng Jiang , Xue-Li Zhang & Zhi-Gang Zhang
Nature Communications, 12, Article number: 174, https://doi.org/10.1038/s41467-020-20447-y
Published: January 08, 2021
We recommend using a computer with 16GB RAM for data having more than 100,000 cells or processed from FASTQ file. However, on computer with 8GB RAM, you can still open large Seurat objects if they are fully processed with PCA and dimensionality reduction results (tested with 300,000 cells object). If you want to submit count matrices, 8 GB RAM can smoothly process data of 30,000 cells.
If you are using a server with a proxy, the message might come up when you try to login to the software since the proxy connection to BioTuring server cannot be made to verify your credentials. Please click on Proxy settings at the bottom of the login screen and configure your server.
You can import FASTQ, MTX, TSV, CSV, .H5, .H5AD, and .RDS files to BBrowser.
For details about the structure of each file, please refer to section 4.
It depends on the format and structure of the file.
If the file fulfills all the requirements of the software, you can import it to BBrowser.
Otherwise, if the author of the study is willing to share their annotations, BioTuring team would be happy to consider hosting the data in our platform and will index the data based on our standard process.
Since we cannot obtain all the parameters of the data processing steps from the authors, for some steps, our default parameters may be different from those of the authors.
To combine multiple datasets, first, you need to make sure they are in the same format (MTX, TSV or CSV).
After that, open BBrowser > Data > Add new study to import all files, select your method for batch correction and name the study, then click Start to run the processing.
BBrowser supports exporting multiple graphs: scatter plots, box plots, violin plots, etc. in either SVG or PNG format with a fixed design and layout.
If you want to customize the color of the graph, go to Settings > Visualization and change the color scale there.
An alternative is to export data of the graph to tsv and reconstruct it by your preferred tools outside BBrowser. BioTuring team also offers a drag-and-drop data visualization tool called BioVinci.
To compare gene expression across different clusters, first, choose the annotation with the clusters you are interested in. Then, type in the gene name or Ensembl ID in the gene query box and click Enter to query for the gene expression. Click on the arrow at the bottom of the color scale to extend the box and click on the Plot button to generate a box plot of gene expression across different clusters.
Azizi, E., Carr, A. J., Plitas, G., Cornish, A. E., Konopacki, C., Prabhakaran, S., … & Choi, K. (2018). Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell, 174(5), 1293-1308.
Butler, Andrew, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. “Integrating single-cell transcriptomic data across different conditions, technologies, and species.” Nature biotechnology 36, no. 5 (2018): 411.
Consortium, Gene Ontology. 2004. “The Gene Ontology (GO) Database and Informatics Resource.” Nucleic acids research 32(suppl_1): D258–D261.
Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal, Complex Systems 1695(5): 1–9.
Gribov, Alexander et al. 2010. “SEURAT: Visual Analytics for the Integrated Analysis of Microarray Data.” BMC medical genomics 3(1): 21.
Haghverdi, Laleh, Aaron T L Lun, Michael D Morgan, and John C Marioni. 2018. “Batch Effects in Single-Cell RNA-Sequencing Data Are Corrected by Matching Mutual Nearest Neighbors.” Nature biotechnology 36(5): 421.
Joshi-Tope, G et al. 2005. “Reactome: A Knowledgebase of Biological Pathways.” Nucleic acids research 33(suppl_1): D428–D432.
Korsunsky, Ilya, Jean Fan, Kamil Slowikowski, Fan Zhang, Kevin Wei, Yuriy Baglaenko, Michael Brenner, Po-Ru Loh, and Soumya Raychaudhuri. “Fast, sensitive, and flexible integration of single cell data with Harmony.” BioRxiv (2018): 461954.
Korthauer, K. D., Chu, L. F., Newton, M. A., Li, Y., Thomson, J., Stewart, R., & Kendziorski, C. (2016). A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome biology, 17(1), 222.
Krijthe, J H. 2015. “Rtsne: T-Distributed Stochastic Neighbor Embedding Using Barnes-Hut Implementation.” R package version 0.13, URL https://github. com/jkrijthe/Rtsne.
Love, Michael I, Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome biology 15(12): 550.
Maaten, Laurens van der, and Geoffrey Hinton. 2008. “Visualizing Data Using T-SNE.” Journal of machine learning research 9(Nov): 2579–2605.
McInnes, Leland, and John Healy. 2018. “Umap: Uniform Manifold Approximation and Projection for Dimension Reduction.” arXiv preprint arXiv:1802.03426.
Melville, James. 2018. “Uwot: The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction.” https://github.com/jlmelville/uwot.
Robinson, Mark D, Davis J McCarthy, and Gordon K Smyth. 2010. “EdgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” Bioinformatics 26(1): 139–40.
Shugay, M., Bagaev, D. V., Zvyagin, I. V., Vroomans, R. M., Crawford, J. C., Dolton, G., … & Eliseev, A. V. (2017). VDJdb: a curated database of T-cell receptor sequences with known antigen specificity. Nucleic acids research, 46(D1), D419-D427.
Subramanian, Aravind et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences 102(43): 15545–50.
Tran, Thang, Thao Truong, Hy Vuong, and Son Pham. 2019. “Hera-T: An Efficient And Accurate Approach For Quantifying Gene Abundances From 10X-Chromium Data With High Rates Of Non-Exonic Reads.”. doi:10.1101/530501.
Wang, T., Li, B., Nelson, C. E., & Nabavi, S. (2019). Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC bioinformatics, 20(1), 40.