BioTuring Cell Search: a new tool to search for similar populations in public single-cell data sets - BioTuring's Blog
  • Solutions

      For single-cell sequencing data

    • BioTuring Single Cell Browser
      • What is it?
      • Download
      • Request a quote/demo
      • Document
      • FAQ
    • Hera-T
    • For RNA sequencing data

    • Hera
    • Other solutions

    • BioVinci
    • Bioinformatics partnerships
  • Resources
    • Webinars
    • Blog
    • Videos
  • Forum
    • Profile
    • Setting
    • Log out
  • navigation
    • About us

    • Solutions

        For single-cell sequencing data

      • BioTuring Single Cell Browser
        • What is it?
        • Download
        • Request a quote/demo
        • Document
      • Hera-T
      • For RNA sequencing data

      • Hera
      • Other solutions

      • BioVinci
      • Bioinformatics partnerships
    • Resources

      • Webinars
      • Blog
      • Videos
    • Forum

    • Careers

    • Contact us

    Log in| Sign up
    1. Home
    2. Single Cell Analysis
    3. BioTuring Cell Search: a new tool to search for similar populations in public single-cell data sets

    BioTuring's Blog

    Data analysis made easy. For biologists, especially.

    6,000,000 cells
    at ease with
    BioTuring Browser
    Explore now
    Single Cell Analysis

    BioTuring Cell Search: a new tool to search for similar populations in public single-cell data sets

    by biomembers • January 15, 2020 January 22, 2020
    cell type prediction, single cell analysis software, single cell rna seq data analysis, single-cell data, single-cell RNA analysis

    A machine learning model for cell type classification?

    When analyzing single-cell transcriptomic data, scientists often perform cell type annotations by checking individual marker genes. However, marker genes are not even consistent among different literature sources. Six months ago, armed with the largest curated single-cell transcriptomic data, BioTuring single-cell team naively thought that we could solve the cell type annotation problem, simply by building a machine learning model for predicting cell type. We also thought that the machine learning model can help scientists recognize not only cell types, but also cell states, cell conditions (disease/control/etc.)

    We kickstarted this project, and our initial excitement quickly turned into nightmares…

    What were the reasons?

    • Annotations in published studies are not consistent. Even with the same cell, different research groups can annotate it with different labels, either with a general cell type or a very specific subtype — depending on their research goals or even opinions!!
    • There are many rare cell populations not having enough data points for learning. For example, the new 30 AXL+SIGLEC6+ dendritic cells identified by Villani and colleagues (Villani et. al., 2017) will be very difficult to incorporate in any machine learning models built from millions of other common cell types.

    We failed in the path of building a machine learning model for predicting cell types and the enormous effort of a 4-engineer team in 6 months could be wasted!

     

    Desperately, we thought we were probably not smart enough!

    Are we dumb?

    Or the nature of the problem we initially formulated is intrinsically difficult?

     

    When we are desperately stuck, usually, there are some moments we ask ourselves: how would some of our former teachers/advisors solve the problem? A great example came back: When Mike Waterman and Pavel Pevzner faced the difficult Hamiltonian problem in genome assembly, they instead reformulated the problem to an Eulerian path problem, which can be solved efficiently (https://www.pnas.org/content/98/17/9748.short).

    Reformulating the problem as a cell search problem. 

    Given the largest indexed single-cell data, we imagine, when a scientist selects a group of cells, a cell search engine can help find all cells in all published studies that have “similar” expression signatures, together with their cell type labels. An important difference of the prediction model and the search problem is that the former takes subjective human annotations as input to the model, while the latter does not. The search engine allows human annotations to be verified by human! This helps bypass the challenges.

    After getting results from a search operation, scientists can download the matched cells, and see all other labelings of these cells. These may include age, disease, tumor/normal conditions. For instance, would it be more interesting to see that this group of microglia cells only appears in Parkinson patients rather than normal? 

    An important challenge in this cell search engine is that it has to bypass technical variations (cells with different biological conditions but were sequenced under similar sequencing technologies) to return only the cells that match biological conditions. We successfully solved this problem (details will be described in a future manuscript). 

    BioTuring Cell Search: a novel search engine for single-cell RNA-seq data

    Our team built and launched BioTuring Cell Search, a search engine that enables quick and accurate searching of similar cells across our single-cell database of 5 million cells, curated from more than 125 publications. Upon the selection of a group of cells, scientists will get:

    • A list of published studies with matched populations and their cell type labelings
    • Similarity scores between the gene expression profiles of matched populations and the selected cells
    • Similarly expressed genes and enrichment processes shared across all matched populations

    Based on the search results, scientists can download the data sets with matched cells, study their states, conditions, compositions, and other annotations, and finally come back to annotate their cells at their own discretion.

    Example: Using BioTuring Cell Search to verify cell type identification results

    Dataset 1: Single-cell profiling identifies myeloid cell subsets with distinct fates during neuroinflammation (Jordao et al., 2019)

    Profiling more than 3000 myeloid cells in the central nervous system (CNS) in multiple sclerosis mouse models, the study provided an atlas of myeloid cells and their dynamics across various stages of neuroinflammation. Major cell types were identified, including microglia, lymphocytes, CNS associated macrophages, dendritic cells, granulocytes, and monocyte-derived cells. 

    BioTuring Cell Search results on each cluster of the data confirm the cell types recognized by the study. 

    • Single-cell profiling identifies myeloid cell subsets with distinct fates during neuroinflammation (Jordao et al., 2019) visualized in BioTuring Browser
    • Selecting the monocyte population and perform Cell Search
    • The monocytes in this study resemble 14 populations across BioTuring database.
    • Similar populations are ordered by similarity scores.
    • The results come with a list of similar genes and enrichment processes shared by all populations.
    • Searching for populations with similar expression profiles to the microglia cluster
    • Searching for populations with similar expression profiles to the microglia cluster
    • Searching for populations with similar expression profiles to the microglia cluster
    • Searching for populations with similar expression profiles to the microglia cluster

    Dataset 2: Single-cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations (MacParland et al., 2018)

    Published in 2018, the work by MacParland and colleagues is one of the very first human liver cell atlases, unveiling new understandings into the liver cellular heterogeneity. 

    With BioTuring Cell Search, we sought to verify the cell types in the dataset. Most labels match previous publications, including B cells, hepatocytes, endothelial cells, plasma cells, and natural killer cells.

    Other cell types like cholangiocytes and macrophages match various populations in the database, yet with different cell type labels. Stellate cells, meanwhile, exhibit some similarity level with fibroblasts. The population highly expresses genes encoding for collagen production (COL1A2 and COL3A1).

    • Performing Cell Search on the stellate population under the work of MacParland et al 2018. The stellate cells show some degrees of similarity to fibroblasts.

    BioTuring Cell Search now can be used with BioTuring Browser, an intuitive platform for exploring single-cell transcriptomic data. The platform is available for download at https://bioturing.com. It can also be called via API. 

    —

    Reference:

    Jordão, Marta Joana Costa, et al. "Single-cell profiling identifies myeloid cell subsets with distinct fates during neuroinflammation." Science 363.6425 (2019): eaat7554.

    MacParland, Sonya A., et al. "Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations." Nature communications 9.1 (2018): 4383.

    Pevzner, Pavel A., Haixu Tang, and Michael S. Waterman. "An Eulerian path approach to DNA fragment assembly." Proceedings of the national academy of sciences 98.17 (2001): 9748-9753.

    Villani, Alexandra-Chloé, et al. "Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors." Science 356.6335 (2017): eaah4573.

    PREVIOUS POST
    How to explore “Characterizing smoking-induced transcriptional heterogeneity in the human bronchial epithelium at single-cell resolution” (Duclos et. al 2019) | BioTuring Cellpedia
    NEXT POST
    Interactive CITE-Seq data analysis with BioTuring Browser

    Leave a Reply Cancel reply

    Recent Posts
    • Explore 10X Visium Spatial Transcriptomics data at ease with BioTuring Browser November 9, 2020
    • A tiny world inside non-small cell lung cancer revealed by single-cell omics: 35 cell types, and their marker genes October 20, 2020
    • The Why, When and How of 3D PCA October 12, 2020
    • 6 best box and whisker plot makers September 18, 2020
    • Immunoglobulin genes up-regulated in lung adenocarcinoma infiltrating T cells: A report from BioTuring lung cancer single cell database September 8, 2020
    Categories
    • BioTuring Cellpedia
    • Data visualization and analysis
      • Box plots
      • Heatmap
      • Principal component analysis
      • Venn diagrams
      • Violin plots
    • RNA-seq data analysis
    • Single Cell Analysis
    • Single-cell RNA-seq tutorials
    • Visium Spatial Transcriptomics Analysis
    Subscribe now
    6,000,000 cells
    at ease with
    BioTuring
    Browser
    Explore now

    We are here to help

    Navigation

    About us
    Contact us
    FAQ
    Blog
    Webinar
    Video
    © 2019 BioTuring. All rights reserved