Algorithm Expands What Scientists Can Discover From Single Cells


There are hundreds of transcription factors inside human cells, and years of experimental research by trial and error may be required to identify which are turned on or off in specific cell types. This is important information because these proteins could be used as potential drug targets.

Previous methods developed to infer the activities of transcription factors were based on messenger RNA (mRNA) encoding transcription factors. These strategies may not be representative of true biological functions because the activity of transcription factors is often regulated at the post-translational level and key changes in transcription factors may not be detected at the mRNA level.

ScRNA-seq is a powerful tool for studying transcriptomic heterogeneity between cells. However, elucidating the underlying biological functions and regulatory mechanisms (i.e. the activities of transcription factors or gene regulatory networks) of cells on the basis of these data is difficult due to the difficulties of research. integration of data from the biological context.

Most applications of scRNA-seq have focused on spatial biology, which aims to identify clusters of cells based on the proximity of individual cells in a small space. However, this analysis does not take biological function into consideration and does not help researchers discover regulatory mechanisms in subpopulations of cells.

“One of the challenges in the field is that the same genes can be turned on in a group of cells but turned off in a different group of cells within the same organ,” said Dr. Jalees Rehman, professor at the University of the Illinois at Chicago, medical school. “Being able to understand the activity of transcription factors in individual cells would allow researchers to study activity profiles in all major cell types of major organs such as the heart, brain or lungs.”

Improved scRNA-seq analysis

To address the limitations of previous scRNA-seq analysis strategies, a team of researchers from the University of Illinois at Chicago led by Rehman and Yang Dai, PhD, associate professor of bioinformatics at the university, developed the model. Bayesian inference transcription factor (BITFAM) activity.

The system combines new gene expression profile data collected from scRNA-seq with existing biological data on target genes of transcription factors. The researchers matched transcription factors to their predicted target gene set, obtained from the GTRD Chromatin Immunoprecipitation Sequencing (ChIP-seq) databases that contain more than 17,000 transcription factor samples.

A schematic overview of the BITFAM machine learning system developed by UIC researchers. User-supplied sequencing data (“normalized scRNA-Seq gene expression”) and existing data on transcription factor binding sites (“TF-target ChIP-seq gene array”) are analyzed to predict transcription factor activity (“inferred TF activity”) which can be used for a wide variety of assays. Image courtesy of Genome Research. Licensed CC BY 4.0.

The model integrates previous biological knowledge (ChIP-seq) with observed data to infer the activities of transcription factors in cell subpopulations. With this information, the system runs numerous computer simulations to find the optimal fit and predict the activity of each transcription factor in the cell.

They applied the model to several scRNA-seq datasets, including a mouse dataset that contains information on all major organs during adult homeostasis, a blood cell development dataset, and a of CRISPR interference data with 50 targeted deletions of CRISPR transcription factors.

In these settings, BITFAM was able to infer biologically significant transcription factor activities from selected well-established transcription factors with known biological functions. For example, he predicted the strong activity of T-cell acute lymphoid leukemia protein 1 (TAL1) in pulmonary endothelial cells, consistent with previous knowledge that TAL1 is an important factor in activating endothelial genes. .

“Our approach not only identifies significant activities of transcription factors, but also provides valuable information on the regulatory mechanisms of underlying transcription factors,” said lead author Shang Gao, a doctoral student in the bioengineering department of the University of Illinois at Chicago. “For example, if 80% of the targets of a specific transcription factor are activated inside the cell, this tells us that its activity is high. By providing data like this for every transcription factor in the cell. cell, the model can give researchers a good idea which ones to consider first when exploring new drug targets to work on this type of cell. ”

The authors also noted that BITFAM could be used to discover new heterogeneous subpopulations with subtle phenotypic differences driven by the regulation of transcription factors.

“This new approach could be used to develop key biological hypotheses regarding regulatory transcription factors in cells related to a wide range of hypotheses and scientific subjects,” Dai said. “This will allow us to better understand the biological functions of cells in many tissues.”

The researchers said the new system is publicly available and could be widely applied because the model can easily be combined with additional analytical methods that may be best suited for their studies, such as finding new drug targets.

Do you have a unique perspective on your research related to bioinformatics or cell biology? Contact the publisher today to find out more.

Related reading

Copyright © 2021

Source link

Leave A Reply

Your email address will not be published.