How to Accelerate Trusted and Innovative Responses with Expert-Curated Biomedical and Clinical Data

With the scientific research community publishing over two million peer-reviewed papers each year since 2012 (1) and next-generation sequencing fueling an explosion of data, the need for comprehensive yet accurate information that is reliable and ready to analysis on the path to biomedical discovery is now more urgent than ever.

Manual preservation has become an essential requirement in the production of such data. Data scientists spend about 80% of their time collecting, cleaning, and processing data, leaving less than 20% of their time analyzing data to generate insights (2,3). But manual preservation is not just time consuming. It is also expensive and difficult to scale.

At QIAGEN, we take care of the manual curation so that researchers like you can focus on the discoveries. Our certified human data lets you focus on generating insights rather than collecting data. QIAGEN has been organizing biomedical and clinical data for more than 25 years. We have made massive investments in a biomedical and clinical knowledge base that contains millions of manually reviewed findings from the literature, as well as information from commonly used third-party databases and omics dataset repositories. Thanks to our knowledge and databases, scientists can quickly and efficiently generate new high-quality hypotheses, while using innovative and advanced approaches, including artificial intelligence..

Here are seven manual curation best practices followed by QIAGEN’s 200 dedicated curation experts, which we featured at the November 2021 Pistoia Alliance Event.

  1. Effective yet in-depth information capture: Agreementarticles is time-limited, so efficiency is imperative. All the essentials should be captured in a single reading. But because critical information may be distributed throughout the article, curators should read it fully to provide accurate results and context.
  2. Standardization: We use an ontology of over 2 million concepts and dozens of relationship types to capture information. Where possible, data is mapped to public identifiers to improve interoperability.
  3. Sorting: The selection of documents is fundamental for effective manual preservation and helps to avoid reading articles that lack useful information. We developed a way to identify relevant sources using criteria such as novelty, and use automation to prioritize articles for manual curation, as well as use delivery workflows to orchestrate the work.
  4. Training: For consistency, we use curation protocols, training materials and editorial reviews developed in-house. Interns receive ongoing feedback for several months before transitioning to our production environment.
  5. Tools: Good curation tools are essential for accuracy and efficiency. Our in-house created tools ensure that we enter information consistently through guided forms, drop-down menus, constraints on time slots and other features.
  6. Revisions: Knowledge is constantly evolving and needs to be updated based on new evidence. Articles may become obsolete or subject to published corrections, and drug labels and guidelines undergo revisions. Our workflows handle all of these situations.
  7. Quality control: Our metrics measure accuracy, including quality control of curation tools, editor reviews, author error reviews, and database consistency checks.

These principles ensure that our knowledge base and integrated ‘omics database provide timely, highly accurate, reliable and analysis-ready data. In our experience, 40% of public omics datasets include typos or other potentially critical errors in an essential element (cell lines, treatments, etc.); 5% ask us to contact authors to resolve inconsistent terms, mislabeled treatments or infections, inaccurate sample groups, or errors in mapping subjects to samples. Thanks to our rigorous manual curation processes, we can correct these errors.

Oursignificant investment in high-quality manual preservationmeansthat scientistslike youdo not doneed to spend 80% oftheirdata aggregation and cleaning time. We have scaled our rigorous manual curation procedures to collect and structure accurate and reliable information from many different sources, from journal articles to drug labels to omics datasets. In short, we accelerate your journey to comprehensive yet accurate data that is reliable and ready for analysis.

Are you ready to get your hands on reliable biomedical, clinical and omics data that we hand-picked using these best practices? Learn more about QIAGEN knowledge and databases and request a consultation to find out how our accurate and reliable data will save you time and get your questions answered quickly.

The references:

  1. The 2018 STM Report: An Overview of Scholarly and Scientific Publishing.
  2. H. Sarih, AP Tchangani, K. Medjaher and E. Pere (2019) Preparation and pre-processing of data for monitoring broadcast systems in the PHM framework. 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), 1444–1449. DO I:10.1109/CoDIT.2019.8820370
  3. From big data to good data: Andrew Ng urges the ML community to be more data-centric and less model-centric (06/04/2021)

Comments are closed.