The challenges of open access data

The demand for publicly funded scientific research to be freely accessible to the public and the wider research community (including beyond academia, e.g. government departments and non-governmental organizations) has increased in recent years. years. Consistent with the goal of open-access science, funders of epidemiological cohort studies often require study teams to make anonymized datasets available for wider use.

The general benefits of open access need not be listed in detail here (for example, it democratizes access to data and potentially maximizes the use and impact of data). An in-house review by researchers and management of the Irish Longitudinal Study on Aging of a selection of publications based on its publicly archived dataset (carried out to examine the nature and assess the quality of a random sample of these results) demonstrated these benefits; however, it also identified serious new challenges associated with open access. We outline some of these challenges here because they require the attention of the scientific community.

First, the review found that some publications showed insufficient understanding of the characteristics of the source data or inadequately addressed issues of sampling methods and representativeness of cohorts or subsamples in analyses. These deficiencies have particular implications for epidemiology, where groups underrepresented in panel studies can often be the most affected by a condition.

More seriously, the review found instances of data misuse, which ultimately resulted in the removal of three articles – two with faulty data and one that fabricated data.
1
Pro tip: When you claim to use a dataset, make sure it collects what you say it does. Retraction watch.

These cases are the first cases of such misconduct detected in the Irish Longitudinal Aging Study; however, it became clear to the review team that open-access cohort study datasets could become the new target of so-called paper mills. Researchers investigating paper mills believe that only a portion of fraudulent papers are actually detected and removed.

2
The full-service stationery and its Chinese customers. For better science.

,

3
Digital magic or the dark arts of the 21st century – how can journals and peer reviewers detect paper mill manuscripts and publications?.

, ,

5
The fight against fake paper mills that produce fake science.

This under-detection is the most significant emerging threat to open data.

Beyond such misconduct, advances in software will make the practice of harking (i.e. research proliferation) inferior.

These challenges suggest the need to monitor access to data and the conduct of research, especially as participants who voluntarily provide their personal data do so in good faith with a reasonable expectation that their data will be used to support research. that meets standards of scientific integrity. These actions have reputational, operational, and resource implications for the funders of the studies and for the research and management teams that maintain the cohort studies.

We make three specific suggestions: appropriate oversight of users and uses made of publicly available cohort data by the data controller to provide a sustainable model that retains the benefits of open data but minimizes risks to scientific rigor and reputation of the study; prospective randomization of study team member involvement with data use applicants, to test whether such involvement improves outcomes without unduly straining study resources; and increased awareness of funders and custodians of emerging challenges to ensure research integrity with publicly available datasets and ensure they work proactively and collaboratively to find solutions.

We declare no competing interests.

The references

  1. 1.

    Pro tip: When you claim to use a dataset, make sure it collects what you say it does. Retraction watch.

  2. 2.

    The full-service stationery and its Chinese customers. For better science.

  3. 3.

    Digital magic or the dark arts of the 21st century – how can journals and peer reviewers detect paper mill manuscripts and publications?.

    FEBS Lett. 2020; 594: 583-589

  4. 4.

    Tadpole stationery.

  5. 5.

    The fight against fake paper mills that produce fake science.

    Nature. 2021; 591: 516-519

Comments are closed.