Scientific research is increasingly shaped by the collective efforts of vast repositories of health data and biological samples, which together form the foundation of biobanks. These biobanks serve as essential resources, supplying vast datasets for groundbreaking medical research aimed at understanding and treating complex diseases. In our work, we’ve explored over 2,600 biobanks and cataloged their presence across scientific literature, patents, clinical trials, grants, and policy documents to better understand the broad impact of these repositories. By tracing mentions and references across almost 230,000 documents, we’ve uncovered unique insights into the areas of science and innovation most frequently propelled by biobank data.
To deepen this understanding, we developed a metric, the Biobank Impact Factor (BIF), which moves beyond traditional measures of influence. While citation numbers alone often fail to capture the true reach of biobanks, BIF integrates additional dimensions—such as diversity of diseases studied, the extent of public health applications, and the intensity of collaborative efforts—to provide a fuller picture of each biobank’s influence. We’ve found that the openness of biobanks to external collaborators, as well as the depth and quality of their datasets, particularly those linked to comprehensive medical records, strongly aligns with greater scientific significance. By making these insights available through an open-access dashboard, our work offers researchers and decision-makers a tool to engage with biobank data more effectively, enhancing the collaborative and transformative potential of biobank-supported research.
Check out the pre-print here.