Getting rid of lazy data

Sharing research results and data will power a learning health care system but we need to ensure that privacy is protected

Doug Fridsma
5 min readAug 26, 2022

--

In my recent collection of blogs, I’ve been exploring data privacy, and how important protecting a patient’s privacy is to public trust. As more data is available in electronic form, it becomes even more critical that we protect the privacy of patient data, maintain public trust that we are using health data responsibly, and share what we learn.

Get rid of lazy data

When I worked at ONC and with Todd Park (when he was the White House CTO), he used to talk about “lazy data” — data that didn’t really do anything, but just sat there. Much of the urgency to get health data off of paper records and into electronic health record systems was driven to get rid of lazy data — to make it possible to use health data at scale for understanding population health and improving the health care system. Many people talk about the idea of a learning health care system in which every data point collected as part of research and care delivery can be used to improve research and patient care. And in my previous blogs have pointed at how important protecting patient privacy is to both public trust, and the ability to turn lazy data into something useful.

So this week, I want to point to an important announcement out of the Office of Science and Technology Policy that has a direct impact on research results, research data, equity, and the learning health care system. While there have been incremental steps to move toward more access and transparency to research results, often important research results are embargoed behind a firewall, or data used for that research is difficult to find and share. That has changed with this OSTP announcement.

Research results belong to the public

Now, any research that is federally funded (think, NIH) must make research results are available without firewalls or payment barriers or embargoes. Results must be published in open access journals. For anyone that has a family member with a complicated diagnosis or rare disease, it can be frustrating when important research results are locked behind firewalls or require significant costs for access the paper results. This policy now makes that data available not only to research institutions, but to anyone.

It also helps level the playing field so that organizations that may lack the resources for costly journal subscriptions now have access to the same information as their more well funded counterparts. Differential access to data creates an unfair advantage to researchers at academic institutions who can afford those subscriptions. In this policy announcement, if the public pays for the research, the public should have access to the results.

Data should be shared equitably

Even more importantly, the OSTP announcement strengthens the requirements for sharing the data that was used to generate the results. In the past, there has been a lag in the time when the results of a study were published, and when the data used for those results are made available to other researchers. Now when results are published, the data must be made available as well.

This is important in a number of ways. First, having access to the data allows other researchers to replicate the findings. This creates more transparency and trust in the science when the results of the first study can be replicated in the second. Studies of research replicability in medicine suggests that 5075% of cancer study results could not be replicated when a different research tried the same experiment. While there are many reasons this is the case (more reasons that I can convey in this overview), having data available at the time that research is published will improve the ability to more rapidly “check the findings” of a study and assure the public that the scientific results are valid.

Second, having equitable access to the data for all researchers will make it easier for under resourced institutions and early investigators to jump start their research. Often students and early investigators are delayed in starting their research because they don’t have access to good research data. Many academic medical center are establish research databases that are available to investigators to test out hypotheses. But if a student or investigator is at an institution that lacks these resources, they are handicapped in competing for federal grants and funding. Having more data available for research purposes will level the playing field for young investigators or institutions that lack the resources for these large research repositories.

The learning health care system

So what does this mean for our goal of developing a learning health care system? First, it makes sure that research results and research data aren’t lazy — people can use data for secondary purposes, accelerate follow-on research studies, and confirm that the results of a study are indeed valid. In this way, every federal research dollar contributes to new insights and new learning in how to take care of patients better. It is an important step, and one in a long line of other changes that need to happen to make sure every research dollar and every patient encounter contributes new knowledge into how to take care of patients better.

Privacy, trust, and data

I am fully supportive of government efforts to get rid of lazy data and make sure that data that is collected as part of government research is made into a public resource. I once asked Francis Collins to estimate what percentage of research dollars are used not for analysis, but to collect data — often collecting the same or similar data again, and again, and again across multiple federally funded grants. While he didn’t have a number, he acknowledged that as data collection becomes more expensive (and research dollars remain level), we continue to collect data, use it once and then collect it again. Now, we are seeing across the NIH, the FDA and other agencies, a desire to use real world evidence — evidence collected as part of patient care — to be repurposed to improve research, health and health care. These efforts have the potential to accelerate drug discovery and lower the cost of research across the life sciences.

But we cannot forget in all of these data sharing plans that much of the clinical research that we use is fundamentally data about people. Individuals who’s data we have an obligation to protect and keep private. We must do everything we can to protect the privacy of patient data while we repurpose it for public good. We need to build privacy into the learning health care system from the ground up.

The OSTP announcement charges government agencies to develop new policies to beef-up data sharing plans and create new incentives to make sure the data isn’t lazy. But we must also beef-up our technology to preserve patient privacy while we link and combine and analyze the data in new ways to generate new insights. Privacy enhancing technologies (PET) are a focus of an ongoing White House challenge in the US and the UK to accelerate research while being responsible stewards of private patient data. Privacy-preserving linkage technology that allows patient records from different organization to be linked without risking patient re-identification will be be a key ingredient to beefed up data sharing plans, and a foundational aspect of the learning health care system to which we all aspire.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Doug Fridsma
Doug Fridsma

Written by Doug Fridsma

Doug is currently the Chief Medical Informatics Officer, Health Universe and a senior advisor for Datavant Inc. Previously the Chief Science officer for ONC.

No responses yet

Write a response