close

Data anonymisation: balancing privacy and progress

Reuse of health data is one of the most exciting fields in medical research. As scientists study new ways to extract insights from data, there are real opportunities to foster research on novel therapies and to enhance patients’ health.  

Using and sharing datasets beyond their initial collection purpose requires us to navigate challenging questions about how to tap into the potential of data while protecting the personal privacy of patients whose information could hold the key to future progress.  

One means by which health data may be reused is the process of anonymisation - the process by which controls are put in place and personal information is removed or transformed from data sets. Controls and stripping out or transforming personal information is an effective way to lower the risk of patient reidentification so that data may safely be reused.  

However, preserving the use of the data while reducing the risk of reidentification to zero is not possible. The more information removed from a dataset, the less value a dataset will have for research purposes. Data that has retained utility retains some level of residual re-identification risk. 

When sharing data, context is key 

As a simple, hypothetical example, consider data collected from people living with diabetes. By removing their ages, genders and location, we can better ensure their identity is protected before sharing a dataset with a trusted team of academic researchers interested in health equality. But, in removing key details, we also remove the prospect of learning about gender differences in patient outcomes. So, if scientists want to know whether gender or age shape diabetes care, they will need to launch a new study because an anonymised dataset will potentially no longer contain this information.  

There is a trade-off to be made between preserving the value of the dataset for secondary research and protecting the privacy of individuals – alongside the requirement to comply with regulatory rules and data protection laws.  

This raises questions which must be tackled directly, particularly as anonymisation forms part of conversations on future data policy and the European Health Data Space (EHDS). Setting a clear common standard for data anonymisation is complex because anonymisation is often highly context specific. The answer to the question of how to balance privacy with the pursuit of new knowledge varies depending on factors such as the size of the dataset and how many people have the disease in question.  

In the simplistic example above, a large dataset drawn from the millions of people living with diabetes implies a low risk that an individual might be identified. The chances of identification are higher in a small pool of people living with a rare disease.  

The context of sharing and the recipients must also be considered. For example, stricter data anonymisation controls should be applied when disclosing data to the public than when disclosing to a narrow group of trusted recipients.    

Introducing the Anonymisation Gradient 

EFPIA has developed a visual aide designed to help individuals outside of clinical trial disclosure better understand the balance of retaining data utility and protecting patient privacy. The Gradient is not intended to provide a prescriptive approach to anonymising data and documents but rather to highlight a non-exhaustive spectrum of controls and levels of redaction/transformation for various purposes. We believe the right balance can best be achieved by a context-specific approach to anonymisation, but it must be understood that the risk of re-identification can never be reduced to zero. We have developed the Anonymisation Gradient to: 

(i)  Present different stages of identifiability from personal data, to pseudonymized data, to anonymized data; 

(ii) Consider various types of controls such as contractual controls, duration of access and data security; 

(ii) Highlight the relevance of existing industry practices, including those based on EMA Policy 0070 and the ISO/IEC Privacy enhancing data de-identification framework

(iii) Explain the impact of anonymisation on data utility. 

The Gradient achieves this by providing examples of data being anonymised to different degrees for different purposes and about the factors and measures that should be considered to achieve this outcome.   

We hope that the Gradient can assist EU and Member State bodies when they consider anonymisation techniques, for example, in the context of the EHDS, national research laws and EU or national research programmes.  

If you want to know more about the Gradient, please contact evita.vandam@efpia.eu.