Can we solve the data privacy/utility problem ?
When the Monetary Authority of Singapore (MAS)-led Veritas consortium wanted to test a new fairness assessment framework, it faced a key challenge.They knew that the most rigorous results would come from using real case studies with real data. But for obvious privacy and competitive reasons, the financial services companies involved could not share that data.
The solution? To use synthetic and/or anonymous data. But this itself created a new problem. They would need synthetic data close enough to the real data to be useful, but different enough to protect the privacy of the individuals involved. How could they find the right balance?
The need for anonymity
This speaks to a broader problem that organizations face in many different contexts. The opportunities to get value from data are exploding. But they come with more and more privacy concerns. Especially when personal microdata are at stake.
One answer is to anonymize data. This is a good compliance strategy for companies, and it’s recommended by most data protection rules. By processing a collection of personal data for anonymization, it’s possible to irreversibly alter it to prevent straightforward identification of the individuals who contributed the data. Yet the anonymized data can still be used for larger statistical analysis. That keeps companies in compliance with data protection rules, and lets them drive value from the original data without directly “using” it.
100% anonymity = 0% value?
However, anonymization creates one very particular challenge. There’s always a trade-off between data privacy and data value. The “further away” from reality synthetic data becomes, the less useful it is for analytics or for developing AI algorithms. So what you gain in privacy, you lose in value.
You can’t eliminate this trade off entirely. But where you land on the spectrum between privacy and utility will vary depending on the approach you take. It’s therefore crucial to think proportionately about anonymization strategy, considering the nature of the data and what it will be used for, as well as the privacy risks.
APAT: finding a balance between privacy and utility
For businesses, the objective is to find the most acceptable balance between privacy and value when dealing with personal data. To do that effectively, they need to fully understand the extent of the trade-offs they’re making. But until now, there haven’t been any off-the-shelf assessment methods or audit tools to do this.
This is where Labs’ new Automated Privacy & Value Assessment Tool (APAT) comes in. APAT takes a dataset and evaluates both the privacy and utility of various anonymization strategies, providing a recommendation on the best option for a particular use case. It’s fully automated and generalizable to any anonymization process.
How did APAT help the the Veritas consortium solve its data privacy problem?
Accenture’s Labs and Applied Intelligence groups helped the MAS-led Veritas consortium select a suitable synthetic dataset for one of its key use cases: predictive underwriting for life insurance.To do this, different anonymized dataset versions were synthesized from the original data. This synthetic data was then tested using APAT, producing scores for the level of utility, privacy and similarity to the original dataset.
A strategic tool for all sensitive data use cases
Anonymisation and synthetic data generation are promising solutions to the data privacy challenge. But it’s critical to understand the trade-offs between privacy and utility.APAT offers a new audit capability that helps organizations make more informed decisions about anonymization strategies. And it can be used in any industry use cases that deal with sensitive data and the need to balance privacy with data value. It can even be extended to questions of fairness and bias as well.