Google adds Python support to privacy-preserving data analysis tool
The addition of Python opens up the open-source differential privacy library to nearly half of all developers worldwide
Google has expanded its open-source differential privacy (DP) platform to support the Python programming language, widening availability to millions more developers and data analysts.
The announcement makes Python the fourth language supported by the project after initially launching in 2019 with support for C++, Java, and Google-created language Go, sometimes referred to as Golang.
It comes after Google reported a significant number of developers contacting the company expressing their interest in using the open-source library for their Python projects. Google worked for more than a year with OpenMined on Python support and said numerous projects have already used its DP library, including Australian developers who have accelerated scientific discoveries through analysing medical data in a private way.
DP is a system used by data analysts to preserve the privacy of the individuals whose data is used in an analysed data set. Work to develop strong DP dates back decades, but only in recent years have tech giants such as Google and Apple embraced the system.
One of the key areas of DP development for Google in the past year has been on providing a tool for developers within the library to fine-tune the 'epsilon' - a mathematical measure of privacy. Finding the optimum epsilon requires a great deal of trial and error to perfect and having a tool within the library that allows developers to make adjustments to yield a lower epsilon, which indicates a more private release, means individual projects are able to be tuned as privately as possible.
Google said now Python is supported, the DP library is now available to nearly half of all developers worldwide which means more developers and researchers will be able to analyse data and make new discoveries while preserving the privacy of users to whom the data belongs.
Python is among the most popular programming languages currently in use and won 'Language of the Year 2021' from the TIOBE index, which ranks programming languages based on their popularity. Python is useful for a wide range of programming activities but is especially well-known for its capabilities in data analysis, making it a natural progression for Google's DP library.
As part of the launch, Google has released a new web-based product, pipleinedp.io, which allows any Python developer to analyse their dataset with differential privacy. Google also said it has seen organisations experimenting with new use cases such as showing a website's most visited web pages by country, in an anonymised fashion.
The library is compatible with leading large data processing engines, the Spark and Beam frameworks, and Google will be launching an additional tool to help users "visualise and better tune the parameters used to produce differentially private information".
"We encourage developers around the world to take this opportunity to experiment with differential privacy use cases like statistical analysis and machine learning, but most importantly, provide us with feedback," said Google announcing the news. "We are excited to learn more about the applications you all can develop and the features we can provide to help along the way.
"We will continue investing in democratising access to critical privacy-enhancing technologies and hope developers join us in this journey to improve usability and coverage. As we’ve said before, we believe that every Internet user in the world deserves world-class privacy, and we’ll continue partnering with organisations to further that goal."
What is differential privacy?
Some argue it is fundamentally necessary in data analytics to preserve the privacy and hide the identity of people whose data is being analysed. For technology companies especially, it has been at the forefront of how their users expect them to handle the data they hold on others.
Content syndication isn't dead, but your data processes might be
It's a new (lead) generationFree Download
DP works by adding 'controlled noise' to datasets so that people cannot be individually identified by the data they provide to the dataset. For example, if residents of a neighbourhood supplied data for analysis involving their salaries which were then represented as an average, and one resident left the neighbourhood, their salary information could be tied to their identity by looking at the difference in the data pre- and post-move.
Similarly, if two databases were analysed, one with a single data point on 50 people and one with a single data point on 51 people, the analysis results for both would have to be indistinguishable from each other to avoid identifying that 51st person, in order to qualify as differentially private.
Adding controlled noise to a dataset would remove the possibility of identifying an individual by skewing the statistics just enough to remove the element of identification, without significantly compromising the accuracy of the results.
All major Big Tech firms have embraced DP in different ways. Microsoft's AI Lab works with Harvard University on projects to facilitate DP-enabled research. Apple has used DP on its products since macOS Sierra and iOS 10, and Facebook and Amazon also have experience working with the system too.
Four strategies for building a hybrid workplace that works
All indications are that the future of work is hybrid, if it's not here alreadyFree webinar
The digital marketer’s guide to contextual insights and trends
How to use contextual intelligence to uncover new insights and inform strategiesFree Download
Ransomware and Microsoft 365 for business
What you need to know about reducing ransomware riskFree Download
Building a modern strategy for analytics and machine learning success
Turning into business valueFree Download