Google adds Python support to privacy-preserving data analysis tool

Google employees walking into work at the California office
(Image credit: Shutterstock)

Google has expanded its open-source differential privacy (DP) platform to support the Python programming language, widening availability to millions more developers and data analysts.

The announcement makes Python the fourth language supported by the project after initially launching in 2019 with support for C++, Java, and Google-created language Go, sometimes referred to as Golang.

It comes after Google reported a significant number of developers contacting the company expressing their interest in using the open-source library for their Python projects. Google worked for more than a year with OpenMined on Python support and said numerous projects have already used its DP library, including Australian developers who have accelerated scientific discoveries through analysing medical data in a private way.

DP is a system used by data analysts to preserve the privacy of the individuals whose data is used in an analysed data set. Work to develop strong DP dates back decades, but only in recent years have tech giants such as Google and Apple embraced the system.

One of the key areas of DP development for Google in the past year has been on providing a tool for developers within the library to fine-tune the 'epsilon' - a mathematical measure of privacy. Finding the optimum epsilon requires a great deal of trial and error to perfect and having a tool within the library that allows developers to make adjustments to yield a lower epsilon, which indicates a more private release, means individual projects are able to be tuned as privately as possible.

Google said now Python is supported, the DP library is now available to nearly half of all developers worldwide which means more developers and researchers will be able to analyse data and make new discoveries while preserving the privacy of users to whom the data belongs.

Python is among the most popular programming languages currently in use and won 'Language of the Year 2021' from the TIOBE index, which ranks programming languages based on their popularity. Python is useful for a wide range of programming activities but is especially well-known for its capabilities in data analysis, making it a natural progression for Google's DP library.

As part of the launch, Google has released a new web-based product,, which allows any Python developer to analyse their dataset with differential privacy. Google also said it has seen organisations experimenting with new use cases such as showing a website's most visited web pages by country, in an anonymised fashion.

The library is compatible with leading large data processing engines, the Spark and Beam frameworks, and Google will be launching an additional tool to help users "visualise and better tune the parameters used to produce differentially private information".

"We encourage developers around the world to take this opportunity to experiment with differential privacy use cases like statistical analysis and machine learning, but most importantly, provide us with feedback," said Google announcing the news. "We are excited to learn more about the applications you all can develop and the features we can provide to help along the way.

"We will continue investing in democratising access to critical privacy-enhancing technologies and hope developers join us in this journey to improve usability and coverage. As we’ve said before, we believe that every Internet user in the world deserves world-class privacy, and we’ll continue partnering with organisations to further that goal."

What is differential privacy?

Differential privacy is a tool that has gained acclaim in recent years as data and identity protection have become focal points for researchers, businesses, and regulators alike.

Some argue it is fundamentally necessary in data analytics to preserve the privacy and hide the identity of people whose data is being analysed. For technology companies especially, it has been at the forefront of how their users expect them to handle the data they hold on others.


Content syndication isn't dead, but your data processes might be

It's a new (lead) generation


DP works by adding 'controlled noise' to datasets so that people cannot be individually identified by the data they provide to the dataset. For example, if residents of a neighbourhood supplied data for analysis involving their salaries which were then represented as an average, and one resident left the neighbourhood, their salary information could be tied to their identity by looking at the difference in the data pre- and post-move.

Similarly, if two databases were analysed, one with a single data point on 50 people and one with a single data point on 51 people, the analysis results for both would have to be indistinguishable from each other to avoid identifying that 51st person, in order to qualify as differentially private.

Adding controlled noise to a dataset would remove the possibility of identifying an individual by skewing the statistics just enough to remove the element of identification, without significantly compromising the accuracy of the results.

All major Big Tech firms have embraced DP in different ways. Microsoft's AI Lab works with Harvard University on projects to facilitate DP-enabled research. Apple has used DP on its products since macOS Sierra and iOS 10, and Facebook and Amazon also have experience working with the system too.

Connor Jones

Connor Jones has been at the forefront of global cyber security news coverage for the past few years, breaking developments on major stories such as LockBit’s ransomware attack on Royal Mail International, and many others. He has also made sporadic appearances on the ITPro Podcast discussing topics from home desk setups all the way to hacking systems using prosthetic limbs. He has a master’s degree in Magazine Journalism from the University of Sheffield, and has previously written for the likes of Red Bull Esports and UNILAD tech during his career that started in 2015.