PETs Adoption Guide

Repository of Use Cases

Presented below is a repository of real-world use cases that leverage emerging PETs. This aims to showcase the variety of use cases and sectors that PETs are providing solutions for. By collating these examples, we hope that organisations will be better placed to learn from others about how PETs can be effectively applied in practice.

Details of the use cases given in this repository are based on public information or information provided to us, and we have not had access to the systems themselves.

Inclusion in this repository should not be interpreted as an endorsement or criticism.

If you have additional examples that you think should be included in the repository, please do contact us.

Who
Sector
Description
Stage of development
PETs used
Supporting links
Alan Turing InstituteHealth and Social CareA research paper by the Alan Turing Institute proposes using 'health tokens' to design COVID-19 immunity passports. These health-token based certificates would be created using differential privacy methods, allowing individual test results to be randomised while still allowing for aggregate level estimates of risk to be calculated. The research suggests that health tokens could mitigate discrimination based on immunity and tackle concerns around the creation of an 'immuno-privileged' class. However, it would still offer valuable information on collective transmission risk posed by small groups. Proof of concept

Differential Privacy

AppleDigitalApple has leveraged federated learning to train the voice recognition software used by its AI Assistant, Siri. A local model is trained on an individual's iPhone, and the resulting model weights are periodically communicated back to a central server, which builds a global model by aggregating the weights from the local models. This global model is pushed out to users' iPhones, and the process repeats. Noise is injected during the training of the local model to ensure it is differentially private, so as to mitigate the risk of reidentification. Using this system, Siri can learn to recognise the voice of the iPhone owner, so that it only responds to them without Apple collecting any raw data relating to the users' voice.Product

Federated Analytics

Differential Privacy

Apple/Google contact tracing APIHealth and Social CareIn response to the COVID-19 pandemic, Apple and Google collaborated to develop a privacy-preserving architecture facilitating digital contract tracing based on bluetooth proximity information. Access to the system is only made available to health authorities, who develop their own apps leveraging the Apple/Google APIs (for example the NHS app in England Wales). Mobiles phones with the app enabled exchange random identifiers (each phone's identifier changes frequently) when in close proximity. Following a positive test, a user consents to upload details of their device's identifiers from recent days to a central server managed by the health authority. All phones periodically download this list of identifiers corresponding to positive tests from the central server, and can alert those who may have been exposed to the virus to self-isolate if there is a match with the identifiers of close contacts stored on their device. Product

Federated Analytics

De-identification techniques

AUSTRACFinanceAUSTRAC is building a platform to identify financial crime across major Australian financial institutions. They have built an algorithm designed to flag suspicious links between two or more accounts and/or trace suspicious funds as they move between accounts at various financial institutions. It does this by connecting databases held by different organisations and PETs are used to protect the privacy of customers who are innocent. The project is being delivered as part of a private-public partnership, including 28 member financial institutions, and they aim to gather insights from over 100 millions accountsIn development

Homomorphic Encryption

Multi-party Computation

Boston Women's Workforce Council (BWWC) and Boston University's Hariri Institute for ComputingOtherThe BWWC began using a multi-party computation (MPC) system developed in partnership with the Hariri Institute in 2017 to enable organisations to anonymously report gender pay gap information. The data collected included gender, ethnicity, length of service, annual compensation, and performance pay. The latest report, produced in 2019, includes data for 136,437 employees from 123 organisations, representing 13% of the Greater Boston workforce. The confidentiality provided by the MPC solution has encouraged a greater number of companies to participate in the study, increasing from 69 companies in 2016 to 123 in 2019. User experience was also important in encouraging participation with the user interface providing a familiar spreadsheet that can be filled with data manually or via copy-paste. The statistics derived through this process have shown that the gender pay gap in the Boston area is even larger than previously estimated by the U.S. Bureau of Labor Statistics.Product

Multi-party Computation

Danisco and the Association of Sugar Beet GrowersOtherSugar beet farmers in Denmark have contracts determining how much beet they produce. All beet produced goes to Danisco, the only sugar producer in Denmark. The EU significantly reduced beet subsidies, meaning the country needed to develop a competitive market for trading production rights. A system was developed leveraging multi-party computation to enable confidential bidding to compute a trading price based on supply and demand. This enabled the production quota to be redistributed accordingly, whilst details of individual beet farmers bids remained confidential. 80% of farmers surveyed said that this confidentiality was important to them. The first auction took place in 2008 and is considered the first large-scale, practical application of multi-party computation.Product

Multi-party Computation

Duality TechnologiesHealth and Social CareThis is a framework for genome-wide association studies (GWAS) that leverages homomorphic encryption to keep medical and genomic data secure. The framework has been applied to conduct GWAS of age-related macular degeneration on a dataset of over 25,000 individuals. The system is ~30 times faster than state of the art GWAS schemes based on multi-party computation.Proof of concept

Homomorphic Encryption

Duality TechnologiesFinanceThis project allows a party to query data that is owned by another party and receive the results without any sensitive parameters being disclosed to the external data owner. It aims to accelerate triage in fraud and anti-money laundering (AML) investigations. The privacy of the entity and the investigation is preserved throughout, even when complex SQL-like queries are made. Proof of concept

Homomorphic Encryption

EnveilFinanceEnveil has designed and developed an approach for financial institutions to identify matching customer information in external datasets without disclosing information about that customer. This allows financial institutions to investigate suspicious activity without revealing personal information, especially in the case that the customer being investigated is ultimately innocent. This proof of concept has been executed using synthetic data and also showed that information can still be made visible for audit, traceability and trust building where required. Proof of concept

Homomorphic Encryption

Enveil & DeliverFundCrime & JusticeEnveil, a PETs company, has partnered with DeliverFund, a counter-human trafficking intelligence organisation, to use homomorphic encryption based technology to provide access to a large human trafficking database in the US. DeliverFund's product reduces the time it takes to identify victims and who exploits them. The use of Enveil's technology will allow them and their partner organisations to better access data on counter-human trafficking without exposing PII or other sensitive data. Users of the platform will be able to cross-match and search on DeliverFund's database without revealing the contents of their search or compromising the security of the data they are searching on. Product

Homomorphic Encryption

Estonian Association of Information Technology and Telecommunications (ITL)OtherIn 2011, the ITL proposed collecting key financial metrics from its member companies in order to better understand the state of the telecoms sector. Members expressed concern over the confidentiality of the metrics as they would be sharing them with competitors. ITL chose to partner with cybersecurity firm Cybernetica, who were able to deploy their Sharemind secure computing platform to enable the analysis to be done whilst protecting confidentiality. 17 companies participated, uploading their metrics to the Sharemind platform, which distributed the data across three “computing parties” (CPs). These CPs performed the desired analysis using a multi-party computation protocol to ensure confidentiality. The final results of the analysis were shared with the ITL who disseminated accordingly. The distributed nature of the computation meant no party, including the ITL, ever had direct access to another party’s metrics.Product

Multi-party Computation

FacebookDigitalIn 2018, Facebook established an initiative to provide researchers access to data in order to study the role of social media in elections and democratic discourse. Data was shared with 60 researchers and consisted of links that had been shared publicly on Facebook by at least 100 unique Facebook users. In 2020, the size of the shared dataset was substantially increased to include approximately 38 million such links with new aggregated information to help researchers analyze how many people saw these links on Facebook and how they interacted with that content – including views, clicks, shares, likes, and other reactions. The data shared was also aggregated by age, gender, country, and month. Facebook leveraged differential privacy to provide privacy guarantees to individuals in the dataset.Product

Differential Privacy

GoogleHealth and Social CareA publicly available resource of statistics and visualisations has been built with the intention to show the changes in the population’s mobility habits in response to COVID-19 interventions. The resource is based on location data from Google users who have opted in to location history tracking. Differential privacy is used to protect two metrics: the details of the location a user visited, and the number of visits the user made to each location.Product

Differential Privacy

GoogleDigitalGBoard is a keyboard app for Android and iOS devices. It features next-word prediction, driven by a machine learning model. GBoard utilises federated learning where each mobile device downloads an initial model from a central server, which is further trained on the device using user data local to the device. The weights of the resulting model are periodically communicated back to the central server using a secure aggregation protocol (a form of multi-party computation), which aggregates the weights received from all mobile devices into a new common model. Devices download this new model, and the cycle repeats, such that the model is continuously trained without collecting user data centrally.Product

Multi-party Computation

Federated Analytics

GoogleDigitalIn order to compute accurate conversion rates from advertisements to actual purchases, Google computes the size of the intersection between the list of people shown an advertisement to the list of people actually purchasing the advertised goods. When the goods are not purchased online and so the purchase connection to the shown advertisement cannot be tracked, Google and the company paying for the advertisement have to share their respective lists in order to compute the intersection size. In order to compute this without revealing anything but the size of the intersection, Google utilises a protocol for privacy-preserving set intersection. Although this protocol is far from the most efficient known today, it is simple and meets their computational requirements.Product

Homomorphic Encryption

Indian Institute of ScienceTransportIn response to the Indian government increasing research into unmanned aerial vehicles (UAVs) and remotely piloted vehicles (RPVs), the Indian Institute of Science's (IISc) has developed "Privaros." This is a set of enhancements to the drone software stack and designed to mitigate privacy concerns around the many sensors, cameras, microphones and GPS capabilities drones are equipped with. Privaros allows a host airspace, such as apartment complexes, university campuses and city municipalities, to determine their own privacy policies and ensure commercial delivery drones are compliant. The working prototype is equipped with a hardware trusted execution environment (TEE) unlike most off-the-shelf drones. Evaluation shows that a drone running Privaros can robustly enforce various privacy policies specified by hosts with only marginal increases to communication latency and power consumption. Proof of concept

Trusted Execution Environment

Inpher Inc.FinanceThis project allowed a subsidiary of a large bank to build a machine-learning sales prediction model using data from other subsidiaries of the same bank located in other countries. The product ensured that there was no data was disclosed during cross-border movement, enabling the subsidiary in question to exploit 300,000 more data points when training the model. The analyst doing the computation only ever saw the outputs of the model. The inputs were encrypted throughout the process. This product has been commercially deployed and can run on real customer data. There have been no data interoperability issues, enabling full use of distributed datasets. Product

Homomorphic Encryption

Multi-party Computation

IQVIAHealth and Social CareE360 Genomics uses a form of secure computation (tokenization of variants, multi-party is desired, and cell-size rules on statistical outputs). This is being leveraged by Genomics England.Product

Multi-party Computation

De-identification techniques

KeylessDigitalKeyless is a cybersecurity platform that provides privacy-first passwordless authentication and personal identity management solutions for enterprises. They combine biometrics with PETS and a distributed cloud network. Their technology means that enterprises no longer need to centrally store and manage passwords, biometric data and other personally identifiable information. The underlying technology that they use is secure multiparty computation which enabled multiple cloud servers to jointly process identity authentication requests without disclosing any data between them. As such, companies are able to comply with authentication requirements and incur minimal data protection risk. Product

Multi-party Computation

MicrosoftDigitalThe Confidential Consortium Blockchain Framework (CCBF) is a system using trusted execution environments that facilitates confidentiality within a blockchain network. Blockchains were designed to prevent malicious behaviours by recording all transactions, making them open for all to see and replicated across hundreds of decentralised nodes for integrity. Within CCBF, confidentiality is provided by trusted execution environments (TEEs) that can process transactions that have been encrypted using keys accessible only to a CCBF node of a specific CCBF service. Besides confidentiality, TEEs also provide publicly verifiable artefacts, called quotes, that certify that the TEE is running a specific code. Hence, integrity of transaction evaluation in CCBF can be verified via quotes and not be replicated across mutually untrusted nodes as it is done in public blockchains. It is worth pointing out that transaction data is replicated in CCBF across a small network of nodes, each executing in a TEE, but for the purpose of fault-tolerance rather than integrity. In addition, Microsoft’s test showed that the CCBF could process 50,000+ transactions per second, demonstrating the scalability of the technology. As a comparison, the public blockchain Ethereum network has an average processing rate of 20 transactions per second, whilst the Visa credit card processing system averages 2,000 transactions per second. The framework is not a standalone blockchain protocol, but rather it provides trusted foundations that can support any existing blockchain protocol.Product

Trusted Execution Environment

MicrosoftDigitalMicrosoft Viva is an Employee Experience Platform (EXP) that brings together communications, learning, resources and analytics. There are four sub-services on the platform called Insights, Topics, Learning and Connections. Viva Insights gives employees and managers personalised and actionable insights on various organisational metrics that can help drive productivity and wellbeing experience. The Insights tool uses safeguards like de-identification and differential privacy by default so that personal insights are only available at the individual level and not for managers or leaders of an organisation. Product

Differential Privacy

De-identification techniques

MicrosoftDigitalMicrosoft has rolled out Password Generator and Password Monitor features in its Edge browser, using a homomorphic encryption service. The password manager collects saved passwords in one place and the monitor alerts the user if passwords have been compromised. The use of homomorphic encryption means that Microsoft never has to decrypt the data, in other words never has access to the actual credentials, but is still able to query the data. Product

Homomorphic Encryption

Netherlands Organisation for Applied Scientific Research (TNO)Health and Social CareAs part of the BigMedilytics project, funded by the EU's Horizon 2020 program, TNO developed a system to identify patients at risk of heart failure by confidentially combining data of potential indicators held by different organisations, leveraging a multi-party computation (MPC) protocol. The Erasmus MC hospital holds data on the lifestyle of patients, and insurance company Zilverin Kruis holds data on attributes such as hospitalization days and health care usage. The solution consists of two phases. First, a secure inner join protocol is used to identify which patients are present in both datasets. Both parties homomorphically encrypt attributes of the dataset which are sent to a third party, which determines the intersection of the datasets (the third party cannot access the raw data directly, since it's homomorphically encrypted). The encrypted intersection is split into 3 secret shares, which are split across the 3 parties, and an MPC protocol is used to train a regression model. Erasmus MC and Zilverin Krius receive the coefficients of the regression, which they can then use to predict the risk of individual patients.Proof of concept

Homomorphic Encryption

Multi-party Computation

De-identification techniques

NHS Digital/PrivitarHealth and Social CareThe NHS has built a system for linking patient data held across different NHS domains. To protect patient confidentiality, identifiers (such as a patient's NHS number) are pseudonymised through tokenisation. For additional security, the tokenisation differs between different NHS domains. Linking data about a patient held in two domains first requires removing the tokenisation which would expose personal information. To avoid this, a partially homomorphic encryption scheme is used which enables data to be linked without revealing the underlying raw identifiers.Product

Homomorphic Encryption

De-identification techniques

NVIDIA and King's College LondonHealth and Social CareIn 2019, researchers from NVIDIA and King's College London collaborated to train a neural network for brain tumour segmentation using a federated learning approach. They used a dataset from the BraTS Challenge 2018, containing MRI scans from 285 patients, using 242 patients as training data, and 43 patients for testing. Training data was split into 13 shards, each representing a client in the federated setup. A data-centralised model was also trained for comparison. The data-centralised model converged in ~300 training epochs, with 205s per epoch. The federated model: converged in ~600 training epochs, with ~65s per epoch (slowest client). The model performance is comparable between the two setups, although the federated model incurs a tradeoff between privacy and performance, determined by the parameters of the differential privacy setup.Pilot

Federated Analytics

Differential Privacy

OpenSAFELYHealth and Social CareOpenSAFELY is a secure analytics platform developed in response to the COVID-19 pandemic, which enables researchers to conduct analysis across millions of patients' electronic health records (EHR). The platform works by leveraging federated analysis, where researchers' analytic code is uploaded to the datacenter where EHR data is kept. The code is executed in the datacenter, with the data kept in situ - data is never moved from where it was originally kept. Researchers are thus unable to download data, mitigating a key risk. The platform provides researchers with dummy data (NOT synthetic data) to develop their code. Once developed, the code must pass a series of automated sanity checks before it is packaged and deployed to the EHR provider's datacenter to execute the analysis. OpenSAFELY has enabled risk factors associated with COVID-19 to be identified, without exposing the personal information of individuals. Product

Federated Analytics

De-identification techniques

OwkinHealth and Social CareFrench-American startup, Owkin, is using Federated Learning to build a score that predicts the severity of a patient's COVID-19 prognosis. The AI-based scoring model is trained on CT scans of lungs (a routine procedure upon admission to hospital for COVID-19 patients). Its performance surpasses that of all other published score benchmarks. These scores support hospitals in resource management and planning at the frontline. Product

Multi-party Computation

PrivitarFinanceWith the use of partial homomorphic encryption, this pilot project allows financial institutions to learn statistics about a population from disparate private and public datasets without collecting any identifiable information. During analysis, raw data from multiple datasets is presented in tokenised form. The results of this project would enable a public authority to gather aggregate statistics about a population which would inform public policy in a privacy-preserving way. Even if a party intercepts the data at any point in the process, they would not be able to decrypt or link the various datasets. Pilot

Homomorphic Encryption

RegulAItionOtherThe public sector gathers workforce information from the private sector (e.g. worker name, date of birth, employment start date). Insurance companies have claims information from their private sector clients (client industry, client size, client workforce size). While the public sector may retain a centralized database, each insurance company only has access to its own data. The public sector and the private sector do not share claims related data with each other. The insurance industry knows that there is a correlation between a corporate organisations average workforce age compared to the average cost of claims by industry. Using the AIR Platform, users are able to privately and securely access both the public sector data and private insurance data. The data remained in situ and under the control of each data holder. Algorithms revealed correlations that improved the participating insurance company’s risk model by 1 to 4%, achieving profitability improvement of US$ 700,000 to US$ 2,800,000 million. The resulting knowledge of correlation was used by the insurance company to provide proactive risk management advice to its customers (such as recommending measures to be taken on construction sites that typically lead to a percentage reduction of workplace accidents). Proof of concept

Federated Analytics

Differential Privacy

De-identification techniques

RegulAItionCrime & JusticeThe public sector wanted to make a new analytics algorithm available to the public which would be made available as open source to anyone wishing to develop a contract review or access to justice solutions. Accessing large volume of relevant data to train this new algorithm was the challenge. A group of participants from the public sector and private sectors made their data accessible using the RegulAItion Platform. They were able to develop and deploy a new algorithm on each of their data without ever sharing, moving or pooling their data. The collaborative project was completed in 12 weeks. The new algorithm contained 6 models. These models were decomposed into private local models and global shared models. Product

Federated Analytics

Differential Privacy

De-identification techniques

SignalDigitalSignal is an open-source, privacy-focused instant messaging app. Signal provides end-to-end encryption of messages, and beyond this aims to collect as little information about its users as possible. The only information stored is a user's phone number as this is required to register with the service. Signal leverages novel security technologies in order to provide features expected by users without collecting data about them. One such example is their use of trusted execution environments (TEEs) - namely, Intel SGX - to allow contact information from a user's phone to be used to find their contacts who are also on Signal. A server-side contact discovery service runs inside the TEE, to which a user uploads their contact information, the service looks for matches in Signal's database of registered users, and information of these matches is returned to the user. Contact information is only decrypted inside the isolated TEE, meaning Signal has no visibility of it. Additionally, SGX supports remote attestation, meaning the client is able to verify that it is the expected contact discovery service code running inside the TEE before using it.Pilot

Trusted Execution Environment

Tsinghua University and MicrosoftHealth and Social CareMedical Named Entity Recognition (NER) is an NLP task which aims to identify entities (e.g. drug names, symptoms) from unstructured medical texts (e.g. patient records, doctor's notes). Microsoft collaborated with Tsinghua University to develop a federated system named FedNER to train a machine learning model to perform NER on a corpus of data held across a number of medical platforms. The model was decomposed into a private local model and a global shared model. Different medical platforms storing information in different formats are able to train the local model without having to wrangle their data into a defined format. This lowers the barrier to participation for any individual medical platform, maximising the amount of data used to train the system, thus enhancing its performance.Pilot

Federated Analytics

US Census BureauOtherThe Bureau has leveraged differential privacy to minimise the risk of identification of individuals when publishing statistics from the 2020 Census. The total population in each state will be as counted, but all other levels of geography - including congressional districts down to townships and census blocks - could have some variance from the raw data as a result of noise-injection to facilitate differential privacy. Setting the value of the privacy budget has not been trivial. The value chosen by the Census Bureau’s Data Stewardship Executive Policy committee was far higher than those envisioned by the creators of differential privacy. There are further challenges, with the National Congress of Native Americans expressing concern that DP could adversely affect the quality of statistics about tribal nations.In development

Differential Privacy