PETs Adoption Guide

Repository of Use Cases

Presented below is a repository of real-world use cases that leverage emerging PETs. This aims to showcase the variety of use cases and sectors that PETs are providing solutions for. By collating these examples, we hope that organisations will be better placed to learn from others about how PETs can be effectively applied in practice.

Details of the use cases given in this repository are based on public information or information provided to us, and we have not had access to the systems themselves.

Inclusion in this repository should not be interpreted as an endorsement or criticism.

If you have additional examples that you think should be included in the repository, please do contact us.

Who
Date added to repository
Sector
Description
Stage of development
PETs used
Supporting links
Alan Turing InstituteJuly 2021Health and Social CareA research paper by the Alan Turing Institute proposes using 'health tokens' to design COVID-19 immunity passports. These health-token based certificates would be created using differential privacy methods, allowing individual test results to be randomised while still allowing for aggregate level estimates of risk to be calculated. The research suggests that health tokens could mitigate discrimination based on immunity and tackle concerns around the creation of an 'immuno-privileged' class. However, it would still offer valuable information on collective transmission risk posed by small groups. Proof of concept

Differential Privacy

AppleJuly 2021DigitalApple has leveraged federated learning to train the voice recognition software used by its AI Assistant, Siri. A local model is trained on an individual's iPhone, and the resulting model weights are periodically communicated back to a central server, which builds a global model by aggregating the weights from the local models. This global model is pushed out to users' iPhones, and the process repeats. Noise is injected during the training of the local model to ensure it is differentially private, so as to mitigate the risk of reidentification. Using this system, Siri can learn to recognise the voice of the iPhone owner, so that it only responds to them without Apple collecting any raw data relating to the users' voice.Product

Federated Analytics

Differential Privacy

Apple/Google contact tracing APIJuly 2021Health and Social CareIn response to the COVID-19 pandemic, Apple and Google collaborated to develop a privacy-preserving architecture facilitating digital contract tracing based on bluetooth proximity information. Access to the system is only made available to health authorities, who develop their own apps leveraging the Apple/Google APIs (for example the NHS app in England Wales). Mobiles phones with the app enabled exchange random identifiers (each phone's identifier changes frequently) when in close proximity. Following a positive test, a user consents to upload details of their device's identifiers from recent days to a central server managed by the health authority. All phones periodically download this list of identifiers corresponding to positive tests from the central server, and can alert those who may have been exposed to the virus to self-isolate if there is a match with the identifiers of close contacts stored on their device. Product

Federated Analytics

De-identification techniques

AUSTRACJuly 2021FinanceAUSTRAC is building a platform to identify financial crime across major Australian financial institutions. They have built an algorithm designed to flag suspicious links between two or more accounts and/or trace suspicious funds as they move between accounts at various financial institutions. It does this by connecting databases held by different organisations and PETs are used to protect the privacy of customers who are innocent. The project is being delivered as part of a private-public partnership, including 28 member financial institutions, and they aim to gather insights from over 100 millions accountsIn development

Homomorphic Encryption

Multi-party Computation

Boston Women's Workforce Council (BWWC) and Boston University's Hariri Institute for ComputingJuly 2021OtherThe BWWC began using a multi-party computation (MPC) system developed in partnership with the Hariri Institute in 2017 to enable organisations to anonymously report gender pay gap information. The data collected included gender, ethnicity, length of service, annual compensation, and performance pay. The latest report, produced in 2019, includes data for 136,437 employees from 123 organisations, representing 13% of the Greater Boston workforce. The confidentiality provided by the MPC solution has encouraged a greater number of companies to participate in the study, increasing from 69 companies in 2016 to 123 in 2019. User experience was also important in encouraging participation with the user interface providing a familiar spreadsheet that can be filled with data manually or via copy-paste. The statistics derived through this process have shown that the gender pay gap in the Boston area is even larger than previously estimated by the U.S. Bureau of Labor Statistics.Product

Multi-party Computation

Danisco and the Association of Sugar Beet GrowersJuly 2021OtherSugar beet farmers in Denmark have contracts determining how much beet they produce. All beet produced goes to Danisco, the only sugar producer in Denmark. The EU significantly reduced beet subsidies, meaning the country needed to develop a competitive market for trading production rights. A system was developed leveraging multi-party computation to enable confidential bidding to compute a trading price based on supply and demand. This enabled the production quota to be redistributed accordingly, whilst details of individual beet farmers bids remained confidential. 80% of farmers surveyed said that this confidentiality was important to them. The first auction took place in 2008 and is considered the first large-scale, practical application of multi-party computation.Product

Multi-party Computation

Data Sharing CoalitionJune 2023Health and Social CareIn the Netherlands, a public-private coalition of organisations came together to build better insights into provisions and needs in the elderly care system. In order to make use of sensitive health and care data, privacy tech company, Linksight, worked together with health insurer DSW and the local "Zorgkantoor" (municipal health and social care office), and created a data analysis platform using Multi-Party Computation. The Dutch elderly care system is facing rising care needs and monitoring this demand has been limited by barriers including privacy risks, cost to access data, and fragmentation of data, which is held by different parties. Using the platform (since only the aggregated data is viewed by the parties involved) the risks and costs for sharing and access are lower, and the resultant insights can be used by policy-makers to respond to the need better. The platform is currently used in the Delft, Schieland and Westland region of the Netherlands, but the Data Sharing Coalition highlights an ambition to scale this up to the national level.Product

Multi-party Computation

Duality TechnologiesJuly 2021Health and Social CareThis is a framework for genome-wide association studies (GWAS) that leverages homomorphic encryption to keep medical and genomic data secure. The framework has been applied to conduct GWAS of age-related macular degeneration on a dataset of over 25,000 individuals. The system is ~30 times faster than state of the art GWAS schemes based on multi-party computation.Proof of concept

Homomorphic Encryption

Duality TechnologiesJuly 2021FinanceThis project allows a party to query data that is owned by another party and receive the results without any sensitive parameters being disclosed to the external data owner. It aims to accelerate triage in fraud and anti-money laundering (AML) investigations. The privacy of the entity and the investigation is preserved throughout, even when complex SQL-like queries are made. Proof of concept

Homomorphic Encryption

EnveilJuly 2021FinanceEnveil has designed and developed an approach for financial institutions to identify matching customer information in external datasets without disclosing information about that customer. This allows financial institutions to investigate suspicious activity without revealing personal information, especially in the case that the customer being investigated is ultimately innocent. This proof of concept has been executed using synthetic data and also showed that information can still be made visible for audit, traceability and trust building where required. Proof of concept

Homomorphic Encryption

Enveil & DeliverFundJuly 2021Crime & JusticeEnveil, a PETs company, has partnered with DeliverFund, a counter-human trafficking intelligence organisation, to use homomorphic encryption based technology to provide access to a large human trafficking database in the US. DeliverFund's product reduces the time it takes to identify victims and who exploits them. The use of Enveil's technology will allow them and their partner organisations to better access data on counter-human trafficking without exposing PII or other sensitive data. Users of the platform will be able to cross-match and search on DeliverFund's database without revealing the contents of their search or compromising the security of the data they are searching on. Product

Homomorphic Encryption

Estonian Association of Information Technology and Telecommunications (ITL)July 2021OtherIn 2011, the ITL proposed collecting key financial metrics from its member companies in order to better understand the state of the telecoms sector. Members expressed concern over the confidentiality of the metrics as they would be sharing them with competitors. ITL chose to partner with cybersecurity firm Cybernetica, who were able to deploy their Sharemind secure computing platform to enable the analysis to be done whilst protecting confidentiality. 17 companies participated, uploading their metrics to the Sharemind platform, which distributed the data across three “computing parties” (CPs). These CPs performed the desired analysis using a multi-party computation protocol to ensure confidentiality. The final results of the analysis were shared with the ITL who disseminated accordingly. The distributed nature of the computation meant no party, including the ITL, ever had direct access to another party’s metrics.Product

Multi-party Computation

Estonian Center of Applied Research (CentAR)June 2023EducationThe Estonian Center of Applied Research (CentAR) used multi-party computation to carry out a big data study on the association between students, working during their studies at university, and whether they graduated in time. The project was the largest example of a statistical study on real data, using encryption for data privacy, to date when it was completed in 2015. 10 million tax records were linked to 60,000 education records from a Ministry of Education database by using Sharemind encryption tools, operated by the data owners, which meant that the data was never unecrypted outside of where it was originally stored. CentAR led the study, which was instigated by the Estonian Association of Information and Technology and Telecommunications. The Sharemind MPC system was hosted by the Estonian Information System's Authority, the Information Technology Centre of the Ministry of Finance, and private organisation, Cybernetica. This study demonstrates the use of MPC for building policy insights with large datasets and high levels of precision, while protecting personal data.Pilot

Multi-party Computation

EurostatJune 2023National StatisticsEurostat, a Directorate-General of the European Commission, has explored the use of Mobile Network Operator (MNO) data for the creation of official statistics on human population movement and mobility. The project analysed data in a trusted execution environment provided by Cybernetica's Sharemind technology to ensure that no individual data was shared, and no individuals could be identified. The product was tested using synthetic data on a population size of up to 100 million, aimed at demonstrating the scalability of the project. In parallel, research was conducted to assess the legal dimensions of the data processing in the project. Eurostat found that the project suggested the rising potential for the use of PETs in creating official statistics.Proof of Concept

Trusted Execution Environment

Synthetic Data

FacebookJuly 2021DigitalIn 2018, Facebook established an initiative to provide researchers access to data in order to study the role of social media in elections and democratic discourse. Data was shared with 60 researchers and consisted of links that had been shared publicly on Facebook by at least 100 unique Facebook users. In 2020, the size of the shared dataset was substantially increased to include approximately 38 million such links with new aggregated information to help researchers analyze how many people saw these links on Facebook and how they interacted with that content – including views, clicks, shares, likes, and other reactions. The data shared was also aggregated by age, gender, country, and month. Facebook leveraged differential privacy to provide privacy guarantees to individuals in the dataset.Product

Differential Privacy

Frontier Development Lab / IntelJune 2023Health and Social CareIn a study on the relationship between radiation exposure in space and cancer, researchers used federated learning methods to access tightly protected human data on astronauts. This collaborative public-private project was by the Frontier Development Lab (FDL), which uses Intel's Open Federated Learning (OpenFL) framework. Researchers were able to use data from institutions including NASA, Mayo Clinic and Nasa's Gene Lab to train and combine models in situ, without requiring the data to be transferred. The main benefit was cost reduction, as the resources required to access and use such data would, otherwise, be a barrier to the research. As an organisation with a mission to provide public and scientific value, this cost-saving impact is an important outcome for NASA and associated organisations.Pilot

Federated Learning

GoogleJuly 2021Health and Social CareA publicly available resource of statistics and visualisations has been built with the intention to show the changes in the population’s mobility habits in response to COVID-19 interventions. The resource is based on location data from Google users who have opted in to location history tracking. Differential privacy is used to protect two metrics: the details of the location a user visited, and the number of visits the user made to each location.Product

Differential Privacy

GoogleJuly 2021DigitalGBoard is a keyboard app for Android and iOS devices. It features next-word prediction, driven by a machine learning model. GBoard utilises federated learning where each mobile device downloads an initial model from a central server, which is further trained on the device using user data local to the device. The weights of the resulting model are periodically communicated back to the central server using a secure aggregation protocol (a form of multi-party computation), which aggregates the weights received from all mobile devices into a new common model. Devices download this new model, and the cycle repeats, such that the model is continuously trained without collecting user data centrally.Product

Multi-party Computation

Federated Analytics

GoogleJuly 2021DigitalIn order to compute accurate conversion rates from advertisements to actual purchases, Google computes the size of the intersection between the list of people shown an advertisement to the list of people actually purchasing the advertised goods. When the goods are not purchased online and so the purchase connection to the shown advertisement cannot be tracked, Google and the company paying for the advertisement have to share their respective lists in order to compute the intersection size. In order to compute this without revealing anything but the size of the intersection, Google utilises a protocol for privacy-preserving set intersection. Although this protocol is far from the most efficient known today, it is simple and meets their computational requirements.Product

Homomorphic Encryption

HazyJune 2023FinanceIn their Fostering Better Finance project, Accenture used synthetic data to build prototype applications aimed at early identification of vulnerable customers based on their transactions. The project used a synthetic data generator model, produced by Hazy, a privacy tech company specialising in commercial use of synthetic data. The goal of the project was to offer earlier intervention for risks faced by customers, hence, providing better support to the vulnerable individuals with their finances. Accenture reported that use of the synthetic data allowed them to launch the project 8 times faster than expected. This is due to reduced risk from not using sensitive customer data, which, in turn, creates reduced security and governance barriers for working together with the banking client on building the project. This allowed prototypes to be built, in advance, before being tested with real customer data, hence, curtailing the risk of compromising potentially sensitive information about individuals.Product

Synthetic Data

Indian Institute of ScienceJuly 2021TransportIn response to the Indian government increasing research into unmanned aerial vehicles (UAVs) and remotely piloted vehicles (RPVs), the Indian Institute of Science's (IISc) has developed "Privaros." This is a set of enhancements to the drone software stack and designed to mitigate privacy concerns around the many sensors, cameras, microphones and GPS capabilities drones are equipped with. Privaros allows a host airspace, such as apartment complexes, university campuses and city municipalities, to determine their own privacy policies and ensure commercial delivery drones are compliant. The working prototype is equipped with a hardware trusted execution environment (TEE) unlike most off-the-shelf drones. Evaluation shows that a drone running Privaros can robustly enforce various privacy policies specified by hosts with only marginal increases to communication latency and power consumption. Proof of concept

Trusted Execution Environment

Indonesia Ministry of TourismJune 2023TourismThe Indonesian Ministry of Tourism has made use of statistics on mobile phone positioning data to better understand cross-border tourism activity in the country. The task required that information by multiple mobile network operators was analyzed together to understand when users cross between different regions, which are covered by different operators. Mobile positioning data is generally protected, due to the sensitivity of information about individuals' mobility. The data was encrypted and aggregated using Sharemind, a privacy technology from Cybernetica. The mobile positioning data came from a system delivered by Estonian data analytics company, Positium, for the Ministry of Tourism. The Sharemind platform uses a trusted execution environment and does not allow access to unencrypted data nor encryption keys at any stage. The aggregated statistics about the number of users, and the overlap in use between the two largest mobile networks is now used by the Ministry of Tourism as a basis for the creation of tourism statistics.Product

Trusted Execution Environment

Inpher Inc.July 2021FinanceThis project allowed a subsidiary of a large bank to build a machine-learning sales prediction model using data from other subsidiaries of the same bank located in other countries. The product ensured that there was no data was disclosed during cross-border movement, enabling the subsidiary in question to exploit 300,000 more data points when training the model. The analyst doing the computation only ever saw the outputs of the model. The inputs were encrypted throughout the process. This product has been commercially deployed and can run on real customer data. There have been no data interoperability issues, enabling full use of distributed datasets. Product

Homomorphic Encryption

Multi-party Computation

IQVIAJuly 2021Health and Social CareE360 Genomics uses a form of secure computation (tokenization of variants, multi-party is desired, and cell-size rules on statistical outputs). This is being leveraged by Genomics England.Product

Multi-party Computation

De-identification techniques

KeylessJuly 2021DigitalKeyless is a cybersecurity platform that provides privacy-first passwordless authentication and personal identity management solutions for enterprises. They combine biometrics with PETS and a distributed cloud network. Their technology means that enterprises no longer need to centrally store and manage passwords, biometric data and other personally identifiable information. The underlying technology that they use is secure multiparty computation which enabled multiple cloud servers to jointly process identity authentication requests without disclosing any data between them. As such, companies are able to comply with authentication requirements and incur minimal data protection risk. Product

Multi-party Computation

MicrosoftJuly 2021DigitalThe Confidential Consortium Blockchain Framework (CCBF) is a system using trusted execution environments that facilitates confidentiality within a blockchain network. Blockchains were designed to prevent malicious behaviours by recording all transactions, making them open for all to see and replicated across hundreds of decentralised nodes for integrity. Within CCBF, confidentiality is provided by trusted execution environments (TEEs) that can process transactions that have been encrypted using keys accessible only to a CCBF node of a specific CCBF service. Besides confidentiality, TEEs also provide publicly verifiable artefacts, called quotes, that certify that the TEE is running a specific code. Hence, integrity of transaction evaluation in CCBF can be verified via quotes and not be replicated across mutually untrusted nodes as it is done in public blockchains. It is worth pointing out that transaction data is replicated in CCBF across a small network of nodes, each executing in a TEE, but for the purpose of fault-tolerance rather than integrity. In addition, Microsoft’s test showed that the CCBF could process 50,000+ transactions per second, demonstrating the scalability of the technology. As a comparison, the public blockchain Ethereum network has an average processing rate of 20 transactions per second, whilst the Visa credit card processing system averages 2,000 transactions per second. The framework is not a standalone blockchain protocol, but rather it provides trusted foundations that can support any existing blockchain protocol.Product

Trusted Execution Environment

MicrosoftJuly 2021DigitalMicrosoft Viva is an Employee Experience Platform (EXP) that brings together communications, learning, resources and analytics. There are four sub-services on the platform called Insights, Topics, Learning and Connections. Viva Insights gives employees and managers personalised and actionable insights on various organisational metrics that can help drive productivity and wellbeing experience. The Insights tool uses safeguards like de-identification and differential privacy by default so that personal insights are only available at the individual level and not for managers or leaders of an organisation. Product

Differential Privacy

De-identification techniques

MicrosoftJuly 2021DigitalMicrosoft has rolled out Password Generator and Password Monitor features in its Edge browser, using a homomorphic encryption service. The password manager collects saved passwords in one place and the monitor alerts the user if passwords have been compromised. The use of homomorphic encryption means that Microsoft never has to decrypt the data, in other words never has access to the actual credentials, but is still able to query the data. Product

Homomorphic Encryption

Netherlands Organisation for Applied Scientific Research (TNO)July 2021Health and Social CareAs part of the BigMedilytics project, funded by the EU's Horizon 2020 program, TNO developed a system to identify patients at risk of heart failure by confidentially combining data of potential indicators held by different organisations, leveraging a multi-party computation (MPC) protocol. The Erasmus MC hospital holds data on the lifestyle of patients, and insurance company Zilverin Kruis holds data on attributes such as hospitalization days and health care usage. The solution consists of two phases. First, a secure inner join protocol is used to identify which patients are present in both datasets. Both parties homomorphically encrypt attributes of the dataset which are sent to a third party, which determines the intersection of the datasets (the third party cannot access the raw data directly, since it's homomorphically encrypted). The encrypted intersection is split into 3 secret shares, which are split across the 3 parties, and an MPC protocol is used to train a regression model. Erasmus MC and Zilverin Krius receive the coefficients of the regression, which they can then use to predict the risk of individual patients.Proof of concept

Homomorphic Encryption

Multi-party Computation

De-identification techniques

NHS Digital/PrivitarJuly 2021Health and Social CareThe NHS has built a system for linking patient data held across different NHS domains. To protect patient confidentiality, identifiers (such as a patient's NHS number) are pseudonymised through tokenisation. For additional security, the tokenisation differs between different NHS domains. Linking data about a patient held in two domains first requires removing the tokenisation which would expose personal information. To avoid this, a partially homomorphic encryption scheme is used which enables data to be linked without revealing the underlying raw identifiers.Product

Homomorphic Encryption

De-identification techniques

NVIDIA and King's College LondonJuly 2021Health and Social CareIn 2019, researchers from NVIDIA and King's College London collaborated to train a neural network for brain tumour segmentation using a federated learning approach. They used a dataset from the BraTS Challenge 2018, containing MRI scans from 285 patients, using 242 patients as training data, and 43 patients for testing. Training data was split into 13 shards, each representing a client in the federated setup. A data-centralised model was also trained for comparison. The data-centralised model converged in ~300 training epochs, with 205s per epoch. The federated model: converged in ~600 training epochs, with ~65s per epoch (slowest client). The model performance is comparable between the two setups, although the federated model incurs a tradeoff between privacy and performance, determined by the parameters of the differential privacy setup.Pilot

Federated Analytics

Differential Privacy

OpenMined / Twitter PartnershipJune 2023Social Media ResearchSocial media company Twitter and open source community OpenMined are working together to pilot methods of improving transparency in social media, by facilitating privacy-enabled access to algorithmic code for researchers. In this way, OpenMined is exploring and implementing the infrastructure for this access, which includes a variety of PETs. One significant example technology used is that of "remote execution" environments, or, federated learning and analytics. Using these PETs, researchers can run code designed to train machine learning algorithms and generate data analytics in a federated way across a network, hence, curtailing the risks which come with a central actor controlling this data throughout the machine learning/analytics process. Initial outcomes of this work have demonstrated that it is possible for researchers to run queries across vast social media algorithms without directly accessing the data themselves. There are now subsequent plans to scale these methods (using federated analytics and learning in addition to other PETs like differential privacy) across multiple online platforms.Proof of Concept

Differential Privacy

Federated Learning

Federated Analytics

OpenSAFELYJuly 2021Health and Social CareOpenSAFELY is a secure analytics platform developed in response to the COVID-19 pandemic, which enables researchers to conduct analysis across millions of patients' electronic health records (EHR). The platform works by leveraging federated analysis, where researchers' analytic code is uploaded to the datacenter where EHR data is kept. The code is executed in the datacenter, with the data kept in situ - data is never moved from where it was originally kept. Researchers are thus unable to download data, mitigating a key risk. The platform provides researchers with dummy data (NOT synthetic data) to develop their code. Once developed, the code must pass a series of automated sanity checks before it is packaged and deployed to the EHR provider's datacenter to execute the analysis. OpenSAFELY has enabled risk factors associated with COVID-19 to be identified, without exposing the personal information of individuals. Product

Federated Analytics

De-identification techniques

OwkinJuly 2021Health and Social CareFrench-American startup, Owkin, is using Federated Learning to build a score that predicts the severity of a patient's COVID-19 prognosis. The AI-based scoring model is trained on CT scans of lungs (a routine procedure upon admission to hospital for COVID-19 patients). Its performance surpasses that of all other published score benchmarks. These scores support hospitals in resource management and planning at the frontline. Product

Multi-party Computation

PrivitarJuly 2021FinanceWith the use of partial homomorphic encryption, this pilot project allows financial institutions to learn statistics about a population from disparate private and public datasets without collecting any identifiable information. During analysis, raw data from multiple datasets is presented in tokenised form. The results of this project would enable a public authority to gather aggregate statistics about a population which would inform public policy in a privacy-preserving way. Even if a party intercepts the data at any point in the process, they would not be able to decrypt or link the various datasets. Pilot

Homomorphic Encryption

RegulAItionJuly 2021OtherThe public sector gathers workforce information from the private sector (e.g. worker name, date of birth, employment start date). Insurance companies have claims information from their private sector clients (client industry, client size, client workforce size). While the public sector may retain a centralized database, each insurance company only has access to its own data. The public sector and the private sector do not share claims related data with each other. The insurance industry knows that there is a correlation between a corporate organisations average workforce age compared to the average cost of claims by industry. Using the AIR Platform, users are able to privately and securely access both the public sector data and private insurance data. The data remained in situ and under the control of each data holder. Algorithms revealed correlations that improved the participating insurance company’s risk model by 1 to 4%, achieving profitability improvement of US$ 700,000 to US$ 2,800,000 million. The resulting knowledge of correlation was used by the insurance company to provide proactive risk management advice to its customers (such as recommending measures to be taken on construction sites that typically lead to a percentage reduction of workplace accidents). Proof of concept

Federated Analytics

Differential Privacy

De-identification techniques

RegulAItionJuly 2021Crime & JusticeThe public sector wanted to make a new analytics algorithm available to the public which would be made available as open source to anyone wishing to develop a contract review or access to justice solutions. Accessing large volume of relevant data to train this new algorithm was the challenge. A group of participants from the public sector and private sectors made their data accessible using the RegulAItion Platform. They were able to develop and deploy a new algorithm on each of their data without ever sharing, moving or pooling their data. The collaborative project was completed in 12 weeks. The new algorithm contained 6 models. These models were decomposed into private local models and global shared models. Product

Federated Analytics

Differential Privacy

De-identification techniques

Replica AnalyticsJune 2023Health and Social CareIn the Canadian province of Alberta, a collaboration of organisations worked together to create synthetic health records data for use in research. Researchers trained a model using 100,000 healthcare records from patients in the province, which they, then, applied to generate synthetic data. Collaborators on this project include a non-profit funder (Health Cities), researchers from the University of Alberta, pharmaceutical company Merck Canada, synthetic data specialists Replica Analytics, and advisors from Alberta Innovates and the Office of the Information and Privacy Commissioner of Alberta. The synthetic data allows students and researchers to undertake projects which would normally be limited due to privacy concerns, related to the sharing of sensitive health data. Hence, this step supports the application of data science to (population) health sciences.Product

Synthetic Data

Secretarium/ DanieJune 2023FinanceThe DANIE consortium is made up of banks and data providers, who upload their banking data to a shared platform for analysis with a number of aims, including: 1) improving the quality of client data, 2) anti money laundering and 3) fraud detection. The DANIE platform was launched in 2020 and uses encryption and trusted execution environments such that no humans have access to the data that is being processed. DANIE uses a privacy enhancing system provided by Secretarium, and both Secretarium and DANIE are finance initiatives that emerged from Société Générale’s incubator programme in London. Additional benefits of involvement in this collaboration for participant organisations include ensuring EU reporting requirements are being met by preventing fines for reporting inaccurate data, reducing resources expended on data reviews and remediation, and improved environmental performance for data management and analysis due to the efficiencies created by the central processing system, offered by Secretarium.Product

Multi-party Computation

Trusted Execution Environment

SignalJuly 2021DigitalSignal is an open-source, privacy-focused instant messaging app. Signal provides end-to-end encryption of messages, and beyond this aims to collect as little information about its users as possible. The only information stored is a user's phone number as this is required to register with the service. Signal leverages novel security technologies in order to provide features expected by users without collecting data about them. One such example is their use of trusted execution environments (TEEs) - namely, Intel SGX - to allow contact information from a user's phone to be used to find their contacts who are also on Signal. A server-side contact discovery service runs inside the TEE, to which a user uploads their contact information, the service looks for matches in Signal's database of registered users, and information of these matches is returned to the user. Contact information is only decrypted inside the isolated TEE, meaning Signal has no visibility of it. Additionally, SGX supports remote attestation, meaning the client is able to verify that it is the expected contact discovery service code running inside the TEE before using it.Pilot

Trusted Execution Environment

StaticeJune 2023InsuranceIn Germany, insurance services company Provinzial collaborated with data privacy services firm Statice, using their synthetic data to train machine learning models which optimised their predictive analytics (in particular a "next best offer" recommender engine). A key outcome of this was saving over three months that would have otherwise been spent evaluating data privacy risks, thus addressing both expense of company time and data privacy in the process of optimising their systems. Product

Synthetic Data

Statistics CanadaJune 2023National StatisticsStatistics Canada, National Statistics Office of Canada, has piloted the use of synthetic data to supply hackathon participants with data containing high analytical value. The risk of using real Statistics Canada data was clear, as many relevant resources are census datasets and mortality registries which clearly relate to personally identifiable individuals. Statistics Canada held two hackathons executed in this way and observed that the analytical skills of the participants were improved, while reducing the risk of disclosure. This means that using synthetic data can have significant positive impact on training scenarios, whilst preserving the privacy of data subjects.Pilot

Synthetic Data

Statistics KoreaJune 2023National StatisticsStatistics Korea, the National Statistics Office of Korea, has begun trying to improve the linking of fragmented government data through a public cloud-based big data system named the "Statistical Data Hub Platform". The provenance of this data stretches across government departments and so it may have significant utility for range of stakeholders operating in different contexts. However, accordingly, certain data may be sensitive and/or reflect identifiable data subjects. Statistics Korea has piloted the use of multiple PETs for this purpose. For example, the pilot involved links small business data, which is encrypted by using homomorphic encryption.Pilot

Homomorphic Encryption

Multi-party Computation

Differential Privacy

Tsinghua University and MicrosoftJuly 2021Health and Social CareMedical Named Entity Recognition (NER) is an NLP task which aims to identify entities (e.g. drug names, symptoms) from unstructured medical texts (e.g. patient records, doctor's notes). Microsoft collaborated with Tsinghua University to develop a federated system named FedNER to train a machine learning model to perform NER on a corpus of data held across a number of medical platforms. The model was decomposed into a private local model and a global shared model. Different medical platforms storing information in different formats are able to train the local model without having to wrangle their data into a defined format. This lowers the barrier to participation for any individual medical platform, maximising the amount of data used to train the system, thus enhancing its performance.Pilot

Federated Analytics

United Nations PET LabJune 2023National StatisticsA pilot programme by the newly launched UN PET Lab is exploring how to improve the understanding of international trade using privacy enhancing technologies. The programme was announced in January 2022 and pilots the use of several PETs to produce useful statistics without sharing or compromising the input data. The example applications include verification of import and export quantities between countries by comparing the statistics that 'paired' countries hold on to how much they have sold and bought of a given commodity from the other. This insight is created by the use of multi-party computation and differential privacy, via a peer-to-peer differential data network created by OpenMined. The initiative also involved creation of an enclave technology provided by Oblivious, which hosts analysis in a secure trusted execution environment, such that only query outputs are shared. The technology also ensures that the output cannot be modified after creation. National statistics offices from the UK, the Netherlands, Italy and Canada took part in the programme, which initially used data uploaded on the UN Comtrade portal. A broader objective of this UN PET Lab project is to build an understanding how international data sharing can be improved using privacy enhancing technologies, as the technologies used in this example can be applied for other forms of data.Proof of Concept

Differential Privacy

Multi-party Computation

US Census BureauJuly 2021OtherThe Bureau has leveraged differential privacy to minimise the risk of identification of individuals when publishing statistics from the 2020 Census. The total population in each state will be as counted, but all other levels of geography - including congressional districts down to townships and census blocks - could have some variance from the raw data as a result of noise-injection to facilitate differential privacy. Setting the value of the privacy budget has not been trivial. The value chosen by the Census Bureau’s Data Stewardship Executive Policy committee was far higher than those envisioned by the creators of differential privacy. There are further challenges, with the National Congress of Native Americans expressing concern that DP could adversely affect the quality of statistics about tribal nations.In development

Differential Privacy

All content is available under the Open Government License v3.0 except where otherwise stated.