Presented below is a repository of real-world use cases that leverage emerging PETs. This aims to showcase the variety of use cases and sectors that PETs are providing solutions for. By collating these examples, we hope that organisations will be better placed to learn from others about how PETs can be effectively applied in practice.
Details of the use cases given in this repository are based on public information or information provided to us, and we have not had access to the systems themselves.
Inclusion in this repository should not be interpreted as an endorsement or criticism.
If you have additional examples that you think should be included in the repository, please do contact us.
Who | Date added to repository | Sector | Description | Stage of development | PETs used | Supporting links |
---|---|---|---|---|---|---|
Alan Turing Institute | July 2021 | Health and Social Care | A research paper by the Alan Turing Institute proposes using 'health tokens' to design COVID-19 immunity passports. These health-token based certificates would be created using differential privacy methods, allowing individual test results to be randomised while still allowing for aggregate level estimates of risk to be calculated. The research suggests that health tokens could mitigate discrimination based on immunity and tackle concerns around the creation of an 'immuno-privileged' class. However, it would still offer valuable information on collective transmission risk posed by small groups. | Proof of concept | Differential Privacy | |
Apple | July 2021 | Digital | Apple has leveraged federated learning to train the voice recognition software used by its AI Assistant, Siri. A local model is trained on an individual's iPhone, and the resulting model weights are periodically communicated back to a central server, which builds a global model by aggregating the weights from the local models. This global model is pushed out to users' iPhones, and the process repeats. Noise is injected during the training of the local model to ensure it is differentially private, so as to mitigate the risk of reidentification. Using this system, Siri can learn to recognise the voice of the iPhone owner, so that it only responds to them without Apple collecting any raw data relating to the users' voice. | Product | Federated Analytics Differential Privacy | |
Apple/Google contact tracing API | July 2021 | Health and Social Care | In response to the COVID-19 pandemic, Apple and Google collaborated to develop a privacy-preserving architecture facilitating digital contract tracing based on bluetooth proximity information. Access to the system is only made available to health authorities, who develop their own apps leveraging the Apple/Google APIs (for example the NHS app in England Wales). Mobiles phones with the app enabled exchange random identifiers (each phone's identifier changes frequently) when in close proximity. Following a positive test, a user consents to upload details of their device's identifiers from recent days to a central server managed by the health authority. All phones periodically download this list of identifiers corresponding to positive tests from the central server, and can alert those who may have been exposed to the virus to self-isolate if there is a match with the identifiers of close contacts stored on their device. | Product | Federated Analytics De-identification techniques | |
AUSTRAC | July 2021 | Finance | AUSTRAC is building a platform to identify financial crime across major Australian financial institutions. They have built an algorithm designed to flag suspicious links between two or more accounts and/or trace suspicious funds as they move between accounts at various financial institutions. It does this by connecting databases held by different organisations and PETs are used to protect the privacy of customers who are innocent. The project is being delivered as part of a private-public partnership, including 28 member financial institutions, and they aim to gather insights from over 100 millions accounts | In development | Homomorphic Encryption Multi-party Computation | |
Boston Women's Workforce Council (BWWC) and Boston University's Hariri Institute for Computing | July 2021 | Other | The BWWC began using a multi-party computation (MPC) system developed in partnership with the Hariri Institute in 2017 to enable organisations to anonymously report gender pay gap information. The data collected included gender, ethnicity, length of service, annual compensation, and performance pay. The latest report, produced in 2019, includes data for 136,437 employees from 123 organisations, representing 13% of the Greater Boston workforce. The confidentiality provided by the MPC solution has encouraged a greater number of companies to participate in the study, increasing from 69 companies in 2016 to 123 in 2019. User experience was also important in encouraging participation with the user interface providing a familiar spreadsheet that can be filled with data manually or via copy-paste. The statistics derived through this process have shown that the gender pay gap in the Boston area is even larger than previously estimated by the U.S. Bureau of Labor Statistics. | Product | Multi-party Computation | |
Danisco and the Association of Sugar Beet Growers | July 2021 | Other | Sugar beet farmers in Denmark have contracts determining how much beet they produce. All beet produced goes to Danisco, the only sugar producer in Denmark. The EU significantly reduced beet subsidies, meaning the country needed to develop a competitive market for trading production rights. A system was developed leveraging multi-party computation to enable confidential bidding to compute a trading price based on supply and demand. This enabled the production quota to be redistributed accordingly, whilst details of individual beet farmers bids remained confidential. 80% of farmers surveyed said that this confidentiality was important to them. The first auction took place in 2008 and is considered the first large-scale, practical application of multi-party computation. | Product | Multi-party Computation | |
Data Sharing Coalition | June 2023 | Health and Social Care | In the Netherlands, a public-private coalition of organisations came together to build better insights into provisions and needs in the elderly care system. In order to make use of sensitive health and care data, privacy tech company, Linksight, worked together with health insurer DSW and the local "Zorgkantoor" (municipal health and social care office), and created a data analysis platform using Multi-Party Computation. The Dutch elderly care system is facing rising care needs and monitoring this demand has been limited by barriers including privacy risks, cost to access data, and fragmentation of data, which is held by different parties. Using the platform (since only the aggregated data is viewed by the parties involved) the risks and costs for sharing and access are lower, and the resultant insights can be used by policy-makers to respond to the need better. The platform is currently used in the Delft, Schieland and Westland region of the Netherlands, but the Data Sharing Coalition highlights an ambition to scale this up to the national level. | Product | Multi-party Computation | |
Duality Technologies | July 2021 | Health and Social Care | This is a framework for genome-wide association studies (GWAS) that leverages homomorphic encryption to keep medical and genomic data secure. The framework has been applied to conduct GWAS of age-related macular degeneration on a dataset of over 25,000 individuals. The system is ~30 times faster than state of the art GWAS schemes based on multi-party computation. | Proof of concept | Homomorphic Encryption | |
Duality Technologies | July 2021 | Finance | This project allows a party to query data that is owned by another party and receive the results without any sensitive parameters being disclosed to the external data owner. It aims to accelerate triage in fraud and anti-money laundering (AML) investigations. The privacy of the entity and the investigation is preserved throughout, even when complex SQL-like queries are made. | Proof of concept | Homomorphic Encryption | |
Enveil | July 2021 | Finance | Enveil has designed and developed an approach for financial institutions to identify matching customer information in external datasets without disclosing information about that customer. This allows financial institutions to investigate suspicious activity without revealing personal information, especially in the case that the customer being investigated is ultimately innocent. This proof of concept has been executed using synthetic data and also showed that information can still be made visible for audit, traceability and trust building where required. | Proof of concept | Homomorphic Encryption | |
Enveil & DeliverFund | July 2021 | Crime & Justice | Enveil, a PETs company, has partnered with DeliverFund, a counter-human trafficking intelligence organisation, to use homomorphic encryption based technology to provide access to a large human trafficking database in the US. DeliverFund's product reduces the time it takes to identify victims and who exploits them. The use of Enveil's technology will allow them and their partner organisations to better access data on counter-human trafficking without exposing PII or other sensitive data. Users of the platform will be able to cross-match and search on DeliverFund's database without revealing the contents of their search or compromising the security of the data they are searching on. | Product | Homomorphic Encryption | |
Estonian Association of Information Technology and Telecommunications (ITL) | July 2021 | Other | In 2011, the ITL proposed collecting key financial metrics from its member companies in order to better understand the state of the telecoms sector. Members expressed concern over the confidentiality of the metrics as they would be sharing them with competitors. ITL chose to partner with cybersecurity firm Cybernetica, who were able to deploy their Sharemind secure computing platform to enable the analysis to be done whilst protecting confidentiality. 17 companies participated, uploading their metrics to the Sharemind platform, which distributed the data across three “computing parties” (CPs). These CPs performed the desired analysis using a multi-party computation protocol to ensure confidentiality. The final results of the analysis were shared with the ITL who disseminated accordingly. The distributed nature of the computation meant no party, including the ITL, ever had direct access to another party’s metrics. | Product | Multi-party Computation | |
Estonian Center of Applied Research (CentAR) | June 2023 | Education | The Estonian Center of Applied Research (CentAR) used multi-party computation to carry out a big data study on the association between students, working during their studies at university, and whether they graduated in time. The project was the largest example of a statistical study on real data, using encryption for data privacy, to date when it was completed in 2015. 10 million tax records were linked to 60,000 education records from a Ministry of Education database by using Sharemind encryption tools, operated by the data owners, which meant that the data was never unecrypted outside of where it was originally stored. CentAR led the study, which was instigated by the Estonian Association of Information and Technology and Telecommunications. The Sharemind MPC system was hosted by the Estonian Information System's Authority, the Information Technology Centre of the Ministry of Finance, and private organisation, Cybernetica. This study demonstrates the use of MPC for building policy insights with large datasets and high levels of precision, while protecting personal data. | Pilot | Multi-party Computation | |
Eurostat | June 2023 | National Statistics | Eurostat, a Directorate-General of the European Commission, has explored the use of Mobile Network Operator (MNO) data for the creation of official statistics on human population movement and mobility. The project analysed data in a trusted execution environment provided by Cybernetica's Sharemind technology to ensure that no individual data was shared, and no individuals could be identified. The product was tested using synthetic data on a population size of up to 100 million, aimed at demonstrating the scalability of the project. In parallel, research was conducted to assess the legal dimensions of the data processing in the project. Eurostat found that the project suggested the rising potential for the use of PETs in creating official statistics. | Proof of Concept | Trusted Execution Environment Synthetic Data | |
July 2021 | Digital | In 2018, Facebook established an initiative to provide researchers access to data in order to study the role of social media in elections and democratic discourse. Data was shared with 60 researchers and consisted of links that had been shared publicly on Facebook by at least 100 unique Facebook users. In 2020, the size of the shared dataset was substantially increased to include approximately 38 million such links with new aggregated information to help researchers analyze how many people saw these links on Facebook and how they interacted with that content – including views, clicks, shares, likes, and other reactions. The data shared was also aggregated by age, gender, country, and month. Facebook leveraged differential privacy to provide privacy guarantees to individuals in the dataset. | Product | Differential Privacy | ||
Frontier Development Lab / Intel | June 2023 | Health and Social Care | In a study on the relationship between radiation exposure in space and cancer, researchers used federated learning methods to access tightly protected human data on astronauts. This collaborative public-private project was by the Frontier Development Lab (FDL), which uses Intel's Open Federated Learning (OpenFL) framework. Researchers were able to use data from institutions including NASA, Mayo Clinic and Nasa's Gene Lab to train and combine models in situ, without requiring the data to be transferred. The main benefit was cost reduction, as the resources required to access and use such data would, otherwise, be a barrier to the research. As an organisation with a mission to provide public and scientific value, this cost-saving impact is an important outcome for NASA and associated organisations. | Pilot | Federated Learning | |
July 2021 | Health and Social Care | A publicly available resource of statistics and visualisations has been built with the intention to show the changes in the population’s mobility habits in response to COVID-19 interventions. The resource is based on location data from Google users who have opted in to location history tracking. Differential privacy is used to protect two metrics: the details of the location a user visited, and the number of visits the user made to each location. | Product | Differential Privacy | ||
July 2021 | Digital | GBoard is a keyboard app for Android and iOS devices. It features next-word prediction, driven by a machine learning model. GBoard utilises federated learning where each mobile device downloads an initial model from a central server, which is further trained on the device using user data local to the device. The weights of the resulting model are periodically communicated back to the central server using a secure aggregation protocol (a form of multi-party computation), which aggregates the weights received from all mobile devices into a new common model. Devices download this new model, and the cycle repeats, such that the model is continuously trained without collecting user data centrally. | Product | Multi-party Computation Federated Analytics | ||
July 2021 | Digital | In order to compute accurate conversion rates from advertisements to actual purchases, Google computes the size of the intersection between the list of people shown an advertisement to the list of people actually purchasing the advertised goods. When the goods are not purchased online and so the purchase connection to the shown advertisement cannot be tracked, Google and the company paying for the advertisement have to share their respective lists in order to compute the intersection size. In order to compute this without revealing anything but the size of the intersection, Google utilises a protocol for privacy-preserving set intersection. Although this protocol is far from the most efficient known today, it is simple and meets their computational requirements. | Product | Homomorphic Encryption | ||
Hazy | June 2023 | Finance | In their Fostering Better Finance project, Accenture used synthetic data to build prototype applications aimed at early identification of vulnerable customers based on their transactions. The project used a synthetic data generator model, produced by Hazy, a privacy tech company specialising in commercial use of synthetic data. The goal of the project was to offer earlier intervention for risks faced by customers, hence, providing better support to the vulnerable individuals with their finances. Accenture reported that use of the synthetic data allowed them to launch the project 8 times faster than expected. This is due to reduced risk from not using sensitive customer data, which, in turn, creates reduced security and governance barriers for working together with the banking client on building the project. This allowed prototypes to be built, in advance, before being tested with real customer data, hence, curtailing the risk of compromising potentially sensitive information about individuals. | Product | Synthetic Data | |
Indian Institute of Science | July 2021 | Transport | In response to the Indian government increasing research into unmanned aerial vehicles (UAVs) and remotely piloted vehicles (RPVs), the Indian Institute of Science's (IISc) has developed "Privaros." This is a set of enhancements to the drone software stack and designed to mitigate privacy concerns around the many sensors, cameras, microphones and GPS capabilities drones are equipped with. Privaros allows a host airspace, such as apartment complexes, university campuses and city municipalities, to determine their own privacy policies and ensure commercial delivery drones are compliant. The working prototype is equipped with a hardware trusted execution environment (TEE) unlike most off-the-shelf drones. Evaluation shows that a drone running Privaros can robustly enforce various privacy policies specified by hosts with only marginal increases to communication latency and power consumption. | Proof of concept | Trusted Execution Environment | |
Indonesia Ministry of Tourism | June 2023 | Tourism | The Indonesian Ministry of Tourism has made use of statistics on mobile phone positioning data to better understand cross-border tourism activity in the country. The task required that information by multiple mobile network operators was analyzed together to understand when users cross between different regions, which are covered by different operators. Mobile positioning data is generally protected, due to the sensitivity of information about individuals' mobility. The data was encrypted and aggregated using Sharemind, a privacy technology from Cybernetica. The mobile positioning data came from a system delivered by Estonian data analytics company, Positium, for the Ministry of Tourism. The Sharemind platform uses a trusted execution environment and does not allow access to unencrypted data nor encryption keys at any stage. The aggregated statistics about the number of users, and the overlap in use between the two largest mobile networks is now used by the Ministry of Tourism as a basis for the creation of tourism statistics. | Product | Trusted Execution Environment | |
Inpher Inc. | July 2021 | Finance | This project allowed a subsidiary of a large bank to build a machine-learning sales prediction model using data from other subsidiaries of the same bank located in other countries. The product ensured that there was no data was disclosed during cross-border movement, enabling the subsidiary in question to exploit 300,000 more data points when training the model. The analyst doing the computation only ever saw the outputs of the model. The inputs were encrypted throughout the process. This product has been commercially deployed and can run on real customer data. There have been no data interoperability issues, enabling full use of distributed datasets. | Product | Homomorphic Encryption Multi-party Computation | |
IQVIA | July 2021 | Health and Social Care | E360 Genomics uses a form of secure computation (tokenization of variants, multi-party is desired, and cell-size rules on statistical outputs). This is being leveraged by Genomics England. | Product | Multi-party Computation De-identification techniques | |
Keyless | July 2021 | Digital | Keyless is a cybersecurity platform that provides privacy-first passwordless authentication and personal identity management solutions for enterprises. They combine biometrics with PETS and a distributed cloud network. Their technology means that enterprises no longer need to centrally store and manage passwords, biometric data and other personally identifiable information. The underlying technology that they use is secure multiparty computation which enabled multiple cloud servers to jointly process identity authentication requests without disclosing any data between them. As such, companies are able to comply with authentication requirements and incur minimal data protection risk. | Product | Multi-party Computation | |
Microsoft | July 2021 | Digital | The Confidential Consortium Blockchain Framework (CCBF) is a system using trusted execution environments that facilitates confidentiality within a blockchain network. Blockchains were designed to prevent malicious behaviours by recording all transactions, making them open for all to see and replicated across hundreds of decentralised nodes for integrity. Within CCBF, confidentiality is provided by trusted execution environments (TEEs) that can process transactions that have been encrypted using keys accessible only to a CCBF node of a specific CCBF service. Besides confidentiality, TEEs also provide publicly verifiable artefacts, called quotes, that certify that the TEE is running a specific code. Hence, integrity of transaction evaluation in CCBF can be verified via quotes and not be replicated across mutually untrusted nodes as it is done in public blockchains. It is worth pointing out that transaction data is replicated in CCBF across a small network of nodes, each executing in a TEE, but for the purpose of fault-tolerance rather than integrity. In addition, Microsoft’s test showed that the CCBF could process 50,000+ transactions per second, demonstrating the scalability of the technology. As a comparison, the public blockchain Ethereum network has an average processing rate of 20 transactions per second, whilst the Visa credit card processing system averages 2,000 transactions per second. The framework is not a standalone blockchain protocol, but rather it provides trusted foundations that can support any existing blockchain protocol. | Product | Trusted Execution Environment | |
Microsoft | July 2021 | Digital | Microsoft Viva is an Employee Experience Platform (EXP) that brings together communications, learning, resources and analytics. There are four sub-services on the platform called Insights, Topics, Learning and Connections. Viva Insights gives employees and managers personalised and actionable insights on various organisational metrics that can help drive productivity and wellbeing experience. The Insights tool uses safeguards like de-identification and differential privacy by default so that personal insights are only available at the individual level and not for managers or leaders of an organisation. | Product | Differential Privacy De-identification techniques | |
Microsoft | July 2021 | Digital | Microsoft has rolled out Password Generator and Password Monitor features in its Edge browser, using a homomorphic encryption service. The password manager collects saved passwords in one place and the monitor alerts the user if passwords have been compromised. The use of homomorphic encryption means that Microsoft never has to decrypt the data, in other words never has access to the actual credentials, but is still able to query the data. | Product | Homomorphic Encryption | |
Netherlands Organisation for Applied Scientific Research (TNO) | July 2021 | Health and Social Care | As part of the BigMedilytics project, funded by the EU's Horizon 2020 program, TNO developed a system to identify patients at risk of heart failure by confidentially combining data of potential indicators held by different organisations, leveraging a multi-party computation (MPC) protocol. The Erasmus MC hospital holds data on the lifestyle of patients, and insurance company Zilverin Kruis holds data on attributes such as hospitalization days and health care usage. The solution consists of two phases. First, a secure inner join protocol is used to identify which patients are present in both datasets. Both parties homomorphically encrypt attributes of the dataset which are sent to a third party, which determines the intersection of the datasets (the third party cannot access the raw data directly, since it's homomorphically encrypted). The encrypted intersection is split into 3 secret shares, which are split across the 3 parties, and an MPC protocol is used to train a regression model. Erasmus MC and Zilverin Krius receive the coefficients of the regression, which they can then use to predict the risk of individual patients. | Proof of concept | Homomorphic Encryption Multi-party Computation De-identification techniques | |
NHS Digital/Privitar | July 2021 | Health and Social Care | The NHS has built a system for linking patient data held across different NHS domains. To protect patient confidentiality, identifiers (such as a patient's NHS number) are pseudonymised through tokenisation. For additional security, the tokenisation differs between different NHS domains. Linking data about a patient held in two domains first requires removing the tokenisation which would expose personal information. To avoid this, a partially homomorphic encryption scheme is used which enables data to be linked without revealing the underlying raw identifiers. | Product | Homomorphic Encryption De-identification techniques | |
NVIDIA and King's College London | July 2021 | Health and Social Care | In 2019, researchers from NVIDIA and King's College London collaborated to train a neural network for brain tumour segmentation using a federated learning approach. They used a dataset from the BraTS Challenge 2018, containing MRI scans from 285 patients, using 242 patients as training data, and 43 patients for testing. Training data was split into 13 shards, each representing a client in the federated setup. A data-centralised model was also trained for comparison. The data-centralised model converged in ~300 training epochs, with 205s per epoch. The federated model: converged in ~600 training epochs, with ~65s per epoch (slowest client). The model performance is comparable between the two setups, although the federated model incurs a tradeoff between privacy and performance, determined by the parameters of the differential privacy setup. | Pilot | Federated Analytics Differential Privacy | |
OpenMined / Twitter Partnership | June 2023 | Social Media Research | Social media company Twitter and open source community OpenMined are working together to pilot methods of improving transparency in social media, by facilitating privacy-enabled access to algorithmic code for researchers. In this way, OpenMined is exploring and implementing the infrastructure for this access, which includes a variety of PETs. One significant example technology used is that of "remote execution" environments, or, federated learning and analytics. Using these PETs, researchers can run code designed to train machine learning algorithms and generate data analytics in a federated way across a network, hence, curtailing the risks which come with a central actor controlling this data throughout the machine learning/analytics process. Initial outcomes of this work have demonstrated that it is possible for researchers to run queries across vast social media algorithms without directly accessing the data themselves. There are now subsequent plans to scale these methods (using federated analytics and learning in addition to other PETs like differential privacy) across multiple online platforms. | Proof of Concept | Differential Privacy Federated Learning Federated Analytics | |
OpenSAFELY | July 2021 | Health and Social Care | OpenSAFELY is a secure analytics platform developed in response to the COVID-19 pandemic, which enables researchers to conduct analysis across millions of patients' electronic health records (EHR). The platform works by leveraging federated analysis, where researchers' analytic code is uploaded to the datacenter where EHR data is kept. The code is executed in the datacenter, with the data kept in situ - data is never moved from where it was originally kept. Researchers are thus unable to download data, mitigating a key risk. The platform provides researchers with dummy data (NOT synthetic data) to develop their code. Once developed, the code must pass a series of automated sanity checks before it is packaged and deployed to the EHR provider's datacenter to execute the analysis. OpenSAFELY has enabled risk factors associated with COVID-19 to be identified, without exposing the personal information of individuals. | Product | Federated Analytics De-identification techniques | |
Owkin | July 2021 | Health and Social Care | French-American startup, Owkin, is using Federated Learning to build a score that predicts the severity of a patient's COVID-19 prognosis. The AI-based scoring model is trained on CT scans of lungs (a routine procedure upon admission to hospital for COVID-19 patients). Its performance surpasses that of all other published score benchmarks. These scores support hospitals in resource management and planning at the frontline. | Product | Multi-party Computation | |
Privitar | July 2021 | Finance | With the use of partial homomorphic encryption, this pilot project allows financial institutions to learn statistics about a population from disparate private and public datasets without collecting any identifiable information. During analysis, raw data from multiple datasets is presented in tokenised form. The results of this project would enable a public authority to gather aggregate statistics about a population which would inform public policy in a privacy-preserving way. Even if a party intercepts the data at any point in the process, they would not be able to decrypt or link the various datasets. | Pilot | Homomorphic Encryption | |
RegulAItion | July 2021 | Other | The public sector gathers workforce information from the private sector (e.g. worker name, date of birth, employment start date). Insurance companies have claims information from their private sector clients (client industry, client size, client workforce size). While the public sector may retain a centralized database, each insurance company only has access to its own data. The public sector and the private sector do not share claims related data with each other. The insurance industry knows that there is a correlation between a corporate organisations average workforce age compared to the average cost of claims by industry. Using the AIR Platform, users are able to privately and securely access both the public sector data and private insurance data. The data remained in situ and under the control of each data holder. Algorithms revealed correlations that improved the participating insurance company’s risk model by 1 to 4%, achieving profitability improvement of US$ 700,000 to US$ 2,800,000 million. The resulting knowledge of correlation was used by the insurance company to provide proactive risk management advice to its customers (such as recommending measures to be taken on construction sites that typically lead to a percentage reduction of workplace accidents). | Proof of concept | Federated Analytics Differential Privacy De-identification techniques | |
RegulAItion | July 2021 | Crime & Justice | The public sector wanted to make a new analytics algorithm available to the public which would be made available as open source to anyone wishing to develop a contract review or access to justice solutions. Accessing large volume of relevant data to train this new algorithm was the challenge. A group of participants from the public sector and private sectors made their data accessible using the RegulAItion Platform. They were able to develop and deploy a new algorithm on each of their data without ever sharing, moving or pooling their data. The collaborative project was completed in 12 weeks. The new algorithm contained 6 models. These models were decomposed into private local models and global shared models. | Product | Federated Analytics Differential Privacy De-identification techniques | |
Replica Analytics | June 2023 | Health and Social Care | In the Canadian province of Alberta, a collaboration of organisations worked together to create synthetic health records data for use in research. Researchers trained a model using 100,000 healthcare records from patients in the province, which they, then, applied to generate synthetic data. Collaborators on this project include a non-profit funder (Health Cities), researchers from the University of Alberta, pharmaceutical company Merck Canada, synthetic data specialists Replica Analytics, and advisors from Alberta Innovates and the Office of the Information and Privacy Commissioner of Alberta. The synthetic data allows students and researchers to undertake projects which would normally be limited due to privacy concerns, related to the sharing of sensitive health data. Hence, this step supports the application of data science to (population) health sciences. | Product | Synthetic Data | |
Secretarium/ Danie | June 2023 | Finance | The DANIE consortium is made up of banks and data providers, who upload their banking data to a shared platform for analysis with a number of aims, including: 1) improving the quality of client data, 2) anti money laundering and 3) fraud detection. The DANIE platform was launched in 2020 and uses encryption and trusted execution environments such that no humans have access to the data that is being processed. DANIE uses a privacy enhancing system provided by Secretarium, and both Secretarium and DANIE are finance initiatives that emerged from Société Générale’s incubator programme in London. Additional benefits of involvement in this collaboration for participant organisations include ensuring EU reporting requirements are being met by preventing fines for reporting inaccurate data, reducing resources expended on data reviews and remediation, and improved environmental performance for data management and analysis due to the efficiencies created by the central processing system, offered by Secretarium. | Product | Multi-party Computation Trusted Execution Environment | |
Signal | July 2021 | Digital | Signal is an open-source, privacy-focused instant messaging app. Signal provides end-to-end encryption of messages, and beyond this aims to collect as little information about its users as possible. The only information stored is a user's phone number as this is required to register with the service. Signal leverages novel security technologies in order to provide features expected by users without collecting data about them. One such example is their use of trusted execution environments (TEEs) - namely, Intel SGX - to allow contact information from a user's phone to be used to find their contacts who are also on Signal. A server-side contact discovery service runs inside the TEE, to which a user uploads their contact information, the service looks for matches in Signal's database of registered users, and information of these matches is returned to the user. Contact information is only decrypted inside the isolated TEE, meaning Signal has no visibility of it. Additionally, SGX supports remote attestation, meaning the client is able to verify that it is the expected contact discovery service code running inside the TEE before using it. | Pilot | Trusted Execution Environment | |
Statice | June 2023 | Insurance | In Germany, insurance services company Provinzial collaborated with data privacy services firm Statice, using their synthetic data to train machine learning models which optimised their predictive analytics (in particular a "next best offer" recommender engine). A key outcome of this was saving over three months that would have otherwise been spent evaluating data privacy risks, thus addressing both expense of company time and data privacy in the process of optimising their systems. | Product | Synthetic Data | |
Statistics Canada | June 2023 | National Statistics | Statistics Canada, National Statistics Office of Canada, has piloted the use of synthetic data to supply hackathon participants with data containing high analytical value. The risk of using real Statistics Canada data was clear, as many relevant resources are census datasets and mortality registries which clearly relate to personally identifiable individuals. Statistics Canada held two hackathons executed in this way and observed that the analytical skills of the participants were improved, while reducing the risk of disclosure. This means that using synthetic data can have significant positive impact on training scenarios, whilst preserving the privacy of data subjects. | Pilot | Synthetic Data | |
Statistics Korea | June 2023 | National Statistics | Statistics Korea, the National Statistics Office of Korea, has begun trying to improve the linking of fragmented government data through a public cloud-based big data system named the "Statistical Data Hub Platform". The provenance of this data stretches across government departments and so it may have significant utility for range of stakeholders operating in different contexts. However, accordingly, certain data may be sensitive and/or reflect identifiable data subjects. Statistics Korea has piloted the use of multiple PETs for this purpose. For example, the pilot involved links small business data, which is encrypted by using homomorphic encryption. | Pilot | Homomorphic Encryption Multi-party Computation Differential Privacy | |
Tsinghua University and Microsoft | July 2021 | Health and Social Care | Medical Named Entity Recognition (NER) is an NLP task which aims to identify entities (e.g. drug names, symptoms) from unstructured medical texts (e.g. patient records, doctor's notes). Microsoft collaborated with Tsinghua University to develop a federated system named FedNER to train a machine learning model to perform NER on a corpus of data held across a number of medical platforms. The model was decomposed into a private local model and a global shared model. Different medical platforms storing information in different formats are able to train the local model without having to wrangle their data into a defined format. This lowers the barrier to participation for any individual medical platform, maximising the amount of data used to train the system, thus enhancing its performance. | Pilot | Federated Analytics | |
United Nations PET Lab | June 2023 | National Statistics | A pilot programme by the newly launched UN PET Lab is exploring how to improve the understanding of international trade using privacy enhancing technologies. The programme was announced in January 2022 and pilots the use of several PETs to produce useful statistics without sharing or compromising the input data. The example applications include verification of import and export quantities between countries by comparing the statistics that 'paired' countries hold on to how much they have sold and bought of a given commodity from the other. This insight is created by the use of multi-party computation and differential privacy, via a peer-to-peer differential data network created by OpenMined. The initiative also involved creation of an enclave technology provided by Oblivious, which hosts analysis in a secure trusted execution environment, such that only query outputs are shared. The technology also ensures that the output cannot be modified after creation. National statistics offices from the UK, the Netherlands, Italy and Canada took part in the programme, which initially used data uploaded on the UN Comtrade portal. A broader objective of this UN PET Lab project is to build an understanding how international data sharing can be improved using privacy enhancing technologies, as the technologies used in this example can be applied for other forms of data. | Proof of Concept | Differential Privacy Multi-party Computation | |
US Census Bureau | July 2021 | Other | The Bureau has leveraged differential privacy to minimise the risk of identification of individuals when publishing statistics from the 2020 Census. The total population in each state will be as counted, but all other levels of geography - including congressional districts down to townships and census blocks - could have some variance from the raw data as a result of noise-injection to facilitate differential privacy. Setting the value of the privacy budget has not been trivial. The value chosen by the Census Bureau’s Data Stewardship Executive Policy committee was far higher than those envisioned by the creators of differential privacy. There are further challenges, with the National Congress of Native Americans expressing concern that DP could adversely affect the quality of statistics about tribal nations. | In development | Differential Privacy |
All content is available under the Open Government License v3.0 except where otherwise stated.