July 28, 2022
Meta is launching a new research project to make progress in assessing and improving our technologies to advance fairness
Adapting a well-established privacy-preserving method called secure multiparty computation (SMPC) to safeguard information, we are asking people on Instagram to take a survey in which they can voluntarily share their race or ethnicity.
This information is important because it’s challenging to address what we can’t measure. With this data, we can begin to better determine how well our products work for people of different races and ethnicities.
Because of the personal nature of this type of demographic information, we are using secure multiparty computation for this initiative to prevent Meta from learning individual people’s survey responses.
The detailed protocol and cryptographic fundamentals provide a strong guarantee that data used in this method can only be processed as documented and cannot be revealed in the clear to any party who should not have access to them.
This project is part of Meta’s broader, long-term effort to help ensure that our products are built responsibly and that our products benefit the people who use them.
Demographic data is often essential in assessing whether a product or process treats all groups fairly. Without this information, many companies and organizations have found it impossible to fully evaluate how well systems perform across different communities, such as for people of different races or ethnicities. But gathering demographic data raises important concerns about how to protect people’s privacy.
To make progress in assessing and improving our technologies to advance fairness and inclusion with respect to race in US, Meta’s Responsible AI, Instagram Equity, and Civil Rights teams are introducing an off-platform, voluntary one-question survey of people who use Instagram in the United States to voluntarily share their race or ethnicity. To safeguard the survey responses, the Responsible AI team at Meta, in consultation with third-party experts, has adapted a data encryption method called secure multiparty computation (SMPC). Using SMPC as documented in this technical paper, Meta cannot at any point access encrypted data linking a specific user to their specific survey responses. As a result of this and additional technical and procedural safeguards, individual survey responses can not and will not be used in our ads system.
Meta is committed to building socially responsible AI systems that treat individuals and communities fairly. In keeping with the recommendations of the civil rights leaders who conducted Meta’s Civil Rights Audit, this new initiative will help us better gauge whether people’s experiences with our technology differ across race.
Recognizing the personal nature of this data, we worked to keep people’s privacy at the forefront by adapting well-established privacy-preserving methods. In partnership with the privacy-focused technology company Oasis Labs, we developed an approach that draws on SMPC, a subfield of cryptography that permits analysis of encrypted data in aggregate. Data is securely distributed among multiple facilitators, who together can perform computations over the combined encrypted information without exposing their individual shares. First developed in the 1970s and 1980s, SMPC has been used for years in auctions, distributed voting, and statistical analysis. More recently, it was used to evaluate fairness in social impact contexts such as pay equity.
“As a technology partner, we are excited to have co-designed and built the SMPC system to assess fairness in AI models, while protecting users’ privacy,” said Professor Dawn Song, Founder of Oasis Labs. “This is an unprecedented use of such cutting-edge cryptographic techniques for a large-scale, real world use case. We hope this can help inspire the community to expand the use of privacy-preserving technologies, and together build towards responsible AI and responsible data use for a fairer and more inclusive society.”
First, an external partner securely collects and encrypts survey data. A notice at the top of Instagram’s Feed invites people to share information about their race and ethnicity. People who click on the notice are redirected to the survey provider and assigned a freshly generated random identifier (RID). To prevent demographic data from being linked to an individual social media account, the survey provider indexes responses by RID but does not have User IDs, or any identifier they could link back to a particular individual. The survey provider does not know (and will not learn) to whom the RID-associated data belongs. Meta, meanwhile, knows who responded to the survey and holds the mapping from RIDs to User IDs (this is necessary in order to later query internal model data to be assessed using this method), but does not have access to the survey responses with demographic information.
A method called additive secret sharing is central to the privacy-preserving nature of our approach. The survey provider encrypts and splits the responses into fragments, or shares, that sum to reveal the original values. The survey provider then distributes the shares among several third-party facilitators, including Northeastern University, Texas Southern University, a Historically Black College/University (HBCU), and the University of Central Florida, a Hispanic-Serving Institution (HSI). This secret sharing scheme guarantees that no information about the encoded vector can be inferred from any incomplete set of shares. The cryptographic treatment renders individual responses indecipherable to facilitators, so they can neither link any response back to a known individual nor decipher any of the de-identified responses. The only way to recover the private information is for all shares to be combined. Within 30 days after uploading the encrypted and split data, the survey provider deletes raw responses, retaining them only as long as necessary to ensure against accidental loss of data during the next stage.
Once the data is collected, encrypted, and split, it will be available for analysis, without enabling raw survey responses to be shared with Meta. To assess fairness using this data, Meta computes a given metric, such as a classifier’s output for all survey respondents (remember that Meta knows who responded to the survey but not their answers). Meta then encrypts these values and uploads these de-identified data, indexed by RIDs, to the facilitators and identifies the combination of demographic attributes over which it wants to compute the sum of the given metric. The facilitators can then perform the computation over their de-identified response shares and the encrypted values, add noise to ensure differential privacy of the output, and return the combined, de-identified computations in an encrypted format to Meta. When Meta decrypts this response, it learns nothing except for a single number that is the sum of the classifier’s outputs over the relevant demographic attributes, plus any residual differential privacy noise.
"Meta posed an important question about how the latest secure computation techniques could be used to answer questions about the fairness and equity of ML models," said Dr. Abhi Shelat, professor of computer sciences at Northeastern University. "Together with Oasis and Meta, we were able to devise a new cryptographic protocol that enables privacy and measurement efficiently. We are thrilled that we were able to implement this protocol and I am proud to have been able to participate as one of the independent parties in the computation."
We have built a custom tool to run SMPC analysis. Similar to Bayesian Improved Surname Geocoding, Meta analysts will only be able to leverage SMPC analysis tools after receiving approval to do so through an internal, structured governance process and Privacy Review process. Requests will be reviewed by a committee that includes representatives from Meta’s Civil Rights, Responsible AI, data science, policy, and legal teams. Proposals also must pass Meta’s privacy review, in which internal teams identify and work to mitigate any broad privacy risks they identify. The governance process adds additional layers of assurance that the survey responses will be used only for approved purposes.
Although its complexity may initially limit the set of operations and techniques we can apply, our method has the potential to work with many types of measurements. We could use it, for example, to determine whether content produced by people of certain races is achieving disproportionate reach, or whether the performance of a model for people from particular ethnic groups is similar across groups. For example, analysis we conduct with this information might help us better understand experiences different communities may have when it comes to how we rank content on Instagram.
Based on what we learn from the implementation of this initial survey on Instagram, we hope to expand this approach to our other platforms and explore how it might be adapted to support fairness work elsewhere. This versatility is a significant benefit of the SMPC approach, and one we intend to highlight for other companies and organizations that are facing similar challenges. We are committed to working in this space for the long run.
This is only one step in a longer journey. In the future, we hope to identify additional privacy-preserving approaches that can support more complex research, such as longitudinal and causal studies. We look forward to engaging with advocates, academic experts, policymakers, and peer companies to continue advancing equity and inclusion in our products while protecting people’s privacy.
Director, Responsible AI
Vice President and Deputy General Counsel, Civil Rights