A Practical Method to Reduce Privacy Loss when Disclosing Statistics Based on Small Samples
Raj Chetty, John Friedman
May 2019

Releasing statistics based on small samples – such as estimates of social mobility by Census tract, as in the Opportunity Atlas – is very valuable for policy but can potentially create privacy risks by unintentionally disclosing information about specific individuals. To mitigate such risks, we worked with researchers at the Harvard Privacy Tools Project and Census Bureau staff to develop practical methods of reducing the risks of privacy loss when releasing such data. This paper describes the methods that we developed, which can be applied to disclose any statistic of interest that is estimated using a sample with a small number of observations.

We focus on the case where the dataset can be broken into many groups (“cells”) and one is interested in releasing statistics for one or more of these cells. Building on ideas from the differential privacy literature, we add noise to the statistic of interest in proportion to the statistic’s maximum observed sensitivity, defined as the maximum change in the statistic from adding or removing a single observation across all the cells in the data. Intuitively, our approach permits the release of statistics in arbitrarily small samples by adding sufficient noise to the estimates to protect privacy. Although our method does not offer a formal privacy guarantee, it generally outperforms widely used methods of disclosure limitation such as count-based cell suppression both in terms of privacy loss and statistical bias.

We illustrate how the method can be implemented by discussing how it was used to release estimates of social mobility by Census tract in the Opportunity Atlas. We also provide a step-by-step guide and illustrative Stata code to implement our approach.



Example Code and Implementation Guide


Replication Package


Replication Package

Dataverse Repository