A learning theory approach to noninteractive database privacy

Abstract

In this article, we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VC-dimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worst-case utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axis-aligned rectangles of constant dimension over discrete domains.

Keywords

Differential privacyComputer scienceClass (philosophy)Dimension (graph theory)Theoretical computer scienceFunction (biology)Constant (computer programming)Conjunctive queryDatabaseAlgorithmMathematicsCombinatoricsRelational databaseArtificial intelligence

Affiliated Institutions

Related Publications

An overview of data warehousing and OLAP technology

Surajit Chaudhuri , Umeshwar Dayal

Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commer...

1997 ACM SIGMOD Record 2382 citations

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Mikhail Belkin , Partha Niyogi

One of the central problems in machine learning and pattern recognition is to develop appropriate representations for complex data. We consider the problem of constructing a rep...

2003 Neural Computation 7514 citations

SCANPY: large-scale single-cell gene expression data analysis

F. Alexander Wolf , Philipp Angerer , Fabian J. Theis

2018 Genome biology 8088 citations

Item-based top-<i>N</i>recommendation algorithms

Mukund Deshpande , George Karypis

The explosive growth of the world-wide-web and the emergence of e-commerce has led to the development of recommender systems ---a personalized information filtering technology u...

2004 ACM Transactions on Information Systems 2164 citations

Exploring the Limits of Transfer Learning with a Unified Text-to-Text\n Transformer

Colin Raffel , Noam Shazeer , Adam Roberts +6 more

Transfer learning, where a model is first pre-trained on a data-rich task\nbefore being fine-tuned on a downstream task, has emerged as a powerful\ntechnique in natural language...

2019 arXiv (Cornell University) 8299 citations

Publication Info

Year: 2013
Type: article
Volume: 60
Issue: 2
Pages: 1-25
Citations: 255
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A learning theory approach to noninteractive database privacy

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

255

OpenAlex

Cite This

APA Style

                            
                                    Avrim Blum, 
                                
                                    Katrina Ligett, 
                                
                                    Aaron Roth
                                
                            (2013). 
                            A learning theory approach to noninteractive database privacy. 
                            Journal of the ACM
                            , 60
                            (2)
                            , 1-25.
                            https://doi.org/10.1145/2450142.2450148

Identifiers

DOI: 10.1145/2450142.2450148