Star 0

Abstract

Query privacy in secure DBMS is an important feature, although rarely formally considered outside the theoretical community. Because of the high overheads of guaranteeing privacy in complex queries, almost all previous works addressing practical applications consider extremely limited queries (e.g., just keyword search), or provide an extremely weak guarantee of privacy. (In contrast, it is well known how to protect the DB data from unauthorized disclosure.) In this work, we address a major open problem in private DB: efficient sublinear search for arbitrary Boolean queries. We consider scalable DBMS with provable security for all parties, including protection of the data from both server (who stores encrypted data) and client (who searches it), as well as protection of the SQL query, and access control for the query. We design, build, and evaluate the performance of a rich DBMS system, suitable for real-world deployment on today medium-to large-scale DBs. On a modern server, we are able to query a formula over 10TB, 100M-record DB, with 70 searchable index terms per DB row, in time comparable to (insecure) MySQL (many practical queries can be privately executed with work 1.2-3 times slower than MySQL). We support a rich query set, including searching on arbitrary boolean formulas on keywords and ranges, support for stemming, and free keyword searches over text fields. We identify and permit a reasonable and controlled amount of leakage, proving that no further leakage is possible. In particular, we allow leakage of some search pattern information, but protect the query and data, provide a high level of privacy for individual terms in the executed search formula, and hide the difference between a query that returned no results and a query that returned a very small result set. We also support private and complex access policies, integrated in the search process so that a query with empty result set and a query that fails the policy are hard to tell apart. Ours is the first system that achieves this tradeoff of performance and privacy while supporting such a rich query set. Indeed, there has been no previous work achieving practical sublinear computation, arbitrary boolean formulas, and our level of provable privacy, even without implementation.