Star 0

Abstract

To allow the benefits of analysis that machine learning algorithms bring, while simultaneously providing user data privacy, we need secure computation models for a broad set of data mining algorithms. We propose to bring secure computation to programming frameworks that support data mining on massively parallel architectures. To address this challenge, we propose a solution that (i) develops a programming paradigm to allow non-cryptography experts to write secure code; (ii) brings parallelism to the secure version of these algorithms; and (iii) that meets the needs for obliviousness, thereby not leaking any private information. For example, we show how to hide the graph structure. Our technique efficiently converts many graph based algorithms (including many machine learning ones) to a distributed oblivious version with minimal communication overhead due to parallelism. Our solution scales linearly with input size and with the number of machines. Importantly, our secure version of graph based algorithms incurs a small overhead, log in the number of graph edges, in comparison to the non-secure parallel version. We demonstrate with 4 common data mining algorithms, that secure computation can be brought into the realm of practicality for big data analysis. Our secure matrix factorization implementation can process 1 million ratings in 13 hours, which is a multiple order-of-magnitude improvement over the only other attempt that needed 3 hours to process 16K ratings.

Slides