Giter Site home page Giter Site logo

blindstore's Introduction

Introduction to Blindstore

Blindstore is a private information retrieval data store. It provides an online lookup service which returns the correct answer without ever knowing what the question was.

Currently this repo is a stub.

  • For the core cryptography implementation, see libshe.
  • For a working proof of concept of the Blindstore concept, please see the blindstore-old-scarab repository. The implementation there is based on another encryption scheme.

This repo will contain an implementation of Blindstore in C++ based on the V-DGHV scheme of somewhat homomorphic encryption (libshe) presented in [1].

Disclaimer

The library presented here is beta software and should not be used for any mission critical applications. No warranty expressed or implied is given.

References

[1] Yi, Xun; Kaosar, Mohammed Golam; Paulet, Russell; Bertino, Elisa, "Single-Database Private Information Retrieval from Fully Homomorphic Encryption", Knowledge and Data Engineering, IEEE Transactions on , vol.25, no.5, pp.1125,1134, May 2013

blindstore's People

Contributors

omegak avatar bogdan-kulynych avatar blipp avatar

Stargazers

Arun S Kumar avatar Morten Dahl avatar Jake Cowton avatar C For C's Sake avatar jose nazario avatar Miroslav Svitok avatar Marcel avatar  avatar Daniel Mendler avatar Oliver Keller avatar brett avatar Benjamin Black avatar Georgios Kokosioulis avatar Romain Bazile avatar Matthew Wraith avatar  avatar Piotr Rożek avatar Wojciech Bederski avatar Sebastian Pawluś avatar  avatar Asko Anti avatar Antonios Manaras avatar Yigit avatar Nate Smith avatar  avatar Thomas Durieux avatar Igor Babuschkin avatar  avatar Jakub Žitný avatar Yoan Blanc avatar Alex Arvanitidis avatar Anastasios Andronidis avatar João Gonçalves avatar Christoph Sterz avatar Michał Jabczyński avatar Pablo de Castro avatar Davide Kirchner avatar Harry Cutts avatar  avatar  avatar Tommaso Papini avatar José Molina Colmenero avatar  avatar

Watchers

 avatar Petr Novák avatar Yigit avatar Oliver Keller avatar  avatar  avatar Anastasios Andronidis avatar Davide Kirchner avatar James Cloos avatar  avatar Alfonso De Gregorio avatar Harry Cutts avatar  avatar Romain Bazile avatar Michał Jabczyński avatar Tommaso Papini avatar Bilge avatar C For C's Sake avatar

blindstore's Issues

Random performance

Request execution can take from 0.22s to 12s. We must find out what causes this nondeterministic behaviour. Possible suspects:

  • serialization (unlikely)
  • python-C communication layer (unlikely)
  • error in Flask (very unlikely)
  • inherently unstable performance of the homomorphic operations or encryption/decryption (need to benchmark directly in C)

image

Request times (100 iterations):

9.07
0.7799999999999994
12.35
7.059999999999999
4.390000000000001
12.230000000000004
2.1299999999999955
0.21999999999999886
0.22000000000000597
1.0399999999999991
3.289999999999999
1.0300000000000011
2.0700000000000003
4.289999999999999
5.590000000000003
0.8599999999999994
0.6700000000000017
2.559999999999988
0.7400000000000091
7.670000000000002
5.359999999999999
4.239999999999995
2.3700000000000045
3.019999999999996
8.019999999999996
9.429999999999993
6.8500000000000085
2.8799999999999955
1.4500000000000028
1.5600000000000023
9.649999999999991
7.740000000000009
0.37999999999999545
0.27000000000001023
0.4099999999999966
1.2299999999999898
3.539999999999992
0.6500000000000057
0.27000000000001023
1.490000000000009
0.23999999999998067
0.28000000000000114
1.9399999999999977
5.670000000000016
7.6200000000000045
11.849999999999994
3.799999999999983
1.670000000000016
1.210000000000008
0.6599999999999966
6.189999999999998
1.3499999999999943
1.490000000000009
1.6999999999999886
3.0500000000000114
4.149999999999977
4.25
4.550000000000011
1.2299999999999898
1.990000000000009
2.490000000000009
3.7299999999999898
6.460000000000008
1.4099999999999966
8.25
2.75
9.960000000000008
1.8499999999999943
3.2199999999999704
2.2200000000000273
0.660000000000025
3.9799999999999613
0.9700000000000273
0.9599999999999795
4.1200000000000045
1.3600000000000136
0.9300000000000068
1.079999999999984
3.2799999999999727
8.260000000000048
0.9599999999999795
3.069999999999993
3.829999999999984
1.8199999999999932
2.1200000000000045
0.5299999999999727
1.2200000000000273
0.2999999999999545
3.3600000000000136
1.25
7.470000000000027
2.8700000000000045
0.5
2.3799999999999955
0.5299999999999727
2.519999999999982
1.8500000000000227
10.949999999999989
2.0300000000000296

Random number in encryption not chosen exactly like in the paper

In the encryption function of libshe, a random integer in [1, 2^s) is chosen, but following the paper, this should be (-2^s, 2^s). We made this change because decryption failed when using the interval from the paper. We have to investigate why.

 // Chooses random noise r from [1, 2^s)
mpz_class bound;
mpz_ui_pow_ui(bound.get_mpz_t(), 2, s);
auto r = _random_mpz(1, bound-1);

In fact, this issue belongs into the repository of libshe, but as it will be merged into blindstore soon, I decided to document this here.

Code parallelization

Today me, @bogdan-kulynych and @northdpole we had a long conversation about parallelization and optimizations. We broke the code into steps and we concluded on how we should proceed with the subject.

What we found:

  • The XOR operation between the clients' encrypted key and the encrypted vector of 1's, can be moved on the client side
  • We can optimize a lot the needed space for the calculation matrices
  • All intermediate data seems to be independent and thus we can parallelize in a high degree.

How we proceed:

  • Create flexible benchmarking scripts that will create test input date
  • Test each part of the algorithm with different sizes of input to find bottlenecks

`libscarab` entropy source to be reviewed

Since the entropy source for libscarab used for encryption is current time, malicious PIR server can use the obtained Public Key and time information to guess the encrypted bits:

encrypted_bit = query.cipher_index[i]
pk = query.pk
for time in possible_milliseconds:
    guess0 = pk.quasi_encrypt(0, time)
    if guess0 == encrypted_bit:
        print ('Encrypted bit i was 0')
    guess1 == pk.quasi_encrypt(1, time)
    if guess1 = encrypted_bit:
        print ('Encrypted bit i was 1')

Merge libshe into blindstore repository

The library libshe will not be useful for any other application than the Private Information Retrieval for which the algorithm was made. Thus, we decided to merge it into the blindstore repository. Please find a way to preserve the history when merging.

Benchmarks for server

Benchmarks for Blindstore server only for further analysis of performance improvement when we implement parallelization.

Decide on the granularity of the choice of privacy

If we want to expose the possibility to choose the level of privacy, we have to decide on the details.

The simplest approach I see at the moment is to just say which part of the database I want to query, given that the database will be split into n parts of equal size. Thus, the server would expect a query to consist of: [encrypted_index, number of parts to split the database into, part to query] and as an example [encrypted_index, 10, 2]. The server then would calculate the according index range for the third of ten parts and run the cryptographic algorithm just on these indices.

A more sophisticated approach would be to let the client specifiy an index range: [encrypted_index, start_index, end_index]. This provides the possibility to choose the database size more freely.

The most general approach would be to completely define the database subset that should be queried. The client could then send a list of index ranges.

Implement V-DGHV somewhat homomorphic scheme

It appears that the V-DGHV scheme proposed in Single-Database PIR from FHE allows for significant performance boost compared to using FHE as a black box. Using this scheme, we don't need to encrypt the indexes.

To achieve this, we need to marry PIR protocol to this particular HE scheme, so no abstraction on HE is needed anymore.

Decide on licensing

We need to decide on a license for Blindstore. This would need to be compatible with the licenses of the libraries which we use, and the aims and objectives of the project.

Investigate possible bug in set call

During the demos, a set call for row 1 appeared to change two rows (1 and 2) in the database, as displayed on the Web display. I attempted (in a very short and hurried period of time) to reproduce the error on master, but could not. We need to discover whether this was an issue with the demo Web display, changes on the demo branch, or an issue with master.

Collect and provide information about the impact of choosing a lower privacy level

By querying only a certain part of the database, the user leaks information to the database. Thus, the feature we provide by exposing the choice of privacy to the user, has to be used wisely. We should give some guidelines to the user of the library: How can the user decrease his privacy but not expose too much information? How can the user choose the part of the database such that the server learns as less as possible? What should be the minimum size of the part of the database? Should the index range be chosen randomly around the index the user queries?

Abstraction on FHE

If we want to benchmark different FHE libraries, first we would need to abstract the interface with them.

Don't use C-style new and free

Hey guys,
pretty awesome work you're doing. Just a quick remark.

I would strongly advise against C-style new and free as in for example https://github.com/blindstore/blindstore/blob/master/server/array.cpp where you do

blindstore_array_t* blindstore_array_make() {
    return new blindstore_array_t();
}

void blindstore_array_free(blindstore_array_t* blindstore_array) {
    delete blindstore_array;
}

Use C++ classes instead. Destructors are one of the most awesome things C++ has to offer that C doesn't. A possible case where this is important is sketched out below:

auto a = blindstore_array_make();
throw SomeException;
blindstore_array_free(a);

And a is leaked...

The only possible way this could work well that I can think of is if you always use std::unique_ptr (or std::shared_ptr respectively) with the appropriate deleter set as in e.g. (not tested)

using unique_blindstore_array_ptr = std::unique_ptr<blindstore_array_t, [&](blindstore_array_t *arr) { blindstore_array_free(arr); }>;

(Actually the default deleter should work fine but since you define a function wrapper around it that might be used to do more than just delete it might be better to use it as the custom deleter...)

Language choice and parallelization tool

Because the main issue with the current prototype is performance, we should re-consider using only python for implementig the protocol functions. Some brainstorming lead to several proposals:

Language choice

(to be matched with the parallelization choice)

Priority is on ease to write and maintain

  • Stick with Python and C
  • Rust (by Mozilla)
  • Go (by Google)
  • C++ (much less likely)
  • Scala/Java (there's fhe-core library for FHE)
  • Other solutions?

Parallelizing tool

(to be matched with the language choice)

  • Hadoop
  • Apache Storm – any language
  • Apache Spark – Python, Scala or Java
  • Other solutions?

XPIR and Blindstore

Hello all,
I am one of the developers of XPIR and I recently stumbled across your project, which seems to have a ton of good ideas !

As you have probably though a lot about making PIR usable I wondered if you would have some suggestions of ideas we could integrate to XPIR from this project or things that we got completely wrong and that we should rethink. Of course if any of you is interested in joining our project it is feasible too ! On the opposite side, if there are things you would like to get from XPIR we can of course help :)

Best,

Carlos

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.