EthereumSkipIndex

Description

This repository includes all code used to produce the results presented in [1]. The paper presents the BF skip index, a data structure based on Bloom filters [2] that can be used for answering inter-block queries on blockchains efficiently.

Overview

The BF skip index was implemented taking the Ethereum blockchain as a reference. Indeed, each Ethereum block already contains a Bloom filter within its header, in order to provide a compact representation of the events raised by the transactions within the block. The experiments proposed in the article (and implemented in this repository) serve a dual purpose.

First, we conducted an historical analysis of events within the first 15 million blocks of the Ethereum blockchain. The goal was to evaluate the feasibility of using default logsBloom filters for building a BF skip index.
Then, we tested our implementation of BF skip indexes and the corresponding algorithm for finding the first occurrence of an event using event data from the CryptoKitties Core smart contract (which can be viewed here).

Data

Due to space constraints, all data needed to reproduce the results is stored in this Zenodo repository. Before running the experiments, all downloaded files should be placed in the data folder of this GitHub repository.

The linked Zenodo repository also includes a detailed description of the data set and its structure.

Documentation

The entire project was implemented using Java and Python. Specifically, all code for constructing Bloom filters, BF skip indexes, and for the simulation of search algorithms was written in Java. Python was used for query generation and result analysis. In particular, Jupyter Notebooks were used to produce and visualize the results interactively.

Requirements

The code was tested using the following software and libraries, which are required for reproducing all experiments.

Oracle Java JDK (v. 19.0.1)
Python (v. 3.8.9) with the following libraries:
- Jupyter Notebook (v. 6.5.2)
- Pandas (v. 1.5.2)
- NumPy (v. 1.23.5)
- Matplotlib (v. 3.6.2)
- SciPy (v. 1.9.3)

The Java implementation depends on the following libraries (already included in the lib folder).

Google Guava (v. 31.1)
LevelDB JNI (v. 1.8)

Hardware

The experiments have been executed on the following hardware.

The Java code has been tested on a machine running Ubuntu Linux, with an 8-core Intel Xeon 5218 CPU @ 2.3 GHz and 256 GB of RAM.
The data analysis with Jupyter Notebooks has been carried out on an Apple MacBook Air with a dual-core Intel i7 CPU @ 1.7 GHz and 8 GB of RAM.

Important: to run the Java simulations we recommend at least 150 GB of free disk space.

How to compile

To build all Java classes with ease, you can use the supplied makefile. Just open a terminal in the current folder and type make.

If the make utility is not available in your system, typing the following command from the root directory of the repository should be sufficient.

javac -cp ".:./lib/*" src/skip/*.java -d bin/

Javadoc

To generate Javadoc for the source code, you can use the supplied makefile. Type make doc and the documentation will be placed in the doc folder of the repository. If make is not available, type the following command:

javadoc -cp ".:./lib/*" src/skip/*.java -d doc/

How to run

This section contains the instructions needed to reproduce the experiments presented in the paper. The results of our experiments are reported in Sections 8.1, 8.2 and 8.3 of the paper. The suggested order for running the experiments aligns with the order in which they are presented in the paper.

Important. First, make sure you have all the downloaded files from the Zenodo repository in the data folder. Also, make sure that you have extracted the required compressed files as described here.

Filter analysis. To obtain all results of Section 8.1, open the filters.ipynb notebook and execute all cells. This should create 4 output plots (i.e., Figure 5 in the paper) that will be placed in the pictures folder. The plots are as follows.

File	Description
`pictures/dist_ones.pdf`	Plot of Figure 5(a)
`pictures/temporal_ones.pdf`	Plot of Figure 5(b)
`pictures/dist_keys.pdf`	Plot of Figure 5(c)
`pictures/temporal_keys.pdf`	Plot of Figure 5(d)

Index analysis. To obtain all results of Section 8.2, open the skip.ipynb notebook and execute all cells. This should create 5 output plots and 2 CSV files containing queries for the simulation. Plots will be placed in the pictures, while CSV files will be saved in the data directory.

File	Description
`pictures/bf-sparse.pdf`	Plot of Figure 6(a)
`pictures/bf-normal.pdf`	Plot of Figure 6(b)
`pictures/bf-saturated.pdf`	Plot of Figure 6(c)
`pictures/cryptokitties_frequency_cumul.pdf`	Plot of Figure 7(a)
`pictures/cryptokitties_frequency_perc.pdf`	Plot of Figure 7(b)
`data/queries_birth.csv`	Data set of queries for the Birth event
`data/queries_transfer.csv`	Data set of queries for the Transfer event

Query analysis. Once the query data sets have been generated, you can execute the following commands starting from the main directory of the repository. Notice that the execution of the Bash scripts may take some time, as they perform intensive computations (e.g., index construction and multiple query simulations) on a data set of 1 million Ethereum blocks.

Run build_filters.sh to build the Bloom filters for the entire data set.
Run build_index.sh to construct the BF skip index.
Run build_storage.sh to construct the event storage database.
Run test_query.sh to launch the query simulation procedure.
Open the query.ipynb notebook and execute all cells. This will analyze the results of the previous steps.

These steps should create the following output files and directories. Note that the BF skip indexes of all blocks are stored in LevelDB key-value databases. The four plots created in the pictures constitute the content of Figure 8.

File	Description
`data/filters_8K`	Binary file containing all Bloom filters
`data/filters_8K_m`	Binary file containing all modified Bloom filters
`data/index_8K_7`	Directory of the BF skip index LevelDB database
`data/index_8K_7_m`	Directory of the BF skip index LevelDB database (with modified filters)
`data/storage`	Directory of the event storage LevelDB database
`data/queries_birth_res.csv`	Results of queries for the Birth event
`data/queries_birth_res_m.csv`	Results of queries for the Birth event (with modified filters)
`data/queries_transfer_res.csv`	Results of queries for the Transfer event
`data/queries_transfer_res_m.csv`	Results of queries for the Transfer event (with modified filters)
`pictures/query_birth.pdf`	Plot of Figure 8(a)
`pictures/query_transfer.pdf`	Plot of Figure 8(b)
`pictures/query_birth_m.pdf`	Plot of Figure 8(c)
`pictures/query_transfer_m.pdf`	Plot of Figure 8(d)

Inter-block time. The time.ipynb notebook contains the calculations for deriving the average inter-block time discussed in Section 8.2.

References

Loporchio, Matteo et al. "Skip index: supporting efficient inter-block queries and query authentication on the blockchain." (2023).
Bloom, Burton H. "Space/time trade-offs in hash coding with allowable errors." Communications of the ACM 13.7 (1970): 422-426.
Wood, Gavin. "Ethereum: A secure decentralised generalised transaction ledger." Ethereum project yellow paper 151.2014 (2014): 1-32.

mloporchio / ethereumskipindex Goto Github PK