Giter Site home page Giter Site logo

pentesting / ethdrain Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cyberfund/ethdrain

0.0 3.0 0.0 41 KB

Python script allowing to copy the Ethereum blockchain towards ElasticSearch, PostgreSQL and csv in an efficient way by connecting to a local RPC node

Python 100.00%

ethdrain's Introduction

Ethdrain

Python 3 script allowing to copy and index the Ethereum blockchain in an efficient way to ElasticSearch, PostgreSQL, csv by connecting to a local node supporting RPC (tried with Parity).

I hardcoded the use of Elasticsearch but feel free to fork it to support others.

Pull requests are welcome!

As of now, this tool saves all block data as well as the related transaction data. The relation is kept as follows:

  • "ethereum-block" documents have:
    • a "transactionCount" property, with the count of their respective transactions.
    • a "transactions" property, with is an "array" of transaction hashes.
  • "ethereum-transaction" documents have:
    • a "blockNumber" property, refering to their parent block.
    • a "blockTimestamp" property, refering to their parent block's timestamp.

The value of a transaction is stored in ether, not in wei and a sum is available at block level ("txValueSum" field).

The following fields are converted from their hex value to a real number:

  • Block number
  • Gas limit
  • Gas used
  • Size
  • Transaction value

I was able to download the entire blockchain (3'400'000 block approx.) using this tool.

You can customize most of the useful parameters by tweaking the constants in the script:

# Elasticsearch maximum number of connections
ES_MAXSIZE = 10
# Parallel processing semaphore size
SEM_SIZE   = 256
# Size of chunk size in blocks
CHUNK_SIZE = 500
# Size of multiprocessing Pool processing the chunks
POOL_SIZE  = 8

Basic examples

# Indexing blocks 0 to 5000
> ./ethdrain.py -s 0 -e 5000

# Indexing blocks 3'000'000 to the latest one
> ./ethdrain.py -s 3000000

# Starting from the latest block indexed by ES, indexing up to block 3'500'000
> ./ethdrain.py -e 3500000

# Automatic mode (could be used in a cron job).
# Starting from the latest block indexed by ES to the latest one available on the local node
> ./ethdrain.py

Continuous sync

In order to perform continuous sync of the blockchain, you can run the script without any parameters and use the watch command:

# Will index the missing block in elastic search every 10 seconds
watch -n 10 ./ethdrain.py

Usage

>  ./ethdrain.py -h
usage: ethdrain.py [-h] [-s START_BLOCK] [-e END_BLOCK] [-f FILE] [-u ESURL]
                   [-m ESMAXSIZE] [-r ETHRPCURL]

optional arguments:
optional arguments:
  -h, --help            show this help message and exit
  -s START_BLOCK, --start START_BLOCK
                        What block to start indexing. If nothing is provided,
                        the latest block indexed will be used.
  -e END_BLOCK, --end END_BLOCK
                        What block to finish indexing. If nothing is provided,
                        the latest one will be used.
  -f FILE, --file FILE  Use an input file, each block number on a new line.
  -es ESURL, --esurl ESURL
                        The elasticsearch url and port. Accepts all the same
                        parameters needed as a normal Elasticsearch client
                        expects.
  -m ESMAXSIZE, --esmaxsize ESMAXSIZE
                        The elasticsearch max chunk size.
  -pg POSTGRESURL, --postgresurl POSTGRESURL
                        The PostgreSQL url and port. Accepts all the same
                        parameters needed as a normal PostgreSQL client
                        expects.
  -r ETHRPCURL, --ethrpcurl ETHRPCURL
                        The Ethereum RPC node url and port.
  -o OUTPUT, --output OUTPUT
                        System for output data from Ethereum (may be:
                        "postgres", "elasticsearch","csv").

Benchmarks

Few benchmarks of copying blocks into ElasticSearch as well as all related transactions on an Intel i7-6700 @ 4GHz and 32GB of RAM. The POOL_SIZE parameter was set to 12.

# start block end block # of blocks time taken (in minutes)
1 0 500'000 500'000 16
2 500'000 2'000'000 1'500'000 87
3 2'000'000 3'000'000 1'000'000 60
4 3'000'000 3'475'450 475'440 39

Planned features

  • Support of a graph database
  • Indexing of addresses / contracts

To-do

  • Friendly help/doc inside the script
  • Include some unit-testing

ethdrain's People

Contributors

vkobel avatar toadkicker avatar snedashkovsky avatar

Watchers

James Cloos avatar Vadim Toptunov avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.