storjold / downstream-node Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 6.0 33.34 MB

Verification node for the Storj network.

Home Page: https://live.driveshare.org

License: MIT License

Python 98.44% HTML 1.56%

downstream-node's People

Contributors

Stargazers

Watchers

Forkers

wiggzz nameone zhmocean davidxv ninjadevelper thewewokachronicle

downstream-node's Issues

Expanding Test File Size

Below is a rough outline of how to approach scaling the file test size from 100-byte files to maximum disk capacity. We make use of our existing algorithm and libraries. This is allow us to stress test our network capacity(in terms of drive space), providing meaningful data for later.

1) Generate a Master List of Test Files Using RandomIO

Find @EmergentBehavior existing list.
We generate a large list of test files, with each test file being approximately 100MB in size. This list should contain the parameters to generate the file with RandomIO, as well the as the hash. We should generate at least 1 PB of data this way. This list will be made public and called the Master List.

2) Generate Hash Challenges Using Heartbeat

For simplicity and scale we will use deterministic hash challenges, as described in the whitepaper. We won't use merkle trees though. This will take a very long time, and some compute power but we will generate a pool of hash challenges for each file on the Master List. This list must remain private.

3) Integrate into Downstream Node

Verification nodes must store these pools of challenges in a DB, and use excess CPU capacity to refresh the challenges file by file over time. Farmers will generate and store as much of the master list as they can. Farmers will all have the same files, but the verification node will issue a random challenge from the pool for each farmer stores, so that each farmers will provide a unique response.

4) Integrate into Downstream Farmer

Farmers simply need to be able to use the updated heartbeat to respond to challenges.

Ensure Data is Statistically Random

Will we allow unencrypted content onto the Storj network, especially if it is below the smallest chunking size? I think we should consider doing a statistical randomness test of chunks.

Removing Test File from Repo

You really aren't supposed to put large files in a Github repo(its pretty slow too). Perhaps instead we can have the user wget it from a web server, or have it automatically downloaded in the install/setup process.

Node Status API

This is the API spec for a node. It is required to produce the following information for each farmer:

Farmer ID (token hash)
SJCX address
Geographic Location
Percentage uptime since creation
Number of heartbeats completed
Whether the farmer is currently online
A hash of the IP address of the farmer
Total data size hosted by the farmer in bytes

The list of all farmer ids can be retrieved with

 GET /api/downstream/status/list

which returns all the farmers ids

{
   "farmers":
   [
      "fa1e4944e48ed7bd3739",
      "997e717ba92078118cce",
      "0f297828e2a687943fc4",
      "81b6a0d841a3184028e6",
      "49eb47ea315d53399f69",
      "b2ca01ff2113559b231d",
      "68ff46d440255ac29a3c",
      "479935fca9ce02f62788",
      "33d63de99f0aad6279ba",
      "8088d59b6adf8faf9974",
      "ddbeb08b93b1d06e9939",
      "e7a5558e62d315a54058",
      "1de67bc29901db705ea1",
      "e94902cda505de115027",
      "c9c1f91a6362af0babad",
      "e87f8117d0d10a8d6479",
      "5f652023fb6b8034fb5c",
      "b05bd26f9f28035a3006",
      "e78d8f0edd1a50bce83b",
      "dc4ce32c7c8a7d0a3cb9"
   ]
}

Optionally, one may sort in ascending order by id, address, uptime, heartbeats, iphash, contracts, size, or onlineby using

GET /api/downstream/status/list/by/<sortby>

or in descending order

GET /api/downstream/status/list/by/d/<sortby>

It is also possible to limit the number of responses

GET /api/downstream/status/list/by/<sortby>/<limit>

and specify a page number

GET /api/downstream/status/list/by/<sortby>/<limit>/<page>

So some examples

GET /api/downstream/status/list/by/d/uptime/25

will return the 25 farmers with the highest uptime percentage

GET /api/downstream/status/list/by/d/contracts/15/2

will return the third page (rows 30-44) of the farmers with the most contracts.

And then the individual farmer information can be retrieved with:

GET /api/downstream/status/show/<token_hash>

{
      "id": "45bd945fa10e3f059834",
      "address": "18d6KhnTg9dM9jtb1MWXdbibu3Pwt1QHQt",
      "location": {"name": "West Jerusalem", "country": "Israel", "lon": "35.21961", "zipcode": "", "state": "Jerusalem District", "lat": "31.78199"},
      "uptime": 0.96015,
      "heartbeats": 241,
      "iphash": "d55529c83953e218cc58",
      "contracts": 2,
      "size": 200,
      "online": true
}

Planning on making the id the first 20 characters of the hex representation of the token hash.

Will probably cache the farmer list on the server side to improve performance.

Currently planning on using geodis or GeoIP for geographic resolution.

Modifications for test file verification prototype

Node library functions

This modification will add the following methods to the node library:

create_token(sjcx_address)
    """Creates a token for the given address. For now, addresses will not be enforced, and anyone
    can acquire a token.

    :param sjcx_address: address to use for token creation.  for now, just allow any address.
    :returns: the token
    """

delete_token(token)
    """Deletes the given token.

    :param token: token to delete
    """

get_chunk_contract(token)
    """In the final version, this function should analyze currently available file chunks and 
    disburse contracts for files that need higher redundancy counts.  
    In this prototype, this function should generate a random file with a seed.  The seed 
    can then be passed to a prototype farmer who can generate the file for themselves.  
    The contract will include the next heartbeat challenge, and the current heartbeat state 
    for the encoded file.

   :param token: the token to associate this contract with
   :returns: the chunk
     """

verify_proof(token,file_hash,proof)
    """This queries the DB to retrieve the heartbeat, state and challenge for the contract id, and 
    then checks the given proof.  Returns true if the proof is valid.

    :param token: the token for the farmer that this proof corresponds to
    :param file_hash: the file hash for this proof
    :param proof: a heartbeat proof object that has been returned by the farmer
    :returns: boolean true if the proof is valid, false otherwise
    """

Database Tables

To write these functions the database models have to be modified to include contracts and tokens tables.

HTTP Routes

Additionally the following prototype routes should be exposed for the public API:

Get a new token for a given address. For now, don't check address, just return a token.

GET /api/downstream/new/<sjcx_address>

Response:

{
    "token": "ceb722d954ef9d1af3eed2bbe0aeb954",
    "heartbeat": "...heartbeat object string representation..."
}

Get a new chunk contract for a token. Only allow one contract per token for now. Returns the first challenge and expiration, the file hash, a seed for generation of the prototype file, and the file heartbeat tag.

GET /api/downstream/chunk/<token>

Response:

{
    "challenge": "...challenge object string representation...",
    "expiration": "2014-10-03 17:29:01",
    "file_hash": "012fb25d2f14bb31bcbad5b8d99703114ed970601b21142c93b50421e8ddb0d7",
    "seed": "70aacdc6a2f7ef0e7c1effde27299eda",
    "tag": "...tag object string representation..."
}

Gets the currently due challenge for this token and file hash.

GET /api/downstream/challenge/<token>/<file_hash>

Response:

{
   "challenge": "...challenge object string representation...",
   "expiration": "2014-10-03 17:29:01",
}

Posts an answer for the current challenge on token and file hash.

POST /api/downstream/answer/<token>/<file_hash>

Parameters:

{
    "proof": "...proof object string representation..."
}

Response:

{
    "status": "ok"
}

Downstream Farmer

Downstream farmer will also need to be modified to interface with this new prototype node.

Handling Scale Issues

In terms of allow people to farm these are the things I was thinking about in terms of limiting access.

Crowdsale addresses with over 10k SJCX are allowed to farm
All crowdsale are allowed to farm
All addresses with more that 10k SJCX are allowed to farm

Even with this limiting factor I think we might have some scale issues. We may or may not have DDOS attacks, but I more see someone trying to scale up a bunch of virtual nodes to farmer. With the limiting factors using total currency supply we can have a maximum of around 5k farmers. I estimate around 1k farmers at the peak.

So I think some testing needs to be done by spinning up some virtual farmers locally, before we start adding limiting factors.

How does the verify node handle 100 farmers
How does the verify node handle 500 farmers
How does the verify node handle 1000 farmers

Regardless I think since the dashboard is a separate interface, we can spin up multiple verify nodes as long has we have some kind of pattern and the dashboard will automatically detect and add their stats. My suggestion is that once the codebase is solid we craft a Digital Ocean image, so I can launch more nodes with a couple clicks.

Build Digital Ocean Image

Missing Install Step?

On Downstream-Farmer:

> downstream --verify-ownership tests/thirty-two_meg.testfile 'http://localhost:5000'
Fetching challenges...
Received 1000 challenge(s).
Verifying ownership...
Verifying local file tests/thirty-two_meg.testfile.
Error: tests/thirty-two_meg.testfile is not a valid file

On Downstream-Node:

DEBUG in routes [/home/super3/Code/downstream-node/downstream_node/routes.py:51]:
No entry in database for file thirty-two_meg.testfile; generating challenes

then

DEBUG in routes [/home/super3/Code/downstream-node/downstream_node/routes.py:45]:
Fetching challenges for thirty-two_meg.testfile

README update

Get downstream-node:

$ git clone https://github.com/Storj/downstream-node.git
$ cd downstream-node
$ pip install -r requirements.txt .

requirements.txt not found

cp downstream_node/config.py.template downstream_node/config.py
wrong

Result:
python runapp.py --initdb
Traceback (most recent call last):
File "runapp.py", line 11, in
import base58
ImportError: No module named 'base58'

Can you update the README?

Consider Adding Bootstrap DHT Capabilities

Having a GVN become a bootstrap node may speed up the peer-finding process for seeding uploaders.

See: https://github.com/bittorrent/bootstrap-dht

storjold / downstream-node Goto Github PK

downstream-node's People

Contributors

Stargazers

Watchers

Forkers

downstream-node's Issues

1) Generate a Master List of Test Files Using RandomIO

2) Generate Hash Challenges Using Heartbeat

3) Integrate into Downstream Node

4) Integrate into Downstream Farmer

Node library functions

Database Tables

HTTP Routes

Downstream Farmer

Recommend Projects

Recommend Topics

Recommend Org