Giter Site home page Giter Site logo

hydra-booster's Introduction

Warning

This repo was archived because Protocol Labs no longer operates Hydra Boosters for the IPFS network.

For more information, see: https://discuss.ipfs.tech/t/dht-hydra-peers-dialling-down-non-bridging-functionality-on-2022-12-01/15567


Hydra Booster Mascot

Hydra Booster


A DHT Indexer node & Peer Router

A new type of DHT node designed to accelerate the Content Resolution & Content Providing on the IPFS Network. A (cute) Hydra with one belly full of records and many heads (Peer IDs) to tell other nodes about them, charged with rocket boosters to transport other nodes to their destination faster.

Read the RFC - Kanban

Install

[openssl support (lower CPU usage)]
go get -tags=openssl github.com/libp2p/hydra-booster

[standard (sub-optimal)]
go get github.com/libp2p/hydra-booster

Usage

Run a hydra booster with a single head:

go run ./main.go

Pass the -nheads=N option to run N heads at a time in the same process. It periodically prints out a status line with information about total peers, uptime, and memory usage.

go run ./main.go -nheads=5

Alternatively you can use the HYDRA_NHEADS environment var to specify the number of heads.

There's also a nicer UI option, this is intended to be run in a tmux window or something so you can see statistics about your contribution to the network. Use the -ui-theme flag to switch to it:

go run ./main.go -ui-theme=gooey # also "none" to turn off logging

Flags

Usage of hydra-booster:
  -bootstrap-conc int
        How many concurrent bootstraps to run (default 32)
  -bootstrap-peers string
        A CSV list of peer addresses to bootstrap from.
  -bucket-size int
        Specify the bucket size, note that for some protocols this must be a specific value i.e. for "/ipfs" it MUST be 20 (default 20)
  -db string
        Datastore directory (for LevelDB store) or postgresql:// connection URI (for PostgreSQL store)
  -provider-store
        A non-default provider store to use (currently only supports "dynamodb,table=<string>,ttl=<ttl-in-seconds>,queryLimit=<int>").
  -disable-db-create
        Don't create table and index in the target database (default false).
  -disable-prefetch
        Disables pre-fetching of discovered provider records (default false).
  -disable-prov-counts
        Disable counting provider records for metrics reporting (default false).
  -disable-prov-gc
        Disable provider record garbage collection (default false).
  -disable-providers
        Disable storing and retrieving provider records, note that for some protocols, like "/ipfs", it MUST be false (default false).
  -disable-values
        Disable storing and retrieving value records, note that for some protocols, like "/ipfs", it MUST be false (default false).
  -enable-relay
        Enable libp2p circuit relaying for this node (default false).
  -httpapi-addr string
        Specify an IP and port to run the HTTP API server on (default "127.0.0.1:7779")
  -idgen-addr string
        Address of an idgen HTTP API endpoint to use for generating private keys for heads
  -mem
        Use an in-memory database. This overrides the -db option
  -metrics-addr string
        Specify an IP and port to run Prometheus metrics and pprof HTTP server on (default "127.0.0.1:9758")
  -name string
        A name for the Hydra (for use in metrics)
  -nheads int
        Specify the number of Hydra heads to create. (default -1)
  -port-begin int
        If set, begin port allocation here (default -1)
  -protocol-prefix string
        Specify the DHT protocol prefix (default "/ipfs") (default "/ipfs")
  -pstore string
        Peerstore directory for LevelDB store (defaults to in-memory store)
  -random-seed string
        Seed to use to generate IDs (useful if you want to have persistent IDs). Should be Base64 encoded and 256bits
  -id-offset
        What offset in the sequence of keys generated from random-seed to start from
  -stagger duration
        Duration to stagger nodes starts by
  -ui-theme string
        UI theme, "logey", "gooey" or "none" (default "logey")

Environment variables

Alternatively, some flags can be set via environment variables. Note that flags take precedence over environment variables.

  HYDRA_BOOTSTRAP_PEERS string
        A CSV list of peer addresses to bootstrap from.
  HYDRA_DB string
        Datastore directory (for LevelDB store) or postgresql:// connection URI (for PostgreSQL store)
  HYDRA_PSTORE string
        Peerstore directory for LevelDB store (defaults to in-memory store)
  HYDRA_PROVIDER_STORE string
        A non-default provider store to use (currently only supports "dynamodb,table=<string>,ttl=<ttl-in-seconds>,queryLimit=<int>").
  HYDRA_DISABLE_DBCREATE
        Don't create table and index in the target database (default false).
  HYDRA_DISABLE_PREFETCH
        Disables pre-fetching of discovered provider records (default false).
  HYDRA_DISABLE_PROV_COUNTS
        Disable counting provider records for metrics reporting (default false).
  HYDRA_DISABLE_PROV_GC
        Disable provider record garbage collection (default false).
  HYDRA_IDGEN_ADDR string
        Address of an idgen HTTP API endpoint to use for generating private keys for heads
  HYDRA_NAME string
        A name for the Hydra (for use in metrics)
  HYDRA_NHEADS int
        Specify the number of Hydra heads to create. (default -1)
  HYDRA_PORT_BEGIN int
        If set, begin port allocation here (default -1)
  HYDRA_RANDOM_SEED string
        Seed to use to generate IDs (useful if you want to have persistent IDs). Should be Base64 encoded and 256bits   
  HYDRA_ID_OFFSET int
        What offset in the sequence of keys generated from random-seed to start from

Best Practices

Only run a hydra-booster on machines with public IP addresses. Having more DHT nodes behind NATs makes DHT queries in general slower, as connecting in generally takes longer and sometimes doesnt even work (resulting in a timeout).

When running with -nheads, please make sure to bump the ulimit to something fairly high. Expect ~500 connections per node you're running (so with -nheads=10, try setting ulimit -n 5000)

Running Multiple Hydras

The total number of heads a single Hydra can have depends on the resources of the machine it's running on. To get the desired number of heads you may need to run multiple Hydras on multiple machines. There's a couple of challenges with this:

  • Peer IDs of Hydra heads are balanced in the DHT. When running multiple Hydras it's necessary to designate one of the Hydras to be the "idgen server" and the rest to be "idgen clients" so that all Peer IDs in the Hydra swarm are balanced. Use the -idgen-addr flag or HYDRA_IDGEN_ADDR environment variable to ensure all Peer IDs in the Hydra swarm are balanced perfectly.
  • A datastore is shared by all Hydra heads but not by all Hydras. Use the -db flag or HYDRA_DB environment variable to specify a PostgreSQL database connection string that can be shared by all Hydras in the swarm.
  • When sharing a datastore between multiple Hydras, ensure only one Hydra in the swarm is performing GC on provider records by using the -disable-prov-gc flag or HYDRA_DISABLE_PROV_GC environment variable, and ensure only one Hydra is counting the provider records in the datastore by using the -disable-prov-counts flag or HYDRA_DISABLE_PROV_COUNTS environment variable.

DynamoDB Provider Store

If the "dynamodb" provider store is specified, then provider records will not be stored in the datastore, but in a DynamoDB table that must conform with the following schema:

  • key
    • type: bytes
    • primary key
  • ttl
    • type: int
    • sort key

The command line / environment variable requires various arguments for configuring the provider store:

  • table
    • string, required
    • the DynamoDB table name
  • ttl
    • int, required
    • the duration in seconds for the provider record TTL, after which DynamoDB will evict the entry
  • queryLimit
    • int, required
    • limit for the # records to retrieve from DynamoDB for a single GET_PROVIDERS DHT query

A GET_PROVIDERS DHT query will result in >=1 DynamoDB queries. The provider store will follow the pagination until the query limit is reached, or no more records are available. DynamoDB will return up to 1 MB of records in a single query page. The providers are sorted by descending TTL, so the most-recently-added providers will be returned first. When the query limit is reached, the remaining providers are truncated.

The provider store uses the default AWS SDK credential store, which will search for credentials in environment variables, ~/.aws, the EC2 instance metadata service, ECS agent, etc.

Some notes and caveats:

  • This does not use consistent reads, so read-after-write is eventually consistent. Consistency is usually achieved so quickly that it's unnoticeable.
  • If the system receives two ADD_PROVIDER messages for the same multihash in the same millisecond, they will race and only one will win, since records are keyed on (multihash, ttl). This should be rare. The prov_ddb_collisions counter is incremented when this happens.

Developers

Release a new version

  1. Update version number in version.go.
  2. Create a semver tag with "v" prefix e.g. git tag v0.1.7.
  3. Publish a new image to docker hub
  4. Scale the hydras down and then back up to pick up the change

Publish a new image

# Build your container
docker build -t hydra-booster .

# Get it to run
docker run hydra-booster

# Commit new version
docker commit -m="some commit message" <CONTAINER_ID> libp2p/hydra-booster

# Push to docker hub (must be logged in, do docker login)
docker push libp2p/hydra-booster

Metrics collection with Prometheus

Install Prometheus and then start it using the provided config:

prometheus --config.file=promconfig.yaml --storage.tsdb.path=prometheus-data

Next start the Hydra Booster, specifying the IP and port to run metrics on:

go run ./main.go -nheads=5 -metrics-addr=127.0.0.1:9090

You should now be able to access metrics at http://127.0.0.1:9090.

API

HTTP API

By default the HTTP API is available at http://127.0.0.1:7779.

GET /heads

Returns an ndjson list of peers created by the Hydra: their IDs and mulitaddrs. Example output:

{"Addrs":["/ip4/127.0.0.1/tcp/50277","/ip4/192.168.0.3/tcp/50277"],"ID":"12D3KooWHacdCMnm4YKDJHn72HPTxc6LRGNzbrbyVEnuLFA3FXCZ"}
{"Addrs":["/ip4/127.0.0.1/tcp/50280","/ip4/192.168.0.3/tcp/50280","/ip4/90.198.150.147/tcp/50280"],"ID":"12D3KooWQnUpnw6xS2VrJw3WuCP8e92fsEDnh4tbqyrXW5AVJ7oe"}
...

GET /records/list

Returns an ndjson list of provider records stored by the Hydra Booster node.

GET /records/fetch/{cid}?nProviders=1

Fetches provider record(s) available on the network by CID. Use the nProviders query string parameter to signal the number of provider records to find. Returns an ndjson list of provider peers: their IDs and mulitaddrs. Will return HTTP status code 404 if no records were found.

POST /idgen/add

Generate and add a balanced Peer ID to the server's xor trie and return it for use by another Hydra Booster peer. Returns a base64 encoded JSON string. Example output:

"CAESQNcYNr0ENfml2IaiE97Kf3hGTqfB5k5W+C2/dW0o0sJ7b7zsvxWMedz64vKpS2USpXFBKKM9tWDmcc22n3FBnow="

POST /idgen/remove

Remove a balanced Peer ID from the server's xor trie. Accepts a base64 encoded JSON string.

GET /swarm/peers?head=

Returns a list of ndjson peers with open connections optionally filtered by Hydra head. Example output:

{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb","Addr":"/ip4/147.75.83.83/tcp/4001","Direction":2}}
{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa","Addr":"/ip6/2604:1380:0:c100::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN","Addr":"/ip4/147.75.69.143/tcp/4001","Direction":2}}
{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ","Addr":"/ip4/104.131.131.82/tcp/4001","Direction":2}}
{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt","Addr":"/ip6/2604:1380:3000:1f00::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb","Addr":"/ip6/2604:1380:2000:7a00::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa","Addr":"/ip6/2604:1380:0:c100::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ","Addr":"/ip4/104.131.131.82/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN","Addr":"/ip6/2604:1380:1000:6000::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt","Addr":"/ip4/147.75.94.115/tcp/4001","Direction":2}}

License

The hydra-booster project is dual-licensed under Apache 2.0 and MIT terms:

hydra-booster's People

Contributors

alanshaw avatar aschmahmann avatar daviddias avatar dennis-tra avatar dependabot[bot] avatar djdv avatar dokterbob avatar guseggert avatar hsanjuan avatar kubuxu avatar lanzafame avatar lemmi avatar libp2p-mgmt-read-write[bot] avatar lidel avatar mburns avatar michaelavila avatar petar avatar raulk avatar rubenkelevra avatar stebalien avatar thattommyhall avatar travisperson avatar web-flow avatar web3-bot avatar whyrusleeping avatar willscott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hydra-booster's Issues

Document idgen setup once deployed

#72 adds idgen to the HTTP API so that ALL hydras can have balanced peer IDs for their sybils.

I propose we setup Alasybil as an idgen server and have the other hydras as clients.

Hydra Booster: help with some questions

Hello,

I have just discovered Hydra Booster and I want to use it for my project.

So I have a few questions about the usage and outputs of Hydra Booster:

  1. When running Hydra Booster with the command go run ./main.go, this creates 1 head and assigns a Peer ID to that head. In addition to Hydra Booster head, I am also running IPFS node on my local machine with the command ipfs daemon. So now both nodes should connect to each other right? But when I run ipfs swarm peers on my local IPFS node, I cannot see the Peer ID of the head that Hydra Booster created. Did I do something wrong or ?
  2. Can I interact with the head that Hydra Booster has created? Like receive files or add files ?
  3. When using HTTP API, what does the command GET /records/list output mean? I know it returns a list of provider records. But the list is in this format: {"Key":"/providers/CIQA3J2WKKA57UENKRIJLB236L74NFIZSDZMMM4ZJP53CO2LLJFF3JQ/CIQFVBUR6S5EJ3GQ3TP7XXVEFDXTMA2Q6SIRPC7FKX3JEQODCRMQK7Y","Value":"wMPj0q3H7Zct","Expiration":"0001-01-01T00:00:00Z","Size":9}. What does this output mean? Provider Record maps a data identifier to a peer that has advertised that they have that content and are willing to provide. But I want to understand what the output means. Can someone please explain that to me.
  4. The command GET /records/fetch/{cid}?nProviders=1 looks for peers that can provide a specific CID, is that correct?

Thank you so much for your time and effort.
I really appreciate it.

Getting to "Be ~1 hop away from every other node in DHT"

Just came out from a standup with @alanshaw and here is what we discussed.

In order to achieve the goal of "Be ~1 hop away from every other node in DHT", we need to have one evenly distributed PeerID for each other 20 nodes in the network, so that we land in everyone's else first kbucket

The formula is quite simple:

  • Number of total DHT nodes in the network / 20 = Number sybils to spawn -> to be 1 hop away

The current Network size is 20K, so applying the formulate we get that we need 1000 sybils to meet this goal.

We have the limit of running 200 sybils for each hydra node, but we can spawn multiple hydra. Because we want the PeerIDs to be evenly distributed, even across hydra nodes, we need to separate the logic of PeerID Gen to be a separate service that is shared by the hydras.

Additionally, we want to adjust the scale of the hydras to the number of nodes, for that, we want to run a cronjob to adjust it every week (or day).

One extra step is as we auto scale up and down, we don't want to loose the work already done in harvesting records. So instead of having each hydra with their own belly (record store) we want to have a shared record store across hydras (using a DB like postgres)

Tasks:

  • Separate the PeerID Gen into a networked service
  • Start a CronJob to autoscale the number of sybils
  • Implement the shared datastore with Postgres

b64 encoded seed can't always be passed as ENV in k8s (from DO secrets)

It might be a digital ocean thing, or a k8s thing, but we could not set N+5Fvrq3CkwAehNdhRebMgPC6psd4DCc5BAbHMgzP5Q= as the seed in prod.

You can run

HYDRA_RANDOM_SEED="N+5Fvrq3CkwAehNdhRebMgPC6psd4DCc5BAbHMgzP5Q=" ./hydra-booster -nheads 10

so its somewhere in k8s secrets thats the problem

Might be there is a line feed or null byte in there

# echo "N+5Fvrq3CkwAehNdhRebMgPC6psd4DCc5BAbHMgzP5Q=" | xxd
00000000: 4e2b 3546 7672 7133 436b 7741 6568 4e64  N+5Fvrq3CkwAehNd
00000010: 6852 6562 4d67 5043 3670 7364 3444 4363  hRebMgPC6psd4DCc
00000020: 3542 4162 484d 677a 5035 513d 0a         5BAbHMgzP5Q=.

error was

Error: failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: setenv: invalid argument: unknown

and some googling made me just try another secret and it worked so I don't know for sure what the problem is

httpapi-addr parameter description same as metrics-addr

Both read "Specify an IP and port to run prometheus metrics and pprof http server on".

Usage of /var/folders/1t/pdr3f6wj4qq26th3_mft_d480000gn/T/go-build979667117/b001/exe/main:
[...]
  -httpapi-addr string
    	Specify an IP and port to run prometheus metrics and pprof http server on (default "127.0.0.1:7779")
[...]
  -metrics-addr string
    	Specify an IP and port to run prometheus metrics and pprof http server on (default "0.0.0.0:8888")
[...]
exit status 2

Verified in 0.4.3.

revise metrics to support multiple provider sources

  • Current metrics label a call as "failed" if either there was a network failure or there was a successful response with no provider records. The latter two cases must be distinguishable in the metrics. Fix.
  • Design metrics so they can be applied generically to any routing source, as well as to the combined source.

Build error: "cannot use makeInsecureTransport"

I tried to build hydra-booster via:

$ go get -u github.com/libp2p/hydra-booster

And Go 1.16.

I get this error:

$ go get -u github.com/libp2p/hydra-booster
# github.com/libp2p/go-libp2p/config
go/pkg/mod/github.com/libp2p/[email protected]/config/config.go:144:19: cannot use makeInsecureTransport(h.ID(), cfg.PeerKey) (type sec.SecureTransport) as type sec.SecureMuxer in assignment:
        sec.SecureTransport does not implement sec.SecureMuxer (wrong type for SecureInbound method)
                have SecureInbound(context.Context, net.Conn) (sec.SecureConn, error)
                want SecureInbound(context.Context, net.Conn) (sec.SecureConn, bool, error)
go/pkg/mod/github.com/libp2p/[email protected]/config/config.go:146:24: cannot assign sec.SecureTransport to upgrader.Secure (type sec.SecureMuxer) in multiple assignment:
        sec.SecureTransport does not implement sec.SecureMuxer (wrong type for SecureInbound method)
                have SecureInbound(context.Context, net.Conn) (sec.SecureConn, error)
                want SecureInbound(context.Context, net.Conn) (sec.SecureConn, bool, error)
go/pkg/mod/github.com/libp2p/[email protected]/config/security.go:56:2: cannot use secMuxer (type *csms.SSMuxer) as type sec.SecureTransport in return argument:
        *csms.SSMuxer does not implement sec.SecureTransport (wrong type for SecureInbound method)
                have SecureInbound(context.Context, net.Conn) (sec.SecureConn, bool, error)
                want SecureInbound(context.Context, net.Conn) (sec.SecureConn, error)
go/pkg/mod/github.com/libp2p/[email protected]/config/security.go:78:2: cannot use secMuxer (type *csms.SSMuxer) as type sec.SecureTransport in return argument:
        *csms.SSMuxer does not implement sec.SecureTransport (wrong type for SecureInbound method)
                have SecureInbound(context.Context, net.Conn) (sec.SecureConn, bool, error)
                want SecureInbound(context.Context, net.Conn) (sec.SecureConn, error)

Extract UI and Reporting

Status reporting and UI are coupled tightly with code to run hydra nodes making it hard to test and reason about. There's also some duplication in the values being collected.

These elements need extracting and tests need to be added.

Log peer IDs to enable unique peer counts across multiple hydra nodes

From the discussion here: #36 (comment)

We should log peer IDs seen by all Hydras in the same format and to the same place as the gateways so that we can:

  1. enable the ability to get unique peer IDs across all hydras
  2. do less work deduping peer IDs on our hydra nodes
  3. get a more accurate view of total IPFS network size by combining hydra and gateway peer IDs.

Improve discarded metric

Screenshot 2020-04-28 at 11 19 52

The provider record prefetch chart shows discarded requests in orange. These are currently the requests where we previously failed to find a CID and were asked again within 1 hour OR if the prefetch queue was full.

We should categorize these properly so that we have the following discarded categories:

  • Previously failed to find a CID and were asked again within 5 minutes
  • Previously failed to find a CID and were asked again within 1 hour
  • Prefetch queue was full

Hydra upgrade

We've identified a number of weaknesses in the hydra design and implementation, which cause ungraceful failures (worker crashes) and downtimes when utilization spikes. The problem occurred in the window 7/7/2021-7/21/2021.

Problem analysis (theory)

The backend Postgres database can become overloaded under high volume of DHT requests to the hydras.
This causes query times to the database to increase. This in turn causes DHT requests to backup in the provider manager loop, which in turn causes the hydra nodes to crash.

Corrective steps

  • Ensure the entire fleet of hydra heads (across machines) always uses the same sequence of balanced IDs:
    #128
    Resolved by #130
  • Ensure ID/address mappings persist across restarts (design goal)
  • Fix aggregate metrics to use fast approximate Postgres queries (as opposed to slow exact queries)
    #133
  • Upgrades in DHT provider manager:
    • Use multiple threads in the provider loop (diminishes the effect of individual straggler requests to the datastore)
      libp2p/go-libp2p-kad-dht#729
    • Gracefully decline quality of service when under load
      libp2p/go-libp2p-kad-dht#730
    • Fully decline service at a configurable peak level of load
  • Monitor (via metrics) the query latency of the backing Postgres database (at the infra level)
  • Setup automatic pprof dumps near out-of-memory events, perhaps using https://github.com/ipfs-shipyard/go-dumpotron (at infra level)

Acceptance criteria

  • Verify that a sustained increased request load at the hydra level does not propagate to the Postgres backing datastore. This should be ensured by measures for graceful degradation of quality (above) at the DHT provider manager.

No provider records, no peers

I've been running 0.4.3 for about a week on a Gigabit machine with public IP (AND an IPFS node running). Somehow, I only seem to get about 5 peers and no provider records whatsoever.

I am currently running 0.5 for some time and am observing the same behaviour.

Is there perhaps something I've missed, or a default configuration option that does not make sense?

We'd like to run a booster to harvest more hashes to index for ipfs-search.com and this is currently blocking further development on our side.

Example output (it starts with 7 peers):

Hydra Booster

Head ID(s)               [<...>]
Connections              1 peers
Unique Peers Seen        7
Memory Allocated         28 MB
Stored Provider Records  0
Routing Table Size       0
Uptime                   0h 35m 37s

Thanks!

Keep Hydra head peerIDs between restarts

There doesn't seem to be a good reason for us to rotate our peerIDs (and therefore locations in the Kademlia keyspace) just because we OOM, update the version we're running, etc.

The negative effects of us rotating our keys are:

  1. If you're running a small number of heads then you're effectively making the records previously stored with you useless since no one will look for them with you
  2. If you're running many heads then you're invalidating a bunch of people's routing tables which can make clients less efficient. It'll all work itself out over time, but we might as well be nice

Solution for always reusing the same balanced IDs in the hydra deployment

The ID generator is a pseudorandom (i.e. seeded) algorithm that generates an infinite sequence of mutually-balanced IDs:
ID0, ID1, ID2, ID3, ...

Any one ID in this sequence is uniquely determined by the seed (which determines the sequence) and its index (i.e. sequence number) in the sequence.

Therefore, to ensure that a collection of Hydra heads (1) have mutually-balanced IDs and (2) they always reuse the same IDs (after restart), it suffices to parameterize each of them with the same seed and index at execution time, such that each of them has a different index in the space of positive integers.

For example, heads can be parameterized as:
id_seed=xyz, id_index=1
id_seed=xyz, id_index=2
id_seed=xyz, id_index=3
...

Note that it is irrelevant which machines or processes the heads run on.
The key requirement is that each head (across the entire fleet) gets a unique index!

Therefore, heads should be parameterized at the infra/deployment level, perhaps using command-line arguments. Restarting a head then guarantees it reuses the same ID and it is unique across the fleet.

Furthermore, this methodology enables easy (auto)scaling: Just assign unused index numbers to heads that are being added. (The space of positive integers is large enough!)

There is no requirement that indices are consequtive numbers (just that they are unique). This facilitates ops engineers to use different blocks of integers for different types of scaling purposes. For example, two entirely independent (with no coordination between them) hydra fleets can be deployed. For instance, the first fleet can use only even numbers for its heads; the second fleet can use only odd numbers for its fleet. Clearly, this example generalizes in various ways.

Note that this methodology completely alleviates the need for any kind of direct network coordination/connection between heads, making the system considerably more robust!

Progress

A first step in this direction is provided in #130.

Stage 4 - Siamese Hydras

Effort Needed: Low to Mid (1~2 weeks of developing + testing + deployment)
Prerequisite(s): Stage 1-3

Design notes & tasks

  • Rather than an in process shared record store, use a database backend (e.g. Postgres) so that multiple hydra booster instances can share the same records list

Testing mechanics & evaluation plan

  • Continue using the testing from the previous 3 stages
  • Verify the health of the records using the database backend UI

Success criteria

  • Multiple Hydra instances (i.e. multiple machines running Hydra nodes) have a shared datastore
  • We can easily query for most popular records, see if there is any particular content that is really hot. Opens the possibility to identify areas of the address space that are more loaded (hot) and to spin new nodes to those areas to redistribute the load (load balancing)

Sort out Docker image build and deployment

I think releases that are tagged here should be automatically built and tagged on Dockerhub and we should deploy specific tags rather than latest to Digital Ocean, perhaps not even in this repo, maybe even not have ./k8s in this project except as an example

Fork & Refactor

  • Forked and renamed
  • Fill the README and link to the RFC (adding the logo) (@alanshaw)
  • Review and document the codebase
    • Modularize/Packagerize the repo
    • Refactor
      • error propagation (no panic(err)) (@alanshaw)
      • strucs as opts (rather than a super long list of args) (@daviddias) #26
      • extract http api into module (@alanshaw)
      • extract ui from run-node.go (@alanshaw)

Build a tool to help us understand the DHT (Visually is a ++)

With the this tool, we should be able to answer:

  • Where are the sybils in comparison to other nodes
  • Are 1~3 hops away from each other non sybil node? (Simulating ideal routing tables)
  • How many more sybils would we need to generate to make the above happen

Integrate delegated content routing client

Done criteria: A https://github.com/ipfs/go-delegated-routing client with get-p2p-provide is invoked in a performant way as part of getting provide request records.
Why important: Enabling hydras to bridge to external systems like the to-be-federated indexer network which is under development.
Notes:

  • Example usecase: enable nft.storage/web3.storage to have their content discoverable on IPFS via Indexers without needing to publish provider records themselves.
  • This depends on the ProvideManager libp2p/go-libp2p-kad-dht#749 to be implemented.
  • There is followup work to have a delegated-routing server implemented in the Indexer and to configure Hydra nodes to invoke it. For the example above, this is being done in #141
  • The calling strategy for the two paths (SQL query vs. delegated router) needs to be determined (are they done in parallel, which takes priority, when do we timeout, etc.)
  • This isn't something we're doing to help out a particular service. If one has a centralized service in need of special indexing, one needs to bridge with the indexer network.

Use fast approximate queries to Postgres for metrics collection

Currently, metrics collection uses exact counting (SELET COUNT), which is slow and expensive and is found to interfere with useful payload queries to the database. Consider using an approximate count query, provided by Postgres. E.g.

  SELECT
  (reltuples/relpages) * (
    pg_relation_size('records') /
    (current_setting('block_size')::integer)
  )
  FROM pg_class where relname = 'records';

How to measure the percentage of content that's available?

We know that there's a lot of content stored on IPFS, but how much of that content can actually be accessed universally right now? Knowing what content is available in the network gives us a metric we can form KRs around and a gauge of how healthy the IPFS network is.

Stage 1 - The Hydra Belly

Effort Needed: Low (1 week of developing + testing + deployment)
Prerequisite(s): None, can be shipped in a go-ipfs 0.4.X

Design notes & tasks

  • Fork ipfs/dht-node & refactor
  • Upgrade the sybils to use shared record store that stores records to disk
    • Verify the LevelDB adapter that was previously created
    • Verify that LevelDB is indeed our best bet (compare it with a networked db such as Postgres in preparation for stage 4)
  • Proactively fetch any record that you receive a request for

Testing mechanics & evaluation plan

  • Deploy the Hydra Booster node to IPFS Infrastructure
  • Monitor the number of records fetched and stored
    Start measuring the number of hops the IPFS Gateway do when running a .FindProvs call

Success criteria

We replicate the most requested records that exist in the network

Configure delegated routing to invoke Indexers

Done criteria: provider records are returned by Hydra boosters via the to-be-federated indexer network which is under development.
Why important: enables any CIDs indexed by the Indexer network (which will have nft.storage/web3.storage CIDs) to have their content discoverable on IPFS via Indexers without needing to publish provider records themselves.
Notes:

  • This builds on #140
  • This is the configuration step of adding the Indexer as a place to delegate routing requests to. This assumes that the Indexer has implemented/deployed the server-side of delegated content routing for get-p2p-provide requests.

Stage 3 - Single routing table (rather than one per Sybil)

Effort Needed: Low (1 week of developing + testing + deployment)
Prerequisite(s): Stage 1 & Stage 2

Design notes & tasks

  • This is a perf optimization. Rather than having N routing tables, where N is the number of sybils, sharing a connection pool (and therefore blocking each other), we want to only have one of the PeerIds doing the routing, while the others are just in the routing tables of other peers
  • Use delegated routing for each of the sybils to use the router sybil to fetch the records

Testing mechanics & evaluation plan

  • Deploy the nodes. Measure the memory, cpu and bandwidth profiles. There should be a drop comparing to previous version.

Success criteria

  • The sybils do not thrash each other when sharing the connection pool
  • The sybils become less noisy (as only one node will have a large routing table, rather than N nodes having many small routing tables that need to be constantly updated)

Stage 2 - Hydra Heads with Pre-Calculated Peer IDs

Effort Needed: Low (1 week of developing + testing + deployment)
Prerequisite(s): None, can be shipped in a go-ipfs 0.4.X (Yes, stage 2 can be done in parallel)

Design notes & tasks

  • Calculate the number of heads (PeerIds) necessary for networks of multiple sizes (10K, 100K, 1000K)
  • Brute force to calculate an even distribution of those PeerIds
  • Use those selected ids as PeerIds for the Hydra Booster sybils

Testing mechanics & evaluation plan

  • We run the DHT scrapper daily and visualize the PeerIds distribution and where the hydra PeerIds show
  • We simulate generating the routing tables for such nodes and verify if indeed the hydra nodes are 1~3 hops away from each other node.

Success criteria

Our Hydra nodes ate 1~3 hops away from every other node in the Network

Security: metrics publicly exposed by default

By default, the Prometheus metrics and pprof HTTP server are listening on 0.0.0.0, which is likely to cause information leaks and/or might expose attack vectors.

As this daemon is to be run on public addresses by default, and uses randomly picked ports for the heads, this default of publicly exposing non-essential services seems a bad design from a security point of view.

In addition, it is inconsistent with the other listeners which default to 127.0.0.1 (as they should).

How to measure the lifecycle of a file in the DHT?

Understanding the dynamics of what happens to a file WRT it's presence in the DHT will enable us to tune the DHT more appropriately.

Some metrics off the top of my head that might be useful.

  • Number of provider records available over time
  • Number of providers over time
  • Availability of providers
  • Frequency of re-provides

Where these are all per CID.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.