libp2p / hydra-booster Goto Github PK

View Code? Open in Web Editor NEW

86.0 14.0 5.0 1.03 MB

A DHT Indexer node & Peer Router

License: Other

Go 99.13% Makefile 0.02% Dockerfile 0.69% Shell 0.16%

libp2p dht scalability ipfs

hydra-booster's Introduction

Warning

This repo was archived because Protocol Labs no longer operates Hydra Boosters for the IPFS network.

For more information, see: https://discuss.ipfs.tech/t/dht-hydra-peers-dialling-down-non-bridging-functionality-on-2022-12-01/15567

Hydra Booster

A DHT Indexer node & Peer Router

A new type of DHT node designed to accelerate the Content Resolution & Content Providing on the IPFS Network. A (cute) Hydra with one belly full of records and many heads (Peer IDs) to tell other nodes about them, charged with rocket boosters to transport other nodes to their destination faster.

Read the RFC - Kanban

Install

[openssl support (lower CPU usage)]
go get -tags=openssl github.com/libp2p/hydra-booster

[standard (sub-optimal)]
go get github.com/libp2p/hydra-booster

Usage

Run a hydra booster with a single head:

go run ./main.go

Pass the -nheads=N option to run N heads at a time in the same process. It periodically prints out a status line with information about total peers, uptime, and memory usage.

go run ./main.go -nheads=5

Alternatively you can use the HYDRA_NHEADS environment var to specify the number of heads.

There's also a nicer UI option, this is intended to be run in a tmux window or something so you can see statistics about your contribution to the network. Use the -ui-theme flag to switch to it:

go run ./main.go -ui-theme=gooey # also "none" to turn off logging

Flags

Usage of hydra-booster:
  -bootstrap-conc int
        How many concurrent bootstraps to run (default 32)
  -bootstrap-peers string
        A CSV list of peer addresses to bootstrap from.
  -bucket-size int
        Specify the bucket size, note that for some protocols this must be a specific value i.e. for "/ipfs" it MUST be 20 (default 20)
  -db string
        Datastore directory (for LevelDB store) or postgresql:// connection URI (for PostgreSQL store)
  -provider-store
        A non-default provider store to use (currently only supports "dynamodb,table=<string>,ttl=<ttl-in-seconds>,queryLimit=<int>").
  -disable-db-create
        Don't create table and index in the target database (default false).
  -disable-prefetch
        Disables pre-fetching of discovered provider records (default false).
  -disable-prov-counts
        Disable counting provider records for metrics reporting (default false).
  -disable-prov-gc
        Disable provider record garbage collection (default false).
  -disable-providers
        Disable storing and retrieving provider records, note that for some protocols, like "/ipfs", it MUST be false (default false).
  -disable-values
        Disable storing and retrieving value records, note that for some protocols, like "/ipfs", it MUST be false (default false).
  -enable-relay
        Enable libp2p circuit relaying for this node (default false).
  -httpapi-addr string
        Specify an IP and port to run the HTTP API server on (default "127.0.0.1:7779")
  -idgen-addr string
        Address of an idgen HTTP API endpoint to use for generating private keys for heads
  -mem
        Use an in-memory database. This overrides the -db option
  -metrics-addr string
        Specify an IP and port to run Prometheus metrics and pprof HTTP server on (default "127.0.0.1:9758")
  -name string
        A name for the Hydra (for use in metrics)
  -nheads int
        Specify the number of Hydra heads to create. (default -1)
  -port-begin int
        If set, begin port allocation here (default -1)
  -protocol-prefix string
        Specify the DHT protocol prefix (default "/ipfs") (default "/ipfs")
  -pstore string
        Peerstore directory for LevelDB store (defaults to in-memory store)
  -random-seed string
        Seed to use to generate IDs (useful if you want to have persistent IDs). Should be Base64 encoded and 256bits
  -id-offset
        What offset in the sequence of keys generated from random-seed to start from
  -stagger duration
        Duration to stagger nodes starts by
  -ui-theme string
        UI theme, "logey", "gooey" or "none" (default "logey")

Environment variables

Alternatively, some flags can be set via environment variables. Note that flags take precedence over environment variables.

  HYDRA_BOOTSTRAP_PEERS string
        A CSV list of peer addresses to bootstrap from.
  HYDRA_DB string
        Datastore directory (for LevelDB store) or postgresql:// connection URI (for PostgreSQL store)
  HYDRA_PSTORE string
        Peerstore directory for LevelDB store (defaults to in-memory store)
  HYDRA_PROVIDER_STORE string
        A non-default provider store to use (currently only supports "dynamodb,table=<string>,ttl=<ttl-in-seconds>,queryLimit=<int>").
  HYDRA_DISABLE_DBCREATE
        Don't create table and index in the target database (default false).
  HYDRA_DISABLE_PREFETCH
        Disables pre-fetching of discovered provider records (default false).
  HYDRA_DISABLE_PROV_COUNTS
        Disable counting provider records for metrics reporting (default false).
  HYDRA_DISABLE_PROV_GC
        Disable provider record garbage collection (default false).
  HYDRA_IDGEN_ADDR string
        Address of an idgen HTTP API endpoint to use for generating private keys for heads
  HYDRA_NAME string
        A name for the Hydra (for use in metrics)
  HYDRA_NHEADS int
        Specify the number of Hydra heads to create. (default -1)
  HYDRA_PORT_BEGIN int
        If set, begin port allocation here (default -1)
  HYDRA_RANDOM_SEED string
        Seed to use to generate IDs (useful if you want to have persistent IDs). Should be Base64 encoded and 256bits   
  HYDRA_ID_OFFSET int
        What offset in the sequence of keys generated from random-seed to start from

Best Practices

Only run a hydra-booster on machines with public IP addresses. Having more DHT nodes behind NATs makes DHT queries in general slower, as connecting in generally takes longer and sometimes doesnt even work (resulting in a timeout).

When running with -nheads, please make sure to bump the ulimit to something fairly high. Expect ~500 connections per node you're running (so with -nheads=10, try setting ulimit -n 5000)

Running Multiple Hydras

The total number of heads a single Hydra can have depends on the resources of the machine it's running on. To get the desired number of heads you may need to run multiple Hydras on multiple machines. There's a couple of challenges with this:

Peer IDs of Hydra heads are balanced in the DHT. When running multiple Hydras it's necessary to designate one of the Hydras to be the "idgen server" and the rest to be "idgen clients" so that all Peer IDs in the Hydra swarm are balanced. Use the -idgen-addr flag or HYDRA_IDGEN_ADDR environment variable to ensure all Peer IDs in the Hydra swarm are balanced perfectly.
A datastore is shared by all Hydra heads but not by all Hydras. Use the -db flag or HYDRA_DB environment variable to specify a PostgreSQL database connection string that can be shared by all Hydras in the swarm.
When sharing a datastore between multiple Hydras, ensure only one Hydra in the swarm is performing GC on provider records by using the -disable-prov-gc flag or HYDRA_DISABLE_PROV_GC environment variable, and ensure only one Hydra is counting the provider records in the datastore by using the -disable-prov-counts flag or HYDRA_DISABLE_PROV_COUNTS environment variable.

DynamoDB Provider Store

If the "dynamodb" provider store is specified, then provider records will not be stored in the datastore, but in a DynamoDB table that must conform with the following schema:

key
- type: bytes
- primary key
ttl
- type: int
- sort key

The command line / environment variable requires various arguments for configuring the provider store:

table
- string, required
- the DynamoDB table name
ttl
- int, required
- the duration in seconds for the provider record TTL, after which DynamoDB will evict the entry
queryLimit
- int, required
- limit for the # records to retrieve from DynamoDB for a single GET_PROVIDERS DHT query

A GET_PROVIDERS DHT query will result in >=1 DynamoDB queries. The provider store will follow the pagination until the query limit is reached, or no more records are available. DynamoDB will return up to 1 MB of records in a single query page. The providers are sorted by descending TTL, so the most-recently-added providers will be returned first. When the query limit is reached, the remaining providers are truncated.

The provider store uses the default AWS SDK credential store, which will search for credentials in environment variables, ~/.aws, the EC2 instance metadata service, ECS agent, etc.

Some notes and caveats:

This does not use consistent reads, so read-after-write is eventually consistent. Consistency is usually achieved so quickly that it's unnoticeable.
If the system receives two ADD_PROVIDER messages for the same multihash in the same millisecond, they will race and only one will win, since records are keyed on (multihash, ttl). This should be rare. The prov_ddb_collisions counter is incremented when this happens.

Developers

Release a new version

Update version number in version.go.
Create a semver tag with "v" prefix e.g. git tag v0.1.7.
Publish a new image to docker hub
Scale the hydras down and then back up to pick up the change

Publish a new image

# Build your container
docker build -t hydra-booster .

# Get it to run
docker run hydra-booster

# Commit new version
docker commit -m="some commit message" <CONTAINER_ID> libp2p/hydra-booster

# Push to docker hub (must be logged in, do docker login)
docker push libp2p/hydra-booster

Metrics collection with Prometheus

Install Prometheus and then start it using the provided config:

prometheus --config.file=promconfig.yaml --storage.tsdb.path=prometheus-data

Next start the Hydra Booster, specifying the IP and port to run metrics on:

go run ./main.go -nheads=5 -metrics-addr=127.0.0.1:9090

You should now be able to access metrics at http://127.0.0.1:9090.

API

HTTP API

By default the HTTP API is available at http://127.0.0.1:7779.

`GET /heads`

Returns an ndjson list of peers created by the Hydra: their IDs and mulitaddrs. Example output:

{"Addrs":["/ip4/127.0.0.1/tcp/50277","/ip4/192.168.0.3/tcp/50277"],"ID":"12D3KooWHacdCMnm4YKDJHn72HPTxc6LRGNzbrbyVEnuLFA3FXCZ"}
{"Addrs":["/ip4/127.0.0.1/tcp/50280","/ip4/192.168.0.3/tcp/50280","/ip4/90.198.150.147/tcp/50280"],"ID":"12D3KooWQnUpnw6xS2VrJw3WuCP8e92fsEDnh4tbqyrXW5AVJ7oe"}
...

`GET /records/list`

Returns an ndjson list of provider records stored by the Hydra Booster node.

`GET /records/fetch/{cid}?nProviders=1`

Fetches provider record(s) available on the network by CID. Use the nProviders query string parameter to signal the number of provider records to find. Returns an ndjson list of provider peers: their IDs and mulitaddrs. Will return HTTP status code 404 if no records were found.

`POST /idgen/add`

Generate and add a balanced Peer ID to the server's xor trie and return it for use by another Hydra Booster peer. Returns a base64 encoded JSON string. Example output:

"CAESQNcYNr0ENfml2IaiE97Kf3hGTqfB5k5W+C2/dW0o0sJ7b7zsvxWMedz64vKpS2USpXFBKKM9tWDmcc22n3FBnow="

`POST /idgen/remove`

Remove a balanced Peer ID from the server's xor trie. Accepts a base64 encoded JSON string.

`GET /swarm/peers?head=`

Returns a list of ndjson peers with open connections optionally filtered by Hydra head. Example output:

{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb","Addr":"/ip4/147.75.83.83/tcp/4001","Direction":2}}
{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa","Addr":"/ip6/2604:1380:0:c100::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN","Addr":"/ip4/147.75.69.143/tcp/4001","Direction":2}}
{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ","Addr":"/ip4/104.131.131.82/tcp/4001","Direction":2}}
{"ID":"12D3KooWKdEMLcKJWk8Swc3KbBJSjpJfNMKZUhcG8LnYPA3XH8Bh","Peer":{"ID":"QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt","Addr":"/ip6/2604:1380:3000:1f00::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb","Addr":"/ip6/2604:1380:2000:7a00::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa","Addr":"/ip6/2604:1380:0:c100::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ","Addr":"/ip4/104.131.131.82/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN","Addr":"/ip6/2604:1380:1000:6000::1/tcp/4001","Direction":2}}
{"ID":"12D3KooWA6MQcQhLAWDJFqWAUNyQf9MuFUGVf3LMo232x8cnrK3p","Peer":{"ID":"QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt","Addr":"/ip4/147.75.94.115/tcp/4001","Direction":2}}

License

The hydra-booster project is dual-licensed under Apache 2.0 and MIT terms:

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

hydra-booster's People

Contributors

Stargazers

Watchers

Forkers

dennis-tra od0 daotlresearch us3r-network jjdfhskjdfahasdf

hydra-booster's Issues

Verify sybils are 1~3 hops away from each other node

We simulate generating the routing tables for such nodes and verify if indeed the hydra sybils are 1~3 hops away from each other node.

Note, we need the output of #68 for this.

Run a DHT scrapper and visualize the Peer IDs and Hydra Sybil IDs

Possible candidate https://github.com/scriptkitty/ipfs-crawler

With the this tool, we should be able to answer:

Where are the sybils in comparison to other nodes
Are sybils 1~3 hops away from non sybil nodes? (Simulating ideal routing tables) (see #69)
How many more sybils would we need to generate to make the above happen (see also #64)

Document idgen setup once deployed

#72 adds idgen to the HTTP API so that ALL hydras can have balanced peer IDs for their sybils.

I propose we setup Alasybil as an idgen server and have the other hydras as clients.

List listening multiaddrs on ui

It would be cool to have a "listening multiaddrs" so that we can quickly connect to it if needed.

Hydra Booster: help with some questions

Hello,

I have just discovered Hydra Booster and I want to use it for my project.

So I have a few questions about the usage and outputs of Hydra Booster:

When running Hydra Booster with the command go run ./main.go, this creates 1 head and assigns a Peer ID to that head. In addition to Hydra Booster head, I am also running IPFS node on my local machine with the command ipfs daemon. So now both nodes should connect to each other right? But when I run ipfs swarm peers on my local IPFS node, I cannot see the Peer ID of the head that Hydra Booster created. Did I do something wrong or ?
Can I interact with the head that Hydra Booster has created? Like receive files or add files ?
When using HTTP API, what does the command GET /records/list output mean? I know it returns a list of provider records. But the list is in this format: {"Key":"/providers/CIQA3J2WKKA57UENKRIJLB236L74NFIZSDZMMM4ZJP53CO2LLJFF3JQ/CIQFVBUR6S5EJ3GQ3TP7XXVEFDXTMA2Q6SIRPC7FKX3JEQODCRMQK7Y","Value":"wMPj0q3H7Zct","Expiration":"0001-01-01T00:00:00Z","Size":9}. What does this output mean? Provider Record maps a data identifier to a peer that has advertised that they have that content and are willing to provide. But I want to understand what the output means. Can someone please explain that to me.
The command GET /records/fetch/{cid}?nProviders=1 looks for peers that can provide a specific CID, is that correct?

Thank you so much for your time and effort.
I really appreciate it.

Getting to "Be ~1 hop away from every other node in DHT"

Just came out from a standup with @alanshaw and here is what we discussed.

In order to achieve the goal of "Be ~1 hop away from every other node in DHT", we need to have one evenly distributed PeerID for each other 20 nodes in the network, so that we land in everyone's else first kbucket

The formula is quite simple:

Number of total DHT nodes in the network / 20 = Number sybils to spawn -> to be 1 hop away

The current Network size is 20K, so applying the formulate we get that we need 1000 sybils to meet this goal.

We have the limit of running 200 sybils for each hydra node, but we can spawn multiple hydra. Because we want the PeerIDs to be evenly distributed, even across hydra nodes, we need to separate the logic of PeerID Gen to be a separate service that is shared by the hydras.

Additionally, we want to adjust the scale of the hydras to the number of nodes, for that, we want to run a cronjob to adjust it every week (or day).

One extra step is as we auto scale up and down, we don't want to loose the work already done in harvesting records. So instead of having each hydra with their own belly (record store) we want to have a shared record store across hydras (using a DB like postgres)

Tasks:

Separate the PeerID Gen into a networked service
Start a CronJob to autoscale the number of sybils
Implement the shared datastore with Postgres

b64 encoded seed can't always be passed as ENV in k8s (from DO secrets)

It might be a digital ocean thing, or a k8s thing, but we could not set N+5Fvrq3CkwAehNdhRebMgPC6psd4DCc5BAbHMgzP5Q= as the seed in prod.

You can run

HYDRA_RANDOM_SEED="N+5Fvrq3CkwAehNdhRebMgPC6psd4DCc5BAbHMgzP5Q=" ./hydra-booster -nheads 10

so its somewhere in k8s secrets thats the problem

Might be there is a line feed or null byte in there

# echo "N+5Fvrq3CkwAehNdhRebMgPC6psd4DCc5BAbHMgzP5Q=" | xxd
00000000: 4e2b 3546 7672 7133 436b 7741 6568 4e64  N+5Fvrq3CkwAehNd
00000010: 6852 6562 4d67 5043 3670 7364 3444 4363  hRebMgPC6psd4DCc
00000020: 3542 4162 484d 677a 5035 513d 0a         5BAbHMgzP5Q=.

error was

Error: failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: setenv: invalid argument: unknown

and some googling made me just try another secret and it worked so I don't know for sure what the problem is

httpapi-addr parameter description same as metrics-addr

Both read "Specify an IP and port to run prometheus metrics and pprof http server on".

Usage of /var/folders/1t/pdr3f6wj4qq26th3_mft_d480000gn/T/go-build979667117/b001/exe/main:
[...]
  -httpapi-addr string
    	Specify an IP and port to run prometheus metrics and pprof http server on (default "127.0.0.1:7779")
[...]
  -metrics-addr string
    	Specify an IP and port to run prometheus metrics and pprof http server on (default "0.0.0.0:8888")
[...]
exit status 2

Verified in 0.4.3.

revise metrics to support multiple provider sources

Current metrics label a call as "failed" if either there was a network failure or there was a successful response with no provider records. The latter two cases must be distinguishable in the metrics. Fix.
Design metrics so they can be applied generically to any routing source, as well as to the combined source.

Build error: "cannot use makeInsecureTransport"

I tried to build hydra-booster via:

$ go get -u github.com/libp2p/hydra-booster

And Go 1.16.

I get this error:

$ go get -u github.com/libp2p/hydra-booster
# github.com/libp2p/go-libp2p/config
go/pkg/mod/github.com/libp2p/[email protected]/config/config.go:144:19: cannot use makeInsecureTransport(h.ID(), cfg.PeerKey) (type sec.SecureTransport) as type sec.SecureMuxer in assignment:
        sec.SecureTransport does not implement sec.SecureMuxer (wrong type for SecureInbound method)
                have SecureInbound(context.Context, net.Conn) (sec.SecureConn, error)
                want SecureInbound(context.Context, net.Conn) (sec.SecureConn, bool, error)
go/pkg/mod/github.com/libp2p/[email protected]/config/config.go:146:24: cannot assign sec.SecureTransport to upgrader.Secure (type sec.SecureMuxer) in multiple assignment:
        sec.SecureTransport does not implement sec.SecureMuxer (wrong type for SecureInbound method)
                have SecureInbound(context.Context, net.Conn) (sec.SecureConn, error)
                want SecureInbound(context.Context, net.Conn) (sec.SecureConn, bool, error)
go/pkg/mod/github.com/libp2p/[email protected]/config/security.go:56:2: cannot use secMuxer (type *csms.SSMuxer) as type sec.SecureTransport in return argument:
        *csms.SSMuxer does not implement sec.SecureTransport (wrong type for SecureInbound method)
                have SecureInbound(context.Context, net.Conn) (sec.SecureConn, bool, error)
                want SecureInbound(context.Context, net.Conn) (sec.SecureConn, error)
go/pkg/mod/github.com/libp2p/[email protected]/config/security.go:78:2: cannot use secMuxer (type *csms.SSMuxer) as type sec.SecureTransport in return argument:
        *csms.SSMuxer does not implement sec.SecureTransport (wrong type for SecureInbound method)
                have SecureInbound(context.Context, net.Conn) (sec.SecureConn, bool, error)
                want SecureInbound(context.Context, net.Conn) (sec.SecureConn, error)

Extract UI and Reporting

Status reporting and UI are coupled tightly with code to run hydra nodes making it hard to test and reason about. There's also some duplication in the values being collected.

These elements need extracting and tests need to be added.

Multiple GC runs for provider records

We share a datastore but every sybil will be running GC on the provider records in the store. So for 200 sybils we get 200 GC runs periodically that we could avoid 199 of.

If we could pass a garbage collector to the DHT or signal to the DHT to not GC somehow we could just do one sweep.

cc @daviddias @aschmahmann @willscott @Stebalien

Log peer IDs to enable unique peer counts across multiple hydra nodes

From the discussion here: #36 (comment)

We should log peer IDs seen by all Hydras in the same format and to the same place as the gateways so that we can:

enable the ability to get unique peer IDs across all hydras
do less work deduping peer IDs on our hydra nodes
get a more accurate view of total IPFS network size by combining hydra and gateway peer IDs.

Improve discarded metric

The provider record prefetch chart shows discarded requests in orange. These are currently the requests where we previously failed to find a CID and were asked again within 1 hour OR if the prefetch queue was full.

We should categorize these properly so that we have the following discarded categories:

Previously failed to find a CID and were asked again within 5 minutes
Previously failed to find a CID and were asked again within 1 hour
Prefetch queue was full

Deploy Hydra nodes so that we have live network stats

Do we need/want continuous deployment?
What does the deployment pipeline look like?

Hydra upgrade

We've identified a number of weaknesses in the hydra design and implementation, which cause ungraceful failures (worker crashes) and downtimes when utilization spikes. The problem occurred in the window 7/7/2021-7/21/2021.

Problem analysis (theory)

The backend Postgres database can become overloaded under high volume of DHT requests to the hydras.
This causes query times to the database to increase. This in turn causes DHT requests to backup in the provider manager loop, which in turn causes the hydra nodes to crash.

Corrective steps

Ensure the entire fleet of hydra heads (across machines) always uses the same sequence of balanced IDs:
#128
Resolved by #130
Ensure ID/address mappings persist across restarts (design goal)
Fix aggregate metrics to use fast approximate Postgres queries (as opposed to slow exact queries)
#133
Upgrades in DHT provider manager:
- Use multiple threads in the provider loop (diminishes the effect of individual straggler requests to the datastore)
  libp2p/go-libp2p-kad-dht#729
- Gracefully decline quality of service when under load
  libp2p/go-libp2p-kad-dht#730
- Fully decline service at a configurable peak level of load
Monitor (via metrics) the query latency of the backing Postgres database (at the infra level)
Setup automatic pprof dumps near out-of-memory events, perhaps using https://github.com/ipfs-shipyard/go-dumpotron (at infra level)

Acceptance criteria

Verify that a sustained increased request load at the hydra level does not propagate to the Postgres backing datastore. This should be ensured by measures for graceful degradation of quality (above) at the DHT provider manager.

Pre-compute PeerIds

From https://docs.google.com/document/d/1yA2fY5c0WIv3LCtJCPVesHzvCWt14OPv7QlHdV3ghgU/edit?disco=AAAAGSYbngA

@petar creating this issue to track the work. Let us know if you have any questions :)

Add Tests

There's no tests 😱

SpawnNode (@alanshaw)
RunMany (@alanshaw)
RunSingle (@alanshaw)
Add Code Coverage to CI (@alanshaw)
Then update the deps and see what breaks :)

No provider records, no peers

I've been running 0.4.3 for about a week on a Gigabit machine with public IP (AND an IPFS node running). Somehow, I only seem to get about 5 peers and no provider records whatsoever.

I am currently running 0.5 for some time and am observing the same behaviour.

Is there perhaps something I've missed, or a default configuration option that does not make sense?

We'd like to run a booster to harvest more hashes to index for ipfs-search.com and this is currently blocking further development on our side.

Example output (it starts with 7 peers):

Hydra Booster

Head ID(s)               [<...>]
Connections              1 peers
Unique Peers Seen        7
Memory Allocated         28 MB
Stored Provider Records  0
Routing Table Size       0
Uptime                   0h 35m 37s

Thanks!

Keep Hydra head peerIDs between restarts

There doesn't seem to be a good reason for us to rotate our peerIDs (and therefore locations in the Kademlia keyspace) just because we OOM, update the version we're running, etc.

The negative effects of us rotating our keys are:

If you're running a small number of heads then you're effectively making the records previously stored with you useless since no one will look for them with you
If you're running many heads then you're invalidating a bunch of people's routing tables which can make clients less efficient. It'll all work itself out over time, but we might as well be nice

Solution for always reusing the same balanced IDs in the hydra deployment

The ID generator is a pseudorandom (i.e. seeded) algorithm that generates an infinite sequence of mutually-balanced IDs:
ID0, ID1, ID2, ID3, ...

Any one ID in this sequence is uniquely determined by the seed (which determines the sequence) and its index (i.e. sequence number) in the sequence.

Therefore, to ensure that a collection of Hydra heads (1) have mutually-balanced IDs and (2) they always reuse the same IDs (after restart), it suffices to parameterize each of them with the same seed and index at execution time, such that each of them has a different index in the space of positive integers.

For example, heads can be parameterized as:
id_seed=xyz, id_index=1
id_seed=xyz, id_index=2
id_seed=xyz, id_index=3
...

Note that it is irrelevant which machines or processes the heads run on.
The key requirement is that each head (across the entire fleet) gets a unique index!

Therefore, heads should be parameterized at the infra/deployment level, perhaps using command-line arguments. Restarting a head then guarantees it reuses the same ID and it is unique across the fleet.

Furthermore, this methodology enables easy (auto)scaling: Just assign unused index numbers to heads that are being added. (The space of positive integers is large enough!)

There is no requirement that indices are consequtive numbers (just that they are unique). This facilitates ops engineers to use different blocks of integers for different types of scaling purposes. For example, two entirely independent (with no coordination between them) hydra fleets can be deployed. For instance, the first fleet can use only even numbers for its heads; the second fleet can use only odd numbers for its fleet. Clearly, this example generalizes in various ways.

Note that this methodology completely alleviates the need for any kind of direct network coordination/connection between heads, making the system considerably more robust!

Progress

A first step in this direction is provided in #130.

Stage 4 - Siamese Hydras

Effort Needed: Low to Mid (1~2 weeks of developing + testing + deployment)
Prerequisite(s): Stage 1-3

Design notes & tasks

Rather than an in process shared record store, use a database backend (e.g. Postgres) so that multiple hydra booster instances can share the same records list

Testing mechanics & evaluation plan

Continue using the testing from the previous 3 stages
Verify the health of the records using the database backend UI

Success criteria

Multiple Hydra instances (i.e. multiple machines running Hydra nodes) have a shared datastore
We can easily query for most popular records, see if there is any particular content that is really hot. Opens the possibility to identify areas of the address space that are more loaded (hot) and to spin new nodes to those areas to redistribute the load (load balancing)

Listen to FindProvs queries by other peers

Every other node will always hit an hydra and will always do a .get on the hydra
Hooked datastore detects this attempt to read
If not there, you call the FindProvs
Do not block!

see #23

Proactively fetch any record that you receive a request for

Note to the implementer:

An optimization can be done by selecting the sibyl that is already closer to that record.

Sort out Docker image build and deployment

I think releases that are tagged here should be automatically built and tagged on Dockerhub and we should deploy specific tags rather than latest to Digital Ocean, perhaps not even in this repo, maybe even not have ./k8s in this project except as an example

Fork & Refactor

Build a tool to help us understand the DHT (Visually is a ++)

With the this tool, we should be able to answer:

Where are the sybils in comparison to other nodes
Are 1~3 hops away from each other non sybil node? (Simulating ideal routing tables)
How many more sybils would we need to generate to make the above happen

Make HTTP API address configurable

Currently hardcoded and not bound to public address. We cannot use #43 from production deployment because of this.

Not working with PostgreSQL 12

Hangs on connection with no error message.

Always select on a context when writing to a channel that could block

If you write to a channel that and you think the receiver might not read from it (e.g., the receiver could cancel the request), make sure to select on a channel. Grep the codebase for <- and audit all the channel writes that don't select on contexts.

Scafold tasks into issues, use Kanban

https://app.zenhub.com/workspaces/hydra-booster-5e64ef0d1fa19e698b659cec/board?repos=245123455

refs 58fe942

Complete the instrumentation to verify that prov records are being collected

Review the leveldb datastore

Can it do concurrency?

Integrate delegated content routing client

Done criteria: A https://github.com/ipfs/go-delegated-routing client with get-p2p-provide is invoked in a performant way as part of getting provide request records.
Why important: Enabling hydras to bridge to external systems like the to-be-federated indexer network which is under development.
Notes:

Example usecase: enable nft.storage/web3.storage to have their content discoverable on IPFS via Indexers without needing to publish provider records themselves.
This depends on the ProvideManager libp2p/go-libp2p-kad-dht#749 to be implemented.
There is followup work to have a delegated-routing server implemented in the Indexer and to configure Hydra nodes to invoke it. For the example above, this is being done in #141
The calling strategy for the two paths (SQL query vs. delegated router) needs to be determined (are they done in parallel, which takes priority, when do we timeout, etc.)
This isn't something we're doing to help out a particular service. If one has a centralized service in need of special indexing, one needs to bridge with the indexer network.

Use fast approximate queries to Postgres for metrics collection

Currently, metrics collection uses exact counting (SELET COUNT), which is slow and expensive and is found to interfere with useful payload queries to the database. Consider using an approximate count query, provided by Postgres. E.g.

  SELECT
  (reltuples/relpages) * (
    pg_relation_size('records') /
    (current_setting('block_size')::integer)
  )
  FROM pg_class where relname = 'records';

How to measure the percentage of content that's available?

We know that there's a lot of content stored on IPFS, but how much of that content can actually be accessed universally right now? Knowing what content is available in the network gives us a metric we can form KRs around and a gauge of how healthy the IPFS network is.

Use go-libp2p-xor

https://github.com/libp2p/go-libp2p-xor evolved from https://github.com/libp2p/hydra-booster/blob/master/idgen/xortrie.go and is better maintained and tested.

Stage 1 - The Hydra Belly

Effort Needed: Low (1 week of developing + testing + deployment)
Prerequisite(s): None, can be shipped in a go-ipfs 0.4.X

Design notes & tasks

Fork ipfs/dht-node & refactor
Upgrade the sybils to use shared record store that stores records to disk
- Verify the LevelDB adapter that was previously created
- Verify that LevelDB is indeed our best bet (compare it with a networked db such as Postgres in preparation for stage 4)
Proactively fetch any record that you receive a request for

Testing mechanics & evaluation plan

Deploy the Hydra Booster node to IPFS Infrastructure
Monitor the number of records fetched and stored
Start measuring the number of hops the IPFS Gateway do when running a .FindProvs call

Success criteria

We replicate the most requested records that exist in the network

Configure delegated routing to invoke Indexers

Done criteria: provider records are returned by Hydra boosters via the to-be-federated indexer network which is under development.
Why important: enables any CIDs indexed by the Indexer network (which will have nft.storage/web3.storage CIDs) to have their content discoverable on IPFS via Indexers without needing to publish provider records themselves.
Notes:

This builds on #140
This is the configuration step of adding the Indexer as a place to delegate routing requests to. This assumes that the Indexer has implemented/deployed the server-side of delegated content routing for get-p2p-provide requests.

Provider records for IPFS Gateway CIDs are fetched

The hydras receive HTTP post requests from the Gateway for CIDs that the Gateway needs to resolve, so that the Hydras can proactively replicate those records.

see #23

Default to HYDRA_ENABLE_V1_COMPAT=true

Without it, no peers are found, hence the current defaults seem broken.

Ref: #110

Multi routing table with seed from closest sybil

[TODO description]

Stage 3 - Single routing table (rather than one per Sybil)

Effort Needed: Low (1 week of developing + testing + deployment)
Prerequisite(s): Stage 1 & Stage 2

Design notes & tasks

This is a perf optimization. Rather than having N routing tables, where N is the number of sybils, sharing a connection pool (and therefore blocking each other), we want to only have one of the PeerIds doing the routing, while the others are just in the routing tables of other peers
Use delegated routing for each of the sybils to use the router sybil to fetch the records

Testing mechanics & evaluation plan

Deploy the nodes. Measure the memory, cpu and bandwidth profiles. There should be a drop comparing to previous version.

Success criteria

The sybils do not thrash each other when sharing the connection pool
The sybils become less noisy (as only one node will have a large routing table, rather than N nodes having many small routing tables that need to be constantly updated)

Re-bootstrap if node loses all connections

Hydra sybils should re-bootstrap to the trusted bootstrap nodes if they lose all their connections so that we maintain our presence on the DHT.

Provide an HTTP endpoint to suggest records to be fetched

Downgrade to golang 1.13

See ipfs/kubo@968e70f

Provide an HTTP endpoint to list the records stored on the shared datastore

Stage 2 - Hydra Heads with Pre-Calculated Peer IDs

Effort Needed: Low (1 week of developing + testing + deployment)
Prerequisite(s): None, can be shipped in a go-ipfs 0.4.X (Yes, stage 2 can be done in parallel)

Design notes & tasks

Calculate the number of heads (PeerIds) necessary for networks of multiple sizes (10K, 100K, 1000K)
Brute force to calculate an even distribution of those PeerIds
Use those selected ids as PeerIds for the Hydra Booster sybils

Testing mechanics & evaluation plan

We run the DHT scrapper daily and visualize the PeerIds distribution and where the hydra PeerIds show
We simulate generating the routing tables for such nodes and verify if indeed the hydra nodes are 1~3 hops away from each other node.

Success criteria

Our Hydra nodes ate 1~3 hops away from every other node in the Network

If we remove idgen server, do we need locks on the BalancedIdentityGenerator ?

We used to pass it to the heads so I think concurrency was a concern, now except for the idgen server we only use it in one goroutine as far as I can tell https://github.com/libp2p/hydra-booster/blob/master/hydra/hydra.go#L137

If we remove idgen server then I dont think we need IdentityGenerator.Remove we can get rid of (I think):

idgen/cleaning*
idgen/delegated*

and simplify the HTTP api a lot

Security: metrics publicly exposed by default

By default, the Prometheus metrics and pprof HTTP server are listening on 0.0.0.0, which is likely to cause information leaks and/or might expose attack vectors.

As this daemon is to be run on public addresses by default, and uses randomly picked ports for the heads, this default of publicly exposing non-essential services seems a bad design from a security point of view.

In addition, it is inconsistent with the other listeners which default to 127.0.0.1 (as they should).

How to measure the lifecycle of a file in the DHT?

Understanding the dynamics of what happens to a file WRT it's presence in the DHT will enable us to tune the DHT more appropriately.

Some metrics off the top of my head that might be useful.

Number of provider records available over time
Number of providers over time
Availability of providers
Frequency of re-provides

Where these are all per CID.

libp2p / hydra-booster Goto Github PK

hydra-booster's Introduction

Hydra Booster

Install

Usage

Flags

Environment variables

Best Practices

Running Multiple Hydras

DynamoDB Provider Store

Developers

Release a new version

Publish a new image

Metrics collection with Prometheus

API

HTTP API

GET /heads

GET /records/list

GET /records/fetch/{cid}?nProviders=1

POST /idgen/add

POST /idgen/remove

GET /swarm/peers?head=

License

hydra-booster's People

Contributors

Stargazers

Watchers

Forkers

hydra-booster's Issues

Problem analysis (theory)

Corrective steps

Acceptance criteria

Solution for always reusing the same balanced IDs in the hydra deployment

Progress

Design notes & tasks

Testing mechanics & evaluation plan

Success criteria

Design notes & tasks

Testing mechanics & evaluation plan

Success criteria

Design notes & tasks

Testing mechanics & evaluation plan

Success criteria

Design notes & tasks

Testing mechanics & evaluation plan

Success criteria

Recommend Projects

Recommend Topics

Recommend Org

`GET /heads`

`GET /records/list`

`GET /records/fetch/{cid}?nProviders=1`

`POST /idgen/add`

`POST /idgen/remove`

`GET /swarm/peers?head=`