ava-labs / hypersdk Goto Github PK
View Code? Open in Web Editor NEWOpinionated Framework for Building Hyper-Scalable Blockchains on Avalanche
Home Page: https://hypersdk.xyz/
License: Other
Opinionated Framework for Building Hyper-Scalable Blockchains on Avalanche
Home Page: https://hypersdk.xyz/
License: Other
Load testing reveals that for every 1 accepted block, we call parse on that block ~17 times
Each transaction should have some sort of fee based on the number of state keys it requires.
We should not make this the entirety of the fee because it would not allow us to charge for interesting compute within an action.
go1.17 added the ability to convert from slice to array/array pointer: https://tip.golang.org/ref/spec#Conversions_from_slice_to_array_or_array_pointer
However, the UX for this was super ugly until go1.20 (just released):
We should now migrate all our codec unmarshaling to do this instead of copying from the tx byte slice (which will dramatically reduce memory allocations in block processing). Thanks to @rafael-abuawad for calling my attention to this in #21.
I wish we had more memory allocations benchmarked so we could show the performance improvement here but don't think it makes sense to wait to add those to add this. We can probably just add one quick one for unmarshaling a dummy tx.
We MUST be careful to only cast exactly the bytes we need so we don't keep the entire backing array of the slice alive if we don't need to (eventhough we usually will because we hold onto the tx bytes): https://utcc.utoronto.ca/~cks/space/blog/programming/GoInteriorPointerGC
go1.20.1 is not stable (golang/go#58798)
During hypersdk
load testing, I found that avalanchego
spends the majority of time performing compression-related tasks (zstd
should help a ton here):
Showing nodes accounting for 16600ms, 71.55% of 23200ms total
Dropped 486 nodes (cum <= 116ms)
Showing top 10 nodes out of 233
flat flat% sum% cum cum%
3040ms 13.10% 13.10% 7990ms 34.44% compress/flate.(*compressor).deflate
2500ms 10.78% 23.88% 3170ms 13.66% compress/flate.(*decompressor).huffSym
2160ms 9.31% 33.19% 2160ms 9.31% runtime.memmove
1820ms 7.84% 41.03% 1820ms 7.84% crypto/sha256.block
1730ms 7.46% 48.49% 1730ms 7.46% runtime/internal/syscall.Syscall6
1550ms 6.68% 55.17% 1670ms 7.20% github.com/golang/snappy.encodeBlock
1460ms 6.29% 61.47% 2290ms 9.87% compress/flate.(*compressor).findMatch
820ms 3.53% 65.00% 820ms 3.53% compress/flate.matchLen (inline)
760ms 3.28% 68.28% 760ms 3.28% compress/flate.(*dictDecoder).writeByte
760ms 3.28% 71.55% 760ms 3.28% runtime.memclrNoHeapPointers
We should use a bytes.Pool to reduce the number of memory allocations we do during trie operations.
When evicting a cached block, failing to parse a block, or a removing a tx from the mempool, we should add the raw bytes to a pool for later reuse.
If a validator does not sign an Avalanche Warp Message or does not send us certain data blobs (in the case of a storage-based Subnet), we should be able to penalize them by reducing their Uptime (which affects whether or not they will be rewarded).
It would also be cool if there was an option to increase the number of accounts with each new transaction instead of sending to an existing account (which keeps the trie the same size during the test).
Create a simple tutorial on how to use the SDK. I believe this would be the most useful thing. Maybe in written or video form.
Topics that the tutorial should cover:
To test the throughput on local/live networks, we'll need some sort of spam script.
To optimize throughput, we should ask for a # of accounts and then generate 1 tx per recipient per second. The max throughput per second will then be # accounts ^ 2.
We should additionally disregard any failures/avoid tracking tx success to maximize throughput.
Bonus points for adding a MPC-based auth provider like (web3auth) instead of storing private keys in the browser
Allocated memory:
Showing nodes accounting for 19.24GB, 48.93% of 39.32GB total
Dropped 435 nodes (cum <= 0.20GB)
Showing top 10 nodes out of 188
flat flat% sum% cum cum%
2.88GB 7.33% 7.33% 2.88GB 7.33% github.com/ava-labs/hypersdk/tstate.(*TState).Insert
2.64GB 6.71% 14.04% 2.64GB 6.71% github.com/ava-labs/hypersdk/examples/tokenvm/storage.PrefixBalanceKey (inline)
2.52GB 6.40% 20.44% 2.52GB 6.40% github.com/ava-labs/avalanchego/utils/wrappers.(*Packer).expand
2.47GB 6.28% 26.72% 2.47GB 6.28% github.com/ava-labs/avalanchego/x/merkledb.(*trieView).invalidateChildrenExcept
2GB 5.09% 31.81% 2GB 5.09% golang.org/x/exp/maps.Clone[...] (inline)
1.42GB 3.60% 35.41% 1.42GB 3.60% github.com/ava-labs/avalanchego/x/merkledb.newPath (inline)
1.41GB 3.58% 39.00% 3.41GB 8.67% github.com/ava-labs/avalanchego/x/merkledb.(*node).clone
1.33GB 3.39% 42.39% 3.51GB 8.93% github.com/ava-labs/avalanchego/x/merkledb.(*trieView).calculateNodeIDsHelper
1.29GB 3.28% 45.67% 1.29GB 3.28% google.golang.org/grpc.(*parser).recvMsg
1.28GB 3.27% 48.93% 3.07GB 7.82% github.com/ava-labs/hypersdk/chain.(*Processor).Prefetch.func1
Blocked on: ava-labs/avalanche-network-runner#467
Related: #82
Will also need to remove all //nolint:gocritic
comments about default T
values for generics.
2023-03-21T06:07:52.3130418Z �[0;0m�[1;33m[node9-bls] [03-21|06:07:52.312] INFO <2EWQarceXeUfevv4ohDBBtgRbJjRYS54Lrrfr7w7jczbcZFT23 Chain> syncer/state_syncer.go:298 skipping state sync {"reason": "no acceptable summaries found"}
2023-03-21T06:07:52.3131808Z �[0;0m�[1;33m[node9-bls] [03-21|06:07:52.312] INFO <2EWQarceXeUfevv4ohDBBtgRbJjRYS54Lrrfr7w7jczbcZFT23 Chain> bootstrap/bootstrapper.go:123 starting bootstrapper
2023-03-21T06:07:52.3132818Z �[0;0m�[1;33m[node9-bls] [03-21|06:07:52.312] �[0;0mINFO�[0;0m <2EWQarceXeUfevv4ohDBBtgRbJjRYS54Lrrfr7w7jczbcZFT23 Chain> vm/vm.go:379 bootstrapping started
2023-03-21T06:07:52.3143658Z �[0;0m�[1;33m[node9-bls] [03-21|06:07:52.314] INFO <2EWQarceXeUfevv4ohDBBtgRbJjRYS54Lrrfr7w7jczbcZFT23 Chain> common/bootstrapper.go:244 bootstrapping started syncing {"numVerticesInFrontier": 1}
2023-03-21T06:07:52.3145168Z �[0;0m�[1;33m[node9-bls] [03-21|06:07:52.314] INFO <2EWQarceXeUfevv4ohDBBtgRbJjRYS54Lrrfr7w7jczbcZFT23 Chain> bootstrap/bootstrapper.go:553 executing blocks {"numPendingJobs": 0}
2023-03-21T06:07:52.3146378Z �[0;0m�[1;33m[node9-bls] [03-21|06:07:52.314] INFO <2EWQarceXeUfevv4ohDBBtgRbJjRYS54Lrrfr7w7jczbcZFT23 Chain> queue/jobs.go:224 executed operations {"numExecuted": 0}
2023-03-21T06:07:52.3148277Z �[0;0m�[1;33m[node9-bls] [03-21|06:07:52.314] INFO <2EWQarceXeUfevv4ohDBBtgRbJjRYS54Lrrfr7w7jczbcZFT23 Chain> snowman/transitive.go:409 consensus starting {"lastAcceptedBlock": "7cE3pDvTaEx3Gn75wU2ve6RqRbLzdm6jh3ooqpvVhRXnSpCWg"}
2023-03-21T06:07:52.3149778Z �[0;0m�[1;33m[node9-bls] [03-21|06:07:52.314] �[0;0mINFO�[0;0m <2EWQarceXeUfevv4ohDBBtgRbJjRYS54Lrrfr7w7jczbcZFT23 Chain> vm/vm.go:382 normal operation started
Reference: https://github.com/ava-labs/xsvm
Track total txs, total accounts, hourly txs, and hourly accounts for easy dashboard creation before full indexing can be done.
When we have previously synced but are past the "history window" kept by other nodes, we clear all previously synced data on-disk.
Instead, we should fetch edge proofs of the current state to see if we can avoid re-fetching anything we already have from the rest of the network.
For auth methods that don't require any DB access to verify, we should add shared modules (ex: ED25519 in TokenVM)
Not sure how to get past this. This is the same problem I'm having with indexvm
Any help?
go version go1.19.2 darwin/arm64
Consider running different devnets at a variety of different hardware specs so devs can have a clear understanding of what different tradeoffs of CPU/RAM/DISKIO allow.
The acceptor queue does not handle graceful shutdown properly (if we are processing a block when shutdown is triggered): https://github.com/ava-labs/indexvm/actions/runs/4226574719/jobs/7340155987.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x90 pc=0xbb5fbe]
goroutine 73 [running]:
github.com/ava-labs/hypersdk/chain.(*StatelessBlock).Processed(...)
/home/runner/go/pkg/mod/github.com/ava-labs/[email protected]/chain/block.go:510
github.com/ava-labs/hypersdk/vm.(*VM).processAcceptedBlocks(0xc0001103c0)
/home/runner/go/pkg/mod/github.com/ava-labs/[email protected]/vm/resolutions.go:132 +0x9e
created by github.com/ava-labs/hypersdk/vm.(*VM).Initialize
/home/runner/go/pkg/mod/github.com/ava-labs/[email protected]/vm/vm.go:272 +0x218a
The simplest fix for this would just be to wait on the acceptorQueue
in VM.Shutdown
.
We should wrap PebbleDB
in this interface to ensure we never write to a database after receiving an unexpected error (which may lead to unexpected consequences): https://github.com/ava-labs/avalanchego/tree/master/database/corruptabledb
It would be super cool if PRs emitted how much the load test changed (assuming the disk sampling was similar on main).
We should be able to emit coverage changes on each PR + add a badge to the README
Once x/merkledb
locking is overhauled, it will be possible to concurrently fetch values from disk (already supported by pebble
).
This is probably not a big deal, but thought I'd mention it.
I am getting the following warning in VS Code:
Error loading workspace: You have opened a nested module. To work on multiple modules at once,
please use a go.work file. See https://github.com/golang/tools/blob/master/gopls/doc/workspace.md
for more information on using workspaces.
Under load (lots of RPC tx submissions and gossip), our naïve locking patterns don't yield time to consensus-related processes as soon as possible.
We need to overhaul this design to ensure that no matter what is going on that consensus-related processes get time to execute (and acquire requisite locks) as soon as possible.
For now, the simplest would be to have a "job processor" where we regulate access to MerkleDB based on priority instead of relying on golang to manage lock contention. This can then be reverted once MerkleDB allows concurrent read access.
https://go.dev/play/p/NQNOAkSxkw5
Inside of avalanchego
we pad the input (https://github.com/btcsuite/btcd/blob/902f797b0c4b3af3f7196d2f5d2343931d1b2bdf/btcutil/bech32/bech32.go#L400-L406) to ensure the bytes each encode 5 bits (required by library used) and then we receive an extra by back in decoding because of this. This is not explained correctly in the comments and the variable we use to check the length against is misnamed.
If we do not do this padding, we can fail to parse bech32 strings with the following error:
unable to convert address from 8-bit to 5-bit formatting
This will allow us to increase the number of active accounts without increasing target TPS (better stress test of merkledb).
Workaround: #56
During load testing, I found that avalanchego
uses a ton of memory storing parsed blocks (basically 2048 * 2 * <expected block size>
):
Showing nodes accounting for 1.94GB, 97.98% of 1.98GB total
Dropped 288 nodes (cum <= 0.01GB)
Showing top 10 nodes out of 65
flat flat% sum% cum cum%
0.95GB 47.83% 47.83% 0.95GB 47.83% google.golang.org/protobuf/internal/impl.consumeBytesNoZero
0.82GB 41.52% 89.35% 0.82GB 41.55% github.com/ava-labs/avalanchego/vms/components/chain.(*State).ParseBlock
0.16GB 8.07% 97.42% 0.16GB 8.07% github.com/ava-labs/avalanchego/utils/wrappers.(*Packer).expand (inline)
0.01GB 0.56% 97.98% 0.01GB 0.56% github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Get
0 0% 97.98% 0.16GB 8.07% github.com/ava-labs/avalanchego/codec.(*manager).Marshal
0 0% 97.98% 0.16GB 8.07% github.com/ava-labs/avalanchego/codec/reflectcodec.(*genericCodec).MarshalInto
0 0% 97.98% 0.16GB 8.07% github.com/ava-labs/avalanchego/codec/reflectcodec.(*genericCodec).marshal
0 0% 97.98% 0.79GB 40.04% github.com/ava-labs/avalanchego/message.(*inMsgBuilder).Parse
0 0% 97.98% 0.79GB 40.04% github.com/ava-labs/avalanchego/message.(*msgBuilder).parseInbound
0 0% 97.98% 0.79GB 40.04% github.com/ava-labs/avalanchego/message.(*msgBuilder).unmarshal
This is very likely related to this: https://github.com/ava-labs/avalanchego/blob/7d73b59cb4838d304387ea680b9cc4053b72620c/vms/rpcchainvm/vm_client.go#L65-L70
const (
decidedCacheSize = 2048
missingCacheSize = 2048
unverifiedCacheSize = 2048
bytesToIDCacheSize = 2048
)
Other HyperVMs (like the IndexVM
) would benefit from having a shared set of testing helpers to write integration/load/e2e tests.
Devnet deployment should be fully reproducible and OSS. We should aim to use avalanche-ops if possible.
Related: #59
When building (maybe only from source) on Linux, we should allow optimized BLS assembly code to be used. We disabled this previously because this optimized code panics on some architectures.
BLS verification is client-side and does not depend on how AvalancheGo is compiled: https://github.com/ava-labs/avalanchego/blob/b3a74687d5443e4dfd6fae61802752eee4c72f9b/vms/platformvm/warp/signature.go#L115-L131
Would be VERY cool if it had some animations like:
This could be used to either build some sort of smart-contract based HyperSDK or to build a rollup-optimized subnet that can process any WASM fraud proof.
Naïve fee pricing/usage targets were added during initial development. We should revisit these and add the updated values to the README.
To send blocks larger than 2MB, allow a block to include hashes for other chunks that can be downloaded over the P2P layer before verification (and after parse).
If all chunks can't be received, then the block will simply fail verification.
We should probably do this with some form of erasure coding.
While we could simply just "send more blocks", this allows us to perform fewer merkle root calculations and consensus events.
We will need to be very thoughtful about introducing protection mechanisms so a malicious producer can't force us to fetch a bunch of useless data.
To invite new contributors and to maintain some guidelines adding Contributors Guide section to the README
could be really helpful.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.