What is wrong?
Currently we first sync the chain data (blocks, receipts, etc) and then sync the state trie. This approach won't reliably work for doing a full sync of the chain because:
- State sync currently takes around 5-10 days
- The state root which it syncs against is static
- Geth (and probably parity) garbage collect old state tries (geth defaults to only keeping state data for 128 blocks)
This means that unless trinity happens to be connected to a node running in archive mode (extremely rare) that by the time it completes the state sync there will be missing trie nodes which will not be available from most any peer it connects to.
And even if trinity happens to be connected to an archive node, in order to hit the necessary performance numbers for Trinity to be viable, we will need to process these two syncs concurrently.
How can it be fixed
An initial exploratory attempt can be found here: ethereum/py-evm#1231
Running the two sync processes concurrently is reasonably easy to do, however, the problems that need to be solved are related to how we orchestrate these two sync processes.
HEAD tracking
See: #54
We need a reference to the HEAD chain header that we are syncing towards.
- For chain data sync, this merely acts as an anchor for when we have fully synced.
- For state trie, we use the
state_root
as our sync target.
Currently the chain data sync updates the target but the state trie sync does not. We need to address the security issues laid out in #1233 as well as implement a mechanism for the state sync to update the state_root
that it is targeting.
Decouple header and body syncing
This isn't exactly required, however I believe it will make this easier to manage.
Currently chain data sync syncs headers and bodies together. This should be separated such that we sync the full header chain ahead of the block bodies, effectively making it into two sync processes.
- One process which syncs only the headers (akin to light sync)
- Another process which syncs the bodies.
By moving the header sync into a dedicated process we can institute the head tracking in that process and then cue off of the current target HEAD in the block body and state trie sync.
Updating the target state_root
A naive approach for updating the state_root
is to simply abandon the previous syncer and start a new one. This however is likely to incur potentially significant performance overhead since it will require re-walking the entire state trie.
Intelligent updating of the state root would ideally involve
- preserving all of the
SyncRequest
objects that are still valid for the new state root
- discarding all of the
SyncRequest
objects which are no longer valid and removing their entries from the database.
Due to the overhead we also cannot migrate to a new state root on every block. Since go-ethereum garbage collects trie nodes after 128 blocks, it is likely appropriate to only migrate to a new target state root at around that interval.
Coordinating chain and state sync
Once we have a reliable way for both chain data and state trie sync to update the targets they are syncing towards, we need a mechanism to synchronize these two processes, i.e, ensure that both sync processes are syncing towards the same target.
Since updating state sync incurs overhead we likely need to only update the sync target every 128 blocks. This means that both state sync and chain data sync will need to be able to pause if they catch up to the sync target and to only resume if the other process does not also complete before the next update.
For the chain data at least, once it has caught up to the sync target we should be able to continue to download the chain data but to cache it in memory, only processing it once the sync target has been updated which would allow chain data sync to catch up faster than trie data sync.