Comments (13)
This structure takes the form of a trie root build as the mapping of indices to storage keys. The indices are sequential and ordered by storage key
I don't understand this. Is it a mapping of storage_key => ??
or i => storage_key
if storage_key
was the i
th change in the block? In that case, how can it be ordered by storage key? For succinct proofs that something hasn't changed we should just be able to get a proof of non-inclusion if the mapping is keyed by storage key.
Given that I'm having trouble parsing the above, I might be completely off-base here, but it sounds like
A null sentinel index of (ChangedKeyCount, null)
is establishing an upper bound on the number of changes we track per block.
That would defeat the purpose since it would allow plausible omission of data to light clients.
If the changes trie-root mapped key => last_changed
where last_changed
is the block number it was changed at before, we could request proof of all changes up to some point in history. All of this still relies on the assumption that clients will mostly want to read specific storage entries as opposed to querying functions.
w.r.t. the footprint of deleted entries, that is also true. Given that our security model is weakly subjective we should make the cleanup period fairly long and in accordance with the withdrawal period of a few months or so. I don't think this poses a DoS vector any more so than transfers of 1 stake-unit to tons of unreachable accounts would, although we should also consider garbage collection and rent in that case. Then the question becomes "what's the minimum fee someone has to pay to impose a storage burden of 1 cleanup period"?
from substrate.
So suppose that there's a block with a single extrinsic sender = Alice, call = transfer(Bob, 69)
. There are three changes to storage: twox_128(':staking:balance:<Alice>')
, twox_128(':staking:balance:<Bob>')
and twox_128(':sys:index:<Alice>')
. Let these three expressions evaluate to 0x1111...
, 0x2222...
and 0x3333...
respectively.
Then we build a trie with four key-value pairs:
(0, 0x1111...)
(1, 0x2222...)
(2, 0x3333...)
(3, 0x)
The keys are just formed by ordering the values and running up from 0
. There's a final value to denote the end. If a light-client wished to have proof that Charlie's balance didn't change, they would determine that entry in storage twox_128(':staking:balance:<Charlie>')
, (which, for argument's sake evaluates to 0x1234...
) then they'd ask a proof-server for a proof of the existence or non-existence of that node in the trie whose root they know via header-sync. An existence proof is trivial. A non-existence proof is also simple: the proof-server would reply with the witness statements for the two entries either side of that value ((0, 0x1111...)
, (1, 0x2222...)
). Since they have consecutive index keys (i.e. 0 and 1), and since the associated storage keys sandwich the storage key we're interested in, and also since we know all index keys are bestowed sequentially on the ordered storage keys, then we can conclude that no storage entry was changed whose key is > 0x1111...
and < 0x2222...
in this block, including 0x1234...
which we're interested in.
from substrate.
I see, that makes sense to me now.
I'm not sure that I would necessarily call what's described a compact proof of non-existence. Given an ordinary merkle mapping from A => B
, you can prove non-existence of an element a
with a single merkle branch (which proves that the lookup terminates before reaching a value). These proofs require two, although they will often share a long common prefix.
The main thing to consider is the complexity of producing such a proof. Doing a binary search from 0 to ChangedKeyCount
and finding the ChangedKeyCount
will take (log(n))^2 + log(n)
db accesses. Evaluating it is only log(n)
since we can specify the indices to look at. Creation of the trie is even more expensive, since we need to accumulate all the keys and then sort them before making the trie. This all might end up pretty costly when lots of keys have been changed. With a regular merkle trie, proofs of existence and non-existence will take only log(n)
time to create and evaluate. The sorting proof approach does allow for fairly compact proofs of non-alteration for arbitrarily large ranges of keys, but it's difficult to take advantage of that with hashed storage keys.
It's not light-client friendly to require fetching them every block while tracking changes. I'd suggest that the ∆-trie map idx => (key, last_changed)
or simply key => last_changed
so a light-client starting with a merkle proof of a storage value can get existence proofs only when something changes and be confident that nothing was omitted. This is a sore point in light clients for other blockchains. We don't even have to synchronize the whole header chain if we are subscribed to clients who will push us those which contain relevant changes and we are aware of a validator set persisting up to some point in the future. Then we only need relevant headers and corresponding ∆-trie proofs pushed to us.
If the last_changed
for each key is stored in the state trie, extending over hierarchical ranges is only necessary to get over a range if we assume old states to be unavailable, which is plausible. If historical state is available, we can just take hops backwards from the current last_changed
to the point in history we are interested in.
from substrate.
That idea is covered under the first proposal "Track last-modification-time entry in storage". As mentioned above, keys that were deleted pose a problem. Indeed there is no need for the ordered/indexed Merkle tree for a compact proof of non-existence, but the point about ranges and specification of which items in the range are of interest still stands.
from substrate.
Could this be solved on the higher level instead? E.g. store a second nonce value with the account which gets incremented every time the balance is changed. Do we really need tracking for all key/values?
from substrate.
In general, yes, since much of substrate's logic will depend on being able to track arbitrary storage items, not simply the specific account items.
from substrate.
You might also look at SMT:
https://eprint.iacr.org/2016/683.pdf
from substrate.
@rphmeier any more to say on this matter? Given the issue with finding non-existence when using "Track last-modification-time entry in storage" strategy, my preference is now firmly with the "Ordered, indexed, Merklised per-block change-trie" proposal, albeit taking into account your point that ordering isn't required for non-existence proofs.
from substrate.
@gavofyork what is the ordering/indexing useful for then? in a state with hashed keys the odds of finding a useful yet compact run of keys is tiny. The implementation and computational complexity of building the hierarchical modification tries will require a lot of overhead on full nodes.
Tracking last-modification time makes the most sense to me, since the approach of clearing zombie keys is actually pretty easy. We can maintain a separate set of del:* -> #
keys for those which are deleted (# = Number, * = some other key), and wipe those every N blocks (the purpose of moving them to a common-prefixed set is to make deletion fast). We just have to check for a del:*
entry when creating a value in a new slot.
from substrate.
Quoting from paritytech/polkadot#425 (comment)
In the current model, we only need to test existence of the key in the Merkle tree; if it's there we don't actually need to load the value to know that the proposal "exists". In this new model that's not enough since it might be there but in a zombie "deleted" state - we'll have to load the value (at least the first 4 bytes) to check whether it really is in the trie. That may be problematic to do efficiently without incurring the full cost of the 700KB load.
By separating the deletion tracker we only have to probe for existence as well. Although this might be a moot point, because in practice in substrate, querying existence of a key in the trie actually incurs a load of the value as well, as the postfix of the key is stored within the leaf node itself, along with the value.
from substrate.
i really dislike zombie keys and latent cleanups. i think there's enough information on why "Ordered, indexed, Merklised per-block change-trie" extended over ranges is useful and i really don't want to rehash the same points here. i suggest we take this offline and come to consensus.
from substrate.
Extending over hierarchical ranges of blocks is useful, but there is no information in this discussion about a proposed use-case of a range-modification proof in a state which makes use of hashed keys, or a fast/storage-efficient algorithm for computing the hierarchical change-set.
from substrate.
Merged in #628
from substrate.
Related Issues (20)
- [RPC-Spec-V2] Limit ongoing operations
- Incorrect processing of updates of the storage collection that stores empty tuples HOT 1
- branchmark storage: Add flag to only read a fraction of all keys HOT 1
- > > Add new Treasurenet blockchain network HOT 1
- Block request timeout not working HOT 2
- Improving ChildBounty ID Allocation: An 8-Digit Solution Proposal HOT 13
- Document safety constraints of `pallet-scheduler` HOT 2
- Expose block hash in pallet-contracts HOT 4
- Feature: Add remark_call extrinsic to pallet-utilities HOT 2
- rpc: the arguments for api method are incorrect HOT 7
- [RPC-Spec-V2] Limit `chainHead_unstable_follow` subscriptions
- Bug Substrate tutorial --rpc-port <X> HOT 6
- Pruning doesn't have expected behavior HOT 13
- error: could not compile `sp-runtime` (lib) Cargo build error HOT 3
- error[E0583]: file not found for module `sys` HOT 4
- Staking Proxy can't register account as validator HOT 1
- asset-conversion pallet: persist a pool account id HOT 4
- Contracts migration v9 to v10 is failling HOT 6
- docker: Substrate Docker image error building binaries
- Avoid including bin/node/cli/src/cli.rs in build.rs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from substrate.