Giter Site home page Giter Site logo

Comments (13)

rphmeier avatar rphmeier commented on May 3, 2024

This structure takes the form of a trie root build as the mapping of indices to storage keys. The indices are sequential and ordered by storage key

I don't understand this. Is it a mapping of storage_key => ?? or i => storage_key if storage_key was the ith change in the block? In that case, how can it be ordered by storage key? For succinct proofs that something hasn't changed we should just be able to get a proof of non-inclusion if the mapping is keyed by storage key.

Given that I'm having trouble parsing the above, I might be completely off-base here, but it sounds like

A null sentinel index of (ChangedKeyCount, null)

is establishing an upper bound on the number of changes we track per block.
That would defeat the purpose since it would allow plausible omission of data to light clients.

If the changes trie-root mapped key => last_changed where last_changed is the block number it was changed at before, we could request proof of all changes up to some point in history. All of this still relies on the assumption that clients will mostly want to read specific storage entries as opposed to querying functions.

w.r.t. the footprint of deleted entries, that is also true. Given that our security model is weakly subjective we should make the cleanup period fairly long and in accordance with the withdrawal period of a few months or so. I don't think this poses a DoS vector any more so than transfers of 1 stake-unit to tons of unreachable accounts would, although we should also consider garbage collection and rent in that case. Then the question becomes "what's the minimum fee someone has to pay to impose a storage burden of 1 cleanup period"?

from substrate.

gavofyork avatar gavofyork commented on May 3, 2024

So suppose that there's a block with a single extrinsic sender = Alice, call = transfer(Bob, 69). There are three changes to storage: twox_128(':staking:balance:<Alice>'), twox_128(':staking:balance:<Bob>') and twox_128(':sys:index:<Alice>'). Let these three expressions evaluate to 0x1111..., 0x2222... and 0x3333... respectively.

Then we build a trie with four key-value pairs:

  • (0, 0x1111...)
  • (1, 0x2222...)
  • (2, 0x3333...)
  • (3, 0x)

The keys are just formed by ordering the values and running up from 0. There's a final value to denote the end. If a light-client wished to have proof that Charlie's balance didn't change, they would determine that entry in storage twox_128(':staking:balance:<Charlie>'), (which, for argument's sake evaluates to 0x1234...) then they'd ask a proof-server for a proof of the existence or non-existence of that node in the trie whose root they know via header-sync. An existence proof is trivial. A non-existence proof is also simple: the proof-server would reply with the witness statements for the two entries either side of that value ((0, 0x1111...), (1, 0x2222...)). Since they have consecutive index keys (i.e. 0 and 1), and since the associated storage keys sandwich the storage key we're interested in, and also since we know all index keys are bestowed sequentially on the ordered storage keys, then we can conclude that no storage entry was changed whose key is > 0x1111... and < 0x2222... in this block, including 0x1234... which we're interested in.

from substrate.

rphmeier avatar rphmeier commented on May 3, 2024

I see, that makes sense to me now.

I'm not sure that I would necessarily call what's described a compact proof of non-existence. Given an ordinary merkle mapping from A => B, you can prove non-existence of an element a with a single merkle branch (which proves that the lookup terminates before reaching a value). These proofs require two, although they will often share a long common prefix.

The main thing to consider is the complexity of producing such a proof. Doing a binary search from 0 to ChangedKeyCount and finding the ChangedKeyCount will take (log(n))^2 + log(n) db accesses. Evaluating it is only log(n) since we can specify the indices to look at. Creation of the trie is even more expensive, since we need to accumulate all the keys and then sort them before making the trie. This all might end up pretty costly when lots of keys have been changed. With a regular merkle trie, proofs of existence and non-existence will take only log(n) time to create and evaluate. The sorting proof approach does allow for fairly compact proofs of non-alteration for arbitrarily large ranges of keys, but it's difficult to take advantage of that with hashed storage keys.

It's not light-client friendly to require fetching them every block while tracking changes. I'd suggest that the ∆-trie map idx => (key, last_changed) or simply key => last_changed so a light-client starting with a merkle proof of a storage value can get existence proofs only when something changes and be confident that nothing was omitted. This is a sore point in light clients for other blockchains. We don't even have to synchronize the whole header chain if we are subscribed to clients who will push us those which contain relevant changes and we are aware of a validator set persisting up to some point in the future. Then we only need relevant headers and corresponding ∆-trie proofs pushed to us.

If the last_changed for each key is stored in the state trie, extending over hierarchical ranges is only necessary to get over a range if we assume old states to be unavailable, which is plausible. If historical state is available, we can just take hops backwards from the current last_changed to the point in history we are interested in.

from substrate.

gavofyork avatar gavofyork commented on May 3, 2024

That idea is covered under the first proposal "Track last-modification-time entry in storage". As mentioned above, keys that were deleted pose a problem. Indeed there is no need for the ordered/indexed Merkle tree for a compact proof of non-existence, but the point about ranges and specification of which items in the range are of interest still stands.

from substrate.

arkpar avatar arkpar commented on May 3, 2024

Could this be solved on the higher level instead? E.g. store a second nonce value with the account which gets incremented every time the balance is changed. Do we really need tracking for all key/values?

from substrate.

gavofyork avatar gavofyork commented on May 3, 2024

In general, yes, since much of substrate's logic will depend on being able to track arbitrary storage items, not simply the specific account items.

from substrate.

NikVolf avatar NikVolf commented on May 3, 2024

You might also look at SMT:
https://eprint.iacr.org/2016/683.pdf

from substrate.

gavofyork avatar gavofyork commented on May 3, 2024

@rphmeier any more to say on this matter? Given the issue with finding non-existence when using "Track last-modification-time entry in storage" strategy, my preference is now firmly with the "Ordered, indexed, Merklised per-block change-trie" proposal, albeit taking into account your point that ordering isn't required for non-existence proofs.

from substrate.

rphmeier avatar rphmeier commented on May 3, 2024

@gavofyork what is the ordering/indexing useful for then? in a state with hashed keys the odds of finding a useful yet compact run of keys is tiny. The implementation and computational complexity of building the hierarchical modification tries will require a lot of overhead on full nodes.

Tracking last-modification time makes the most sense to me, since the approach of clearing zombie keys is actually pretty easy. We can maintain a separate set of del:* -> # keys for those which are deleted (# = Number, * = some other key), and wipe those every N blocks (the purpose of moving them to a common-prefixed set is to make deletion fast). We just have to check for a del:* entry when creating a value in a new slot.

from substrate.

rphmeier avatar rphmeier commented on May 3, 2024

Quoting from paritytech/polkadot#425 (comment)

In the current model, we only need to test existence of the key in the Merkle tree; if it's there we don't actually need to load the value to know that the proposal "exists". In this new model that's not enough since it might be there but in a zombie "deleted" state - we'll have to load the value (at least the first 4 bytes) to check whether it really is in the trie. That may be problematic to do efficiently without incurring the full cost of the 700KB load.

By separating the deletion tracker we only have to probe for existence as well. Although this might be a moot point, because in practice in substrate, querying existence of a key in the trie actually incurs a load of the value as well, as the postfix of the key is stored within the leaf node itself, along with the value.

from substrate.

gavofyork avatar gavofyork commented on May 3, 2024

i really dislike zombie keys and latent cleanups. i think there's enough information on why "Ordered, indexed, Merklised per-block change-trie" extended over ranges is useful and i really don't want to rehash the same points here. i suggest we take this offline and come to consensus.

from substrate.

rphmeier avatar rphmeier commented on May 3, 2024

Extending over hierarchical ranges of blocks is useful, but there is no information in this discussion about a proposed use-case of a range-modification proof in a state which makes use of hashed keys, or a fast/storage-efficient algorithm for computing the hierarchical change-set.

from substrate.

svyatonik avatar svyatonik commented on May 3, 2024

Merged in #628

from substrate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.