paritytech / substrate Goto Github PK

Substrate: The platform for blockchain innovators

License: Apache License 2.0

Rust 98.40% Shell 0.19% Python 0.01% Dockerfile 0.02% WebAssembly 1.28% Nix 0.01% Handlebars 0.08% EJS 0.01% JavaScript 0.01%

parity polkadot blockchain substrate client node

substrate's Introduction

Dear contributors and users,

We would like to inform you that we have recently made significant changes to our repository structure. In order to streamline our development process and foster better contributions, we have merged three separate repositories Cumulus, Substrate and Polkadot into a single new repository: the Polkadot SDK. Go ahead and make sure to support us by giving a star ⭐️ to the new repo.

By consolidating our codebase, we aim to enhance collaboration and provide a more efficient platform for future development.

If you currently have an open pull request in any of the merged repositories, we kindly request that you resubmit your PR in the new repository. This will ensure that your contributions are considered within the updated context and enable us to review and merge them more effectively.

We appreciate your understanding and ongoing support throughout this transition. Should you have any questions or require further assistance, please don't hesitate to reach out to us.

Best Regards,

Parity Technologies

substrate's People

Contributors

Stargazers

Watchers

Forkers

guanqun soros1321 base58ed jingzhongxu allchain chrispaprocki bsjung reemardelarosa junhuac mikalv tomaka realgar suyanlong minxinping0105 memopo mggicisusa tempbottle tkcolorado apilipis iovblockchain viren2468 rogerwilcos jdetychey chevdor weeenzh azureatom anglinb zixuanzh premachb gnunicorn saiyulong lishengpeng github8577 songfj thewawar dongnanpro pierozi inthestorm tavakyan agryaznov samparsky cheme evictorcysh shamardy zero-down2018 chainx-org thogiti atenjin hicommonwealth flyingcarpet-network shawntabrizi nickemmons loxadim lbqds annygutierrez mmmika chaftacode albert19882016 benjaminbollen nearprotocol sriharikapu topabomb domsteil hatgit vinny2020 piotrpasich pkrasam phantomk ironman-jason kaliwdsn dapowerplay azban huangyuzhong jamesray1 drewstone nunofernandes-plight gphummer haydonluo webmaster128 proofofvitalik davekaj stefie lucianmincu torresashjian 0xthreebody kalanamith wrfcoin mixbytes graphprotocol eimfach aknuds1 lzdoscar renzo-blockchain r0qs alexzhenwang jupitermandy arikan mtdk1 kylexyxu jordy25519

substrate's Issues

Refactor Transaction through runtime & primitives

The primitives crate is a dependency of the runtime crate. Yet the Transaction type defined in primitives, semantically depends on the contents of runtime since it expresses all callable endpoints within runtime in a strongly typed manner.

This combination has a number of problematic side effects:

When new end-points are added, they are added not just to their native crate, but also to one if its dependencies, making no sense.
It's not enough to publicly export any types by a runtime module and exposed through a callable endpoint. You need to actually move that type up into the primitives crate. If the type has impl logic, then that must be moved too (even if it's highly specialised and not very "primitive") or some other acrobatics used to circumvent.

Furthermore, requiring a strongly-typed dispatch at all implies another facepalm: Some endpoints themselves proxy a further dispatchable "proposal", causing a self-reference that means a bare type in the enum cannot be used and further allocations are needed. i.e.

enum Proposal {
    ...
    StartPublicReferendum(Proposal, Format),
    ...
}

must become StartPublicReferendum(Box<Proposal>, Format),.

Aside from these specific side-effects which are cause pain right now, the general ramifications of this leaving this unfixed are a tendency towards spaghetti references and monolithic code.

There are three ways of going that I can see:

1a. Move Transaction (and all that depend on it, like Block) to runtime. This would keep the current types as they are, but leave primitives to be just the super-low-level types and runtime to be the crate to be imported if high-level typing was needed.
1b. Move Transaction (and all that depend on it, like Block) to some other module (e.g. highlevel). This would keep the current types as they are, but leave primitives to be just the super-low-level types. highlevel would depend on runtime and be the crate to be imported.
2. Avoid making Transaction typed around any runtime-dependent information. Transaction would be more like in Ethereum where the dispatch element is just a byte blob to be interpreted at (or just before) the time of dispatch, not when the transaction is being initially deserialised. This fixes all problems including the Proposal-within-a-Proposal issue.

My preference is for option 2, moving away from this attempt to bake the dispatch logic into the type system, which seems to be forcing such problems on us. Aside from the great view from the ivory tower, I see no great need to represent the dispatch data under strong types prior to the time of dispatch.

CC @rphmeier

Runtime block validation should check transactions root

Make block check transactions trie root
Implement trie root (either external or in wasm)

Client: Give runtime control over BFT round-proposer selection

It can't be handled effectively at the substrate level because different runtimes may use different randomness beacons, etc.

Replace all unwraps in runtime with `expect`

Runtime block validation should check state root

At present, the storage trie is not calculated during the execution of the runtime, so it's rather difficult to verify the storage root in the header. We'll have to introduce an external - calculate_storage_root or whatever, which forces the storage root to be evaluated.

Consolidate runtime primitives with mainline primitives

Currently there's two set of primitives - would be a lot nicer if there were just one.

RPC: Storage entry change query and notification pub/sub

Websockets/pub-sub RPCs should be expanded to allow RPC clients to track specific storage items to get notifications should they change. It should also be possible to efficiently query historical changes - getting which extrinsics changed a number of storage keys over a range of blocks.

-> state_queryStorage(keys: [ StorageKey, ... ], from: BlockHash, to: Option<BlockHash>) -> Result<QueryIndex, Error>: Query changes of a storage entry, possibly historical and potentially tracking real-time, asynchronously reporting.
- from: The block after which changes will be provided.
- to: If given, the block up to which changes will be provided. If not given, then notifications will track the head of the chain as it changes.
<- state_notifyStorage(query: QueryIndex, until: BlockHash, changes: Changes) Notify of a change that happened in block until. This is guaranteed to be more recent than any previously reported changes. NOTE: This doesn't cover reversions - we'd probably want a separate notification type for those.

Error may be:

Unknown (i.e. we don't recognise from or to)

Changes takes the form of a structure:

[ {
  block: BlockHash,
  changes: [ {
    keys: [ StorageKey, ... ]
  }, ... ]
}, ... ]

Base RPC

Network: Separate polkadot and substrate-specific networking

Parachain statement propagation, collation, fishermen network, etc. should be separated from the substrate-network crate.

Audit runtime for consensus issues between native & wasm

Check edge conditions in externalities are handled equivalently between native & wasm.
Check for platform-dependent gotchas (e.g. unguarded use of usize) between native & wasm.

Macro for constructing a high-level type-safe wrapper around substrate storage

Goal: never reference or load storage items using the key string directly. It is arcane, bug-prone, and unreadable.

usage:

Using a trait Storage:

trait Storage {
    // panic if the type is wrong for the key.
    fn load<T: Decodable>(&self) -> Option<T>;
    fn store<T: Encodable>(&mut self, value: T);
}

storage_declarations! {
    Authorities: List(":auth" -> AuthorityId), // creates something like current `KeyedVec` using prefix ":auth"
    Code: ":code" -> Vec<u8>, // creates a single value. stored under that key.
    ...
}

Authorities::len_key() -> &'static [u8];
Authorities::load_len(&Storage) -> u32
Authorities::key_for(n) -> Vec<u8>;
Authorities::load_from(&Storage, n) -> Option<AuthorityId>;
// ... KeyedVec-like API

Code::load_from(&Storage) -> Option<Vec<u8>>; // assumes a `Decodable` trait where the `<[u8] as Decodable>::Decoded = Vec<u8>`
Code::store_in(&mut Storage, &[u8]); 
Code::key() -> &'static [u8]; // for low-level usage.

crate substrate_storage would define all storage values used in substrate.
crate polkadot_storage would define all storage values used in polkadot.

The "load"/"store" API is a little annoying, so under runtime-support we would provide a Storage implementation that calls out to the externalities and a trait to provide helpers that are more ergonomic: i.e. a load() and store(T) function which are usable only within the runtime.

Usage in runtime:

// assuming these declarations:
storage_declarations! {
    Authorities: List(":auth" -> AuthorityId), // creates something like current `KeyedVec` using prefix ":auth"
    Code: ":code" -> Vec<u8>, // creates a single value. stored under that key.
    ...
}

// ...

Authorities::len_key() -> &'static [u8];
Authorities::len() -> u32
Authorities::key_for(n) -> Vec<u8>;
Authorities::load(n) -> AuthorityId;
// ... KeyedVec-like API

Code::load() -> Option<Vec<u8>>; // assumes a `Decodable` trait where the `<[u8] as Decodable>::Decoded = Vec<u8>`
Code::store(&[u8]); 
Code::key() -> &'static [u8]; // for low-level usage.

Pricing of relay-chain transactions

Introduce metering for public transactions (staking, destaking, transfers).

Runtime: Avoid panics in apply_extrinsic

NOTE: This is specific to the Polkadot/Demo implementations, NOT Substrate.

The native runtime should be able to just the validity of including an extrinsic upon only the information in the extrinsic and basic balance/index information of the sender account. This makes the implementation of the tx queue more straightforward and efficient and also helps make clear arguments against DoS vectors.

Basically, the only conditions upon which apply_extrinsic may panic are:

the free balance of the sender account is less than cost_xt_basic + cost_xt_byte * xt.encode().len(); or
the index of the sender account is not equal to xt.index.

If these "panic" conditions are not met then apply_extrinsic must never panic. To panic thereafter would cause a DoS vector for the miner at best, and will cause the miner to create invalid blocks at worst.

Directly it is determined that apply_extrinsic will not panic, the balance should be reduced by the fee and the sender index incremented. Any further "higher-level" criteria that are not met (and would thus cause a panic in the current code) should be reworked to ensure they return instead without changing any storage items (except, of course, the balance reduction and index increment).

RPC for submitting (signed) transaction.

Currently there's no way for the external environment to get a transaction into the queue.

Full block production

Requires #7 and #8: We can then add dummy parachains, collators, and then have validators vote on proposals and seal them online.

Transaction fees

Otherwise it will be spammed.

Genesis block

Need a genesis block. Should include initial set of validators/session keys, together with the initial code, compiled from the runtime.

Proper N-PoS implementation.

Current staking algorithm is very basic; we want something much more like the N-PoS scheme in the PDP.

Client: Structured logging

Eventual aim: a highly detailed version of ethstats.net with clients (optionally!) directly contacting a central web server to provide it with real-time information that is collated and served to web pages.

This is a two-part project; one part is fitting the appropriate logging into the client in order to connect and stream JSON information on the client's operational statistics to a server. The second part is writing such a web app; the server part of the app would receive and collate this information in real time from many polkadot clients and then distribute the resulting information to web-browsers for display.

This issue only describes the first part.

Implementation

slog can provide the structured logging API. This should be combined with a lazy_static and a simple macro in order to get a global logging macro, much like trace! from the log crate except that it accepts key/value pairs rather than a formatted string.

This macro should be used throughout the client for all key events (block arrived from network/queued/validated/imported, transaction(s) submitted/arrived/mined, peer connected/disconnected, ...).

The output of the structured log should be directed to a JSON encoder and then sent via a websockets connection to a server (address/port configurable via CLI params ala polkadot --stats-server=ws://stats.polkadot.io). On opening the websockets connection, an initial dump of the nodes state should be made (current chain head number/hash, peers, transactions in the pool).

Full client import and API

BFT Consensus + Empty Block Production

Initially without staking
Dynamic ValidatorSet
Something like Aura

For Polkadot: WASM-based smart-contract parachain

Can link in relevant code from https://github.com/paritytech/parity

BlockData:

256 recent headers
parachain transactions
state trie proof

Validation function:

check header validity (mostly just timestamp)
apply ingress prior to transactions
apply transactions

Collator:

create valid header
apply ingress
push transactions from queue until out of gas or transactions.
best choice of gas amount?

Record proposals for live rhododendron sessions in the DB

To prevent accidental double-propose when going offline for a short period.

DB holds a mapping equivalent to a HashMap<parent_hash, Vec<(round number, proposal)>>.

When proposing at a round k on top of a given parent hash, check if we already proposed at this round and don't create a new one. Otherwise, place the new proposal in the mapping and commit to disk.

When importing a block on top of parent_hash, clear all recorded proposals based on it as they are no longer relevant.

Client: Light client import and API

Consensus for PoC-1

The PR to the spec is the best resource for the description of the consensus algorithm w3f/polkadot-spec#11

Slashing: on-chain evaluation of misbehavior reports from the BFT and statement table subsystems

...and automatically generate transactions with any witnessed misbehavior.

Should be simple enough:

Each misbehavior report must be accompanied by some security bond.
Each report contains a validator who misbehaved, the block hash they were building on when they misbehaved, and the proof of misbehavior
A report is valid iff the validator was a validator at the given block hash and the misbehavior given is true.

BFT misbehavior reports could be managed at the substrate level.
Parachain Statement Table misbehavior can only be managed in the polkadot runtime -- and will often require proofs of group membership at a block in recent history. We will need to make sure that all duty rosters within some range of history are computable.

Protocol: Light-client friendly storage tracking

Use-cases of a node from an external application's point of view fall into two categories: inspection and notification. Light clients, which sync using only the chain of headers and do not generally validate extrinsic data within the block, must have special considerations to ensure they are able to provide both use-cases efficiently.

Inspection (of storage or the chain) is pretty easily provided following the design of Ethereum and Bitcoin before it: full nodes (or even light-nodes that already have the requisite information) may provide proofs on the value of a particular key in storage using only the storage's Merkle-trie root as a priori assumption (which is provided for through a header-sync).

However, notification is more difficult to provide as a low-trust service since proof that a given block does not have an extrinsic which causes a state-change of interest is not efficiently derivable from the storage's Merkle-trie roots. In Ethereum this was addressed through collating a large (2KB) Bloom filter in each block and embedding it in the header. This bloated the header and was ultimately fruitless as the usage of Ethereum ballooned and the Bloom became saturated.

Instead, I propose three mechanisms for addressing state-change notification on Substrate, two of which (the latter two) it makes sense to combine:

Track last-modification-time entry in storage;
Provide a Merkle trie root of all modified storage entries, ordered and indexed;
Provide a hierarchy of Merkle trie roots of all modified storage entries in a series of blocks, ordered and indexed.

Track last-modification-time entry in storage

At present, the storage database is a set of key-value pairs. These pairs are arranged into a Merkle trie and baked into a single "root" hash. This proposal would simply prefix the value with the block number at which the value was last modified.

Synced light-clients could easily query proof-servers on when a storage item of interest last changed. Proof-servers could prove the most recent block (compared to either the head or some block before the head that the light-client knows about) at which the change happened. Light-clients could request a change-log between some begin and end block of one or more storage keys and the proof-server would return a chain of these proofs as irrefutable evidence of all blocks in which one or a number of storage entries changed.

There is one issue with this approach: deleted storage entries would still have a footprint in the database, necessary for recording the block at which it was "last modified" (i.e. deleted) - without this light-clients would lose their ability to query for its historical change log. The (slightly inelegant) workaround to this would be to have special "garbage-collection" blocks in which these zombie entries would be purged from the database (and thus the trie). Light-clients would ensure that they always made at least one change-log request within each of these periods.

This would increase every storage entry in the database by around 32-bytes (for the block number). It wouldn't have much of an effect on the disk i/o and the header size would remain the same. However, for storage with a lot of changes, building and executing these garbage-collection blocks may become a serious efficiently issue.

Ordered, indexed, Merklised per-block change-trie

This proposal creates a new structure that encodes all changes, not dissimilar in spirit to the Bloom filter. This structure takes the form of a trie root build as the mapping of indices to storage keys. The indices are sequential and ordered by storage key. Like the Bloom filter this gives a cryptographic digest of what has changed in the block. Unlike the Bloom filter, a proof that any given key has not changed is not only possible, but also compact.

To provide the proof that a given key didn't change, the proof-server provides the two (Index, Key) entries on either side of the key to be proven was not changed. (A null sentinel index of (ChangedKeyCount, null) would denote the upper end in order to provide a proof should the queried key be greater than the upper limit of modified keys.)

In principle, this trie could also contain a second mapping of (Key, [ ExtrinsicIndex_1, ExtrinsicIndex_2, ... ]) to denote which extrinsic data in the block actually caused the key to be changed.

Extending over ranges

While this allows for efficient proofs that one or more keys were not changed (or, if they were changed, can give the specific extrinsics which caused the change) in any given block, use-cases typically want to ascertain this for a range of blocks.

This structure, however, lends itself to a hierarchical approach: event N blocks, the trie would contain an additional entry ('digest', DigestChangeTrieRoot). DigestChangeTrieRoot would be the root of a similar trie structure, except that it would contain the accumulated modified accounts of the previous N blocks. Rather than containing the series of ExtrinsicIndexs that caused the change of any given key, it would contain the block numbers that caused the change, allowing for the efficient identification of the exact extrinsic through a logarithmic number of queries/proofs.

This structure can be nested and recursed arbitrarily; N might reasonably be either 16, 32 or 256 and be recursed 4, 3, or 2 times accordingly to get a maximum block range that the top-level trie covered to be 32768 or 65536.

Light clients would query proof-servers in batches, hopping over the blocks by the top-level range at a time. Proof-servers would either return with proofs that nothing of interest changed or would return with the sub-ranges where something did change (along with the key that changed there). Light-client would re-query within that subrange, drilling down until they determined the exact set of extrinsics. In principle, this entire query could be prepared on the server-side and a compiled proof of everything built and sent back to the light-client with minimal bandwidth/latency used.

Runtime: Fix broken blanket implementation of `MaybeEmpty` in Runtime primitives

#105 (comment)

Related: #184

RPC: Extrinsic submission & inclusion notification pub/sub

Websockets/pub-sub RPCs should be expanded to allow RPC clients to submit extrinsics and get full lifetime notifications of them.

-> author_submitExtrinsic(xt: Vec<u8>) -> Result<ExtrinsicHash, Error>
<- author_extrinsicUpdate(xt: ExtrinsicHash, status: Status)

Error may be:

InvalidFormat (i.e. it's plain old invalid and will never become valid)
Dead (i.e. it was once valid but has now become invalid)
Immature (i.e. it's currently invalid and while it may become valid at some point, that's too far ahead to care about)
PoolFull (i.e. there's no room at the inn)
~~AlreadyKnown~~ If we already know about it but yet it's valid for inclusion, then it's not an error - we carry on as before and trace the pre-existing extrinsic instead.

Status may be:

Finalised(BlockHash) (it's finalised and all is well)
Usurped(ExtrinsicHash) (some state change (perhaps another extrinsic was included) rendered this extrinsic invalid)
Broadcast(Vec<PeerId>) (it has been broadcast to the given peers) version 2.0 only

Refactor consensus code into polkadot/substrate components

The BFT code just needs a couple of things:

Messages in
Messages out
Sign message with local key
Round proposer
Round timeouts
Proposal generation function
Proposal evaluation function

The proposal generation and evaluation functions should encapsulate the behavior of the specific substrate chain. This can be packaged up into its own trait.

We can package this up in a substrate-bft crate.

polkadot-statement-table crate:
The details of the proposal generation/evaluation function are what encapsulates the current statement table logic. This needs

Misbehavior type
Import of incoming statements
Incoming statements to trigger fetch of candidate data for evaluation of availability
Signing statements
Creation of batches of outgoing messages (to be done on a timer or by some other heuristic)

polkadot-consensus crate:

service built on top of the substrate-network
maintains connections to authorities
combined router for substrate-bft and polkadot-statement-table messages
manages local consensus identity and signing of messages
collates local parachain candidate
fetches and evaluates other candidates as necessary
creates full substrate block from statement table and transaction queue
accumulates misbehavior to be evaluated on-chain

Refactor into substrate and polkadot-relay

Rough gameplan:

Unpick native-runtime from substrate-executor (use native impl_stubs! macro - current a no-op, to generate the code || match method { "execute_block" => safe_call(|| runtime::execute_block(&data.0)),... and this function, along with the wasm is equates to in as a static-dispatch). (#62)
Refactor/pick-apart polkadot-client into a generic substrate-client and a Polkadot-specific polkadot-client. (#62)
Rework polkadot-rpc to depend only on the generic (perhaps trait?) substrate-client and rename to substrate-rpc (and rename polkadot-rpc-servers -> substrate-rpc-servers). (#62)
Rework polkadot-network to create a generic substrate module substrate-network (the relay-chain sync code plus a peer-network overlay, basically) that can be used by substrate-client and extended into a polkadot-specific module polkadot-network capable of handling polkadot network messages (parachain candidate selection &c.) and functionality (parachain peer-pre-connections).
Consider renaming substrate::transactions to substrate::extrinsics to reflect the fact that the data is completely generic and may not have features typical of transactions.

Runtime: "DAO"/community funding manager

Currently the staking mechanism doesn't pay out a reward. Once it does, then there will be a counterpart reward paid into the network funding bucket. This network funding bucket may be tapped by ecosystem members and payments be made to those that are approved.

Main things to consider:

Are payouts made in a one-off ad-hoc fashion or batched into monthly budgets?
Is the capitalisation of the bucket at a fixed rate (relative to the validator payout) or adaptive?
How does a payment get ratified?

For Polkadot: Optimize candidate statement routing

Done naively in #34

Integrate networking with consensus protocol

Parachains: Minimal Parachains Framework

Validators collate, evaluate, and ensure availability of parachain candidates.
Misbehavior which can be slashed
No messaging yet; parachains are completely isolated.

Integrate consensus with the state

Integrate consensus with the state:

~~determination of roles taken by validators (in terms of grouping and primary selection)~~ #55
collation of local parachain candidate
creation of a relay chain block from candidates with enough requisite votes
verification that the candidates in a proposed block have enough requisite votes

Determination of roles taken by validators (in terms of grouping and primary selection)

Create Block/Header types definition

Client: Consider using `bytes` or `Arc<Vec<u8>>` for DB to get rid of allocations.

Contracts traits.

Key management for validators

Validator nodes will need to store their master keys persistently. Session keys can be derived from the master key and session index. This will be done in the native blockchain-specific (i.e. not substrate) code.

Client: Have polkadot-api handle changes in runtime code more smoothly

Build: Avoid checking in compiled WASM artifacts

They should be produced during the build process.

I don't think rustc has reproducible builds, so we should make it possible to provide user-specified WASM sources at build-time.

Runtime: Recombine into Polkadot

Ditch the runtime in favour of something equivalent to demo/runtime.
Remove any Log/Header/Block types and use the specialised versions of the runtime primitives generics.
Reintegrate staking/slashing logic.

Governance: Delayed enactments

Proposals should generally suffer some delay after the vote is finalised before they get enacted in order to allow for any tokens to change hands. This delay should either be at the council's discretion, according to the level of contention it generates on the council or, according to the level of contention it generates on the network. For the most contentious motions that get passed, a sufficiently long period should be left before enactment in order that stakers are able to disengage and sell funds.

Do proper deduplication of consensus messages in the network layer

...not by hashing untrusted JSON representation.

(not a blocker for PoC-1)

Basic relay-chain block authoring

Some logic to allow validators to:

collect transactions; and
create new blocks.

Derivable Codec

Split traits into Encodable, Decodable or something similar so we can serialize borrowed/unsized data and deserialize into borrowed data.
custom-derive

Parachains: Parachain definition, representation, and execution

Network Protocol

Initially focused just around synchronization.