llfourn / bdk_core_staging Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 12.0 1.04 MB

Staging area for bdk_core initial development

Rust 100.00%

bdk_core_staging's People

Contributors

Stargazers

Watchers

Forkers

rajarshimaitra evanlinjin jbesraa afilini 22388o danielabrozzoni dhruv-1001 johncantrell97 laggintimes vladimirfomene notmandatory 5l1v3r1

bdk_core_staging's Issues

Demo CoinSelector usage in BDK proper

CoinSelector is nicely decoupled from the rest of the code and could be pasted into bdk and benefit it right now. This also allows us to make a key API change to BDK that will have to be done eventually anyway. Better to do it now rather than all at the same time later on. Also gives us feedback on the API.

warnings should fail ci

I was able to merge a PR with a warning. This should fail in CI.

Rename "Failure" and "Err" things to just "Error"

e.g. the idomatic name for InflateFailure is InflateError.

Implement updates that returns position in block and merkle proofs

We have built the internals to allow us to include position in block and even a full merkle proof for the transactoin in ChainIndex. Let's do it and show it in an example.

cc @evanlinjin a decent candidate for this would be creating a ChainGraph update from blocks fetched from rpc. It could be done for electrum too (see #77) because iirc getting the position is the same as getting the merkle root. This should be a different call though because you don't always want this.

Another thing worth considering is that including the position in block makes the ChainIndex a total order so we may need to extend that trait to tell us whether the index is a total order. This will allow us to make sure no two chain indexes are the same if it is a total order (i.e. no transactions are at the same height in the same block).

Implement transaction builder

Refer to meeting notes: https://hackmd.io/@evanlinjin/bdk_minutes_20221005

Add ChainGraph::invalidate_checkpoints

We added this to SparseChain but not chain graph but I think we need it directly on chain graph as well because we can't get a mutable ref to SparseChain from the chain graph.

Make SpkTxOutIndex::add_spk handle duplicate spks under different indexes

What should we do here? The simplest thing would be to return a bool or result indicating whether we ignored the request or not because it is already under another index.

"Real world" testing for transaction building and coin selection

Firstly, we need tests to check whether weight calculations are correct for coin selector. I would do this with different descriptors, and attempt to "sign" with "max satisfaction weight".

Generate a Selection result.
Construct tx with that result (that is signed).
Compare weight of final tx with calculations of the CoinSelector::weight* methods.

It will be awesome, if we can pass our raw tx to Bitcoin Core RPC (or something) to verify further.

We also need tests that ensure "minimum drain value" are correct, so that tx will not be rejected for having a dust output.

What other tests do we need?

Super-issue regarding transaction history consistency

As mentioned by @LLFourn in a set of messages, there are scenarios which we should handle:

When we evict a tx due to conflict, we should also evict the tx's children.
When we insert a tx without chain position that exposes 2 TxHeight::Unconfirmed txs have conflict, we should have a way to ensure consistency.

Also related:

We should implement CPFP feerate logic.

Contextual quotes

This is why I am concerned about just allowing arbitrary graph insertions.
We have to come up with an algorithm to always make txs in chain consistent with graph on every insertion.
-- @LLFourn

`SparseChain` API for invalidating checkpoints directly

Something like...

/// Invalidates checkpoints `from_height` (inclusive) and the displaced
/// transactions are moved to `TxHeight::Unconfirmed`.
pub fn invalidate_checkpoints_changeset(&self, from_height: u32) -> ChangeSet {
    todo!()
}

With this API, we can simplify the complexity of our RPC Chain Example (#89). We will not need the must_include variable within Client::emit_blocks.

Put bdk_file_store into bdk_chain

It only needs std and bincode v2.0.0. We can make bincode an optional dependency as I think implementing bincode traits directly would be a nice addition anyway.

The only thing I can see that we need to do is get rid of anyhow from bdk_file_persist (just return an std::io::Error) and to actually implement the bincode traits on everything. Some things might have to be done without derive since we might want to use consensus encode/decode for various bitcoin types.

Rename "ChainIndex" to "ChainPositoin"

We are calling this an "index" all over the place which is confusing because of all the indexes we have in the crate. Let's call it a ChainPosition (the position is the specific type of index it is) and use the type parameter P.

Fast indexing of UTXOs

Right now our APIs allow you to find unspent outputs by iterating over all outputs and checking if they're spent. This is fine for most applications but for users with hundreds of thousands of historical txos doing all this work just to get the balance is a bit much. How can we do better?

Things that I imagine would be useful:

What utxos are unspent
What utxos are only spent by unconfirmed transactions
What utxos that were unspent at a certain height

Remove "iter_" prefix for things

In std many things return an iterator without putting iter_ in front of it. Let's only put iter when we are creating some kind of custom iterator (e.g. generating a iterator for all spks of a descriptor) and want to bring attention to that.

API to get changeset from update to SparseChain

Right now we have a CheckpointCandidate which we can apply to a SparseChain. Given refactoring we can rename CheckpointCandidate to Update. When we apply an update we want the option to see the ChangeSet which should include all the txids added and all the txids removed (and their heights and whatever metadata we add).

This is useful so you can fetch the full transactions for the actual addition to the sparse chain and add them to the transaction graph. This changeset might also be useful for applying to other structures and data stores.

It should not be possible to change blockhash before agreement point

The following should not be possible.

bdk_core_staging/bdk_core/tests/test_sparse_chain.rs

Lines 292 to 304 in 35a3347

    
           #[test] 
        
           fn fix_blockhash_before_agreement_point() { 
        
               let chain1 = chain!([0, h!("im-wrong")], [1, h!("we-agree")]); 
        
               let chain2 = chain!([0, h!("fix")], [1, h!("we-agree")]); 
        
               assert_eq!( 
        
                   chain1.determine_changeset(&chain2), 
        
                   Ok(changeset! { 
        
                       checkpoints: [(0, Some(h!("fix")))], 
        
                       txids: [] 
        
                   }) 
        
               ) 
        
           }

Remove `from` in SparseChain ChangeSet

We can just look up the current value before applying the changeset if we want to know what from is.

ChainGraph should disallow coinbase transactions at an unconfirmed position

Because this is impossible.

Improve efficiency of `BnbLimit::Duration`

As @afilini mentioned in #23 (comment)

post-merge note: i don't have real numbers but i'm guessing the syscall to get the current time can easily take 10x more time than just running a bnb iteration (due to the context switch), at least on modern cpus.

i think it would make sense to check the time every n iterations, with n probably >= 100. worst case you overshoot a bit, but overall at least you'd spend most of the time actually doing work rather than waiting for a syscall to finish.

there's a catch, though: if you have a very very slow cpu and a lot of utxos you may end up overshooting by a lot. i'm not sure what's the solution here though, i don't think it's worth adding the complexity to measure how long a few iterations take to adjust the n parameter. currently the bnb does i think thousands of iterations and nobody ever complained, so n = 100 i think is more than reasonable imho.

Fuzz testing

Port @afilini's fuzzing work over: afilini/bdk@edf52c9

TODO: What else could we fuzz?

`BnbIter` improvements

Use bitcoin::Amount everywhere?

see comment bitcoindevkit/bdk#823 (comment)

One difficultly with this is I was planning to have the coin selection stuff not have any depends at all.

`CoinSelector` could be decoupled even further

The problem

Currently waste metric calculations (and associated parameters) are tightly coupled with CoinSelector and CoinSelectorOpt. For example, CoinSelectorOpt::long_term_feerate and CoinSelectorOpt::spend_drain_weight are waste-specific. Could we somehow have this in a separate struct?

Additionally, @csralvall suggested that we could try make excess strategies more flexible/extendable.

Potential solution

Ideally, after we decouple waste calculations, we would still want to be able to determine waste from both CoinSelector and Selection. This is because waste is considered during selection (for BnB), and is used to compare different selection results.

Personally, I think CoinSelector and Selection should be the same struct named CoinSelection and we use the "typestate pattern". This way weight and excess calculations/methods are shared between these two states, and we can perform waste calculations on both the unfinished and finished states.

CoinSelection<Unfinished> will have mutating methods.
CoinSelection<Finsihed> will only have non-mutating methods.

implement coin selection algorithms

Branch and bound at a minimum
Perhaps others inspired by bitcoind's wallet

We want to provide useful CS algorithms and make sure our CoinSelector API is composable enough to support them.

Futher notes

We should re-engineer finish so it doesn't decide whether we should use change or not. I may output different fee, weight and waste for drain vs non-drain. We want to do this so we can have more sophisticated change policy. (done)
We should provide a function that applies a particular drain policy with some usual setups. For example, there a heuristic that uses the magnitude of the outputs to decide which output is change. We could provide a way of applying a policy that selects more inputs to counter this. Of course we must provide a way of making the decision about whether there should be change or not.
Let's make a testing/bench-marking program to evaluate coin selection. We should look at computational performance per iteration and selection performance. (done)
We want an API that does drain minimization because we don't want to apply the drain policy in coin selection and it is useful to have drain minimization for cases where you don't mind adding an excess to the receiver's value.
We want an API that does minimization of a user defined value through a callback for example. For example a user could provide:
- A function which outputs waste in which case it would do something similar to the existing B&B
- A function which outputs excess (i.e. minimizes the selected amount)
- A function which simply outputs the number of inputs selected (minimizes number of selected)
- A function which scores different types of input SPKs badly (so you try your best to select from the same kind of SPK)
- A function which outputs some privacy metric (minimises privacy loss)
- A function which outputs a large number if the selection would include a change output (try to find changeless solution).
- ... and so on and so forth
Since B&B and other search solutions incrementally improve their selection we could return an iterator of selections that would start off returning sub-optimal selections but then give you better ones the more you call .next. This allows the user to control how much time is spent doing CS.

Even further todos

How do we filter out candidates with a negative "effective value"? (done)
Change error to struct, with enum to signal which constraint was the cause. needed should reflect how much more "effective value" is needed before a successful finish can be achieved.
finish should return different choices: no drain with excess to fees, no drain with excess to receiver, with drain. finish returns vec ordered by increasing waste. (done)

Allow optional recipient

CoinSelectOpt::target_value should be Option<u64>. This way, we can "drain all".

Update::merge should be error free

When merging two updates, if they disagree somewhere (or rather you cannot prove they agree on the blockhash at that height) the update should just truncate itself to the height hey agree on and remove transactions in the disputed height.

Remove conflicting transactions from mempool/unconfirmed

Right now we allow transactions in SparseChain to double spend if they are in the mempool. Or rather we cannot enforce this within the sparse chain itself since we assume that the txs that are confirmed are from valid blocks and therefore cannot possibly spend each other. The TxGraph however should be able to resolve this for us. In ChainGraph we should be able to enforce this policy on every update (the mempool evictions should go into the ChangeSet) since it also has access to a TxGraph.

I'm not sure what the API should be but you need to be able to:

Get unconfirmed txids from sparse_chain.
For each one, check what conflicts with them. If the conflict is confirmed then it goes into eviction set. Otherwise you tiebreak by fee (perhaps make this configurable).
Create a sparse_chain::ChangeSet of all removals

With chain_graph all of the above should happen automatically behind the scenes when you update so implementing it on ChainGraph might be the place to start to figure out what additional APIs you need on SparseChain and TxGraph (there might not be any).

Design and implement data persistence

It should be easy to serialize the state of each of our components
We should have nice APIs for common DB backends for storing and restoring TxGraph and SparseChain.
Perhaps you might want to also store your derivation index.
Try and use commit structure to make it fast to update the DB.

Update should allow you to insert partial transactions

consider that we have a partial transaction type in TxGraph called TxNode. Maybe we can make this public and use it here?

Tests for mempool consistency

We merged #74 but didn't really add any tests to properly enforce the new behaviors.

The main thing I can see is missing is testing that a transaction that was in a stale block which conflicts with an update transaction does not make it into the mempool.

I think #74 also introduced a bug where an update can evict a transaction in the chain without invalidating its block by having a new transaction that conflicts with it. We shouldn't allow this and this should mean the update is invalid. Test for this too.

Improve signature of `CoinSelectorOpt::fund_outputs`

pub fn fund_outputs(txouts: &[(Script, u64)], drain_output: &Script, ...)

This potentially will be better.

Make sure feerate/coin selection is correct

hint: it's off
use insights from bitcoindevkit/bdk#666 to fix.

Rename derive_next_unused next_unused

Since it doesn't always derive.

see #104 (comment)

Esplora and electrum should allow producing an update based on outputs rather than script pubkey

Right now the only way to find out if an output is spent is to query the APIs for a script pubkey. What if I only have an outpoint that I am interested in or I just want to check if a transaction is confirmed.

In addition to the iterator of script pubkeys we currently pass in we should allow iterators for:

OutPoint which would return the tx with the outpoint on it (if it can find it) and any tx spending from it
Txid which would return the transaction with this txid in the update if it exists.

Update should be opaque

Originally I had the fields as pub because I thought it might be nice to be able to serialize/deserialize it. I think it's more important that we exlcude invalid states from this thing such as transactions being higher than the highest checkpoint etc.

Consider doing this at the same time as #54

codecov is not set up properly

SpkTracker should let you rescan when adding new scripts

e.g.

spk_tracker.rescan(index_range, &tx_graph, &sparse_chain, 600_000)

Or maybe just scan?.
Where index range is anything that can IntoIterator they indexes of the tracker I guess.

Changesets should commit to previous changesets to avoid applying out of order

Now we have persistent storage we want to make sure that we apply changesets in order. One way to ensure this is that each changeset commits to the previous one. This can be checked by the persistent storage backend before applying it.

Things to consider:

When we add new transactions to the sparse chain we may want to atomically remove conflicts in the mempool as dictated by the TxGraph see #68. How do we make sure that these two things end up in the same changeset and avoiding having to return multiple changesets.

Enforce tx consistency in ChainGraph

Now that we have done #80 we always have a full tx whenever we have a txid in the SparseChain. In #74 we started enforcing that txs in ChainGraph don't double spend each other but only in determine_changeset. We are inconsistent about this though. We don't do it in insert_tx for example. This is an issue to discuss whether we want to do this across the whole structure. If we are not going to always enforcing it then we shouldn't just do it in determine_changeset. We should have a separate method like determine_changeset_evict_conflicts that does determine_changeset.

My preference is to keep it consistent across the whole structure.

Restructure chain data indexing

Here are our current components:

SparseChain: It indexes transactions and where they are in blocks.
SpkTracker: It indexes txouts by script pubkeys in whatever transactions you add to it.

Design goals

We want to be able to incrementally update SpkTracker with new data in the blockchain where complexity proportional to the changes (e.g. number of new transactions) since SpkTracker was last updated. This currently works but only in the case that new transactions are all later in the blockchain than when it was last updated.
We want to able to update the list of TXIDs in our TXID set in SparseChain without necessarily getting the full transaction data. Suggestion: split sprase chain into a (i) monotone structure which just stores and indexes transactions (but doesn't know which in the chain or not) and (ii) a non-monotone structure which keeps track of where transactions are in the chain but doesn't have full transaction data. We want to be able to quickly find which txids exist in the non-monotone structure but not in the monotone one.
SpkTracker currently has to scan over all the txouts to find which are unspent. We should be able to create an index of this. Preferably, this would be outside SpkTracker. Something like an UnspentIndex. UnspentIndex would be non-montone and would require the non-monotone blockchain data to generate.

Potential Solution

Do not rely on the order of transactions in updates like the checkpointing system does. Rather, just cache the updates to each structure and then when other structures need to sync to it they can recall which was the last update they saw i.e. like git commits and merging.

Potential Order

Purge codebase of internal checkpointing logic. CheckPointCandidate doesn't need to be changed here just the internal optimizing algorithms in SpkTracker and SparseChain.
Split SparseChain into monotone and non-monotone structures. Also change CheckPointCandidate to only have txids rather than full transactions.
Add commit like indexing from SpkTracker to monotone SparseChain and from monotone SparseChain to non-monotone SparseChain. (rename these monotone SparseChain could TxGraph).
Develop UnspentIndex.

Add clippy

Test case tracking issue

The development approach we're taking here is to test for correctness since we're still trying to figure out how the APIs should work.
In this issue we can track test cases we think we need but don't want to write since things are changing too quickly.

when applying an update that invalidates a block any conflicts in the update with transactions in that block should be ignored.
Getting new script pubkeys from a descriptor tracker that has wildcards should result in different scripts for each call

Esplora may return conflicting transactions which should be tiebroken by feerate

In general when inserting txs from esplora we should insert where unconfirmed conflicts are resolved by choosing the one with the higher feerate. This is because it may return a txid that has been fully evicted from the mempool. This is probably a feature where you can see a tx that has been rbf'd.

Setting checkpoint_limit should be available on anything that has a sparse chain

We've created a few different structures that have a SparseChain inside them but don't offer a way to set the checkpoint_limit.
You need to be able to set this. Alternatively you could try and design a syncing mechanism that doesn't need a checkpoint limit.

ChainGraph API for electrum

We want to have the chaingraph enforce that for every txid in sparse_chain there is a full tx in the graph. To do this we'll have to make those fields private and we'll have to have an error case when the changeset is applied.
Electrum needs to fetch incrementally. First it tells you txids you are interested in (associated with your script pubkeys) and then you batch fetch the transactions. We need an API to do this. It makes sense for electrum to return something like a SparseChain from the first requests for related txids to the scripts. We then need to turn the txids we're missing into full transactions in a ChainGraph that can be applied. Hint: maybe we can have a missing function on ChainGraph that somehow returns what's missing and then a method on the electrum API to take that set of missing things and turn it into a ChainGraph with a batch request.

`UpdateFailure::NotConnected` should also return "how to connect"

Add Coin Selection Algorithm

The CoinSelectOpt should have an enum with various algorithm options to use for coin selection.

Initial target is to cover what bdk already covers.. Largest first, Oldest first and Branch and bound..

This can be a potentially "good first issue" too..

	#[test]
	fn fix_blockhash_before_agreement_point() {
	let chain1 = chain!([0, h!("im-wrong")], [1, h!("we-agree")]);
	let chain2 = chain!([0, h!("fix")], [1, h!("we-agree")]);

	assert_eq!(
	chain1.determine_changeset(&chain2),
	Ok(changeset! {
	checkpoints: [(0, Some(h!("fix")))],
	txids: []
	})
	)
	}