llfourn / bdk_core_staging Goto Github PK
View Code? Open in Web Editor NEWStaging area for bdk_core initial development
Staging area for bdk_core initial development
CoinSelector is nicely decoupled from the rest of the code and could be pasted into bdk and benefit it right now. This also allows us to make a key API change to BDK that will have to be done eventually anyway. Better to do it now rather than all at the same time later on. Also gives us feedback on the API.
I was able to merge a PR with a warning. This should fail in CI.
e.g. the idomatic name for InflateFailure
is InflateError
.
We have built the internals to allow us to include position in block and even a full merkle proof for the transactoin in ChainIndex
. Let's do it and show it in an example.
cc @evanlinjin a decent candidate for this would be creating a ChainGraph
update from blocks fetched from rpc. It could be done for electrum too (see #77) because iirc getting the position is the same as getting the merkle root. This should be a different call though because you don't always want this.
Another thing worth considering is that including the position in block makes the ChainIndex
a total order so we may need to extend that trait to tell us whether the index is a total order. This will allow us to make sure no two chain indexes are the same if it is a total order (i.e. no transactions are at the same height in the same block).
Refer to meeting notes: https://hackmd.io/@evanlinjin/bdk_minutes_20221005
We added this to SparseChain
but not chain graph but I think we need it directly on chain graph as well because we can't get a mutable ref to SparseChain
from the chain graph.
What should we do here? The simplest thing would be to return a bool or result indicating whether we ignored the request or not because it is already under another index.
Firstly, we need tests to check whether weight calculations are correct for coin selector. I would do this with different descriptors, and attempt to "sign" with "max satisfaction weight".
Selection
result.CoinSelector::weight*
methods.It will be awesome, if we can pass our raw tx to Bitcoin Core RPC
(or something) to verify further.
We also need tests that ensure "minimum drain value" are correct, so that tx will not be rejected for having a dust output.
What other tests do we need?
As mentioned by @LLFourn in a set of messages, there are scenarios which we should handle:
TxHeight::Unconfirmed
txs have conflict, we should have a way to ensure consistency.Also related:
This is why I am concerned about just allowing arbitrary graph insertions.
We have to come up with an algorithm to always make txs in chain consistent with graph on every insertion.
-- @LLFourn
Something like...
/// Invalidates checkpoints `from_height` (inclusive) and the displaced
/// transactions are moved to `TxHeight::Unconfirmed`.
pub fn invalidate_checkpoints_changeset(&self, from_height: u32) -> ChangeSet {
todo!()
}
With this API, we can simplify the complexity of our RPC Chain Example (#89). We will not need the must_include
variable within Client::emit_blocks
.
It only needs std
and bincode v2.0.0
. We can make bincode an optional dependency as I think implementing bincode traits directly would be a nice addition anyway.
The only thing I can see that we need to do is get rid of anyhow
from bdk_file_persist
(just return an std::io::Error
) and to actually implement the bincode traits on everything. Some things might have to be done without derive since we might want to use consensus encode/decode for various bitcoin types.
We are calling this an "index" all over the place which is confusing because of all the indexes we have in the crate. Let's call it a ChainPosition (the position is the specific type of index it is) and use the type parameter P
.
Right now our APIs allow you to find unspent outputs by iterating over all outputs and checking if they're spent. This is fine for most applications but for users with hundreds of thousands of historical txos doing all this work just to get the balance is a bit much. How can we do better?
Things that I imagine would be useful:
In std
many things return an iterator without putting iter_
in front of it. Let's only put iter
when we are creating some kind of custom iterator (e.g. generating a iterator for all spks of a descriptor) and want to bring attention to that.
Right now we have a CheckpointCandidate
which we can apply to a SparseChain
. Given refactoring we can rename CheckpointCandidate
to Update
. When we apply an update we want the option to see the ChangeSet
which should include all the txids added and all the txids removed (and their heights and whatever metadata we add).
This is useful so you can fetch the full transactions for the actual addition to the sparse chain and add them to the transaction graph. This changeset might also be useful for applying to other structures and data stores.
The following should not be possible.
bdk_core_staging/bdk_core/tests/test_sparse_chain.rs
Lines 292 to 304 in 35a3347
We can just look up the current value before applying the changeset if we want to know what from
is.
Because this is impossible.
As @afilini mentioned in #23 (comment)
post-merge note: i don't have real numbers but i'm guessing the syscall to get the current time can easily take 10x more time than just running a bnb iteration (due to the context switch), at least on modern cpus.
i think it would make sense to check the time every n iterations, with n probably >= 100. worst case you overshoot a bit, but overall at least you'd spend most of the time actually doing work rather than waiting for a syscall to finish.
there's a catch, though: if you have a very very slow cpu and a lot of utxos you may end up overshooting by a lot. i'm not sure what's the solution here though, i don't think it's worth adding the complexity to measure how long a few iterations take to adjust the n parameter. currently the bnb does i think thousands of iterations and nobody ever complained, so n = 100 i think is more than reasonable imho.
Port @afilini's fuzzing work over: afilini/bdk@edf52c9
see comment bitcoindevkit/bdk#823 (comment)
One difficultly with this is I was planning to have the coin selection stuff not have any depends at all.
Currently waste metric calculations (and associated parameters) are tightly coupled with CoinSelector
and CoinSelectorOpt
. For example, CoinSelectorOpt::long_term_feerate
and CoinSelectorOpt::spend_drain_weight
are waste-specific. Could we somehow have this in a separate struct?
Additionally, @csralvall suggested that we could try make excess strategies more flexible/extendable.
Ideally, after we decouple waste calculations, we would still want to be able to determine waste from both CoinSelector
and Selection
. This is because waste is considered during selection (for BnB), and is used to compare different selection results.
Personally, I think CoinSelector
and Selection
should be the same struct named CoinSelection
and we use the "typestate pattern". This way weight and excess calculations/methods are shared between these two states, and we can perform waste calculations on both the unfinished and finished states.
CoinSelection<Unfinished>
will have mutating methods.
CoinSelection<Finsihed>
will only have non-mutating methods.
We want to provide useful CS algorithms and make sure our CoinSelector
API is composable enough to support them.
.next
. This allows the user to control how much time is spent doing CS.needed
should reflect how much more "effective value" is needed before a successful finish can be achieved.finish
should return different choices: no drain with excess to fees, no drain with excess to receiver, with drain. finish
returns vec ordered by increasing waste. (done)CoinSelectOpt::target_value
should be Option<u64>
. This way, we can "drain all".
When merging two updates, if they disagree somewhere (or rather you cannot prove they agree on the blockhash at that height) the update should just truncate itself to the height hey agree on and remove transactions in the disputed height.
Right now we allow transactions in SparseChain
to double spend if they are in the mempool. Or rather we cannot enforce this within the sparse chain itself since we assume that the txs that are confirmed are from valid blocks and therefore cannot possibly spend each other. The TxGraph
however should be able to resolve this for us. In ChainGraph
we should be able to enforce this policy on every update (the mempool evictions should go into the ChangeSet
) since it also has access to a TxGraph
.
I'm not sure what the API should be but you need to be able to:
sparse_chain
.sparse_chain::ChangeSet
of all removalsWith chain_graph
all of the above should happen automatically behind the scenes when you update so implementing it on ChainGraph
might be the place to start to figure out what additional APIs you need on SparseChain
and TxGraph
(there might not be any).
TxGraph
and SparseChain
.consider that we have a partial transaction type in TxGraph
called TxNode
. Maybe we can make this public and use it here?
We merged #74 but didn't really add any tests to properly enforce the new behaviors.
The main thing I can see is missing is testing that a transaction that was in a stale block which conflicts with an update transaction does not make it into the mempool.
I think #74 also introduced a bug where an update can evict a transaction in the chain without invalidating its block by having a new transaction that conflicts with it. We shouldn't allow this and this should mean the update is invalid. Test for this too.
pub fn fund_outputs(txouts: &[(Script, u64)], drain_output: &Script, ...)
This potentially will be better.
hint: it's off
use insights from bitcoindevkit/bdk#666 to fix.
Since it doesn't always derive.
see #104 (comment)
Right now the only way to find out if an output is spent is to query the APIs for a script pubkey. What if I only have an outpoint that I am interested in or I just want to check if a transaction is confirmed.
In addition to the iterator of script pubkeys we currently pass in we should allow iterators for:
OutPoint
which would return the tx with the outpoint on it (if it can find it) and any tx spending from itTxid
which would return the transaction with this txid in the update if it exists.Originally I had the fields as pub
because I thought it might be nice to be able to serialize/deserialize it. I think it's more important that we exlcude invalid states from this thing such as transactions being higher than the highest checkpoint etc.
Consider doing this at the same time as #54
e.g.
spk_tracker.rescan(index_range, &tx_graph, &sparse_chain, 600_000)
Or maybe just scan
?.
Where index range is anything that can IntoIterator
they indexes of the tracker I guess.
Now we have persistent storage we want to make sure that we apply changesets in order. One way to ensure this is that each changeset commits to the previous one. This can be checked by the persistent storage backend before applying it.
Things to consider:
Now that we have done #80 we always have a full tx whenever we have a txid in the SparseChain
. In #74 we started enforcing that txs in ChainGraph
don't double spend each other but only in determine_changeset
. We are inconsistent about this though. We don't do it in insert_tx
for example. This is an issue to discuss whether we want to do this across the whole structure. If we are not going to always enforcing it then we shouldn't just do it in determine_changeset
. We should have a separate method like determine_changeset_evict_conflicts
that does determine_changeset
.
My preference is to keep it consistent across the whole structure.
Here are our current components:
We want to be able to incrementally update SpkTracker
with new data in the blockchain where complexity proportional to the changes (e.g. number of new transactions) since SpkTracker
was last updated. This currently works but only in the case that new transactions are all later in the blockchain than when it was last updated.
We want to able to update the list of TXIDs in our TXID set in SparseChain
without necessarily getting the full transaction data. Suggestion: split sprase chain into a (i) monotone structure which just stores and indexes transactions (but doesn't know which in the chain or not) and (ii) a non-monotone structure which keeps track of where transactions are in the chain but doesn't have full transaction data. We want to be able to quickly find which txids exist in the non-monotone structure but not in the monotone one.
SpkTracker
currently has to scan over all the txouts to find which are unspent. We should be able to create an index of this. Preferably, this would be outside SpkTracker
. Something like an UnspentIndex
. UnspentIndex
would be non-montone and would require the non-monotone blockchain data to generate.
CheckPointCandidate
doesn't need to be changed here just the internal optimizing algorithms in SpkTracker
and SparseChain
.SparseChain
into monotone and non-monotone structures. Also change CheckPointCandidate
to only have txids
rather than full transactions.SpkTracker
to monotone SparseChain
and from monotone SparseChain
to non-monotone SparseChain
. (rename these monotone SparseChain
could TxGraph
).UnspentIndex
.The development approach we're taking here is to test for correctness since we're still trying to figure out how the APIs should work.
In this issue we can track test cases we think we need but don't want to write since things are changing too quickly.
In general when inserting txs from esplora we should insert where unconfirmed conflicts are resolved by choosing the one with the higher feerate. This is because it may return a txid that has been fully evicted from the mempool. This is probably a feature where you can see a tx that has been rbf'd.
We've created a few different structures that have a SparseChain
inside them but don't offer a way to set the checkpoint_limit
.
You need to be able to set this. Alternatively you could try and design a syncing mechanism that doesn't need a checkpoint limit.
sparse_chain
there is a full tx in the graph
. To do this we'll have to make those fields private and we'll have to have an error case when the changeset is applied.SparseChain
from the first requests for related txids to the scripts. We then need to turn the txids we're missing into full transactions in a ChainGraph
that can be applied. Hint: maybe we can have a missing
function on ChainGraph
that somehow returns what's missing and then a method on the electrum API to take that set of missing things and turn it into a ChainGraph
with a batch request.The CoinSelectOpt
should have an enum with various algorithm options to use for coin selection.
Initial target is to cover what bdk already covers.. Largest first, Oldest first and Branch and bound..
This can be a potentially "good first issue" too..
@LLFourn is working on a module for miniscript that is currently being used by bdk_core_example
to get the satisfaction weight and to do signing.
@afilini and @danielabrozzoni have taken over this task.
Refer to comment: #89 (comment)
At some point we will need proper Error structure for the bdk_core
lib.
This is a tracking issue on discussing over the possible Error
struct.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.