Add bdk_core blog post

author LLFourn <lloyd.fourn@gmail.com>

Tue, 10 May 2022 05:57:32 +0000 (15:57 +1000)

committer LLFourn <lloyd.fourn@gmail.com>

Tue, 10 May 2022 05:57:32 +0000 (15:57 +1000)
author LLFourn <lloyd.fourn@gmail.com>
Tue, 10 May 2022 05:57:32 +0000 (15:57 +1000)
committer LLFourn <lloyd.fourn@gmail.com>
Tue, 10 May 2022 05:57:32 +0000 (15:57 +1000)
diff --git a/docs/_blog/bdk_core_pt1.md b/docs/_blog/bdk_core_pt1.md

new file mode 100644 (file)

index 0000000..f0fa30e
--- /dev/null
+++ b/docs/_blog/bdk_core_pt1.md
@@ -0,0 +1,318 @@
+---
+title: "`bdk_core`: a new architecture for the Bitcoin Dev Kit"
+description: "A new architecture for the Bitcoin Dev Kit"
+authors:
+    - Lloyd Fournier
+date: "2022-05-09"
+tags: ["architecture"]
+hidden: true
+draft: false
+---
+
+The Bitcoin Developer Kit (BDK) lets you do a lot of useful things through convenient high level
+abstractions. When these abstractions map nicely to your application components your life is will be
+very easy but when they don't it can be become frustrating. My rather ambitious plan is start
+developing a new `bdk_core` library that exposes all the useful *mechanisms* that BDK has inside it
+without them being tied to any particular usage *policy* and with very minimal dependencies.
+
+## The separation of policy and mechanism
+
+My guiding principle for `bdk_core` is going to be the *separation of policy and mechanism*. This is
+what I mean by these terms:
+
+- *mechanism*: How you do a particular thing. Mechanism code is functional and doesn't change much.
+- *policy*: What you want to do. Policy code composes mechanisms to a achieve something in
+  an application.
+
+Here's a nice passage about why the designers of the [X window system] applied this principle. X has
+been around since 1984 and doesn't look like it's going anywhere so it probably has a lot to teach us.
+From *[The Art of UNIX Programming]*:
+
+> ...we observed that the designers of X made a basic decision to implement “mechanism, not policy”—to
+> make X a generic graphics engine and leave decisions about user-interface style to toolkits and
+> other levels of the system. We justified this by pointing out that policy and mechanism tend to
+> mutate on different timescales, with policy changing much faster than mechanism. Fashions in the
+> look and feel of GUI toolkits may come and go, but raster operations and compositing are forever.
+
+> Thus, hardwiring policy and mechanism together has two bad effects: It makes policy rigid and
+> harder to change in response to user requirements, and it means that trying to change policy has a
+> strong tendency to destabilize the mechanisms.
+
+> On the other hand, by separating the two we make it
+> possible to experiment with new policy without breaking mechanisms. We also make it much easier to
+> write good tests for the mechanism (policy, because it ages so quickly, often does not justify the
+> investment).
+
+> This design rule has wide application outside the GUI context. In general, it implies that we
+> should look for ways to separate interfaces from engines.
+
+You'll notice we have a similar situation in Bitcoin engineering. We have mechanism code like
+signing algorithms, key derivation, transaction construction logic etc that don't change much. But
+how these compose together in applications changes quickly over time and between applications.
+
+The main culprit of policy and mechanism conflation in `bdk` is the main [`Wallet`] type.
+Wallets do all of the following:
+
+1. Store one or two descriptors (external and optional internal).
+2. Keep track of which addresses you've given out so you only give out fresh ones from each descriptor.
+3. Keep a list of transactions associated with the addresses in the wallet.
+4. Given a source of blockchain data it can update its internal list of transactions.
+5. Given some parameters it can build a PSBT from transaction outputs.
+6. Given a PSBT it can sign it with its [`Signers`][`Signer`].
+
+All of that is very useful but it is bound together with the particular policies and opinions of `Wallet`.
+If `Wallet`'s policy is not your policy it's going to be tricky to get it to do what you want.
+Here's some examples:
+
+1. In order to control how the `Wallet` will select coins for a transaction internally you have to
+   pass in something implementing the [`CoinSelectionAlgorithm`] trait. A coin selection algorithm
+   is clearly mechanism code but the policy of `Wallet` restricts that mechanism's interface. We
+   have [very old issues](https://github.com/bitcoindevkit/bdk/issues/281) related to what the
+   interface of this trait should be and we don't have a clear way forward. In `bdk_core` I want to
+   purely provide the coin selection mechanisms for figuring out whether you need to select more
+   UTXOs or whether you need a change output etc. How you use that mechanism will be up to you.
+2. Another trait that has a similar structure is the [`Signer`] trait. You have to pass in signers
+   so your wallet can sign PSBTs but you have little control on how the wallet chooses which signers
+   to use any given situation. Right now the wallet will just iterate through all the signers and
+   ask them to sign. This is not always the appropriate. In `bdk_core` I hope we can just provide
+   ways of populating the correct PSBT fields with signatures.
+
+## A Syncing mechansim without the policy
+
+The rest of this post goes into detail about how to expose useful mechanisms for syncing without
+imposing a policy either on how blockchain data is fetched or how it is stored.
+
+### The problems with the policy
+
+Syncing in `bdk` is the place where the design of `Wallet` is most restrictive. The [`WalletSync`]
+trait forces you to sync all addresses in a wallet in one big batch. But this is not always what you
+want to do. I spoke to a developer who wanted to create a new tor connection to an electrum server
+for each address so the addresses couldn't be easily linked and to run this slowly in the
+background. It would be really difficult to to implement `WalletSync` with such a strategy. Another
+example where `WalletSync` isn't the right fit is the [Sensei] project which uses BDK but
+incrementally updates the database whenever new information comes in from the blockchain.
+
+Even if syncing all addresses at the same time is roughly what you want to do `WalletSync` still
+gets in the way since it defines whether you do it synchronously or asynchrononusly. Applications
+can control this through `bdk`'s `async-interface` feature flag which internally changes the trait
+definition through macros. Another annoyance is that when using `async-interface` the future that
+gets returned from `WalletSync` [cannot be `Send`](https://github.com/bitcoindevkit/bdk/issues/165)
+because of how `Wallet` handles database mutability internally meaning you can't spawn the future
+into a new thread.
+
+### A general syncing mechanism
+
+So what is the most general syncing mechanism that solves these problems? These are the things I
+think it has to do regardless of where the blockchain data comes from or how it's stored:
+
+1. Generate and store addresses.
+2. Index transaction data. e.g. transaction outputs we own, when/if they were spent etc.
+3. Keep track of witch addresses have been given out and which have been used.
+4. Be able to "roll back" our view of the above data if a re-org makes some of it stale.
+5. Keeping track of transactions related our addresses in our mempool.
+
+Let's talk about how to implement a mechanism that does all that.
+
+### How to store and index transactions
+
+Different persistent storage backends have different APIs and their own indexing strategies. That's
+why the [`Database`] trait exists in BDK to make a clean API to the different storage engines. It's
+important to note that the database in BDK only holds public data that could always be retrieved
+from the chain. It's just a cache. Despite this we support different backends. Right now it is a a
+lot of work to add a new index to the data since you have add it to every backend and you might have
+to apply schema changes (we still [don't have a standard approach to
+this](https://github.com/bitcoindevkit/bdk/issues/359)).
+
+Thomas Eizinger [suggested](https://github.com/bitcoindevkit/bdk/issues/165#issuecomment-1047483895)
+doing everything in memory and only writing to persistent storage when it was convenient. It took me
+some time but I came around to this idea. It would allow us to get rid of the `Database` trait (at
+least at the `bdk_core` level) and greatly simplify what the persistent storage layer has to do.
+Whenever the data is loaded from persistent storage we can just do the indexing in memory and
+present it to the application.
+
+*But wait! Wouldn't this mean we'd use way more memory than we need to?* Yes but memory is cheap.
+Consider that if we say the average transaction size is 300 bytes then with all our indexes each
+transaction might cost 1kb of memory (pessimistically). This means we could index one thousand
+transactions in a single megabyte! My iPhone has 4gb of memory so it could index a million
+transactions with plenty of memory to spare. *But what if some users can't afford an iPhone?* Then
+they also couldn't have afforded to have made a million Bitcoin transactions! *But what about memory
+constrained devices like hardware wallets!?* Those devices typically don't store and retrieve
+transactions. They're usually just signing devices. Perhaps one day someone will build a memory
+constrained device that needs to do this work but until then I think this is a fine approach to
+take.
+
+For now I'm calling this thing that does the in memory indexing of transactions related to a single
+descriptor a `DescriptorTracker`. Here's a diagram that communicates how I imagine it relates to the
+other components.
+
+![](./bdk_core_pt1/descriptor-tracker.jpg)
+
+### Rolling back, rolling forward and sycning to disk
+
+State changes in blockchains are clearly delineated. They all happen in blocks! Every view of the
+blockchain whether you're getting it through compact block filters, an electrum server or something
+wacky like a utreexo bridge will have a concept of blocks and transactions in them. For a wallet we
+only need a very sparse view of the blockchain that includes at which block a set of transactions
+existed. That way, if a block disappears we know that all those transactions might disappear too.
+
+With `bdk_core` I want introduce the concept of a *checkpoint* which is a block height and hash and
+a set of txids that were present at that height **but not present in the previous checkpoint**. In
+this way we create an append only data structure that can easily be rolled back to a previous height
+if there is a re-org. After rolling back we can then roll forward and apply the new blocks.
+
+Here's an example of how this idea works:
+
+![](./bdk_core_pt1/checkpoints.jpg)
+
+There's a few edge cases I'd like to cover:
+
+1. What if when gathering new data from the chain to update a `DescriptorTracker` we find an old transaction that belongs to an earlier checkpoint that we had missed form our earlier syncs?
+2. What if when we go to write to persistent storage from a `DescriptorTracker` we find that it has some transactions the tracker doesn't. Should we try and reconcile the two sets of transactions?
+
+I think the correct approach is to treat the chain data as the source of truth for the
+`DescriptorTracker` and the `DescriptorTracker` as the source of truth for persistent storage. That
+is in the case of (1) we should just rollback the `DescriptorTracker` and insert the old but
+recently discovered transaction in the right place. In the case of (2) we should roll back the
+persistent storage to the point where it differs and apply changes from there. This implies that you
+should only keep one instance of a `DescriptorTracker` for a descriptor in your application and only
+update persistent storage by first applying the changes to the tracker.
+
+## Examples
+
+Here are some examples of what I think this may end up looking like in code. Keep in mind that if
+this looks complicated it will probably be more complicated in practice! This doesn't mean that we
+can't create simplifying abstractions and tools around this primitives to cover common policies. I hope we can implement `Wallet` with `DescriptorTracker`s internally.
+
+### Doing an initial sync of a descriptor that may already contain coins
+
+When we first sync a descriptor that may already contain coins we want to iterate over all the
+scripts of the wallet and then stop if there's a big enough gap (e.g. 20). In this example we use an
+stateless [esplora like API](https://mempool.space/docs/api/rest).
+
+```rust
+// create a descriptor tracker the external addresses of a BIP86 key
+let mut tracker = DescriptorTracker::new("tr([73c5da0a/86'/0'/0']xpub6BgBgsespWvERF3LHQu6CnqdvfEvtMcQjYrcRzx53QJjSxarj2afYWcLteoGVky7D3UKDP9QyrLprQ3VCECoY49yfdDEHGCtMMj92pReUsQ/0/*)");
+
+let esplora = bdk_esplora::Client::new();
+let update = esplora.fetch_related_transactions(bdk_esplora::Params {
+   // iterate over all addresses in a descriptor
+   scripts: Some(tracker.iter_scripts()),
+   // stop if you find a gap of 20 unused addresses
+   stop_gap: Some(20),
+   ..Default::default()
+}).await?;
+
+tracker.apply_update(update)?;
+
+// now we want to persist this disk
+let db_update = tracker.generate_update(Params {
+    start_checkpoint: None,
+});
+
+// Note that the db_update type is the same as the `update` above.
+my_db.apply_update(db_update);
+```
+
+### Doing a sync of a wallet after you already have sync'd
+
+Now imagine you just want to check if any UTXOs in your wallet have been spent. In this case we've
+already sync'd before so we need to load that data into the tracker from disk first (rather than
+going straight to the blockchain). Then we just ask esplora for transactions related to these
+transaction outputs.
+
+```rust
+// create a descriptor tracker the external addresses of a BIP86 key
+let mut tracker = DescriptorTracker::new("tr([73c5da0a/86'/0'/0']xpub6BgBgsespWvERF3LHQu6CnqdvfEvtMcQjYrcRzx53QJjSxarj2afYWcLteoGVky7D3UKDP9QyrLprQ3VCECoY49yfdDEHGCtMMj92pReUsQ/0/*)");
+
+let init_update = my_db.generate_update(Params {
+    checkpoint: None
+});
+
+// get up to speed with what was on disk.
+tracker.apply_update(init_update);
+// get the latest checkpoint
+let checkpoint = tracker.get_checkpoint(0);
+
+let esplora = bdk_esplora::Client::new();
+
+// Fetch transactions spending any utxos we have
+let update = esplora.fetch_related_transactions( bdk_esplora::Params {
+   checkpoint: Some(checkpoint),
+   tx_outs: Some(tracker.iter_unspent()),
+   ..Default::default()
+}).await?;
+
+match update {
+   Ok(update) => {
+       tracker.apply_update(update)?;
+       // now we want to persist this disk
+       let db_update = tracker.generate_update(Params {
+           // this call could fail if tracker no longer has this checkpoint.
+           // In this case we'd ask persistent_storage for an earlier checkpoint and try again.
+           start_checkpoint: persistent_storage.get_checkpoint(0),
+       });
+
+       persistent_storage.apply_update(db_update);
+   }
+   Err(bdk_esplora::Error::StaleCheckpoint) => {
+      // here we should call fetch related transactions with an earlier checkpoint.
+      // In practice this logic will be called in a loop
+   }
+}
+```
+
+
+### Updating state when you get the data in real time
+
+If you have an event based view of the blockchain that feeds you block connected or block
+disconnected events then I imagine the API would look something like this.
+There's quite a bit left out here but I hope you get the idea.
+
+```rust
+// create a descriptor tracker the external addresses of a BIP86 key
+let mut tracker = DescriptorTracker::new("tr([73c5da0a/86'/0'/0']xpub6BgBgsespWvERF3LHQu6CnqdvfEvtMcQjYrcRzx53QJjSxarj2afYWcLteoGVky7D3UKDP9QyrLprQ3VCECoY49yfdDEHGCtMMj92pReUsQ/0/*)");
+
+
+let blockchain_events = { /* get a Stream of blockchain block connected/disconnected events */ };
+
+
+loop {
+    while Some(blockchain_event) =  blockchain_events.next() {
+       match blockchain_event {
+           BlockChainEvent::Connected(new_block) => {
+               match tracker.apply_block(new_block) {
+                   Ok(modified) => if modified {
+                       // update persistent storage from tracker
+                   }
+                   Err(ApplyBlockError::OutOfOrder) => {
+                       // the block event we got was not the next block we expected.
+                       // How to recover from this will depend on the application and block source
+                   }
+               }
+           }
+           BlockchainEvent::Disconnected((disconnected_height, disconnected_hash)) => {
+              // this might invalidate a checkpoint
+              tracker.disconnect_block(disconnected_height, disconnected_hash);
+              // Now apply to persistent storage
+           }
+       }
+    }
+}
+```
+
+## Feedback
+
+The best way to give feedback on this would be contact me [BDK discord server](https://discord.gg/dstn4dQ).
+Expect to see a draft release of `bdk_core` towards the end of May. 
+
+
+[X window system]: https://en.wikipedia.org/wiki/X_Window_System
+[The Art of UNIX Programming]: https://en.wikipedia.org/wiki/The_Art_of_Unix_Programming
+[`Wallet`]: https://docs.rs/bdk/latest/bdk/wallet/struct.Wallet.html
+[`CoinSelectionAlgorithm`]: https://docs.rs/bdk/latest/bdk/wallet/coin_selection/trait.CoinSelectionAlgorithm.html
+[`Signer`]: https://docs.rs/bdk/latest/bdk/wallet/signer/trait.Signer.html
+[`WalletSync`]: https://docs.rs/bdk/latest/bdk/blockchain/trait.walletsync.html
+[Sensei]: https://l2.technology/sensei
+[`Database`]: https://docs.rs/bdk/latest/bdk/database/trait.Database.html
+
+
diff --git a/docs/_blog/bdk_core_pt1/checkpoints.jpg b/docs/_blog/bdk_core_pt1/checkpoints.jpg

new file mode 100644 (file)

index 0000000..69b742b

Binary files /dev/null and b/docs/_blog/bdk_core_pt1/checkpoints.jpg differ
diff --git a/docs/_blog/bdk_core_pt1/checkpoints.png b/docs/_blog/bdk_core_pt1/checkpoints.png

new file mode 100644 (file)

index 0000000..8058c95

Binary files /dev/null and b/docs/_blog/bdk_core_pt1/checkpoints.png differ
diff --git a/docs/_blog/bdk_core_pt1/descriptor-tracker.jpg b/docs/_blog/bdk_core_pt1/descriptor-tracker.jpg

new file mode 100644 (file)

index 0000000..d6a27f2

Binary files /dev/null and b/docs/_blog/bdk_core_pt1/descriptor-tracker.jpg differ
author	LLFourn <lloyd.fourn@gmail.com>
	Tue, 10 May 2022 05:57:32 +0000 (15:57 +1000)
committer	LLFourn <lloyd.fourn@gmail.com>
	Tue, 10 May 2022 05:57:32 +0000 (15:57 +1000)
docs/_blog/bdk_core_pt1.md	[new file with mode: 0644]	patch \| blob
docs/_blog/bdk_core_pt1/checkpoints.jpg	[new file with mode: 0644]	patch \| blob
docs/_blog/bdk_core_pt1/checkpoints.png	[new file with mode: 0644]	patch \| blob
docs/_blog/bdk_core_pt1/descriptor-tracker.jpg	[new file with mode: 0644]	patch \| blob