From: Riccardo Casatta Date: Thu, 21 Jan 2021 21:44:07 +0000 (+0100) Subject: updates X-Git-Url: http://internal-gitweb-vhost/script/%22https:/-sqlite-db-configuration/static/bitcoin/struct.ScriptHash.html?a=commitdiff_plain;h=84fedbcb9175d8889ff4ad745949a33f048454ad;p=bitcoindevkit.org updates --- diff --git a/content/blog/2021/fee_estimation_for_light_clients_part_1.md b/content/blog/2021/fee_estimation_for_light_clients_part_1.md index 029c760b7a..628348663f 100644 --- a/content/blog/2021/fee_estimation_for_light_clients_part_1.md +++ b/content/blog/2021/fee_estimation_for_light_clients_part_1.md @@ -23,7 +23,7 @@ Fee estimation is the process of selecting the fee rate[^fee rate] for a bitcoin * The current congestion of the Bitcoin network. * The urgency, or lack thereof, for the transaction confirmation, i.e, its inclusion in a block. -A fee rate should be adequate to the above factors: a fee too high would be a waste of money, because the same result could have been achieved with a lower expense. On the other end, a fee rate too low would wait for a confirmation longer than planned or, even worse, could not be confirmed at all. +A fee rate should be adequate to the above factors: a fee too high would be a waste of money, because the same result could have been achieved with a lower expense. On the other hand, a fee rate too low would wait for a confirmation longer than planned or, even worse, the transaction could not be confirmed at all. ## The problem @@ -42,7 +42,7 @@ Thus, this work is an effort to build a **good fee estimator for purely peer to for other, better, models. In the meantime, another sub-goal is pursued: attract the interest of data scientists; Indeed the initial step for this analysis consists in constructing a data set, which could also also help kickstart other studies on fee -esimation or, more broadly, the Bitcoin mempool. +esimation or, more broadly, on the Bitcoin mempool. #### The challenges and the solution @@ -61,11 +61,11 @@ and there are enough examples, the black box will eventually start predicting th To define our inputs and outputs, we need to start from the question we want to answer. For a fee estimator this is: -*"Which fee rate should I use if I want this transaction to be confirmed in at most `n` blocks?"* +*"Which minimum fee rate should I use if I want this transaction to be confirmed in at most `n` blocks?"* This can be translated to a table with many rows like: -confirms_in | other_informations | fee_rate +confirms_in | other_information | fee_rate -|-|- 1|...|100.34 2|...| 84.33 @@ -78,7 +78,7 @@ The main thing that's missing is an indication of when the node first saw a tran within the number of blocks it actually took to be confirmed. For instance, if we see transaction `t` when the blockchain is at height `1000` and then we notice that `t` has been included in block `1006`, we can deduce that the fee-rate paid by `t` was the exact value required to get confirmed within the next `6` blocks. -So to build our model, we first need to gather this data, and machine learning needs a *lot* of data to work well. +So to build our model, we first need to gather these data, and machine learning needs a *lot* of data to work well. #### The data logger @@ -91,7 +91,7 @@ In the final dataset this field is called `confirms_in`[^blocks target]; if `con Another critical piece of information logged by the data logger is the `fee_rate` of the transaction, since the absolute fee value paid by a bitcoin transaction is not available nor derivable given only the transaction itself, as the inputs don't have explicit amounts. -All this data (apart from the time of the transaction entering in the mempool) can actually be reconstructed simply by looking at the blockchain. However, querying the bitcoin node can be fairly slow, and during the model training iterations we want to recreate the ML dataset rapidly[^fast], for example whenever we need to modify or add a new field. +All these data (apart from the time of the transaction entering in the mempool) can actually be reconstructed simply by looking at the blockchain. However, querying the bitcoin node can be fairly slow, and during the model training iterations we want to recreate the ML dataset rapidly[^fast], for example whenever we need to modify or add a new field. For these reasons, the logger is split into two parts: a process listening to the events sent by our node, which creates raw logs, and then a second process that uses these logs to create the final CSV dataset. Raw logs are self-contained: for example, they contain all the previous transaction output values for every relevant transaction. This causes some redundancy, but in this case it's better to trade some efficiency for more performance @@ -101,7 +101,7 @@ when recreating the dataset. My logger instance started collecting data on the 18th of December 2020, and as of today (18th January 2020), the raw logs are about 14GB. -I expect (or at least hope) the raw logs will be useful also for other projects as well, like monitoring the propagation of transactions or other works involving raw mempool data. We will share raw logs data through torrent soon. +I expect (or at least hope) the raw logs, the CSV dataset, or the data logger will be useful also for other projects as well, like monitoring the propagation of transactions or other works involving raw mempool data. We will share raw logs data through torrent soon. In the following [Part 2] we are going to talk about the dataset. diff --git a/content/blog/2021/fee_estimation_for_light_clients_part_2.md b/content/blog/2021/fee_estimation_for_light_clients_part_2.md index e92fd66b71..0fb372a001 100644 --- a/content/blog/2021/fee_estimation_for_light_clients_part_2.md +++ b/content/blog/2021/fee_estimation_for_light_clients_part_2.md @@ -65,7 +65,7 @@ The blocks are available through the p2p network, and downloading the last 6 is Another information the dataset contain is the block percentile fee rate, considering `r_i` to be the rate of the `ith` transaction in a block, `q_k` is the fee rate value such that for each transaction in a block `r_i` < `q_k` returns the `k%` transactions in the block that are paying lower fees. Percentiles are not used to feed the model but to filter some outliers tx. -Removing this observations is controversial at best and considered cheating at worse. However, it should be considered that bitcoin core `estimatesmartfee` doesn't even bother to give estimation for the next block, we think this is due to the fact that many transactions that are confirming in the next block are huge overestimation [^overestimation], or clearly errors like [this one] we found when we started logging data. +Removing this observations is controversial at best and considered cheating at worse. However, it should be considered that Bitcoin Core `estimatesmartfee` doesn't even bother to give estimation for the next block, we think this is due to the fact that many transactions that are confirming in the next block are huge overestimation [^overestimation], or clearly errors like [this one] we found when we started logging data. These outliers are a lot for transactions confirming in the next block (`confirms_in=1`), less so for `confirms_in=2`, mostly disappeared for `confirms_in=3` or more. It's counterintuitive that overestimation exist for `confirms_in>1`, by definition an overestimation is a fee rate way higher than needed, so how is possible that an overestimation doesn't enter the very next block? There are a couple of reasons why a block is discovered without containing a transaction with high fee rate: * network latency: my node saw the transaction but the miner didn't see that transaction yet, * block building latency: the miner saw the transaction, but didn't finish to rebuild the block template or decided it's more efficient to finish a cycle on the older block template. @@ -92,7 +92,7 @@ timestamp | converted | The time when the transaction has been added in the memp current_height | no | The blockchain height seen by the node in this moment confirms_in | yes | This transaction confirmed at block height `current_height+confirms_in` fee_rate | target | This transaction fee rate measured in `[sat/vbytes]` -fee_rate_bytes | no | fee rate in satoshi / bytes, used to check bitcoin core `estimatesmartfee` predictions +fee_rate_bytes | no | fee rate in satoshi / bytes, used to check Bitcoin Core `estimatesmartfee` predictions block_avg_fee | no | block average fee rate `[sat/vbytes]` of block `current_height+confirms_in` core_econ | no | bitcoin `estimatesmartfee` result for `confirms_in` block target and in economic mode. Could be not available `?` when a block is connected more recently than the estimation has been requested, estimation are requested every 10 secs. core_cons | no | Same as above but with conservative mode @@ -117,8 +117,8 @@ In the following [Part 3] we are going to talk about the model. [^MAE]: MAE is Mean Absolute Error, which is the average of the series built by the absolute difference between the real value and the estimation. [^drift]: drift like MAE, but without the absolute value [^minimum relay fee]: Most node won't relay transactions with fee lower than the min relay fee, which has a default of `1.0` -[^blocks target]: Conceptually similar to bitcoin core `estimatesmartfee` parameter called "blocks target", however, `confirms_in` is the real value not the desired target. -[^fast]: 14GB of compressed raw logs are processed and a compressed CSV produced in about 4 minutes. +[^blocks target]: Conceptually similar to Bitcoin Core `estimatesmartfee` parameter called "blocks target", however, `confirms_in` is the real value not the desired target. +[^fast]: 14GB of compressed raw logs are processed and a compressed CSV is produced in about 4 minutes. [Part 1]: /blog/2021/01/fee-estimation-for-light-clients-part-1/ [Part 2]: /blog/2021/01/fee-estimation-for-light-clients-part-2/