Raw logs are self-contained: for example, they contain all the previous transaction output values for every relevant transaction. This causes some redundancy, but in this case it's better to trade some efficiency for more performance
when recreating the dataset.
-
+
My logger instance started collecting data on the 18th of December 2020, and as of today (25th January 2020), the raw logs are about 16GB.
a1-a2-... | yes | Contains the number of transaction in the mempool with known fee rate in the ith bucket.
-
+
<div align="center">My biological neural network fired this, I think it's because a lot of chapters start with "The"</div>
<br/><br/>
The code building and training the model with [tensorflow] is available in [google colab notebook] (jupyter notebook); you can also download the file as plain python and run it locally. At least 1 hour is needed to train the full model, but it heavily depends on the hardware available.
-
+
<div align="center">Do you want to choose the fee without a model? In the last 5 weeks a ~50 sat/vbyte transaction never took more than a day to confirm and a ~10 sat/vbyte never took more than a week</div><br/>
As a reference, in the code we have a calculation of the bitcoin core `estimatesmartfee` MAE[^MAE] and drift[^drift].
metrics=['mae', 'mse'])
```
-
+
The model is fed with the `encoded_features` coming from the processing phase, then there are 2 layers with 64 neurons each followed by one neuron giving the `fee_rate` as output.
Our model doesn't look to suffer overfitting cause `loss` and `val_loss` doesn't diverge during training
-
+
While we told the training to do 200 epochs, the training stopped at 158 because we added an `early_stop` call back with `20` as `PATIENCE`, meaning that after 20 epoch and no improvement in `val_loss` the training is halted, saving time and potentially avoiding overfitting.
The following chart is probably the best visualization to evaluate the model, on the x axis there is the real fee rate while on the y axis there is the prediction, the more the points are centered on the bisection, the more the model is good.
We can see the model is doing quite well, the MAE is 8 which is way lower than `estimatesmartfee`. However, there are big errors some times, in particular for prediction for fast confirmation (`confirms_in=1 or confirms_in=2`) as shown by the orange points. Creating a model only for blocks target greater than 2 instead of simply remove some observations may be an option.
-
+
The following chart is instead a distribution of the errors, which for good model should resemble the normal distribution centered in 0, and it loooks like the model is respecting that.
-
+
## Conclusion and future development
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I don't understand Machine Learning(ML), but is it horrible to use ML to predict bitcoin fees? <br><br>I have heard tales of this "Deep Learning" thing where you throw a bunch of data at it and it gives you good results with high accuracy.</p>— sanket1729 (@sanket1729) <a href="https://twitter.com/sanket1729/status/1336624662365822977?ref_src=twsrc%5Etfw">December 9, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
-This is the final part of the series. In the previous [Part 1] we talked about the problem and in [Part 3] we talked about the dataset.
+This is the final part of the series. In the previous [Part 1] we talked about the problem and in [Part 2] we talked about the dataset.
[^MAE]: MAE is Mean Absolute Error, which is the average of the series built by the absolute difference between the real value and the estimation.
[^drift]: drift like MAE, but without the absolute value
--- /dev/null
+<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"><!-- Generated by graphviz version 2.40.1 (20161225.0304)
+ --><!-- Title: G Pages: 1 --><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="526pt" height="638pt" viewBox="0.00 0.00 526.26 638.00">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 634)">
+<title>G</title>
+<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-634 522.2624,-634 522.2624,4 -4,4"/>
+<g id="clust1" class="cluster">
+<title>cluster_logger</title>
+<polygon fill="#d3d3d3" stroke="#d3d3d3" points="46,-233.8 46,-622 278,-622 278,-233.8 46,-233.8"/>
+<text text-anchor="middle" x="162" y="-605.4" font-family="Times,serif" font-size="14.00" fill="#000000">bitcoin-logger</text>
+</g>
+<g id="clust2" class="cluster">
+<title>cluster_bitcoind</title>
+<polygon fill="#d3d3d3" stroke="#d3d3d3" points="286,-433.6 286,-510.4 430,-510.4 430,-433.6 286,-433.6"/>
+<text text-anchor="middle" x="358" y="-493.8" font-family="Times,serif" font-size="14.00" fill="#000000">bitcoind</text>
+</g>
+<g id="clust3" class="cluster">
+<title>cluster_csv</title>
+<polygon fill="#d3d3d3" stroke="#d3d3d3" points="8,-65 8,-141.8 296,-141.8 296,-65 8,-65"/>
+<text text-anchor="middle" x="152" y="-125.2" font-family="Times,serif" font-size="14.00" fill="#000000">bitcoin-csv</text>
+</g>
+<!-- store -->
+<g id="node1" class="node">
+<title>store</title>
+<ellipse fill="#ffffff" stroke="#ffffff" cx="213" cy="-370.8" rx="29.6127" ry="18"/>
+<text text-anchor="middle" x="213" y="-366.6" font-family="Times,serif" font-size="14.00" fill="#000000">store</text>
+</g>
+<!-- flush -->
+<g id="node3" class="node">
+<title>flush</title>
+<ellipse fill="#ffffff" stroke="#ffffff" cx="213" cy="-259.8" rx="30.1958" ry="18"/>
+<text text-anchor="middle" x="213" y="-255.6" font-family="Times,serif" font-size="14.00" fill="#000000">flush</text>
+</g>
+<!-- store->flush -->
+<g id="edge3" class="edge">
+<title>store->flush</title>
+<path fill="none" stroke="#000000" d="M213,-352.4706C213,-335.0373 213,-308.5482 213,-288.3489"/>
+<polygon fill="#000000" stroke="#000000" points="216.5001,-288.1566 213,-278.1566 209.5001,-288.1567 216.5001,-288.1566"/>
+</g>
+<!-- rpc -->
+<g id="node6" class="node">
+<title>rpc</title>
+<ellipse fill="#ffffff" stroke="#ffffff" cx="321" cy="-459.6" rx="27" ry="18"/>
+<text text-anchor="middle" x="321" y="-455.4" font-family="Times,serif" font-size="14.00" fill="#000000">rpc</text>
+</g>
+<!-- store->rpc -->
+<g id="edge5" class="edge">
+<title>store->rpc</title>
+<path fill="none" stroke="#000000" d="M230.6678,-385.3269C248.3931,-399.901 275.8597,-422.4847 295.8541,-438.9245"/>
+<polygon fill="#000000" stroke="#000000" points="293.7008,-441.6852 303.6479,-445.3327 298.1465,-436.2782 293.7008,-441.6852"/>
+<text text-anchor="middle" x="311.1904" y="-411" font-family="Times,serif" font-size="14.00" fill="#000000">missing data</text>
+</g>
+<!-- zmq_reader -->
+<g id="node2" class="node">
+<title>zmq_reader</title>
+<ellipse fill="#ffffff" stroke="#ffffff" cx="213" cy="-571.2" rx="56.7213" ry="18"/>
+<text text-anchor="middle" x="213" y="-567" font-family="Times,serif" font-size="14.00" fill="#000000">zmq_reader</text>
+</g>
+<!-- zmq_reader->store -->
+<g id="edge1" class="edge">
+<title>zmq_reader->store</title>
+<path fill="none" stroke="#000000" d="M213,-553.0624C213,-518.0524 213,-440.9338 213,-399.1098"/>
+<polygon fill="#000000" stroke="#000000" points="216.5001,-399.041 213,-389.041 209.5001,-399.041 216.5001,-399.041"/>
+</g>
+<!-- raw_logs -->
+<g id="node9" class="node">
+<title>raw_logs</title>
+<polygon fill="none" stroke="#000000" points="246.8177,-204.8 179.1823,-204.8 179.1823,-168.8 246.8177,-168.8 246.8177,-204.8"/>
+<text text-anchor="middle" x="213" y="-182.6" font-family="Times,serif" font-size="14.00" fill="#000000">raw_logs</text>
+</g>
+<!-- flush->raw_logs -->
+<g id="edge7" class="edge">
+<title>flush->raw_logs</title>
+<path fill="none" stroke="#000000" d="M213,-241.7551C213,-233.6828 213,-223.9764 213,-214.9817"/>
+<polygon fill="#000000" stroke="#000000" points="216.5001,-214.8903 213,-204.8904 209.5001,-214.8904 216.5001,-214.8903"/>
+</g>
+<!-- rpc_call -->
+<g id="node4" class="node">
+<title>rpc_call</title>
+<ellipse fill="#ffffff" stroke="#ffffff" cx="96" cy="-571.2" rx="42.2719" ry="18"/>
+<text text-anchor="middle" x="96" y="-567" font-family="Times,serif" font-size="14.00" fill="#000000">rpc_call</text>
+</g>
+<!-- rpc_call->store -->
+<g id="edge2" class="edge">
+<title>rpc_call->store</title>
+<path fill="none" stroke="#000000" d="M106.383,-553.4158C127.1286,-517.8823 173.7774,-437.9813 197.9061,-396.6532"/>
+<polygon fill="#000000" stroke="#000000" points="200.9539,-398.3746 202.9733,-387.974 194.9087,-394.8452 200.9539,-398.3746"/>
+</g>
+<!-- rpc_call->rpc -->
+<g id="edge6" class="edge">
+<title>rpc_call->rpc</title>
+<path fill="none" stroke="#000000" d="M128.3237,-559.6236C134.5117,-557.4551 140.9473,-555.2324 147,-553.2 195.2719,-536.9911 207.5791,-533.6629 256.1584,-518.4 267.6285,-514.7963 272.1326,-517.2692 282,-510.4 291.4048,-503.8529 299.5581,-494.3438 305.9432,-485.3189"/>
+<polygon fill="#000000" stroke="#000000" points="308.8983,-487.1957 311.5065,-476.9269 303.0638,-483.3279 308.8983,-487.1957"/>
+<text text-anchor="middle" x="303.4208" y="-522.6" font-family="Times,serif" font-size="14.00" fill="#000000">estimatesmartfee</text>
+</g>
+<!-- zmq -->
+<g id="node5" class="node">
+<title>zmq</title>
+<ellipse fill="#ffffff" stroke="#ffffff" cx="394" cy="-459.6" rx="27.8286" ry="18"/>
+<text text-anchor="middle" x="394" y="-455.4" font-family="Times,serif" font-size="14.00" fill="#000000">zmq</text>
+</g>
+<!-- zmq->zmq_reader -->
+<g id="edge4" class="edge">
+<title>zmq->zmq_reader</title>
+<path fill="none" stroke="#000000" d="M389.3641,-477.6447C383.977,-494.9667 373.3851,-520.5921 355,-535.2 342.1425,-545.416 306.1216,-554.486 273.579,-560.9666"/>
+<polygon fill="#000000" stroke="#000000" points="272.7739,-557.5574 263.6207,-562.8931 274.1035,-564.43 272.7739,-557.5574"/>
+<text text-anchor="middle" x="443.6312" y="-522.6" font-family="Times,serif" font-size="14.00" fill="#000000">rawtx, rawblock, sequence</text>
+</g>
+<!-- raw_logs_reader -->
+<g id="node7" class="node">
+<title>raw_logs_reader</title>
+<ellipse fill="#ffffff" stroke="#ffffff" cx="213" cy="-91" rx="75.2616" ry="18"/>
+<text text-anchor="middle" x="213" y="-86.8" font-family="Times,serif" font-size="14.00" fill="#000000">raw_logs_reader</text>
+</g>
+<!-- csv_writer -->
+<g id="node8" class="node">
+<title>csv_writer</title>
+<ellipse fill="#ffffff" stroke="#ffffff" cx="68" cy="-91" rx="51.5823" ry="18"/>
+<text text-anchor="middle" x="68" y="-86.8" font-family="Times,serif" font-size="14.00" fill="#000000">csv_writer</text>
+</g>
+<!-- csv -->
+<g id="node10" class="node">
+<title>csv</title>
+<polygon fill="none" stroke="#000000" points="95,-36 41,-36 41,0 95,0 95,-36"/>
+<text text-anchor="middle" x="68" y="-13.8" font-family="Times,serif" font-size="14.00" fill="#000000">csv</text>
+</g>
+<!-- csv_writer->csv -->
+<g id="edge9" class="edge">
+<title>csv_writer->csv</title>
+<path fill="none" stroke="#000000" d="M68,-72.9551C68,-64.8828 68,-55.1764 68,-46.1817"/>
+<polygon fill="#000000" stroke="#000000" points="71.5001,-46.0903 68,-36.0904 64.5001,-46.0904 71.5001,-46.0903"/>
+</g>
+<!-- raw_logs->raw_logs_reader -->
+<g id="edge8" class="edge">
+<title>raw_logs->raw_logs_reader</title>
+<path fill="none" stroke="#000000" d="M213,-168.7808C213,-154.9527 213,-135.593 213,-119.6477"/>
+<polygon fill="#000000" stroke="#000000" points="216.5001,-119.2641 213,-109.2642 209.5001,-119.2642 216.5001,-119.2641"/>
+</g>
+</g>
+</svg>
\ No newline at end of file
+++ /dev/null
-<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"><!-- Generated by graphviz version 2.40.1 (20161225.0304)
- --><!-- Title: G Pages: 1 --><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="526pt" height="638pt" viewBox="0.00 0.00 526.26 638.00">
-<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 634)">
-<title>G</title>
-<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-634 522.2624,-634 522.2624,4 -4,4"/>
-<g id="clust1" class="cluster">
-<title>cluster_logger</title>
-<polygon fill="#d3d3d3" stroke="#d3d3d3" points="46,-233.8 46,-622 278,-622 278,-233.8 46,-233.8"/>
-<text text-anchor="middle" x="162" y="-605.4" font-family="Times,serif" font-size="14.00" fill="#000000">bitcoin-logger</text>
-</g>
-<g id="clust2" class="cluster">
-<title>cluster_bitcoind</title>
-<polygon fill="#d3d3d3" stroke="#d3d3d3" points="286,-433.6 286,-510.4 430,-510.4 430,-433.6 286,-433.6"/>
-<text text-anchor="middle" x="358" y="-493.8" font-family="Times,serif" font-size="14.00" fill="#000000">bitcoind</text>
-</g>
-<g id="clust3" class="cluster">
-<title>cluster_csv</title>
-<polygon fill="#d3d3d3" stroke="#d3d3d3" points="8,-65 8,-141.8 296,-141.8 296,-65 8,-65"/>
-<text text-anchor="middle" x="152" y="-125.2" font-family="Times,serif" font-size="14.00" fill="#000000">bitcoin-csv</text>
-</g>
-<!-- store -->
-<g id="node1" class="node">
-<title>store</title>
-<ellipse fill="#ffffff" stroke="#ffffff" cx="213" cy="-370.8" rx="29.6127" ry="18"/>
-<text text-anchor="middle" x="213" y="-366.6" font-family="Times,serif" font-size="14.00" fill="#000000">store</text>
-</g>
-<!-- flush -->
-<g id="node3" class="node">
-<title>flush</title>
-<ellipse fill="#ffffff" stroke="#ffffff" cx="213" cy="-259.8" rx="30.1958" ry="18"/>
-<text text-anchor="middle" x="213" y="-255.6" font-family="Times,serif" font-size="14.00" fill="#000000">flush</text>
-</g>
-<!-- store->flush -->
-<g id="edge3" class="edge">
-<title>store->flush</title>
-<path fill="none" stroke="#000000" d="M213,-352.4706C213,-335.0373 213,-308.5482 213,-288.3489"/>
-<polygon fill="#000000" stroke="#000000" points="216.5001,-288.1566 213,-278.1566 209.5001,-288.1567 216.5001,-288.1566"/>
-</g>
-<!-- rpc -->
-<g id="node6" class="node">
-<title>rpc</title>
-<ellipse fill="#ffffff" stroke="#ffffff" cx="321" cy="-459.6" rx="27" ry="18"/>
-<text text-anchor="middle" x="321" y="-455.4" font-family="Times,serif" font-size="14.00" fill="#000000">rpc</text>
-</g>
-<!-- store->rpc -->
-<g id="edge5" class="edge">
-<title>store->rpc</title>
-<path fill="none" stroke="#000000" d="M230.6678,-385.3269C248.3931,-399.901 275.8597,-422.4847 295.8541,-438.9245"/>
-<polygon fill="#000000" stroke="#000000" points="293.7008,-441.6852 303.6479,-445.3327 298.1465,-436.2782 293.7008,-441.6852"/>
-<text text-anchor="middle" x="311.1904" y="-411" font-family="Times,serif" font-size="14.00" fill="#000000">missing data</text>
-</g>
-<!-- zmq_reader -->
-<g id="node2" class="node">
-<title>zmq_reader</title>
-<ellipse fill="#ffffff" stroke="#ffffff" cx="213" cy="-571.2" rx="56.7213" ry="18"/>
-<text text-anchor="middle" x="213" y="-567" font-family="Times,serif" font-size="14.00" fill="#000000">zmq_reader</text>
-</g>
-<!-- zmq_reader->store -->
-<g id="edge1" class="edge">
-<title>zmq_reader->store</title>
-<path fill="none" stroke="#000000" d="M213,-553.0624C213,-518.0524 213,-440.9338 213,-399.1098"/>
-<polygon fill="#000000" stroke="#000000" points="216.5001,-399.041 213,-389.041 209.5001,-399.041 216.5001,-399.041"/>
-</g>
-<!-- raw_logs -->
-<g id="node9" class="node">
-<title>raw_logs</title>
-<polygon fill="none" stroke="#000000" points="246.8177,-204.8 179.1823,-204.8 179.1823,-168.8 246.8177,-168.8 246.8177,-204.8"/>
-<text text-anchor="middle" x="213" y="-182.6" font-family="Times,serif" font-size="14.00" fill="#000000">raw_logs</text>
-</g>
-<!-- flush->raw_logs -->
-<g id="edge7" class="edge">
-<title>flush->raw_logs</title>
-<path fill="none" stroke="#000000" d="M213,-241.7551C213,-233.6828 213,-223.9764 213,-214.9817"/>
-<polygon fill="#000000" stroke="#000000" points="216.5001,-214.8903 213,-204.8904 209.5001,-214.8904 216.5001,-214.8903"/>
-</g>
-<!-- rpc_call -->
-<g id="node4" class="node">
-<title>rpc_call</title>
-<ellipse fill="#ffffff" stroke="#ffffff" cx="96" cy="-571.2" rx="42.2719" ry="18"/>
-<text text-anchor="middle" x="96" y="-567" font-family="Times,serif" font-size="14.00" fill="#000000">rpc_call</text>
-</g>
-<!-- rpc_call->store -->
-<g id="edge2" class="edge">
-<title>rpc_call->store</title>
-<path fill="none" stroke="#000000" d="M106.383,-553.4158C127.1286,-517.8823 173.7774,-437.9813 197.9061,-396.6532"/>
-<polygon fill="#000000" stroke="#000000" points="200.9539,-398.3746 202.9733,-387.974 194.9087,-394.8452 200.9539,-398.3746"/>
-</g>
-<!-- rpc_call->rpc -->
-<g id="edge6" class="edge">
-<title>rpc_call->rpc</title>
-<path fill="none" stroke="#000000" d="M128.3237,-559.6236C134.5117,-557.4551 140.9473,-555.2324 147,-553.2 195.2719,-536.9911 207.5791,-533.6629 256.1584,-518.4 267.6285,-514.7963 272.1326,-517.2692 282,-510.4 291.4048,-503.8529 299.5581,-494.3438 305.9432,-485.3189"/>
-<polygon fill="#000000" stroke="#000000" points="308.8983,-487.1957 311.5065,-476.9269 303.0638,-483.3279 308.8983,-487.1957"/>
-<text text-anchor="middle" x="303.4208" y="-522.6" font-family="Times,serif" font-size="14.00" fill="#000000">estimatesmartfee</text>
-</g>
-<!-- zmq -->
-<g id="node5" class="node">
-<title>zmq</title>
-<ellipse fill="#ffffff" stroke="#ffffff" cx="394" cy="-459.6" rx="27.8286" ry="18"/>
-<text text-anchor="middle" x="394" y="-455.4" font-family="Times,serif" font-size="14.00" fill="#000000">zmq</text>
-</g>
-<!-- zmq->zmq_reader -->
-<g id="edge4" class="edge">
-<title>zmq->zmq_reader</title>
-<path fill="none" stroke="#000000" d="M389.3641,-477.6447C383.977,-494.9667 373.3851,-520.5921 355,-535.2 342.1425,-545.416 306.1216,-554.486 273.579,-560.9666"/>
-<polygon fill="#000000" stroke="#000000" points="272.7739,-557.5574 263.6207,-562.8931 274.1035,-564.43 272.7739,-557.5574"/>
-<text text-anchor="middle" x="443.6312" y="-522.6" font-family="Times,serif" font-size="14.00" fill="#000000">rawtx, rawblock, sequence</text>
-</g>
-<!-- raw_logs_reader -->
-<g id="node7" class="node">
-<title>raw_logs_reader</title>
-<ellipse fill="#ffffff" stroke="#ffffff" cx="213" cy="-91" rx="75.2616" ry="18"/>
-<text text-anchor="middle" x="213" y="-86.8" font-family="Times,serif" font-size="14.00" fill="#000000">raw_logs_reader</text>
-</g>
-<!-- csv_writer -->
-<g id="node8" class="node">
-<title>csv_writer</title>
-<ellipse fill="#ffffff" stroke="#ffffff" cx="68" cy="-91" rx="51.5823" ry="18"/>
-<text text-anchor="middle" x="68" y="-86.8" font-family="Times,serif" font-size="14.00" fill="#000000">csv_writer</text>
-</g>
-<!-- csv -->
-<g id="node10" class="node">
-<title>csv</title>
-<polygon fill="none" stroke="#000000" points="95,-36 41,-36 41,0 95,0 95,-36"/>
-<text text-anchor="middle" x="68" y="-13.8" font-family="Times,serif" font-size="14.00" fill="#000000">csv</text>
-</g>
-<!-- csv_writer->csv -->
-<g id="edge9" class="edge">
-<title>csv_writer->csv</title>
-<path fill="none" stroke="#000000" d="M68,-72.9551C68,-64.8828 68,-55.1764 68,-46.1817"/>
-<polygon fill="#000000" stroke="#000000" points="71.5001,-46.0903 68,-36.0904 64.5001,-46.0904 71.5001,-46.0903"/>
-</g>
-<!-- raw_logs->raw_logs_reader -->
-<g id="edge8" class="edge">
-<title>raw_logs->raw_logs_reader</title>
-<path fill="none" stroke="#000000" d="M213,-168.7808C213,-154.9527 213,-135.593 213,-119.6477"/>
-<polygon fill="#000000" stroke="#000000" points="216.5001,-119.2641 213,-109.2642 209.5001,-119.2642 216.5001,-119.2641"/>
-</g>
-</g>
-</svg>
\ No newline at end of file