web3.py Patterns: Multithreading
Is most of your runtime spent waiting on responses from remote nodes? web3.py's AsyncHTTPProvider is the ideal solution for many use cases, but this post is for those users that need a performance boost and can't use asyncio
due to dependency constraints.
Like asyncio
, threads can be used to improve the performance of I/O bound processes in Python, and that includes your Web3.py script or dapp. The idea behind threads and async libraries is similar: while your system is waiting for something to happen (e.g., a response from a server), be productive elsewhere (e.g., dispatch another request).
We refer to this functionality as concurrency. Note, though, that two or more tasks aren't being executed at the same time; your CPU core is just switching between tasks to minimize idle time where possible. For a good introduction to the concepts of threads, asyncio
, and multiprocessing, see this explainer.
Performance
In small-scale tests (e.g., 50-200 concurrent eth_getBlock
requests to a remote node provider), the performance was comparable between the AsyncHTTPProvider
and a threaded approach utilizing the standard HTTPProvider
.
A sample output from logging the first 50 block numbers as they're returned:
∆∆∆ SYNC VERSION
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
SYNC Time: 5.27 sec
∆∆∆ THREADED VERSION
[0, 10, 11, 1, 2, 6, 3, 12, 4, 13, 14, 15, 16, 18, 17, 19, 20, 21, 22, 23, 24, 25, 28, 27, 26, 29, 30, 31, 33, 35, 32, 34, 36, 37, 39, 40, 38, 41, 42, 43, 46, 45, 48, 47, 44, 49, 5, 8, 9, 7]
THREADED Time: 1.39 sec
∆∆∆ ASYNC VERSION
[11, 2, 42, 13, 21, 49, 38, 5, 22, 16, 34, 1, 40, 10, 8, 12, 18, 29, 30, 43, 19, 39, 17, 4, 9, 15, 44, 27, 37, 45, 0, 20, 35, 36, 14, 28, 24, 46, 41, 33, 31, 25, 23, 48, 32, 3, 47, 7, 6, 26]
ASYNC Time: 1.43 sec
Concurrent block fetching regularly offered a 3-5x performance gain at this sample size, but note that actual performance will vary widely based on your hardware, bandwidth, and remote node provider – their technical limitations, but also your subscription plan limitations!
At larger sample sizes, the expectation is for asyncio
to outperform threads, given that it takes less resources to spin up asyncio
tasks than new threads.
Sample code
A "Hello, World"-style threaded example to get you off the ground:
import concurrent.futures
from web3 import Web3, HTTPProvider
w3 = Web3(HTTPProvider("..."))
blocks = []
def fetch_block(num):
b = w3.eth.get_block(num)
blocks.append(b)
with concurrent.futures.ThreadPoolExecutor() as ex:
ex.map(fetch_block, range(50))
A couple quick tips:
- The
ThreadPoolExecutor
can be fine-tuned by including amax_workers
parameter. - Usage of the
ThreadPoolExecutor
within a context manager simplifies the process of starting each thread and waiting for their completion. - For more sophisticated use cases, explore queues and producer-consumer pipelines.
An AsyncHTTPProvider
example for the sake of comparison:
import asyncio
from web3 import Web3, AsyncHTTPProvider
from web3.eth import AsyncEth
w3 = Web3(AsyncHTTPProvider("..."))
blocks = []
async def main():
for result in asyncio.as_completed(
[w3.eth.get_block(num) for num in range(50)]
):
b = await result
blocks.append(b)
asyncio.run(main())
When to use threads
If you have no performance concerns or your code cannot benefit from concurrency: congratulations! You can stop here.
If your application spends much of its time I/O bound, e.g., waiting on responses from remote nodes, you've got a good candidate to explore concurrency options.
The downside of using threads is added complexity. The example code in this post may be blissfully simple, but when you need share context between threads, the onus is on you to prevent deadlocks and race conditions.
When evaluating threads vs. asyncio
for your use case, this gets a bit more subjective if your scale doesn't necessitate the use of asyncio
. A regular sentiment I see is that asyncio
code is generally easier to write and reason about. Anecdotally, the release notes for Python 3.11 also indicate a greater recent investment in asyncio
than threading by the Python core team and community.
If you're looking for a rule of thumb: reach for an async provider first. If that isn't an option, either because a needed feature isn't available yet or because you depend on a library that is not asyncio
-compatible, use threads.