web3.py Patterns: Multithreading

Is most of your runtime spent waiting on responses from remote nodes? web3.py's AsyncHTTPProvider is the ideal solution for many use cases, but this post is for those users that need a performance boost and can't use asyncio due to dependency constraints.

Like asyncio, threads can be used to improve the performance of I/O bound processes in Python, and that includes your Web3.py script or dapp. The idea behind threads and async libraries is similar: while your system is waiting for something to happen (e.g., a response from a server), be productive elsewhere (e.g., dispatch another request).

We refer to this functionality as concurrency. Note, though, that two or more tasks aren't being executed at the same time; your CPU core is just switching between tasks to minimize idle time where possible. For a good introduction to the concepts of threads, asyncio, and multiprocessing, see this explainer.

Performance

In small-scale tests (e.g., 50-200 concurrent eth_getBlock requests to a remote node provider), the performance was comparable between the AsyncHTTPProvider and a threaded approach utilizing the standard HTTPProvider.

A sample output from logging the first 50 block numbers as they're returned:

∆∆∆ SYNC VERSION
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
SYNC Time: 5.27 sec

∆∆∆ THREADED VERSION
[0, 10, 11, 1, 2, 6, 3, 12, 4, 13, 14, 15, 16, 18, 17, 19, 20, 21, 22, 23, 24, 25, 28, 27, 26, 29, 30, 31, 33, 35, 32, 34, 36, 37, 39, 40, 38, 41, 42, 43, 46, 45, 48, 47, 44, 49, 5, 8, 9, 7]
THREADED Time: 1.39 sec

∆∆∆ ASYNC VERSION
[11, 2, 42, 13, 21, 49, 38, 5, 22, 16, 34, 1, 40, 10, 8, 12, 18, 29, 30, 43, 19, 39, 17, 4, 9, 15, 44, 27, 37, 45, 0, 20, 35, 36, 14, 28, 24, 46, 41, 33, 31, 25, 23, 48, 32, 3, 47, 7, 6, 26]
ASYNC Time: 1.43 sec

Concurrent block fetching regularly offered a 3-5x performance gain at this sample size, but note that actual performance will vary widely based on your hardware, bandwidth, and remote node provider – their technical limitations, but also your subscription plan limitations!

At larger sample sizes, the expectation is for asyncio to outperform threads, given that it takes less resources to spin up asyncio tasks than new threads.

Sample code

A "Hello, World"-style threaded example to get you off the ground:

import concurrent.futures
from web3 import Web3, HTTPProvider

w3 = Web3(HTTPProvider("..."))

blocks = []

def fetch_block(num):
    b = w3.eth.get_block(num)
    blocks.append(b)

with concurrent.futures.ThreadPoolExecutor() as ex:
    ex.map(fetch_block, range(50))

A couple quick tips:

  • The ThreadPoolExecutor can be fine-tuned by including a max_workers parameter.
  • Usage of the ThreadPoolExecutor within a context manager simplifies the process of starting each thread and waiting for their completion.
  • For more sophisticated use cases, explore queues and producer-consumer pipelines.

An AsyncHTTPProvider example for the sake of comparison:

import asyncio
from web3 import Web3, AsyncHTTPProvider
from web3.eth import AsyncEth

w3 = Web3(AsyncHTTPProvider("..."))

blocks = []

async def main():
    for result in asyncio.as_completed(
        [w3.eth.get_block(num) for num in range(50)]
    ):
        b = await result
        blocks.append(b)

asyncio.run(main())

When to use threads

If you have no performance concerns or your code cannot benefit from concurrency: congratulations! You can stop here.

If your application spends much of its time I/O bound, e.g., waiting on responses from remote nodes, you've got a good candidate to explore concurrency options.

The downside of using threads is added complexity. The example code in this post may be blissfully simple, but when you need share context between threads, the onus is on you to prevent deadlocks and race conditions.

When evaluating threads vs. asyncio for your use case, this gets a bit more subjective if your scale doesn't necessitate the use of asyncio. A regular sentiment I see is that asyncio code is generally easier to write and reason about. Anecdotally, the release notes for Python 3.11 also indicate a greater recent investment in asyncio than threading by the Python core team and community.

If you're looking for a rule of thumb: reach for an async provider first. If that isn't an option, either because a needed feature isn't available yet or because you depend on a library that is not asyncio-compatible, use threads.