Web3.py Internals: JSON-RPC Round Trips
Here be dragons! This is a deep dive into some of the internals of Web3.py. This post may be for you if you're A) interested in contributing to the Web3.py codebase, B) implementing custom modules, methods, or middleware, or C) otherwise doing some deep debugging.
In this post, we'll take a look at what a round-trip from your command line to an Ethereum node and back looks like as it travels through Web3.py. For the sake of the example, we'll query the balance of an account and trace its path in the code. The sample code in this post is pulled from the Web3.py codebase, but is simplified in areas to convey the relevant point. Ready?
The Web3
Class
Using Web3.py almost always starts with the instantiation of a Web3
object. There are opportunities to configure the object after its instantiated, but you'll need to pass the relevant Provider up front. In this example, we'll be using the HTTPProvider
.
from web3 import Web3, HTTPProvider
w3 = Web3(HTTPProvider('https://<your-provider-url>'))
w3.isConnected()
# True
When you create the Web3
object, a lot's happening under the hood, but notably you're getting a request manager and a few modules.
class Web3:
def __init__(
self,
provider = None,
middlewares = None,
modules = None,
external_modules = None,
ens = cast(ENS, empty)
) -> None:
self.manager = self.RequestManager(self, provider, middlewares)
self.codec = ABICodec(build_default_registry())
if modules is None:
modules = get_default_modules()
self.attach_modules(modules)
if external_modules is not None:
self.attach_modules(external_modules)
self.ens = ens
For our first rabbit hole, let's start with modules.
The Module
Class
Most users have their needs met with the default set of modules. If you don't pass in a custom selection, Web3.py will start you off with the following:
For this walkthrough, we're interested in querying the balance of an account. This functionality lives within the Eth
module above, like all other methods defined in the standard Ethereum JSON-RPC API. Specifically, the JSON-RPC method we're interested in is eth_getBalance
.
Note that if you want to query a balance, Ethereum clients expect that request in this JSON format:
{
"jsonrpc": "2.0",
"method": "eth_getBalance",
"params": ["0x3C6...", "latest"],
"id": 10
}
It would be exceptionally inconvenient to craft requests like this by hand, so Web3.py provides a friendlier interface. Executing w3.eth.get_balance('0x3C6...')
will generate and send the appropriate JSON-RPC request, similar to that above. Let's dig into how that happens.
We now know that eth_getBalance
and the other standard Ethereum methods are encapsulated within Web3.py's Eth
module. The definition of those methods looks something like this:
class Eth(Module):
...
get_balance = Method(
RPC.eth_getBalance,
mungers=[BaseEth.block_id_munger],
)
...
A few things to note here:
- the
Eth
module inherits from aModule
class, get_balance
is defined as an instance of theMethod
class, and- we're including something called a "munger."
Lets start at the top.
Each of Web3.py's modules inherit from a Module
class which has a limited, but important set of responsibilities captured in the retrieve_caller_fn
method. TL;DR – when the get_balance
method is called, inputs are formatted, the JSON-RPC payload is constructed and sent, then result formatters are applied to the response.
def retrieve_method_call_fn(w3, module, method):
def caller(*args, **kwargs):
# 1) Apply input mungers
(method_str, params), response_formatters = method.process_params(module, *args, **kwargs)
...
# 2) Have the RequestManager build and send the tx
result = w3.manager.request(method_str,
params,
error_formatters,
null_result_formatters)
# 3) Format human-readable results
return apply_result_formatters(result_formatters, result)
return caller
class Module:
def __init__(self, w3):
self.retrieve_caller_fn = retrieve_method_call_fn(w3, self)
self.w3 = w3
self.codec: ABICodec = w3.codec
def __get__(self, obj = None, obj_type = None):
return obj.retrieve_caller_fn(self)
Request and response formatters play a large role in making blockchain data more user-friendly. When you make the eth_getBalance
call, the Ethereum client is going to return a hexadecimal string ("hex string"), as the Ethereum JSON-RPC spec requires. In other words, the client will return a payload that looks something like this:
{
'jsonrpc': '2.0',
'id': 6,
'result': '0x83a3c396d1a7b40'
}
That's not exactly human-readable, so Web3.py applies a response formatter to convert that hex string into an integer. Several formatters are maintained within the method_formatters.py
module, including the relevant PYTHONIC_RESULT_FORMATTERS
:
PYTHONIC_RESULT_FORMATTERS = {
...
RPC.eth_getBalance: to_integer_if_hex,
...
}
Note that Ethereum clients also expect a hexadecimal string if you're querying for balance at a specific block number. A request formatter enables users to simply pass in an integer value, e.g., w3.eth.get_balance('0x123...', 500000)
, instead of manually converting it to a hex string:
PYTHONIC_REQUEST_FORMATTERS = {
...
RPC.eth_getBalance: apply_formatter_at_index(to_hex_if_integer, 1),
...
}
These formatters are unique for each method, so it follows that they would be registered in the Method
class. Smooth transition.
The Method
Class
Now that we've got a high-level view of the Module
class, let's zoom back in to the Method
class. Recall that get_balance
is an instance of the Method
class:
The Method
class simply offers a composable way to maintain several incoming and outgoing payload formatters per method. Here's a look at the __init__
function to give you an idea of what exactly is maintained:
class Method:
def __init__(
self,
json_rpc_method = None,
mungers = None,
request_formatters = None,
result_formatters = None,
null_result_formatters = None,
method_choice_depends_on_args = None,
is_property = False,
):
self.json_rpc_method = json_rpc_method
self.mungers = _set_mungers(mungers, is_property)
self.request_formatters = request_formatters or get_request_formatters
self.result_formatters = result_formatters or get_result_formatters
self.null_result_formatters = null_result_formatters or get_null_result_formatters
self.method_choice_depends_on_args = method_choice_depends_on_args
self.is_property = is_property
For many of these values, if a formatter is not passed in, reasonable defaults are chosen. This is the case for get_balance
, since we've only passed in the json_rpc_method
and a munger.
The final loose thread: what are mungers? This generic term conveys that some data transformation may be occurring beyond just type formatting. The get_balance
method provides a good example. The method accepts two arguments: an address and a block identifier to determine at what point in time you'd like to view the balance of that address. Accepted block identifier values include "earliest"
, "latest"
, "pending"
, or a specific block number.
In the get_balance
method definition we've included a block_id_munger
. This particular munger simply sets a default block identifier if none is provided. By default, this value is "latest"
, indicating that we're interested in the current balance of an account.
def block_id_munger(self, account, block_identifier = None):
if block_identifier is None:
block_identifier = self.default_block
return (account, block_identifier)
With that, we've covered most of the important building blocks. Let's finish up with middleware then walk through the query round trip from start to finish.
Middleware
Middleware are functions that can intercept and perform arbitrary actions on outgoing requests and incoming responses. Those actions can include logging, data formatting, rerouting a subset of requests to an entirely different endpoint, and whatever else you can dream up.
You may recall that upon creation of the Web3
instance, the RequestManager
is the recipient of any middlewares you pass in. If none are provided, a set of default middleware is included:
@staticmethod
def default_middlewares(w3):
return [
(request_parameter_normalizer, 'request_param_normalizer'),
(gas_price_strategy_middleware, 'gas_price_strategy'),
(name_to_address_middleware(w3), 'name_to_address'),
(attrdict_middleware, 'attrdict'),
(pythonic_middleware, 'pythonic'),
(validation_middleware, 'validation'),
(abi_middleware, 'abi'),
(buffered_gas_estimate_middleware, 'gas_estimate'),
]
Each tuple in this list contains a middleware function and whatever name you want to assign that middleware. Let's zoom in on the name_to_address
middleware.
The name
in name_to_address
refers to an Ethereum Name Service (ENS) name. Web3.py has support for ENS names, meaning that you can request the balance of a human-readable domain like shaq.eth
, instead of the long-form address, 0x3C6aEFF92b4B35C2e1b196B57d0f8FFB56884A17
. Under the hood, the name_to_address
middleware intercepts eth_getBalance
requests with an ENS name as a parameter, resolves the name to an Ethereum hex string address, then forwards the call on to the next middleware or executes the request.
# 0) original request:
w3.eth.get_balance('shaq.eth')
# 1) after input munging:
w3.eth.get_balance('shaq.eth', 'latest')
# 2) after request middleware:
w3.eth.get_balance('0x3C6aEFF92b4B35C2e1b196B57d0f8FFB56884A17', 'latest')
In Web3.py, one middleware can affect incoming and outgoing requests, so the playful name "middleware onion" was adopted to visually represent their application. In this case, the name_to_address
middleware only formats outgoing requests, but if you have need, you are free to write a custom address_to_name
response middleware that converts addresses to ENS names for specific calls.
Summary: The Full Round Trip
Let's bring it all home.
- When you create a new
Web3
instance and pass in a provider, you're getting some name-spaced modules and aRequestManager
that maintains a middleware stack. - When you execute the
get_balance
method on theEth
module, input mungers are applied first. In this case, if you executew3.eth.get_balance('shaq.eth')
, theblock_id_munger
will add the default value of'latest'
as the second parameter. - Next, Web3.py would look to apply any request formatters. If you happened to want Shaq's balance at a specific block, say, at block number 9999999, the Pythonic request formatters would convert that to a hex string – the format expected by Ethereum clients.
- Middlewares are triggered next, performing any final manipulations prior to dispatching the request. For example, ENS names will be resolved to Ethereum account addresses via the
name_to_address
middleware. - After all middleware functions are called, the provider builds the JSON-RPC request and sends the request via the appropriate channel – HTTP, IPC, or WebSockets.
- The response from the Ethereum client is decoded then passed back through the middleware onion. Applicable response middleware are executed.
- Finally, back within the module, human-readable response formatters are applied. If
eth_getBalance
returns the hex string'0x819ef3b0a273233'
, then the Pythonic response formatter will convert that to an integer (583760663573639731
) and returns that value in wei back the user.
A visual representation:
That's quite enough rabbit holing for one post. To continue the journey, see the Web3.py documentation, open issues as appropriate, join the Ethereum Python Discord community, and keep building. See you on the other side.
🐰🕳