Web3.py Internals: JSON-RPC Round Trips

Here be dragons! This is a deep dive into some of the internals of Web3.py. This post may be for you if you're A) interested in contributing to the Web3.py codebase, B) implementing custom modules, methods, or middleware, or C) otherwise doing some deep debugging.

In this post, we'll take a look at what a round-trip from your command line to an Ethereum node and back looks like as it travels through Web3.py. For the sake of the example, we'll query the balance of an account and trace its path in the code. The sample code in this post is pulled from the Web3.py codebase, but is simplified in areas to convey the relevant point. Ready?

The Web3 Class

Using Web3.py almost always starts with the instantiation of a Web3 object. There are opportunities to configure the object after its instantiated, but you'll need to pass the relevant Provider up front. In this example, we'll be using the HTTPProvider.

from web3 import Web3, HTTPProvider

w3 = Web3(HTTPProvider('https://<your-provider-url>'))

w3.isConnected()
# True

When you create the Web3 object, a lot's happening under the hood, but notably you're getting a request manager and a few modules.

class Web3:
    def __init__(
        self,
        provider = None,
        middlewares = None,
        modules = None,
        external_modules = None,
        ens = cast(ENS, empty)
    ) -> None:
        self.manager = self.RequestManager(self, provider, middlewares)

        self.codec = ABICodec(build_default_registry())

        if modules is None:
            modules = get_default_modules()
        self.attach_modules(modules)

        if external_modules is not None:
            self.attach_modules(external_modules)

        self.ens = ens

For our first rabbit hole, let's start with modules.

The Module Class

Most users have their needs met with the default set of modules. If you don't pass in a custom selection, Web3.py will start you off with the following:

def get_default_modules():
    return {
        "eth": Eth,
        "net": Net,
        "version": Version,
        "parity": (Parity, {
            "personal": ParityPersonal,
        }),
        "geth": (Geth, {
            "admin": GethAdmin,
            "miner": GethMiner,
            "personal": GethPersonal,
            "txpool": GethTxPool,
        }),
        "testing": Testing,
    }
(Some legacy support in there)

For this walkthrough, we're interested in querying the balance of an account. This functionality lives within the Eth module above, like all other methods defined in the standard Ethereum JSON-RPC API. Specifically, the JSON-RPC method we're interested in is eth_getBalance.

Note that if you want to query a balance, Ethereum clients expect that request in this JSON format:

{
    "jsonrpc": "2.0",
    "method": "eth_getBalance",
    "params": ["0x3C6...", "latest"],
    "id": 10
}

It would be exceptionally inconvenient to craft requests like this by hand, so Web3.py provides a friendlier interface. Executing w3.eth.get_balance('0x3C6...') will generate and send the appropriate JSON-RPC request, similar to that above. Let's dig into how that happens.


We now know that eth_getBalance and the other standard Ethereum methods are encapsulated within Web3.py's Eth module. The definition of those methods looks something like this:

class Eth(Module):
    ...
    get_balance = Method(
        RPC.eth_getBalance,
        mungers=[BaseEth.block_id_munger],
    )
    ...

A few things to note here:

  1. the Eth module inherits from a Module class,
  2. get_balance is defined as an instance of the Method class, and
  3. we're including something called a "munger."

Lets start at the top.

Each of Web3.py's modules inherit from a Module class which has a limited, but important set of responsibilities captured in the retrieve_caller_fn method. TL;DR – when the get_balance method is called, inputs are formatted, the JSON-RPC payload is constructed and sent, then result formatters are applied to the response.

def retrieve_method_call_fn(w3, module, method):
    def caller(*args, **kwargs):

        # 1) Apply input mungers
        (method_str, params), response_formatters = method.process_params(module, *args, **kwargs)
        ...

        # 2) Have the RequestManager build and send the tx
        result = w3.manager.request(method_str,
                                    params,
                                    error_formatters,
                                    null_result_formatters)

        # 3) Format human-readable results
        return apply_result_formatters(result_formatters, result)
    return caller
    

class Module:
    def __init__(self, w3):
        self.retrieve_caller_fn = retrieve_method_call_fn(w3, self)
        self.w3 = w3
        self.codec: ABICodec = w3.codec
        
    def __get__(self, obj = None, obj_type = None):
        return obj.retrieve_caller_fn(self)

Request and response formatters play a large role in making blockchain data more user-friendly. When you make the eth_getBalance call, the Ethereum client is going to return a hexadecimal string ("hex string"), as the Ethereum JSON-RPC spec requires. In other words, the client will return a payload that looks something like this:

{
    'jsonrpc': '2.0',
    'id': 6, 
    'result': '0x83a3c396d1a7b40'
}

That's not exactly human-readable, so Web3.py applies a response formatter to convert that hex string into an integer. Several formatters are maintained within the method_formatters.py module, including the relevant PYTHONIC_RESULT_FORMATTERS:

PYTHONIC_RESULT_FORMATTERS = {
    ...
    RPC.eth_getBalance: to_integer_if_hex,
    ...
}

Note that Ethereum clients also expect a hexadecimal string if you're querying for balance at a specific block number. A request formatter enables users to simply pass in an integer value, e.g., w3.eth.get_balance('0x123...', 500000), instead of manually converting it to a hex string:

PYTHONIC_REQUEST_FORMATTERS = {
    ...
    RPC.eth_getBalance: apply_formatter_at_index(to_hex_if_integer, 1),
    ...
}

These formatters are unique for each method, so it follows that they would be registered in the Method class. Smooth transition.

The Method Class

Now that we've got a high-level view of the Module class, let's zoom back in to the Method class. Recall that get_balance is an instance of the Method class:

class Eth(Module):
    ...
    get_balance = Method(
        RPC.eth_getBalance,
        mungers=[BaseEth.block_id_munger],
    )
    ...
For convenience, the get_balance definition again

The Method class simply offers a composable way to maintain several incoming and outgoing payload formatters per method. Here's a look at the __init__ function to give you an idea of what exactly is maintained:

class Method:
    def __init__(
        self,
        json_rpc_method = None,
        mungers = None,
        request_formatters = None,
        result_formatters = None,
        null_result_formatters = None,
        method_choice_depends_on_args = None,
        is_property = False,
    ):
        self.json_rpc_method = json_rpc_method
        self.mungers = _set_mungers(mungers, is_property)
        self.request_formatters = request_formatters or get_request_formatters
        self.result_formatters = result_formatters or get_result_formatters
        self.null_result_formatters = null_result_formatters or get_null_result_formatters
        self.method_choice_depends_on_args = method_choice_depends_on_args
        self.is_property = is_property

For many of these values, if a formatter is not passed in, reasonable defaults are chosen. This is the case for get_balance, since we've only passed in the json_rpc_method and a munger.

The final loose thread: what are mungers? This generic term conveys that some data transformation may be occurring beyond just type formatting. The get_balance method provides a good example. The method accepts two arguments: an address and a block identifier to determine at what point in time you'd like to view the balance of that address. Accepted block identifier values include "earliest", "latest", "pending", or a specific block number.

In the get_balance method definition we've included a block_id_munger. This particular munger simply sets a default block identifier if none is provided. By default, this value is "latest", indicating that we're interested in the current balance of an account.

def block_id_munger(self, account, block_identifier = None):
    if block_identifier is None:
        block_identifier = self.default_block
    return (account, block_identifier)

With that, we've covered most of the important building blocks. Let's finish up with middleware then walk through the query round trip from start to finish.

Middleware

Middleware are functions that can intercept and perform arbitrary actions on outgoing requests and incoming responses. Those actions can include logging, data formatting, rerouting a subset of requests to an entirely different endpoint, and whatever else you can dream up.

You may recall that upon creation of the Web3 instance, the RequestManager is the recipient of any middlewares you pass in. If none are provided, a set of default middleware is included:

@staticmethod
def default_middlewares(w3):
    return [
        (request_parameter_normalizer, 'request_param_normalizer'),
        (gas_price_strategy_middleware, 'gas_price_strategy'),
        (name_to_address_middleware(w3), 'name_to_address'),
        (attrdict_middleware, 'attrdict'),
        (pythonic_middleware, 'pythonic'),
        (validation_middleware, 'validation'),
        (abi_middleware, 'abi'),
        (buffered_gas_estimate_middleware, 'gas_estimate'),
    ]

Each tuple in this list contains a middleware function and whatever name you want to assign that middleware. Let's zoom in on the name_to_address middleware.

The name in name_to_address refers to an Ethereum Name Service (ENS) name. Web3.py has support for ENS names, meaning that you can request the balance of a human-readable domain like shaq.eth, instead of the long-form address, 0x3C6aEFF92b4B35C2e1b196B57d0f8FFB56884A17. Under the hood, the name_to_address middleware intercepts eth_getBalance requests with an ENS name as a parameter, resolves the name to an Ethereum hex string address, then forwards the call on to the next middleware or executes the request.

# 0) original request:
w3.eth.get_balance('shaq.eth')

# 1) after input munging:
w3.eth.get_balance('shaq.eth', 'latest')

# 2) after request middleware:
w3.eth.get_balance('0x3C6aEFF92b4B35C2e1b196B57d0f8FFB56884A17', 'latest')

In Web3.py, one middleware can affect incoming and outgoing requests, so the playful name "middleware onion" was adopted to visually represent their application. In this case, the name_to_address middleware only formats outgoing requests, but if you have need, you are free to write a custom address_to_name response middleware that converts addresses to ENS names for specific calls.

Summary: The Full Round Trip

Let's bring it all home.

  • When you create a new Web3 instance and pass in a provider, you're getting some name-spaced modules and a RequestManager that maintains a middleware stack.
  • When you execute the get_balance method on the Eth module, input mungers are applied first. In this case, if you execute w3.eth.get_balance('shaq.eth'), the block_id_munger will add the default value of 'latest' as the second parameter.
  • Next, Web3.py would look to apply any request formatters. If you happened to want Shaq's balance at a specific block, say, at block number 9999999, the Pythonic request formatters would convert that to a hex string – the format expected by Ethereum clients.
  • Middlewares are triggered next, performing any final manipulations prior to dispatching the request. For example, ENS names will be resolved to Ethereum account addresses via the name_to_address middleware.
  • After all middleware functions are called, the provider builds the JSON-RPC request and sends the request via the appropriate channel – HTTP, IPC, or WebSockets.
  • The response from the Ethereum client is decoded then passed back through the middleware onion. Applicable response middleware are executed.
  • Finally, back within the module, human-readable response formatters are applied. If eth_getBalance returns the hex string '0x819ef3b0a273233', then the Pythonic response formatter will convert that to an integer (583760663573639731) and returns that value in wei back the user.

A visual representation:

That's quite enough rabbit holing for one post. To continue the journey, see the Web3.py documentation, open issues as appropriate, join the Ethereum Python Discord community, and keep building. See you on the other side.
🐰🕳