YAStack: Journey of a packet
--
Introduction
When your complete networking stack right from L2-L7 is in user-space, it provides you with the luxury of using gdb to walk through all the layers of the networking stack.
We’ll touch upon some of the entry points starting right from DPDK driver functions. Next every packet is handed over to the TCP/IP stack, and we highlight some of the functions at L2/L3 where the packets are inserted in the stack. Packets traversing through the stack end up on a socket. Libevent works over the socket layer to generate read/write events for envoy, which then runs its listener callbacks.
This article will touch several different subsystems. It won’t cover any of them in depth. However there will be a list of gdb breakpoints we highlight. These can be easily imported in a gdbinit file and the same walk-through can be replicated on a developers environment.
Also for sake of simplicity, we’ll run this on a one-core setup. This doesn’t change anything from functionality point of view. The architecture of yastack also is such, that the understanding a single-process system can be easily extrapolated to a multi process system. There is some code (software RSS specifically), that brings the multi-core aspect to the project.
We assume the plumbing is already in place for packets to be processed. Plumbing aspects of the system can be traced starting at ff_init()/ff_dpdk_init() calls.
The hardware abstraction of network device
Apart from abstracting low-level resources like hardware and memory, the DPDK library strives to build an abstraction over different network drivers. It does this by using a struct rte_eth_dev (rte_ethdev_core.h) that holds state (vrte_eth_dev_data) and callbacks (eth_dev_ops) for different drivers (intel, mellanox, ena etc.).
The poll mode driver(PMD) is a layer of abstraction that works on this data structure. It provides a consistent abstraction over different types of hardware. Invocations to functions inside the PMD result in invocation of driver specific calls through this data structure.
We are interested in rte_eth_rx_burst()/rte_eth_tx_burst() that help us get packets from the hardware and send packets to the hardware respectively. These functions are geared towards multi-queue read/write semantics to support multi-core function.
DPDK entry point
The while(1) loop of the system uses rte_eth_rx_burst()/rte_eth_tx_burst() to read/write packets from the nic.
Toepilitz hash to a core
The packets read are now sent through toeplitz hash to determine the core they are destined to. The receiving core hands packets to a different core using a queue destined to that core.
The process_packets call invokes the packet_dispatcher that provides us the with the queue. Once the queue is known, the packet is enqueued on that queue using the rte_ring_enqueue.
The packet becomes an mbuf
The enqueued packet is read from the queue using the rte_ring_dequeue_burst. The packet is now ready for its journey through the stack.
The core handling the packet invokes ff_veth_input by dequeing the packet from the queue. This function sets up the mbuf for the stack from the rte_mbuf and invokes the ff_veth_process_packet.
The subsequent if_input (which is a pointer to ether_input) call is similar to what a non-DPDK hardware device driver would invoke to hand the packet over to the network stack.
Libevent to envoy network listener
After the packet moves through the different layers of TCP stack and is enqueued on a socket.
This results in an accept in libevent and it invokes listener_read_cb from libevent.
The libevent handler invokes the envoy registered callback — Envoy::Network::ListenerImpl::listenCallback. This invokes the envoy state machines.
Conclusion
The current walk through is a very high level one. There is a lot of detail we have omitted to keep it simple. In the next few write-ups, we’ll cover more details on some of the other aspects of this walkthrough (like Envoy state machines, SSL state machine, a bit of TCP/IP stack etc.)
Additionally to achieve this, envoy threading model was changed. Each core runs a separate process with an envoy running on it. This is different from the multi-threaded model vanilla envoy has.
The breakpoints used in this walk-through are checked into github here. Macros of interest in this file are — break_journey_of_a_packet_in, break_journey_of_a_packet_out, break_journey_of_a_packet_with_ssl.
define break_journey_of_a_packet_in
break ether_input
break ip_input
break tcp_input
break listener_read_cb
break evutil_accept4_
break ff_accept
break Envoy::Network::ListenerImpl::listenCallback
break Envoy::Network::ConnectionImpl::onFileEvent
break Envoy::Network::ConnectionImpl::onReadReady
end
define break_journey_of_a_packet_out
break ether_output
break ip_output
break tcp_output
break Envoy::Network::ConnectionImpl::onFileEvent
break ff_write
end
define break_journey_of_a_packet_with_ssl
break Envoy::Ssl::SslSocket::doRead
break Envoy::Ssl::SslSocket::doWrite
break Envoy::Ssl::SslSocket::doHandshake
break bssl::tls_write_buffer_flush
break BIO_write
break BIO_read
end
Invoking these macros in gdb automatically sets up the relevant breakpoints.
If you have any feedback please feel free to reach out using the contact form here.