Yet another stack
The need for yet another stack
Having spent a couple of years on nginx, we looked at envoy earlier this year. While nginx was great with speed and the ecosystem of modules, it has a learning curve and it takes getting used to.
Eliminating interrupts, locks and memory copy are well known techniques to improve performance of a network packet processing platform. With DPDK drivers, you could poll for packets (eliminating interrupts) and avoid memory copy. However efficiently polling packets from the NIC isn’t sufficient. A network stack that can process these packets in user-space is indispensable.
Another project that caught our interest was f-stack. With DPDK and FreeBSD stack, it was a great combination. And it ran nginx on top of this for L7 processing.
If you subtract nginx from the above equation and add envoy to it, that would make for a very useful platform.
Event loops
nginx implements its own event collection mechanism. Envoy uses libevent. The event loop polls for events and processes them.
So a sub-goal for this project was to integrate libevent with what f-stack provided. The next step would be to glue envoy on top of it.
At a high level, we added the capability for the libevent library to poll events on the dpdk-freebsd stack.
Flow distribution across cores
Any multi-core system uses some mechanism to distribute packets across cores. This mechanism may be encoded in an FPGA for line rate performance. And if you wrote code for Cavium’s simple-executive, it had a similar paradigm. DPDK provides similar functionality using NIC RSS aka. Receive Side Scaling.
However hardware based RSS needs hardware support. If a developer doesn’t have access to such hardware, it is tricky to develop. Also hardware RSS cannot be tweaked. (for instance, for some conditions, temporarily stop sending packets to a specific core)
We added a switch to enable software perform RSS functions. One core picks up the packets and distributes it to different cores. When deploying on hardware, hardware RSS can be enabled for superior performance. Software based flow distribution also works in non-baremetal scenarios like AWS (with ENA enabled — our primary development environment).
Dropping Envoy in the mix
The next step was having envoy call into libevent to poll both these stacks. The result was an ability to work with sockets on both the stacks. This was really beneficial. For instance, you could still poll envoy stats using a host socket (and capture information about what the dpdk-freebsd stack was doing).
Two flavors of socket
Our particular implementation defines two flavors of socket (like f-stack). The protobuf interface was modified to accommodate an additional parameter. During socket creation you could specify if the socket being created is for host or otherwise.
SSL
SSL library in envoy (boring SSL) is also aware of these two socket flavors. So it does the right thing and picks the appropriate stack.
Ease of updating codebase
Our first integration was with Envoy 1.7. It took us approximately a week or two to upgrade to 1.8. This was possible since the changes to Envoy are minimal and localized. Going forward, it should be easier to pull more features into this codebase.
Performance
While we did not formally test performance, with our few tests we found it was substantially superior to vanilla envoy.
Driving Multiple Envoys
While scaling Envoy using YAStack is one mechanism, another way to achieve it is by running multiple Envoy's to scale it horizontally. Enroute Universal Gateway can be used for such horizontal scaling.
Conclusion
We feel this platform will help build next generation of NFV appliances that can provide superior performance and functionality with commodity hardware. We are open sourcing this project. Details about the project can be found at yastack.io