For the way to implement the buffers that the NIC has to deal with is a ring buffer. So each NIC queue is implemented as a ring buffer. And it is not specific to DPDK due an inefficient where we're implementing this buffer, so that there is no contention between the producers of a packet and the consumer of a packet. And if you recall, when we discussed Zen virtualizer. We also talked about the ring data structure that is used in in the Zen hypervisor, the same idea that is being used in this ring buffer of the NIC. And the idea is that this buffer is organized as a circular ring data structure. And there's a right pointer, which is the point that the NIC is using to put a packet into this into this particular slot in the ring buffer. And there is the read pointer which is being used by the CPU in order to read a packet. And because of the fact that the write pointer is local to the NIC, that's the only guy that's gonna write, modify the pointer. And the read pointer is local to the application. Here's the application is the only one that's gonna modify that and therefore there is only one entity that can write a pointer, whether it 's a read pointer or the write pointer. And that's the reason that you don't have to worry about synchronization between these two concurrent activities. And you can have a way of manipulating these network packets from the application without coordinating with the NIC. So each slot in the ring buffer, it holds a descriptor for a packet. It's important to understand what a descriptor is. Descriptor is, is not the place into which DMA is happening. Every one of these descriptors are actually pointing to a memory buffer. So it only contains a pointer to the actual packet and all the other metadata that is associated with that. That's what is being stored in every one of these descriptors, and the actual packet is stored in another data structure. And this is why I said that this should be very reminiscent for the students of the Zen ring data structure that we talked about when we talked about virtualization, better virtualization. So that's the the ring buffer, the NIC ring buffer that is being used as a way of communicating between the NIC at the bottom and the application on the top. Now expanding on this ring buffer idea, what happens is upon packet arrival, NIC populates the next vacant slot with a packet's descriptor. So the next vacant slot the NIC is gonna populate that basically dimming into the memory buffer that is associated with this particular descriptor. And then advance the pointer. So this will happen on every one of these packets as they come in, and the CPU core. One CPU core is running the network function. That's the guy that's gonna be pulling the ring for unread slots. So here is the network function, and what the network function is doing is using this read pointer to pull for packets that may have arrived. And there is a packet that has arrived as indicated in the descriptor, that it said now it's time for me to work on that. And when you descriptors are found the CPU reads the packet data for those descriptors and returns the packet to the application. Cuz there's no copying involved because the buffers that is being allocated in the user space and therefore, there's no copying involved between the NIC and the application. And there's also no need for locking as I mentioned, because there is a single producer and there's a single consumer. And only the producer is gonna modify the write pointer and the consumer is going to only modify the read pointer and therefore, there is no contention, therefore no need for locking. Since we've decoupled the producer and the consumer and of course, it could be that this data structure is completely filled up with packets that came in. And if that happens, there are no vacant descriptors, no problem, we can simply drop the packets because at the network level, it is a data gram that you're dealing with if your packet comes in. If you have place to put it, put it, if not drop it. High level entities in the entire protocol stack will deal with retransmissions and so on, so you don't have to worry about that. So that's the way the ring buffer is gonna be used in order to communicate between the NIC and the application. So the key to reducing overheads is how these buffers are managed. So we're gonna pre allocate the buffers for storing the packets. Instead of allocating a buffer for each incoming packet, what DPDK does is, it pre-allocates a multiple buffers are at the time of initialization. So the receive queue in the NIC cannot hold more packets than the capacity of the ring buffer. And this is what I said earlier that if there are no slots in the ring buffer, no problem, we can simply drop the packet and let the higher levels of the protocol stack deal with retransmissions and so on. And so the total size of packet buffers is they're very known, and that's the capacity of the ring. And as I said, each link slot points to one particular pre allocated buffer, that holds an incoming packet. So when a packet comes in, the incoming packet is DMA'd into the buffer that is associated with that particular descriptor. And along with any information that needs to be put into the descriptor about what the packet is, details of the packet. And DPDK uses huge pages to maintain large pools of memory. And each page is 2 megabytes in size, and compared to the traditional page size that is used by the virtual memory system, which is typically 4 kilobyte pages. And what that means is that, using these big pages allows a lot of packets to be contained in one page. And that means there are fewer number of pages holding larger number of packets, and therefore they will result in fewer TLB misses and that will improve the performance. So this is another way by which you can actually take what capability that is there in the hardware in order to make this packet processing more efficient. And you already saw that there is no overhead for copying the packet, because the NIC is gaming the packet directly into user space buffers. And the protocol processing TCP/IP protocol processing is done using those buffered packets in place. And of course, if the network function does not require TCP/IP at all, then it need not be used in the processing. So some network functions may need TCP IP processing, some others may not need that. But essentially what we've done is we've reduced the footprint of the code traversal that needs to be done, as well as, reduced the amount of copying and interrupts and all of that. Which is plugging implementation of network functions on Linux kernel by this technique that is being made available by DPDK. So the upshot of using DPDK for implementing a network function and with intelligent NICs is that. All the kernel overheads and packet processing that we alluded to earlier is either completely eliminated or at least mitigated, right? And for example, the huge page size ensures that there will be less number of TLB faults and so on. So you're mitigating some of the ill effects of locality loss by having the huge page sizes. And what that results is in a performance conscious implementation of a virtual network function. So now the developer of a network function can simply concentrate on the functionality of the network function. And all of the nastiness with respect to performance loss due to the Linux kernel coming in the way. You can alleviate it by delegating DPDK to take care of some of the packet processing overheads for any network function that you wanna implement.