WEBVTT

00:00.000 --> 00:11.800
So, hi everyone. So, today I'm going to talk also about VPP. I'm going to take a more

00:11.800 --> 00:16.560
concrete example. We'll talk about levitation mid-handy, roll your on my level of

00:16.560 --> 00:23.200
balancer with VPP. So, first, a few words about myself. I'm Nathan. I'm the software

00:23.200 --> 00:28.680
engineering working for Cisco. And I'm doing mostly continually working, VPP and

00:28.680 --> 00:35.020
VPP and also have a bit of trouble past with Python and JavaScript called applications.

00:35.020 --> 00:41.140
So, before we start, a small bit of a word of warning. If someone is afraid of

00:41.140 --> 00:50.320
that six-sixer, please be aware. I made the machine in that. Anyway, so let's get

00:50.320 --> 00:58.280
solid. What I'm going to talk about is taking an example of flood applications where

00:58.320 --> 01:02.440
you want to deploy in a typical cloud environment. And for which you want to have

01:02.440 --> 01:07.960
good SLAs. So, typically what you want to do is have a redundancy, which typically means

01:07.960 --> 01:11.840
having a lot of balancer in front of it. And you want that lot of balancer to have nice

01:11.840 --> 01:19.280
properties like recovering for failures or seamless horizontal scanning. So, one of the issues

01:19.280 --> 01:24.760
with other debock load balancing is that it comes with limitations. As typically, your hash

01:24.760 --> 01:31.820
thresholds when the set of back end changes. So, that's why Maglev was actually made. And

01:31.820 --> 01:36.640
Maglev is just a smart way of doing load balancing that offers the nice properties that

01:36.640 --> 01:42.280
we are searching for. Namely, it's supposed to be relatively stable when the set of back

01:42.280 --> 01:48.080
end changes, which is exactly what we want. And also, it has a few things that it's supposed

01:48.080 --> 01:53.760
to work with. It recommends working with direct service return, DSR. So, that you don't

01:53.760 --> 01:58.320
have to start stating the load balancer. You just for what I get a bonfire for the return.

01:58.320 --> 02:04.240
In that way, you also spare a network hub on the way which is good for performance. And

02:04.240 --> 02:08.720
one of the things in the paper is that it recommends working with a user space stack. But

02:08.720 --> 02:15.520
that I'll come back to that later. So, how does it work? In practice, what you do is you

02:15.520 --> 02:22.240
consider end buckets. And the buckets are going to be an interaction from flows to the application

02:22.240 --> 02:27.120
back ends that you deploy. And typically, you want to choose emiratively big, so typically

02:27.120 --> 02:36.840
100 times maximum number of back ends that you want to have. And sorry, pop up. So, that

02:36.840 --> 02:44.840
you keep your hash space wide enough so that the back ends don't collide too much. So,

02:44.840 --> 02:49.040
what you're going to do is for each of your application back ends, you're going to assign

02:49.120 --> 02:55.360
to them a preference list, which is going to be just a permutation of the set 1 to m. And

02:55.360 --> 03:00.960
then each of the back ends will receive that list as a preference list of the buckets. And

03:00.960 --> 03:07.200
they will be with the other great algorithm on the side, choosing the buckets by their preference.

03:07.200 --> 03:13.600
So, this, the reason we do that is this is to ensure that the back ends to back end mappings

03:13.600 --> 03:19.280
will stay relatively stable when the set of back end changes because each of them is more

03:19.280 --> 03:27.520
less guaranteed to get it's first pick and then it's second pick and so on. And then when you

03:27.520 --> 03:32.720
have been coming flows, what you'll do is just regular hashing, you pick a bucket and that

03:32.720 --> 03:37.600
bucket gives you the back end pair of the previous mapping. And that's it. That's all you have to

03:37.600 --> 03:46.080
do your flows a little balance and life it does. So, to give you a bit more colors on what

03:46.080 --> 03:51.600
on why it is exactly helps. If we take an example of two back ends, so here we have one green

03:51.600 --> 03:59.280
and one blue. And we have there with their respective preference list and we have them choose the

03:59.280 --> 04:06.160
buckets. We little balance if you flows that go into the back list and we consider the addition

04:06.160 --> 04:11.760
of the third one, the red one, what will happen is that obviously that third back end will take

04:11.760 --> 04:17.600
your third of the buckets, so probably a third of the flows. But it will leave the rest of the mapping

04:17.600 --> 04:23.840
mostly untouched because of the preference list. So, both blue and green, they will keep their

04:23.840 --> 04:30.720
preference list and so they will most likely keep their set of back ends stable. And given that

04:30.720 --> 04:36.080
the number of buckets is high, you have a huge likelihood of having the same mapping in the end.

04:36.080 --> 04:40.560
So, that way the flows are struggling to stay relatively minimal, which is not the case if you

04:40.560 --> 04:48.080
consider an either a balance or based on just the hash and a module of it. So, that's for the

04:48.080 --> 04:56.960
third, but how do we make this work in practice? So, if we go back to all to application and we say

04:56.960 --> 05:05.440
that we deploy it in a cloud provider in VMs and each VM has its own IP address, one thing that we

05:05.440 --> 05:10.320
have to account for is that ingress is typically constrained. So, you have to go through an infracton

05:10.320 --> 05:16.960
struct that typically already does a translation on some kind. And at the same time, Egress is also

05:16.960 --> 05:24.320
constrained because you typically don't want to send any packets as a cloud provider. So, what will

05:24.320 --> 05:30.480
make it hard to do with the SO? Because you basically, as an application, you cannot issue packets

05:31.520 --> 05:40.000
to an IP that's from an IP that is not assigned to from the introspective. Also something to note

05:40.000 --> 05:45.120
is that if we want to do direct service return, we will need additional configuration to the

05:45.120 --> 05:50.160
applications themselves because typically you want an cap and you want to terminate packets to an

05:50.240 --> 05:56.240
IP that is not yours. So, that brings in additional complexity. So, for the sake of simplicity

05:56.240 --> 06:03.200
in this example, I'm considering that instead of the SO, that is that all my grievances will

06:03.200 --> 06:10.960
receive packets to a IP, not them with the back-and-eye feature, each is. So, ironically, one of the

06:10.960 --> 06:16.080
issues that is that we have now to keep say it around, but nothing for a bit is returned back to the

06:17.040 --> 06:24.160
to the uncapsi if we manage to extract the constraint of the deployment. So, let's consider

06:24.160 --> 06:30.000
all of the answer. What we will do is provision a VM and we want the packets from the client IP

06:30.000 --> 06:37.200
to the two of the MSIP, to be source-natted, not to be destination-natted to the back-and-eye

06:37.200 --> 06:43.040
IP that we should. So, that's the MagLeft part. And the source-natted with the VM's IP so that

06:43.040 --> 06:50.720
the traffic flows back to us and we can undo our work. So, in order to do that, we need an implementation

06:51.440 --> 06:55.680
and, for honestly, unfortunately, that's not something that's natively offered by the kernel.

06:56.400 --> 07:03.280
But also, as in the algorithm, there are many important implementation decals and also parameters

07:03.280 --> 07:09.600
that you need to input. It's probably not going to be able to be leveraged out of the box.

07:09.760 --> 07:14.880
And also, a full point is that we want this to be quite performance because we want to handle

07:14.880 --> 07:20.800
the traffic of all the packets on the single VM. So, that sounds like a great use case for user-based

07:20.800 --> 07:26.400
networking technology like VPP and that should not really come as a surprise being in the name of the

07:26.400 --> 07:35.280
talk. So, yeah, let's start VPP then. So, in order to do that, one thing if you do is just

07:35.280 --> 07:41.840
issued that document line. So, you pick a relatively recent release of VPP. You're free in a few

07:41.840 --> 07:47.680
stands-up, basically you pass the PCI-ID of the interface and that's it. The thing is running without

07:47.680 --> 07:56.480
anything more to you. But if you do that, one thing that may happen is that if we have a single

07:56.480 --> 08:03.840
interface, which is the case on most VMs that you spun up for starters, you will lose access

08:03.840 --> 08:09.120
because VPP has it, but not the host anymore. So, ideally, if you want to solve that, we'll add

08:09.120 --> 08:14.800
an extra management interface, but not every deployment has this line around. So, we are a bit

08:14.800 --> 08:20.560
a bit screwed if we go that way. And something that would be nice is to have a way to somehow

08:20.560 --> 08:28.480
duplicate it zero inside the host. So, fortunately, that's something you can do. Making VPP act as a

08:28.480 --> 08:34.320
event, as a bump in the wire, using the, what we call the pen feature, meaning we recreate

08:34.320 --> 08:38.880
the tap interface in Linux. Give it the exact same configuration as in the host and just tell

08:38.880 --> 08:46.640
you P pass every packet that you don't know what you do with to those. It's quite ugly from

08:46.640 --> 08:51.040
network perspective because you're sharing IPs and that you're not supposed to do that. But

08:51.040 --> 08:56.480
one of the benefits is that you're making the host life transparent and you manage to keep your

08:56.480 --> 09:04.800
SSH stations open. So, that's, that's that is something. So, now, if we go back to our

09:04.800 --> 09:10.880
Microsoft sorry, now that we have VPP instance running, what we need to do is just add the

09:10.880 --> 09:18.240
Microsoft magic to it. So, and just the average one of the nodes available that was presented

09:18.240 --> 09:24.080
before, that's going to do the fetching of the packet data, looking the appropriate set of

09:24.080 --> 09:29.440
backends and just rewriting the addresses as we need. So, for that, we'll leverage a plug-in

09:29.440 --> 09:36.080
called CNAT, standing for the demonic child of cloud and network adversensation for lack of a

09:36.080 --> 09:42.240
bit of a name, that already has this kind of logic implemented because we don't want to

09:42.240 --> 09:50.880
redo all the, all the checks and computations ourselves. And the thing that we still have to do is

09:50.960 --> 09:58.000
make that piece of code where of the IPs that we, that we want to rewrite the package with.

09:59.840 --> 10:05.280
So, for that, writing control plane logic in C is quite cumbersome. So, what we'll do is

10:05.280 --> 10:12.720
pull in agents. So, why choose going because, because I do that on the, on my spare time.

10:13.360 --> 10:18.800
And that way, what this allows us is to leverage all of our regulatory of the moment. So,

10:18.800 --> 10:25.040
you can't throw in console, let's see, just pull whatever from GAB. Access, whether a series

10:25.040 --> 10:31.520
of scoring mechanism, you available, available. And then, open a connection to the IP and talk,

10:31.520 --> 10:35.680
talk to it through the binary API without. So, basically, you, you, you split a bit

10:35.680 --> 10:41.760
the complaint and data by logic. You have all the complex and, uh, and even driven logic in

10:41.840 --> 10:46.240
go-lank. And then you cross the binary API and you're just, uh, in C, the wing data plane.

10:47.280 --> 10:51.760
And back a bit done relatively easily. You have an API of a library available. You just

10:51.760 --> 10:58.480
initiate a connection to the IP. You tell that you want to migrate, NAT, DCP for AT,

10:59.360 --> 11:06.000
to your, this, to, to, to your, your own IP. And then you want to realize that through the set of,

11:06.560 --> 11:10.960
of backends that you have available. And that's it. Uh, you can't know, talk,

11:10.960 --> 11:19.520
through the, to the backends using the, the, the, the, the, the address. So, that set things are

11:19.520 --> 11:24.160
working, but I, I think it's still worth asking, why did we, did we, did we, did we regain something?

11:24.880 --> 11:30.240
So, I, I, I, I, I, I, I, you're doing that, uh, brings in a few, a few benefits. So, one of them is that

11:30.240 --> 11:34.320
we know have full control on the, the load balancer, how it's implemented, how it's configured,

11:34.960 --> 11:40.960
uh, all that. Um, it's, it's, it's portable, which speaks for deployment uniformity, meaning you

11:40.960 --> 11:45.280
have the same behavior, regardless of where do you apply that? If you have multiple crop providers,

11:45.280 --> 11:49.920
if you're testing locally, you have, you have the exact same behavior. And also the thing is

11:49.920 --> 11:53.760
reciting that way to deploy. Meaning you have a container, you can run it fairly much everywhere.

11:53.760 --> 11:59.120
And underneath, if you have DPDK or some driver, that allows you to swap the, the interface

11:59.200 --> 12:03.840
function. And finally, we're getting performance, um, we should be mostly, or

12:03.840 --> 12:08.960
undergiven VM, we should be mostly IO bound, um, or most of the instance I, so throughput shouldn't

12:08.960 --> 12:16.080
be an issue, pps, like ways to be way over the capacity of the instance. And that last one actually

12:16.080 --> 12:23.920
raises a question, um, if we are IO bound under VM, that means that we don't use it, use all the

12:24.160 --> 12:29.120
VMs. So we have specific use that we don't know what you do for. And also, um, before we were

12:29.120 --> 12:35.680
talking about ancaps, uh, which means that we actually see that ideally we'll need to do more

12:36.240 --> 12:41.440
network processing on the backends themselves. So that actually means that one thing that we could

12:41.440 --> 12:48.640
do is collapse the whole infrastructure and do everything in the same place, meaning having the backends

12:48.720 --> 12:56.800
running directly into the VM of, um, where what we do in my life. And what, so if you, if you collapse

12:56.800 --> 13:02.880
the whole thing, what actually you end up doing is doing some kind of container networking.

13:02.880 --> 13:08.320
Actually, this actually leads us towards the solution that's more or something like what humanity

13:08.320 --> 13:17.040
is does. And so that was, that's actually why this first project, and let us to envision something

13:17.040 --> 13:24.000
like a KalikoVP, which is, uh, doing a container to container and working the, um, MVP itself,

13:24.000 --> 13:29.440
leveraging, uh, the control plane that humanity just provides, and actually having the, the best

13:29.440 --> 13:37.680
of the world with my glad to port. And that's it. Thanks a lot for listening. I hope you enjoy this

13:37.680 --> 13:48.240
talk, and just time for two questions if you have.