enarx
Nathaniel
Okay, hearing none, let's dive into the code in the main repo. So we're now looking at the Enarx repo. And there's a lot of stuff. This is by far our largest repo, we have now, we're now up to 458 stars, yay on everybody for that. And we have, you know, 1300 some odd commits in this repo. And there's quite a bit of code here. So I'm gonna go through the main code, I'm not going to talk about tests, because I didn't build that framework. Maybe we could get Harald to talk about that a little bit more at another time. But let's start with the source code. So if we look in the source directory, this is all the code that runs on the host side. So this includes all of our backends. So this is the code, for example, how to set up a key for a specific architecture. We'll come back to what this looks like in a minute. It also contains other stuff like, for SGX, we need protobuf to be able to talk to the ASM demon. In order to get an attestation report report, we have, I don't know what this oh, this is yeah, the workload or stuff, we should probably rework this a little bit, but it's low on the priority. So all of this is just code basically, that runs on the host side, we'll walk through how an actual Keep gets set up in a moment. The main file itself is pretty small, just basically parses sub commands, the most interesting sub commands are like the Enarx info command. And that will print out information about your system that is defined under CLI info. So here's where we actually do this, we collect a bunch of datum objects and, and print those out or give them in JSON. And then there's other commands here as well. Another interesting one for in the case of SEV, for example, is the VCEK command, which will download and dump out the full certificate chain for that platform. And so all of the different sub commands just exist under the CLI directory. But again, this is all host code. Now we have a particular problem in Enarx. And that particular problem is not well solved by cargo, the problem that we have is that we need to build composable components for the coded data that gets loaded into the Keep. But we need to be able to build these for different target platforms. For example, we want to build the exec layer as a static Linux binary linked against musl. On the other hand, we want to build the shim using x86 instructions, but not linked against any system libraries and not using any syscalls because it's effectively a kernel. And so, these presents unique challenges to us, we would ideally like to be able to specify these as dependencies and cargo and have cargo build them for their appropriate targets and produce a binary file that is, you know, saved in a local directory. And then we can do whatever embedding we have there. But cargo doesn't support that today. However, we have contracted with somebody to actually implement this feature in cargo. And this is currently under review. So once cargo gets this feature, we will be able to undo the hackery that I'm about to explain. And I apologize if I cause your eyes to bleed. But yeah, we are working on fixing this situation. So again, the problem is we have to build a variety of binary components for different architectures and cargo doesn't currently support this. So the way that we work around this problem is by using a build script, and that's this build.rs. And there's actually quite a bit of stuff in this build.rs, unfortunately.
Almost all of it goes away. We still have to generate protobuf bindings. But that's pretty standard usage of build scripts. And I wish we didn't have to do protobuf at all. So maybe there's even a way around that. Well, no, I wish that there's we didn't have to do that protobuf generation, there might be another way we could do that. Perhaps we generate it into a crate and upload that crate something like that, there could be a solution that way. We also have to do some builds script stuff for the test framework. Particularly because we have a variety of tests and I said I wouldn't go under the test framework, but we have a variety of tests, written in C, which are C tests here. And these exist, because it was a little bit easier, at least at the time to create tests for various syscalls that did C like things in the C language. And we compile these and run these tests, I think we're actually going to dump this C testing infrastructure eventually. And we're going to instead just test the end user experience. That may be the way to reduce our build scripts stuff. But the important thing about the build script is not necessarily the tests, and the protobuf. The important thing is that when you run a cargo build on the main Enarx repo, we actually first manually in the build.rs, go into this internal directory, and we build each of these crates, first, for their target platforms. So for example, shim SEV is built for x86 with no linkage to system stuff, the shim SGX same thing. And then wasmldr is the exec layer on that. That's actually compiled for for musl. So these separate crates, once we have cargo, binary artifacts can actually get split out of here. And perhaps live in their own repos, which is what some of the repos were that we want. We wanted to do that before. And but we couldn't make it work. So eventually, we will solve this problem. But we have the problem today, as it exists. Let's talk about how we actually bring up a Keep
It's because we bundle them. We also do other hackery. So let's look it the build.rs and I'm going to look for its .tml. So if cargo detects that you have a cargo.toml file, it's like Whoa, you're trying to like sub bundle crates, we don't do that you need to publish those crates independently. And so what we actually do is we hack around that by moving the cargo file to a cargo.tml. And this actually allows us to embed these internal crates in the Enarx distribution on crates.io. It's a really horrible hack. I really wish it didn't exist this way. But again, it exists because cargo doesn't provide us the functionality we need in order to deliver this.
Rust has a feature called include_bytes. And you basically specify a file on disk and you say I want to include these bytes as a variable. And then Rust just does all the magic for you. And so we don't even have to like find them in special sections. They're just available to us as constants.
Yeah, one thing you might think of is, hey, wait a minute, does that mean that you're embedding two or three binaries inside of the Enarx binary? The answer is yes. And then you immediately think, Oh, well, what about all the memory that's wasted from that? You're right, there is some memory wasted from that. It would be better if they were on disk or something else. However, it's not that much of a waste, because the Linux kernel already today does deduplication of executables. So when you run a binary, it loads that binary into memory, but by mmapping it into memory, and when you run that same exact executable a second time, you don't get a second copy of it in physical memory, you just get an additional mapping of it. So Linux is already deduplicating it. And this means that in the actual deployment on a server, yes, you're going to have all three of these in memory resident one time on the whole server, and frankly, that's just not too bad. The shims themselves are like under half a meg. I think they're like under 300k, actually, something like that. So pretty tiny. And then wasmldr is larger, I think that's like 10 Meg's. But again, it's only one copy in memory. So that's not too terribly horrible.
I mean, if you're running on bare metal, you're gonna like boot a normal bootloader. And that's gonna boot a normal kernel, and then you're gonna get a normal user space, and then you're going to do Enarx run, and then, and then it works. Yeah, all of the special stuff is what's happening inside of our process. That's where the all the secret sauce is, although it's not too secret, because it's all open source.
Musl libc? That's a great question. So libc is the standard library for the C language, okay. But it also has more history than just being a standard library for the C language. If you look at other standard libraries, like the standard library for C++, or if you look at standard libraries, like for Rust or for Java, these are typically things that were distributed by the language or by the compiler. And they sort of like lived on top of the operating system. And then yeah, they would use some operating system functions, but they were sort of like independent entities. Libc is different in one very important regard, which is that Unix and the C language are very, very tightly intertwined, historically, right. The C language was basically built in order to create Unix. And so libc is not only just a lib centered library, it's also basically the system library on the whole platform. And so it does a whole bunch of stuff that's like more than just a standard library. And it's, it's very tightly wound with the rest of the operating systems. And on Linux, each operating system distro, typically, a standard library will be delivered with the compiler or with the Java runtime or whatever. In the case of libc, however, it's almost always delivered by the operating system, rather than by the compiler, or otherwise. And so you know, NetBSD, FreeBSD, Linux, Mac OS, all of these have their own libc implementations. And they all provide similar functionality, but they're all again, very tightly integrated into the operating system. The default libc on most Linux distributions is called GNU libc. It's made by the GNU project, and it is the default libc. And it works well for the majority of cases. However, it's really not intended to be used as a static library. So if you attempt to do static compilation, there's a ton of things that actually break in it. It's also designed to be integrated with a bunch of other pluggable mechanisms. For example, when you attempt to look up a hostname using libc, then the GNU libc will also check for things like is there an LDAP server installed on this system? Or, you know, is there files in etc.host and it tries to do like all of these other smart things to try to be more integrated? Well, the problem with that is we actually don't want all that functionality inside of our Keep. We also need everything to be statically linked, which GNU libc is is not intended for. So there's another libc implementation, which is musl libc. And musl libc is very commonly used by embedded applications and people who want to build static binaries. And it's not as tightly integrated with the operating system. A classic counter example of that is if people are familiar with the very tiny Linux distribution called Alpine. Alpine actually uses musl libc as their default libc for the entire operating system. And that's one of the things that allows them to get very small binaries. So the reason we are building against musl libc for the binary that ends up inside the key is because we want something very small, very self contained, doesn't have all these pluggable features and tries to do fancy things. We want something that's really tight and simple and small. It does make it faster, at least in some cases. Although speed is a very complex topic, especially when it comes to libc because, you know, one may be slightly faster at one thing in one case, and another one may be slightly faster in another thing in another case, but it's faster in the sense that you don't have all of these additional features. And so you're doing less, which, which is definitely faster.
So, let's now dig in. So basically, the way that we create a Keep is we are we're going to do an Enarx run now. And we're going to, we're going to build up this Keep, and what how we compose that Keep, is we take the exec layer, which under internalist, called wasmldr. We maybe just want to rename that exec or something makes a little bit more straightforward. But the exec layer that is going to run, that's the bit that we actually really care about that sort of like our Enarx runtime. But we also need then additional stuff inside of the Keep in order to make a standard Linux binary run, we basically emulate a Linux like environment inside of the Keep. And that's the purpose of the shims. So each shim, has a way to emulate something that looks like a Linux operating system inside of a Keep. And so what if, if we're on SEV, or if we're on SGX, we're still running the same exact layer, the same wasmldr, the only thing that differs is the shims. And this means that when we do an Enarx run, we need to compose a Keep using one of the shims and the the wasmldr. And so we need to load these binaries, we need to parse all our contents, we need to load the pages into memory in the appropriate places for the Keep. And then we need to do whatever instructions need to happen to get the Keep loading. So this is where backends come in. And we're gonna go look at one of the backends right now. We'll look at the SGX backend. First. But we have some types here. A backend defines a variety of types. There's a config type. And the config type is basically a shim array. If you remember, when we looked at noted, we can put that additional metadata in the shim binary. While we want to sometimes extract that information for the runtime. And this is roughly what the config does, it returns a type called flags. These are the additional flags that are used for for a particular binary. This includes in the case of SGX, a flag called the TCS flag, which which tells us that a particular pages of TCS page. But basically the first step is we load a shim. And we parse the metadata to find out any flags that are in there that we need to know about. The second phase, we might say, is that we pass each section in the elf binary to the mapper, and the job of the mapper is to, using the information from the configuration, you'll notice that that the config flags are in the width parameter there. So we parse out the flags, we pass them to the mapper, the mapper is going to take each of the sections from the binary and load it into the enclave. So each enclave will have an implementation of the mapper trait, and it receives the pages from the binary. And this is what guarantees us that the loading semantics for all of the different backends are the same, because the actual parsing for the elf binaries is just done once globally and then handed to each backend as sections. There's a mapper and a loader and these are both typically implemented on the same type. The loader trait however has a has a blanket implementation. And that means that all the semantics of loading an entire binary are the same for different types of mappers. And then finally, we have this backend trait and the backend trait is what sort of groups all this stuff together in a coherent object for us to reason about. So when we actually search for a backend, we have the name of the backend, we have the the shim that is matched to that back in a note that we have the raw bytes for that, is what gets returned. And then we have data, this allows us to collect a set of datum objects about the platform. This is what's displayed by Enarx info: is this data. Then we have the keep method and the keep method creates a Keep instance. So it returns, you can see, an Enarx reference to a dynamic heap object. So this is how we basically start creating a heap. So when the runtime is ready to create a Keep it hands the raw bytes of the shim in the exec layer to this key method. And it's going to do the whole setup of the Keep and return to Keep object. And what this does is this sets up the Keep but it doesn't start executing the Keep yet, we start executing the Keep by calling a method on the Keep object which we which is in this file, that spawn, creates a new thread, and then enter is where we enter on that thread. Currently, we only support one particular thread. So if you actually try to call spawn more than once, you're gonna get an error. But eventually, we will support multiple threads. And then you'll be able to call spawn on the Keep multiple times to create multiple threads. We also have this this hash function, you'll notice that hash has the same signature, basically, as the keep function, this is intentional, because this function is what is going to allow us to calculate the hash for a particular Keep, even without instantiating the Keep. So we're going to run the same hashing algorithms that are used during Keep creation. The idea is that the hash from the hash function should match the hash that comes out of the attestation message from the Keep, right.
Nicolas
So having like separate functions for the Keep in hash, we just have like hash function via thing and then like to actually make the Keep with first call hash and then the Keep for whatever?
Nathaniel
No, because there's not really any value in doing it that way. Yeah, the code path is a little bit more complicated than that. And it's not exactly straightforward. You also don't actually have to always calculate the hash as part of heap creation. Because the hardware is doing it for you, right, when you're actually creating a Keep, the hardware is the thing that's doing the hash for you. When you're not creating a Keep, you just want to validate a Keep, that's what the hash function is for. The last method here is have and it just returns whether or not this particular backend is supported. So on the given platform that you're running on. I won't go into a lot of detail about the datum objects.
Yep, yeah. So this is a datum is one particular fact, right, about about a system and we're checking whether or not the system is configured correctly for this one particular item. Okay. And so each datum has a name that describes what that item is. We have paths, which is a boolean, and that describes whether or not the system supports that feature. So for example, one of these might be the CPU manufacturer, right? Intel SGX requires an Intel CPU. So the name for this might be Intel CPU and pass is it's true if it's an Intel CPU, and it's false if it's not an Intel CPU.
Next is info. This is short additional information to display to the user. And so this just shows, if you've ever run Enarx info, it just shows you the the additional information and then it shows you a checkbox, whether it passed or failed. And then the final thing is the message which is a longer explanatory message on how to resolve the problem. So, one of the configuration parameters for SEV is mlock and if mlock is not set to a high enough of value, then we will give an error with a short description. But in the long message we will tell them here's how you fix the problem go change this setting and then it will work on your system.
Okay, so let's talk about a Keep. Oonce we have a Keep object. What this means is that we've instantiated a Keep, like it's in memory, it's encrypted, but we're not actually running in that Keep yet. In order to run in that Keep, we have to spawn a thread. And that's what we do with this spawn method on line 97. And then, once we've spawned a thread, we may spawn multiple threads in the future. Today, it's only one. But then we call the enter method every time we want to enter the Keep. Now, when we enter a keep on a given thread, the Keep may at times suspend execution and ask the host to do something. And it does so by returning this command object. What's actually happening inside of this enter method is that we enter the Keep, the Keep does a bunch of stuff, and then it sends a message out on the Sallyport. And then the backend, takes the message off of the Sallyport converts it into this command object and returns it. And so we see here, for example, that one command is a syscall. Another one is CPU ID. Another one is GDB. for debugging. And then the last one is continue. Continue is a useful primitive, because it allows us to suspend our existing execution, but not actually do anything and then just reenter. It's essentially a noop.
And so then finally, we have this backend structure. And this is just a list of all the backends that we have, hopefully, this will grow over time. And now we can look at a particular backends implementation. So we'll look at SGX. And there's a bunch of code in here. So let's start with config. If you remember, the first thing we do is we get the configuration from the object. There's a an associated type called flags, which is which can be any format. But what it does is it parses the flags from the elf binary, including any custom flags that are set in there, and then returns the data object that is related to those flags. So in the case of for example, SEV. Actually, let's look at KVM. That's a better example. In the case of config, there are no custom flags for KVM. So we literally take in the flags parameter, and we just return exactly the same thing. And we're done. And that's because there's no there's nothing custom about that. But in the case of SGX, on the other hand, when we have to check all the normal flags, which are read, write and execute. Then there's also this GDB feature here. But we also have a custom flag in the case of SGX. And that custom flag is either unmeasured, or there's a TCS page. And so we can we can set those, and instead of returning the u32, we actually return a second fo type, and this is one of those Intel SGX types. So we've now created a custom SGX type from the flags, as well as the bool. The bool is whether or not that page is supposed to be measured or unmeasured. Whether we're going to hash that particular page, the contents of the page, we always hash, whether a page is added or not. But we don't always hash the contents of that page. And this allows for like, if you wanted to allocate you know, gigabyte of memory, you could throw a gigabyte of memory without running cryptographic hashes over it. Because it's all going to just be zeros, for example. And then, the last thing we do here is we create a new config from our binaries, the shim in the Exec. And you'll notice that we do this by parsing the notes out of the binary. So we find out all of the features that we need from that binary. And we can return finally a config object. And so this config object contains all of the parameters that are used to launch the SGX enclave, as well as the number of SSA areas, or SSA pages, which is what SSAP stands for, and then the size of the enclave as well. So now that we have this config object, we can actually instantiate a Keep using the builder object. And the way that we do this is three steps. So the first thing we do is we convert a config object in through a builder object, and this is what does the initial instantiation of the Keep. So here's a use mmarinus: for somebody who asked where it was used, here's one example of it being used. What we do first, is we allocate memory that is twice the size of the expected enclave size. And the reason for this is that enclaves must be naturally aligned, meaning that the location and memory must be equally divisible by the size of the allocation. So we allocate twice the size, because that guarantees us that somewhere within in that region is a naturally aligned the block of memory. And then we split that in this naturally aligned mapping section. So we split, we hack off the front and the back until we have just a naturally aligned section left. And then finally, we open the device. So we open dev SGX enclave, and we issue the iocuddle. And here's an example of iocuddle in operation. So we issue the iocuddle to actually create the enclave on that device node. And then the end result is now we have a builder object. And if you remember, the builder object implements these mapper and loader traits, where it can receive the actual pages that are being loaded. So at this point, the main code that's doing the Enarx run has a builder object, and it can add pages to the enclave. And that's what this map function is going to be. So every time one of the binaries has some page that needs to be loaded into the enclave, this function gets called. And what we do, the first thing we do is we ignore we do nothing if the if the number of pages is empty. This actually perturbs the measurement, so you have to you have to avoid it, if you have an empty section. And other than that, then we just call the add pages iocuddle. There's the iocuddle again. And then we update the hasher because we are, or in this case we are both having the hardware calculate the hash, and we are also calculating the hash in parallel to the hardware. We'll do that because we're going to create a signature on the enclave at the end. Finally, we save the address of where we're pushing this page to. And the reason for this is that we have to update page mappings at the end of this. So we're just recording this for now. And we also have to keep track of all the TCS pages, because if you remember, we had that spawn method where we could create a new thread. In SGX, you need one TCS page for each thread. And so the maximum number of threads you can have as the maximum number of TCS pages. And so the way that it works internally, which we'll see here in a moment, is that every time you call spawn, it's going to pop off one of those TCS pages and use that for the thread. And then when the thread dies, the TCS pages return to the pool. So the next thing now that we've loaded all the pages into memory from the builder is that we're going to convert the builder into an actual Keep object. And this is where we do the final instantiation of the Keep. And so the first thing we do is, remember, we've been keeping this hash of all the pages we've been adding, this is the point at which we finalize that and we generate a random key. And we actually generate a signature on that random key. And then we initialize the enclave, because if you remember, as I said before, SGX still requires all enclaves to be signed, but no longer cares about the key that actually signs it. Finally, we do this updating of permissions. And I had hinted at this before, that we were saving pages to update mappings. This is where we update the mappings. And once all the mappings are done, we finally return the Keep object. And the Keep object only contains two things. It contains the the mmap of the the mmap region that we did for the enclave, and it also contains the TCS pages, which I believe is only one in this case, but in the future, we would make this a vector or something else and store more than one. And that's it. That's all the Keep contains. In the future for SGX 2, this will we will also keep the device node open because under SGX 2, we can after we initialize the enclave, we can add and remove pages from the enclave dynamically with the support of the enclave. But we need to keep open the device node in order to do that. So that will have to get added here at some point. So now we've built a Keep and we have a Keep actually running. The keep yype is here I believe Yeah. This is the actual Keep structure. The Keep structure you'll notice, really has no implementation, it doesn't really have any methods. And that's really just because it's holding the state for this thing. The really interesting thing is actually the spawn when we spawn a thread, and we do this here, this is the only real method on a Keep. And you'll see here we're using the vdso function we talked about in the previous crate. So we're going to look up in the vdso this function called double underscore vdso SGX enter enclave, which is this vdso function. And once we have that vdso file, we just keep track of it. We'll use it later for entering the enclave. We also then make sure that we have the TCS page, and that we've we've taken it, we now have ownership of it inside the thread. You'll notice that in the drop implementation for the thread, so when the thread is destroyed, we actually push the TCS page back into into the enclave so that you can call spawn again and get and recreate a thread. And then once that's done, we store all the state for the thread. This includes a reference to the Keep, which we will use for pushing the TCS page back and includes the vdso symbol we looked up it includes the TCS page location, it includes a block, which is a set of registers, that's part of how we implement SGX, so we preserve some registers basically. And the current state save area, which is zero. And every time we basically do an asynchronous exit, this gets incremented by one, and then decrement it when we do an eresume. how is what caused the last exit from the enclave, so we keep it we keep track of that state. And then if GDB is enabled, we have this TCP stream that we do for GDB support. So now that we have a thread for Keep, we call the enter method on the Keep. And the goal is we're going to enter the Keep we're going to do as much execution as we can, until some message comes out over the Sallyport and gives us a command to do. And the way that we do this is that we initiate this run object, we store the TCS page in it, and we store the last how we exited the enclave. And then here's where we actually, this big chunk of assembly, is where we actually enter the enclave. So we just basically preserve some register state, and we call the vdso function to enter the enclave. And this then runs until the enclave decides to exit and suspend execution. And when it does, we evaluate how it exited. So for example, if the last one was eenter or eresume, and if the we exited because of an invalid opcode, then the next time we enter, we should do an eenter. And then same for the page, this is for GDB. And then for eexit, if we just do a normal eexit, then we should do an eresume. This is the bit of code where we keep track of the current state save area so that we know how deep we are invested in this. And then finally, the last bit of logic is how to handle the messages in the sallyport. So when the CSSA is greater than one (there's also a note here, which you should read if you really care about this logic), but basically, we're going to get the number (this is the old Sallyport version, by the way, the new one will have to be updated). We find out basically what kind of command we're going to execute. If it's a CPU id command, then we just return the CPU id command to the outer main loop. If it's GDB, then we do this gdb command. If it's the get at attestation call, then we have to do our whole attestation logic, which is a separate thing. So this is essentially an internal command, right? We don't actually forward this upwards to the main loop. We handle this internally. And you see that we do continue here as the noop. That's one place where it's used. And then last, if it's none of those things, then it's a syscall. And we ask the host to do this syscall. And that's basically the entirety of the SGX backend. I haven't touched on attestation, which is this file here. Thanks to Jarkko for getting this in pushing this over the line this week. And that's more or less it, like there's the hasher that does does hashing of stuff. And here's here's our iocuddle, for SGX related stuff. But that's more or less it. And now that you've peeked inside of SGX, let's come back to the broader module. And you'll see some shared code that we have here. This is code for parsing the binaries, and make sure they're sane, make sure that the right architecture, and all that kind of stuff. We have some utility functions for finding segments and finding ranges and headers and parsing notes that are in the binaries. The more interesting bit of code is this bit of code right here that's on the screen, which is why actually went into this file. This bit of code here is effectively the outer main loop, we actually have an outer main loop and an inner main loop, the inner main loop is what's happening in that thread enter command, the outer main loop is this one. And this is what actually gets called effectively when you do an Enarx run. So the first thing it does is it parses both the shim and the exec layer as binaries, then it finds the offset for loading the code, this is within Keep, that checks the bounds of the executable. So the shim define a region of memory where the exec layer gets injected to. And we basically make sure that all that same, we check side of sallyport compatibility and make sure that there's not a different version of sallyport for some reason. Then we parse the config and create a builder. Then with the builder, we we collect an array of all the final segment locations, and we do an initial relocation here. Finally, the last sanity check is to ensure no segments overlap in memory, this is actually a violation of the elf spec, if they do overlap. But we're just being defensive here. In case someone worked, for some reason, hand us a binary that was malicious in some way. So no, none of the segments overlap. And so now that we've basically proven everything is saying, This is what we do of how we finished loading. We for each segment in the binary, we map that segment into memory, and then we call the map function on the builder. And then when we're finally done, that last loader.trying to is what finishes initializing the enclave
We are looking at the wasmldr and the shim binaries. The two are combined together to create the Keep. And we need to make sure that there's no funny business happening between them. The shim defines all of its own normal binary stuff. And then it also has a region dedicated for the exact code to sit within. That's defined in the shim binary. And so when the when we talk about doing the relocation, what we're actually relocating there is we're relocating the binary into the slot that was defined in memory by the shim. And then we do one final pass of validation to make sure there's no funny business before we actually create the Keep.
Yeah, because someone could have like done binary manipulation and edited it out. I've tried to edit something else in. We're probably being way over paranoid here. But it's better to be a little over paranoid than to end up in a really difficult to debug situation later. Right? What any sort of like funny business combined with the fact that everything you're doing is encrypted, and you can't actually read into the bytes makes debugging of this a natural disaster of epic proportions.
Most of our most of this would arise from a mistake, right? Like, what if we had something in the shim, we introduced some bugs in the build process of the shim, and we did something that was by mistake, right? This code would hopefully catch that and warn us So, no, it's not going to stop a dedicated attacker. What stops dedicated attacker is the attestation report at the end, which will be different in some regard than what we expect. But this is an early way to detect those problems and see them upfront for somebody who's not truly being malicious and investing lots of resources in it.
Usually nothing. If there were a bug, either in the code that we're using during compilation or in the compiler itself, it could produce bad code for some reason. Again, I don't think we'll ever hit this check. But it's better to be safe than sorry.
I'll do the quick walkthrough on KVM, I won't take more than a few minutes. It's really the same thing. Config object, the only thing we really load for KVM is, nothing, we don't have any config out of that. We d<o validate that, you can only contain one sallyport PT load segment. So we do one sanity check there, but with no config. And then during the building phase, we create a new VM using the KVM API's. And then we load all of the pages into the VMs' memory address, address space. And then, once that's basically done, then we just instantiate the VM, and then spawn, which is here, just literally creates a new thread using the KVM API. And then we can enter in and do it. There's two special commands in the case of KVM balloon to get more memory, which we can't currently do, because we don't support SGX2, but when we do, we'll have something similar there. And meminfo, which is a way to gather information about memory, but this is going to go away with sallyport version two, we've fixed this problem. We no longer need that. So then you've got a main loop that's similar, right? We enter into the VM, eventually, the VM will exit when it does, there's going to be messages in the sallyport. We parse those messages, and then we return to the outer main loop, if there's any questions.
And then SEV is really yet again, just exactly the same stuff as KVM. But it's, it's done slightly differently because of the memory encryption. So that's pretty much it.
Do you want to walk through shims? Does anybody think that would be helpful? Okay, so I'll go through shim SGX. So shim SGX is divided into two components. It is a library, which contains most of the code. And the reason why that's separate is because we can run unit tests on the library. But we can't run unit tests on the binary because it's compiled for a different target. It's cross compiled. And one thing to note about the shims, that's different from other crates is that they have linker scripts, which is what you see here. And this is because we need to lay out these binaries in very specific way custom sections and things like that. We drop a bunch of sections we don't care about. We have custom sections. For example, I set up a default stack, and here's a TCS page that we set up that's custom. Here's the state save area. So yeah, there's additional stuff in the linker scripts for that particular backend, as well as we set that this particular section contains TCS pages, and that's how the loader can keep track of them because we have that extra custom flag there. That's pretty much it for the linker script. So the main file is the binary. And there's not much here because this is all just basically assembly entry code. And it's really all just this start function, and all the other functions or helper functions for the start function. And this is what the code when we immediately enter into the enclave, the TCS page, and I'll actually go back and show that to you. You see here in the TCS page at a particular offset, we've set up the start symbol there. And that's how the hardware knows where to jump into the enclave. So it's whatever the address of that underscore start function is. So the underscore start function is just a naked function where it only contains assembly. And we basically just do a bunch of stuff, I will not go through the logic for that, it's incredibly tortured. And a lot of it is really just about clearing CPU state. Because when you've switched from the non enclave mode into the enclave mode of the CPU, you basically need to reset all the flags and extended state and all the stuff of the CPU. Otherwise, the host side could use that leakage to try to attack into the enclave area.
Okay, so but the goal of this start function is to clear all the CPU state and get into Rust code as quickly as possible. So we one of the important things we do is we relocate the binary, that's the final relocation of the shim. And then once all that's done, we can then jump to Rust. And we do this by calling the entry function, which we have defined here as sim main. And that's this main function right here. So now we're in this is analogous yes to no standard with _start in the main. In this case, main takes a variety of parameters that we've set up from the assembly code. And those are a reference to the sallyport block. If you remember, the sallyport block is where we're going to store those messages to send out to the host side, as well as a reference to the state save area, which is currently three. And the CSA, CSSA tells us what our current state saver area depth is. And then as I said in the meeting yesterday, really, this is just a match statement on the current state save area. When that's zero, we're going to do our initialization code, which is why it's called entry, and we're going to jump into the exec layer. When that CSSA is one, it means we're handling a syscall or a CPU ID instruction. And when it's anything else, it means that we are finishing the handling of that. There is an interesting workaround here, which is enabling and disabling exceptions. And that's because there's a specific attack that you can make. This was found by another researcher, which we appreciated. Thank you very much if you see this. And basically the error here is that a host could attempt to go into the enclave and then immediately generate an exception. And if they caught it at the right time, then they have the ability to inject CPU state basically into it. And so what happens is, we only enable exceptions once we're out of this assembly code, because that's the only place where it's safe to enable exceptions. And the most important bit here is the entry code, which is an entry and then the handler, which is in handler, so we'll go over each of those. Now. So the entry code, as I mentioned is what happens when we first enter the enclave. The first time right, we're gonna jump into the exec layer we have to set it up. So we call this entry function. And the entry function takes a pointer to where the exec layer code is. And the first thing that we do is we actually treat that as an elf header. That's this header type here from Goblin, which is an elf parser. So we find the header for the elf. And then we do a sanity check to validate that this is in fact elf. And if it's not elf, something terrible has gone wrong, and we exit. And other than that everything looks right. And so we're going to prepare the crt0stack, so that we can emulate running a Linux binary. And that's where we use the crt0stack crate. Then we from the header, we get the eentry function, this tells us what the starting point is of that particular executable. And we have to add the offset to it because we haven't relocated yet that the exact binary. The binary will relocate itself when we jump into it. So we just we need to do one little trivial relocation here in order to get the final address. And then once we jumped to that address, all the rest of the relocations will happen and it's musl libc that does that relocation by the way. So yeah, once that's all done, then we just jump to the start location. And we do so by by setting the stack pointer to this top of this ctrl0stack. And so now once we've jumped to this address, musl libc takes over. And musl libc is going to parse that crtl0stack, get all the information out and and we'll do the relocations and then we'll enter the executable like normal. You can see here, this is the crt0 setup, it looks pretty similar to the example we saw in the crt0 crate. We call this binary slash in it. And we just hard code that we also hard code a set of variables, we could make something more dynamic there. But for now, that's what we do. We also hard code a bunch of these other values. The important thing is that most of this stuff can't come from the host. Really none of this can come from the host. Two interesting little helper functions here are the exit one. And this is because we need to actually, we can't rely on standard library, we have to do this manually with some assembly. So we're actually executing the exit syscall there. And then the random one is we are using the rd Rand instruction on Intel CPUs in order to generate some random data. So this random function will give us 64 bits of random data. And if it fails, then we will call the exit function and we will exit because we can't really recover from anything at this point, we can only bail. Once this is done, we are now in the exec layer. And the exec layer is just running like a normal Linux binary until it hits a syscall or a CPU ID instruction. And when it does, at that point, we come back and we enter the handler function. And the handler function is here. So we create a new handler bypassing the state save area, which we're going to have to use because we want to look up what the instructions are. And we want to patch up when when we've successfully done things. We also need a mutable reference to the sallyport block because we're going to send messages out over that block. And so when we're handling requests, we're going to come in on this handle function. And the first thing we're going to do is we're going to look at the state save area, the first block and the state state save area. And we're going to look at the vector and this is going to tell us what the exception was that was generated to cause the asynchronous exit. And we're going to see that that was an invalid opcode. And if it wasn't an invalid opcode and it wasn't this GDB exception vector page, then we're going to go to the attacked function. The only thing the attack function does is it calls the exit syscall in a loop. Right? And the reason it calls it in the loop is because what if the host is trying to attack you. And therefore when you tell the host please exit, it doesn't exit and it attempts to resume, right the only thing we can do at that point is loop again and try to exit until until the host finally complies. We don't have any other options. On the other hand, if this is an invalid opcode, then we're going to look at Rap. Rap is the instruction pointer on an x86 CPU, and we're gonna get the we're gonna find out what precisely, we read two bytes from that, and that tells us the instruction that we're about stack. The instruction that attempted to execute that caused the failure. And in this case, we get either a syscall, in which case, we're going to try to handle the syscall, or we got a CPU ID, and we're going to try to handle that. If we got any other kind of instruction, then we're going to have to either do some GDB stuff, or we just report this unsupported opcode. And we call it a day. So handle syscall is really just put a syscall in the sallyport. And then, and then suspend execution as handle CPU ID is exactly the same thing. It's put the CPU ID request in the Sallyport. And then, you know, suspend execution to the host. The thing that's interesting here, I think, is this bit of code right here. And if you remember I said, we manually issue the broken instruction that doesn't work in enclave mode, again, in order to suspend execution. That's what we do here. When this exact instruction executes, at this point, we will get an asynchronous exit, this code will suspend operation, and it will jump out to the host, the host will reenter, and it will reenter back to main with the count up one more, and now instead of calling handle, we'll call finish instead. And so if we go back to the handler, you'll see that the finish is here. And the only thing we do in finish is we detect was this an invalid opcode again, was it a syscall or CPU ID again, by reading the instruction pointer. And if it was, it means that everything's good, our previous state with suspended, everything's expected to do, and so we're going to skip the syscall or CPU ID instruction by adding to the instruction pointer, which skips those two instruction bytes. And then at that point, we return. And when we do a return, we end up back in main. Here, we've now returned from this finish function, right, and we're going to now return from the main function, which means we now return here in the assembly code, we clear our CPU state again, because we don't want to leak data to the host. And then we've performed the eexit and we exit the enclave. At that point, the enclave now knows that it's ready to handle what's in the sallyport. It handles what's in the sallyport. It doesn't resume. And then it looks as if the intruction of a syscall or CPU ID instruction had in fact successfully executed when in fact, exceptions have been generated, and then we did this patch up in order to make everything work. So the code jumps around a lot. It's not exactly easy to follow, but it is fairly simple, as simple as it can be. I will not attempt to go through the SEV shim because I didn't write it. That's Harald if you want more details on that, but it'll give you a similar sort of experience. It's a kernel that handles syscalls and dubs. I know that there's crt0 in here somewhere it sets up the stack. But yeah that's pretty much it.