As I've mentioned in previous posts, my original idea was to have a "master" bus driver which would manage initializing VMX on each CPU as well as creating the VMs, and then a "child" function driver for controlling each VM. However, I recently realized it would make more sense (and would be much easier in the long run) to just have everything in one driver. Instead of representing VMs as device handles, user-space applications need only a simple ID integer; to operate on a VM, this ID would be passed as a parameter, along with whatever other parameters, into the master driver itself. Furthermore, this avoids potential future issues, such as "half-creation", where a VM has resources allocated, but is not in a state where applications can free them.
I've also done a bit of optimizations with DPCs; instead of creating a new one every time I want to execute a task on a VM, I now have a single DPC per processor that is given an argument which is the task to execute which is called by a main DPC entry. This also means that the task functions no longer have to signal completion - this is done by the DPC entry itself.
Making these changes, I'm now at a point where I can create VM objects and interact with them from user-space code. The next step is to learn how the VMCS region works, how to set up guest registers/memory, and how to actually launch the VM.
In other news, I (finally) head back to school in a couple of hours - not exactly looking forward to the cold!
SWiVL (Simple Windows Virtualization Layer)
Saturday, January 18, 2014
Wednesday, January 8, 2014
DPCs to the Rescue!
I finally fixed the CPU affinity issue, and have reason to believe I can now accurately initialize VMX on all active CPUs.
The trick was, instead of creating a thread for each CPU and trying to set the affinity, I can create a DPC object for each CPU. There's a nice little routine called KeSetTargetProcessorDpcEx which allows me to issue a DPC on a specific CPU. So, all I have to do is create a DPC for each CPU, target that CPU, and when the DPC runs, it just initializes VMX - no need to adjust the affinity mask. Perhaps one drawback is that, at least from my tests so far, they all run sequentially on the same thread; however, I don't think this'll be an issue because the initialization code is pretty fast and straightforward. Also, DPCs run at DISPATCH_LEVEL, meaning everything the DPC accesses (including the function itself!) must be non-paged.
Another difference is that I have to create a new DPC anytime I want to run code on a particular processor (in the future, this will include VM operations that are tied to a CPU, such as starting the VM), but again, this is no big deal, as DPCs are relatively straightforward.
I think I'm gonna call it quits for tonight - perhaps tomorrow, I can start working on actual VM creation.
The trick was, instead of creating a thread for each CPU and trying to set the affinity, I can create a DPC object for each CPU. There's a nice little routine called KeSetTargetProcessorDpcEx which allows me to issue a DPC on a specific CPU. So, all I have to do is create a DPC for each CPU, target that CPU, and when the DPC runs, it just initializes VMX - no need to adjust the affinity mask. Perhaps one drawback is that, at least from my tests so far, they all run sequentially on the same thread; however, I don't think this'll be an issue because the initialization code is pretty fast and straightforward. Also, DPCs run at DISPATCH_LEVEL, meaning everything the DPC accesses (including the function itself!) must be non-paged.
Another difference is that I have to create a new DPC anytime I want to run code on a particular processor (in the future, this will include VM operations that are tied to a CPU, such as starting the VM), but again, this is no big deal, as DPCs are relatively straightforward.
I think I'm gonna call it quits for tonight - perhaps tomorrow, I can start working on actual VM creation.
Monday, January 6, 2014
Progress!
I've successfully implemented child device creation! Whenever the bus driver receives a IOCTL_SWIVL_CREATE_VM ioctl, it creates a new VM object and registers a new VM device with the PnP (plug-and-play) manager, which is then able to load the function driver. This is a pretty big accomplishment because it means a good portion of the bus driver is already complete - all that's missing is the IOCTL_SWIVL_DELETE_VM ioctl, plus maybe a few other. Most of the actual code will be in the VM instance driver - starting the VM, setting up guest registers/memory, etc.
Now, there's still one bug that's been quite troublesome since the beginning - every once in a while, when uninstalling the hypervisor driver, it crashes with STATUS_ILLEGAL_INSTRUCTION when trying to execute the vmxoff instruction. From what I could discern from the documentation, the main cause for this is if VMX is not enabled when the instruction is executed. I've made several little patches/updates to my code (as well as the 'VMX Basics' post), hoping to fix it; alas, I was only treating the symptoms, not the problem.
Basically, as part of the driver startup, it needs to initialize VMX on every active CPU. As far as I'm aware, the only way to do this is to create a system thread for every CPU, and set the affinity for each thread to only run on its corresponding CPU. Based on KeSetSystemGroupAffinityThread on MSDN, the thread should be moved to the correct CPU by the time that function returns (because the thread's IRQL is at the lowest possible level, which is below APC level). Sometimes, however, the thread remains on the CPU it started on. One time, I even saw it move to the correct CPU and immediately jump back to the original CPU before initializing VMX. Therefore, I believe the root cause of the issue is perhaps the thread's IRQL isn't high enough, some sort of kernel interrupt is preempting it and restoring the affinity mask (and thus possibly moving it back to the original CPU). As a result, it'll try to clean up VMX on a CPU that isn't running VMX anymore, thus resulting in the crash.
I want to test some more on the whole thread affinity issue, and I definitely need to do more testing with the child device creation; however, it's currently about 4:50 AM here, and I should probably get some sleep.
Maybe tomorrow I can figure out how to attach a few pictures showing the registered devices and logs.
Now, there's still one bug that's been quite troublesome since the beginning - every once in a while, when uninstalling the hypervisor driver, it crashes with STATUS_ILLEGAL_INSTRUCTION when trying to execute the vmxoff instruction. From what I could discern from the documentation, the main cause for this is if VMX is not enabled when the instruction is executed. I've made several little patches/updates to my code (as well as the 'VMX Basics' post), hoping to fix it; alas, I was only treating the symptoms, not the problem.
Basically, as part of the driver startup, it needs to initialize VMX on every active CPU. As far as I'm aware, the only way to do this is to create a system thread for every CPU, and set the affinity for each thread to only run on its corresponding CPU. Based on KeSetSystemGroupAffinityThread on MSDN, the thread should be moved to the correct CPU by the time that function returns (because the thread's IRQL is at the lowest possible level, which is below APC level). Sometimes, however, the thread remains on the CPU it started on. One time, I even saw it move to the correct CPU and immediately jump back to the original CPU before initializing VMX. Therefore, I believe the root cause of the issue is perhaps the thread's IRQL isn't high enough, some sort of kernel interrupt is preempting it and restoring the affinity mask (and thus possibly moving it back to the original CPU). As a result, it'll try to clean up VMX on a CPU that isn't running VMX anymore, thus resulting in the crash.
I want to test some more on the whole thread affinity issue, and I definitely need to do more testing with the child device creation; however, it's currently about 4:50 AM here, and I should probably get some sleep.
Maybe tomorrow I can figure out how to attach a few pictures showing the registered devices and logs.
Tuesday, December 31, 2013
Miscellaneous Items
Just want to get a few things out of the way.
First off, copyright license. I've added a little license gadget underneath the "about me" (may change the layout later); basically, unless explicitly stated otherwise, everything I post is under the CC BY license (creative commons, with attribution). Basically, feel free to use my code as you wish - all I ask in return is some sort of link to me (such as this blog).
Next - feedback. I'm still getting used to blogging in general, and would appreciate any (constructive) feedback on things such as the layout or the content. For example, how often should I make "technical" posts vs. more "laid-back" posts? How often should I post in general? Should I change the positioning of anything, or the color scheme?
Finally, the code itself. Would anyone be interested in me maintaining a git repository for the code? I could create a few branches, including a "master" or "stable" branch for the well-tested code, and a "dev"/"unstable"/"experimental"/etc. branch for newer code.
Again, if you have any questions/comments, feel free to contact me.
Have a happy new year!
First off, copyright license. I've added a little license gadget underneath the "about me" (may change the layout later); basically, unless explicitly stated otherwise, everything I post is under the CC BY license (creative commons, with attribution). Basically, feel free to use my code as you wish - all I ask in return is some sort of link to me (such as this blog).
Next - feedback. I'm still getting used to blogging in general, and would appreciate any (constructive) feedback on things such as the layout or the content. For example, how often should I make "technical" posts vs. more "laid-back" posts? How often should I post in general? Should I change the positioning of anything, or the color scheme?
Finally, the code itself. Would anyone be interested in me maintaining a git repository for the code? I could create a few branches, including a "master" or "stable" branch for the well-tested code, and a "dev"/"unstable"/"experimental"/etc. branch for newer code.
Again, if you have any questions/comments, feel free to contact me.
Have a happy new year!
Wednesday, December 25, 2013
VMX Basics
As I mentioned in my first post, I would be posting little code snippets from my project, as well as a quick explanation on how it works. Note: this post is a bit lengthy, assumes a basic to intermediate understanding of x86 assembly (as well as MASM-style syntax), and is quite technical/low-level! For a quick x86 reference, I commonly use this reference (the coder editions are easier to understand). If you're interested in learning more about general x86 assembly, there are plenty of good tutorials out there (I can't personally recommend one - I mostly learned through trial and error).
For this post, we'll be looking at the basics of VMX and how to initialize it.
The first thing to do of course is to make sure the CPU actually supports the VMX extensions; this is done through the cpuid instruction, with eax=1. VMX support is indicated by ecx bit 5. If this bit is clear, VMX is not supported on this processor, and STATUS_NOT_IMPLEMENTED is returned to the driver.
By the way, the push rcx is to save the single argument, the physical pointer to a VMXON region, which I'll explain later. The push rbx is needed because rbx is a non-volatile register which needs to be restored upon function return.
Next, we need to make sure that the IA32_FEATURE_CONTROL MSR (index 0x3a) is properly set up; more than likely, it'll already be pre-programmed by the BIOS, but just in case, we want to set it up ourselves.
If it's not set up, we need to set 3 bits; bit 0 is a lock bit that prevents the MSR from being changed (and also indicates that it has been set up, which is what we check for earlier). The other two bits, bit 1 and bit 2, indicate whether the vmxon instruction is supported inside/outside SMX operation. We prefer both ;)
This next chunk is a bit lengthy, but it's basically two halves that do the same thing - check certain required bits in control registers cr0 and cr4. The MSRs IA32_VMX_CR0_FIXED0, IA32_VMX_CR0_FIXED1, IA32_VMX_CR4_FIXED0, and IA32_VMX_CR4_FIXED1 (indexes 0x486, 0x487, 0x488, and 0x489) indicate which bits need to be set or reset in the corresponding control registers. For now if any bit is not in a state required, then rather than trying to modify the control registers, we simply fail out with STATUS_INVALID_DEVICE_STATE.
If everything is setup correctly, we can go ahead and enable VMX by setting bit 13 of cr4, and actually entering VMX operation with the vmxon instruction. The QWORD PTR [rsp + 8] operand refers to the previous argument push, and points to the previously-mentioned VMXON region.
The reason for the STATUS_RESOURCE_IN_USE error if VMX is already enabled is because that means something else (such as VMWare) is already probably using VMX, in which case we just let it continue using VMX by itself and error out (I have yet to test if VMWare behaves similarly if it detects a hypervisor already running on the machine).
We then make one final check to make sure vmxon actually succeeded; if it didn't the carry flag will be set (which means we shouldn't try to disable VMX later on).
That's pretty much it! Now, the VMXON region I mentioned a few times earlier is an implementation-dependent region of host memory (at most 4KB) that VMX uses to manage VMs. The only initialization it needs is a 31-bit VMCS identifier at the beginning, which can be read via the IA32_FEATURE_CONTROL MSR (index 0x480), taking care to clear bit 31 (since VMXON uses it as a "shadow region" indicator, which we don't want).
Leaving VMX region is very simple:
Whew! That was quite a lengthy post. I can continue doing post like these along with less-technical progress reports if people like them, or not - either way is fine with me. If you want to learn more about VMX, you can always consult the x86 programmer's manual here (volume 3, chapters 23-25 and chapter 30).
Edit: I've fixed a few bugs in the code; earlier I mentioned that bit 10 of ecx after the cpuid instruction indicated VMX support (even though the code tested bit 9), when in fact, it is bit 5 that indicates support.
Also, the required bit checking code clobbers the rbx register, which is non-volatile (i.e. it needs to be preserved across the function call). Finally, the vmxon operand was updated to reflect the newly-pushed register, and was actually tested for success (via the carry flag).
For this post, we'll be looking at the basics of VMX and how to initialize it.
- VmxEnable PROC FRAME
- ; save arg
- push rcx
- .PUSHREG rcx
- ; save nonvolatile rbx
- push rbx
- .PUSHREG rbx
- .ENDPROLOG
- ; first, ensure VMX is supported
- mov eax, 1
- cpuid
- test cl, 020h
- jnz vmxSupported
- ; not supported
- mov eax, STATUS_NOT_IMPLEMENTED
- pop rbx
- pop rcx
- ret
The first thing to do of course is to make sure the CPU actually supports the VMX extensions; this is done through the cpuid instruction, with eax=1. VMX support is indicated by ecx bit 5. If this bit is clear, VMX is not supported on this processor, and STATUS_NOT_IMPLEMENTED is returned to the driver.
By the way, the push rcx is to save the single argument, the physical pointer to a VMXON region, which I'll explain later. The push rbx is needed because rbx is a non-volatile register which needs to be restored upon function return.
- ; ensure the IA32_FEATURE_CONTROL MSR is setup correctly
- mov ecx, IA32_FEATURE_CONTROL
- rdmsr
- test cl, 001h
- jnz msrSetUp
Next, we need to make sure that the IA32_FEATURE_CONTROL MSR (index 0x3a) is properly set up; more than likely, it'll already be pre-programmed by the BIOS, but just in case, we want to set it up ourselves.
- ; program the MSR
- or al, 007h ; enable VMXON inside/outside SMX operation, set lock bit
- wrmsr
- msrSetUp:
If it's not set up, we need to set 3 bits; bit 0 is a lock bit that prevents the MSR from being changed (and also indicates that it has been set up, which is what we check for earlier). The other two bits, bit 1 and bit 2, indicate whether the vmxon instruction is supported inside/outside SMX operation. We prefer both ;)
- ; test required bits
- mov rbx, cr0
- mov ecx, IA32_VMX_CR0_FIXED1
- rdmsr
- shl rdx, 16
- or rdx, rax
- not rdx
- ; all bits set in rdx (~FIXED1) must also be clear in rbx
- test rbx, rdx
- jnz badBits
- not rbx
- mov ecx, IA32_VMX_CR0_FIXED0
- rdmsr
- shl rdx, 16
- or rdx, rax
- ; all bits set in rdx must be clear in rbx (~cr0)
- test rbx, rdx
- jnz badBits
- mov rbx, cr4
- mov ecx, IA32_VMX_CR4_FIXED1
- rdmsr
- shl rdx, 16
- or rdx, rax
- not rdx
- ; all bits set in rdx (~FIXED1) must also be clear in rbx
- test rbx, rdx
- jnz badBits
- not rbx
- mov ecx, IA32_VMX_CR4_FIXED0
- rdmsr
- shl rdx, 16
- or rdx, rax
- ; ignore VMX bit
- btr rdx, 13
- ; all bits set in rdx must be clear in rbx (~cr4)
- test rbx, rdx
- jnz badBits
- jmp goodBits
- badBits:
- mov eax, STATUS_INVALID_DEVICE_STATE
- pop rbx
- pop rcx
- ret
- goodBits:
This next chunk is a bit lengthy, but it's basically two halves that do the same thing - check certain required bits in control registers cr0 and cr4. The MSRs IA32_VMX_CR0_FIXED0, IA32_VMX_CR0_FIXED1, IA32_VMX_CR4_FIXED0, and IA32_VMX_CR4_FIXED1 (indexes 0x486, 0x487, 0x488, and 0x489) indicate which bits need to be set or reset in the corresponding control registers. For now if any bit is not in a state required, then rather than trying to modify the control registers, we simply fail out with STATUS_INVALID_DEVICE_STATE.
- ; set VMX bit
- mov rax, cr4
- bts rax, 13
- mov cr4, rax
- jnc vmxEnabled
- ; already enabled
- mov eax, STATUS_RESOURCE_IN_USE
- pop rbx
- pop rcx
- ret
- vmxEnabled:
- ; enter VMX operation
- vmxon QWORD PTR [rsp + 8]
- jnc success
- ; something went wrong
- mov eax, STATUS_UNSUCCESSFUL
- pop rbx
- pop rcx
- ret
- success:
- xor eax, eax
- pop rbx
- pop rcx
- ret
- VmxEnable ENDP
If everything is setup correctly, we can go ahead and enable VMX by setting bit 13 of cr4, and actually entering VMX operation with the vmxon instruction. The QWORD PTR [rsp + 8] operand refers to the previous argument push, and points to the previously-mentioned VMXON region.
The reason for the STATUS_RESOURCE_IN_USE error if VMX is already enabled is because that means something else (such as VMWare) is already probably using VMX, in which case we just let it continue using VMX by itself and error out (I have yet to test if VMWare behaves similarly if it detects a hypervisor already running on the machine).
We then make one final check to make sure vmxon actually succeeded; if it didn't the carry flag will be set (which means we shouldn't try to disable VMX later on).
That's pretty much it! Now, the VMXON region I mentioned a few times earlier is an implementation-dependent region of host memory (at most 4KB) that VMX uses to manage VMs. The only initialization it needs is a 31-bit VMCS identifier at the beginning, which can be read via the IA32_FEATURE_CONTROL MSR (index 0x480), taking care to clear bit 31 (since VMXON uses it as a "shadow region" indicator, which we don't want).
Leaving VMX region is very simple:
- VmxDisable PROC
- ; leave VMX operation
- vmxoff
- ; clear VMX bit
- mov rax, cr4
- btr rax, 13
- mov cr4, rax
- ret
- VmxDisable ENDP
Whew! That was quite a lengthy post. I can continue doing post like these along with less-technical progress reports if people like them, or not - either way is fine with me. If you want to learn more about VMX, you can always consult the x86 programmer's manual here (volume 3, chapters 23-25 and chapter 30).
Edit: I've fixed a few bugs in the code; earlier I mentioned that bit 10 of ecx after the cpuid instruction indicated VMX support (even though the code tested bit 9), when in fact, it is bit 5 that indicates support.
Also, the required bit checking code clobbers the rbx register, which is non-volatile (i.e. it needs to be preserved across the function call). Finally, the vmxon operand was updated to reflect the newly-pushed register, and was actually tested for success (via the carry flag).
First Post!
Woo! So yeah, I decided to create a simple little blog for my adventures and discoveries as I write my Simple Windows Virtualization Layer (SWiVL) driver. Just a fair warning; this will be a more technical blog, with possible code snippets, so unless you're familiar with C/Windows driver programming, many of my posts may not make much sense.
Just as a little background about myself, I'm currently an undergraduate student at RPI, junior year, studying Computer Science. I've been programming for about 10 years now, ever since I got a Lego® Mindstorms® set for Christmas. I've most of my experience in C, C++, C♯, Java, Lua, and x86 assembly.
Anyway, onto the driver itself! The current goal of SWiVL is to provide user-mode Win32 applications a basic interface to the VMX (and possibly SVM in the future) virtualization extensions for x64-based CPUs, in much the same way KVM (kernel-based virtual machine) does for Linux. The project will actually be split into two different drivers; the basic hypervisor, which initializes VMX on all currently active CPUs and creates virtual machine objects, and the VM function driver, which manages a single VM instance. Both drivers will provide a simple IOCTL interface which will be used by a simple user-mode library. For now, the user-mode library will just provide simple wrappers around DeviceIoControl calls, but I do plan on possibly adding C++ classes for a more object-oriented approach to VM management.
If anyone's interested, I may post snippets of code as well as explanation behind them (and would certainly be open to any feedback; always looking for better approaches).
Bit of a long first post, but it was mostly to introduce myself and the project, and explain the current goals. I have no idea how often I'll be posting, or how long future posts will be, but I'll try to keep the blog updated whenever I make good progress. Stay tuned!
Just as a little background about myself, I'm currently an undergraduate student at RPI, junior year, studying Computer Science. I've been programming for about 10 years now, ever since I got a Lego® Mindstorms® set for Christmas. I've most of my experience in C, C++, C♯, Java, Lua, and x86 assembly.
Anyway, onto the driver itself! The current goal of SWiVL is to provide user-mode Win32 applications a basic interface to the VMX (and possibly SVM in the future) virtualization extensions for x64-based CPUs, in much the same way KVM (kernel-based virtual machine) does for Linux. The project will actually be split into two different drivers; the basic hypervisor, which initializes VMX on all currently active CPUs and creates virtual machine objects, and the VM function driver, which manages a single VM instance. Both drivers will provide a simple IOCTL interface which will be used by a simple user-mode library. For now, the user-mode library will just provide simple wrappers around DeviceIoControl calls, but I do plan on possibly adding C++ classes for a more object-oriented approach to VM management.
If anyone's interested, I may post snippets of code as well as explanation behind them (and would certainly be open to any feedback; always looking for better approaches).
Bit of a long first post, but it was mostly to introduce myself and the project, and explain the current goals. I have no idea how often I'll be posting, or how long future posts will be, but I'll try to keep the blog updated whenever I make good progress. Stay tuned!
Subscribe to:
Posts (Atom)