Saturday, January 18, 2014

Change of Plans

As I've mentioned in previous posts, my original idea was to have a "master" bus driver which would manage initializing VMX on each CPU as well as creating the VMs, and then a "child" function driver for controlling each VM. However, I recently realized it would make more sense (and would be much easier in the long run) to just have everything in one driver. Instead of representing VMs as device handles, user-space applications need only a simple ID integer; to operate on a VM, this ID would be passed as a parameter, along with whatever other parameters, into the master driver itself. Furthermore, this avoids potential future issues, such as "half-creation", where a VM has resources allocated, but is not in a state where applications can free them.

I've also done a bit of optimizations with DPCs; instead of creating a new one every time I want to execute a task on a VM, I now have a single DPC per processor that is given an argument which is the task to execute which is called by a main DPC entry. This also means that the task functions no longer have to signal completion - this is done by the DPC entry itself.

Making these changes, I'm now at a point where I can create VM objects and interact with them from user-space code. The next step is to learn how the VMCS region works, how to set up guest registers/memory, and how to actually launch the VM.

In other news, I (finally) head back to school in a couple of hours - not exactly looking forward to the cold!

Wednesday, January 8, 2014

DPCs to the Rescue!

I finally fixed the CPU affinity issue, and have reason to believe I can now accurately initialize VMX on all active CPUs.

The trick was, instead of creating a thread for each CPU and trying to set the affinity, I can create a DPC object for each CPU. There's a nice little routine called KeSetTargetProcessorDpcEx which allows me to issue a DPC on a specific CPU. So, all I have to do is create a DPC for each CPU, target that CPU, and when the DPC runs, it just initializes VMX - no need to adjust the affinity mask. Perhaps one drawback is that, at least from my tests so far, they all run sequentially on the same thread; however, I don't think this'll be an issue because the initialization code is pretty fast and straightforward. Also, DPCs run at DISPATCH_LEVEL, meaning everything the DPC accesses (including the function itself!) must be non-paged.

Another difference is that I have to create a new DPC anytime I want to run code on a particular processor (in the future, this will include VM operations that are tied to a CPU, such as starting the VM), but again, this is no big deal, as DPCs are relatively straightforward.

I think I'm gonna call it quits for tonight - perhaps tomorrow, I can start working on actual VM creation.

Monday, January 6, 2014

Progress!

I've successfully implemented child device creation! Whenever the bus driver receives a IOCTL_SWIVL_CREATE_VM ioctl, it creates a new VM object and registers a new VM device with the PnP (plug-and-play) manager, which is then able to load the function driver. This is a pretty big accomplishment because it means a good portion of the bus driver is already complete - all that's missing is the IOCTL_SWIVL_DELETE_VM ioctl, plus maybe a few other. Most of the actual code will be in the VM instance driver - starting the VM, setting up guest registers/memory, etc.

Now, there's still one bug that's been quite troublesome since the beginning - every once in a while, when uninstalling the hypervisor driver, it crashes with STATUS_ILLEGAL_INSTRUCTION when trying to execute the vmxoff instruction. From what I could discern from the documentation, the main cause for this is if VMX is not enabled when the instruction is executed. I've made several little patches/updates to my code (as well as the 'VMX Basics' post), hoping to fix it; alas, I was only treating the symptoms, not the problem.

Basically, as part of the driver startup, it needs to initialize VMX on every active CPU. As far as I'm aware, the only way to do this is to create a system thread for every CPU, and set the affinity for each thread to only run on its corresponding CPU. Based on KeSetSystemGroupAffinityThread on MSDN, the thread should be moved to the correct CPU by the time that function returns (because the thread's IRQL is at the lowest possible level, which is below APC level). Sometimes, however, the thread remains on the CPU it started on. One time, I even saw it move to the correct CPU and immediately jump back to the original CPU before initializing VMX. Therefore, I believe the root cause of the issue is perhaps the thread's IRQL isn't high enough, some sort of kernel interrupt is preempting it and restoring the affinity mask (and thus possibly moving it back to the original CPU). As a result, it'll try to clean up VMX on a CPU that isn't running VMX anymore, thus resulting in the crash.

I want to test some more on the whole thread affinity issue, and I definitely need to do more testing with the child device creation; however, it's currently about 4:50 AM here, and I should probably get some sleep.

Maybe tomorrow I can figure out how to attach a few pictures showing the registered devices and logs.

Tuesday, December 31, 2013

Miscellaneous Items

Just want to get a few things out of the way.

First off, copyright license. I've added a little license gadget underneath the "about me" (may change the layout later); basically, unless explicitly stated otherwise, everything I post is under the CC BY license (creative commons, with attribution). Basically, feel free to use my code as you wish - all I ask in return is some sort of link to me (such as this blog).

Next - feedback. I'm still getting used to blogging in general, and would appreciate any (constructive) feedback on things such as the layout or the content. For example, how often should I make "technical" posts vs. more "laid-back" posts? How often should I post in general? Should I change the positioning of anything, or the color scheme?

Finally, the code itself. Would anyone be interested in me maintaining a git repository for the code? I could create a few branches, including a "master" or "stable" branch for the well-tested code, and a "dev"/"unstable"/"experimental"/etc. branch for newer code.

Again, if you have any questions/comments, feel free to contact me.

Have a happy new year!

Wednesday, December 25, 2013

VMX Basics

As I mentioned in my first post, I would be posting little code snippets from my project, as well as a quick explanation on how it works. Note: this post is a bit lengthy, assumes a basic to intermediate understanding of x86 assembly (as well as MASM-style syntax), and is quite technical/low-level! For a quick x86 reference, I commonly use this reference (the coder editions are easier to understand). If you're interested in learning more about general x86 assembly, there are plenty of good tutorials out there (I can't personally recommend one - I mostly learned through trial and error).

For this post, we'll be looking at the basics of VMX and how to initialize it.

  1. VmxEnable PROC FRAME
  2.     ; save arg
  3.     push    rcx
  4. .PUSHREG    rcx
  5.     ; save nonvolatile rbx
  6.     push    rbx
  7. .PUSHREG    rbx
  8.  
  9. .ENDPROLOG
  10.  
  11.     ; first, ensure VMX is supported
  12.     mov     eax, 1
  13.     cpuid
  14.     test    cl, 020h
  15.     jnz     vmxSupported
  16.     ; not supported
  17.     mov     eax, STATUS_NOT_IMPLEMENTED
  18.     pop     rbx
  19.     pop     rcx
  20.     ret

The first thing to do of course is to make sure the CPU actually supports the VMX extensions; this is done through the cpuid instruction, with eax=1. VMX support is indicated by ecx bit 5. If this bit is clear, VMX is not supported on this processor, and STATUS_NOT_IMPLEMENTED is returned to the driver.

By the way, the push rcx is to save the single argument, the physical pointer to a VMXON region, which I'll explain later. The push rbx is needed because rbx is a non-volatile register which needs to be restored upon function return.

  1.     ; ensure the IA32_FEATURE_CONTROL MSR is setup correctly
  2.     mov     ecx, IA32_FEATURE_CONTROL
  3.     rdmsr
  4.     test    cl, 001h
  5.     jnz     msrSetUp

Next, we need to make sure that the IA32_FEATURE_CONTROL MSR (index 0x3a) is properly set up; more than likely, it'll already be pre-programmed by the BIOS, but just in case, we want to set it up ourselves.


  1.     ; program the MSR
  2.     or      al, 007h ; enable VMXON inside/outside SMX operation, set lock bit
  3.     wrmsr
  4. msrSetUp:

If it's not set up, we need to set 3 bits; bit 0 is a lock bit that prevents the MSR from being changed (and also indicates that it has been set up, which is what we check for earlier). The other two bits, bit 1 and bit 2, indicate whether the vmxon instruction is supported inside/outside SMX operation. We prefer both ;)


  1.     ; test required bits
  2.     mov     rbx, cr0
  3.     mov     ecx, IA32_VMX_CR0_FIXED1
  4.     rdmsr
  5.     shl     rdx, 16
  6.     or      rdx, rax
  7.     not     rdx
  8.     ; all bits set in rdx (~FIXED1) must also be clear in rbx
  9.     test    rbx, rdx
  10.     jnz     badBits
  11.  
  12.     not     rbx
  13.     mov     ecx, IA32_VMX_CR0_FIXED0
  14.     rdmsr
  15.     shl     rdx, 16
  16.     or      rdx, rax
  17.     ; all bits set in rdx must be clear in rbx (~cr0)
  18.     test    rbx, rdx
  19.     jnz     badBits
  20.  
  21.     mov     rbx, cr4
  22.     mov     ecx, IA32_VMX_CR4_FIXED1
  23.     rdmsr
  24.     shl     rdx, 16
  25.     or      rdx, rax
  26.     not     rdx
  27.     ; all bits set in rdx (~FIXED1) must also be clear in rbx
  28.     test    rbx, rdx
  29.     jnz     badBits
  30.  
  31.     not     rbx
  32.     mov     ecx, IA32_VMX_CR4_FIXED0
  33.     rdmsr
  34.     shl     rdx, 16
  35.     or      rdx, rax
  36.     ; ignore VMX bit
  37.     btr     rdx, 13
  38.     ; all bits set in rdx must be clear in rbx (~cr4)
  39.     test    rbx, rdx
  40.     jnz     badBits
  41.     jmp     goodBits
  42.  
  43. badBits:
  44.     mov     eax, STATUS_INVALID_DEVICE_STATE
  45.     pop     rbx
  46.     pop     rcx
  47.     ret
  48.  
  49. goodBits:

This next chunk is a bit lengthy, but it's basically two halves that do the same thing - check certain required bits in control registers cr0 and cr4. The MSRs IA32_VMX_CR0_FIXED0, IA32_VMX_CR0_FIXED1, IA32_VMX_CR4_FIXED0, and IA32_VMX_CR4_FIXED1 (indexes 0x486, 0x487, 0x488, and 0x489) indicate which bits need to be set or reset in the corresponding control registers. For now if any bit is not in a state required, then rather than trying to modify the control registers, we simply fail out with STATUS_INVALID_DEVICE_STATE.

  1.     ; set VMX bit
  2.     mov     rax, cr4
  3.     bts     rax, 13
  4.     mov     cr4, rax
  5.     jnc     vmxEnabled
  6.     ; already enabled
  7.     mov     eax, STATUS_RESOURCE_IN_USE
  8.     pop     rbx
  9.     pop     rcx
  10.     ret
  11. vmxEnabled:
  12.  
  13.     ; enter VMX operation
  14.     vmxon   QWORD PTR [rsp + 8]
  15.     jnc     success
  16.     ; something went wrong 
  17.     mov     eax, STATUS_UNSUCCESSFUL
  18.     pop     rbx
  19.     pop     rcx
  20.     ret
  21.  
  22. success:
  23.     xor     eax, eax
  24.     pop     rbx
  25.     pop     rcx
  26.     ret
  27. VmxEnable ENDP

If everything is setup correctly, we can go ahead and enable VMX by setting bit 13 of cr4, and actually entering VMX operation with the vmxon instruction. The QWORD PTR [rsp + 8] operand refers to the previous argument push, and points to the previously-mentioned VMXON region.

The reason for the STATUS_RESOURCE_IN_USE error if VMX is already enabled is because that means something else (such as VMWare) is already probably using VMX, in which case we just let it continue using VMX by itself and error out (I have yet to test if VMWare behaves similarly if it detects a hypervisor already running on the machine).

We then make one final check to make sure vmxon actually succeeded; if it didn't the carry flag will be set (which means we shouldn't try to disable VMX later on).

That's pretty much it! Now, the VMXON region I mentioned a few times earlier is an implementation-dependent region of host memory (at most 4KB) that VMX uses to manage VMs. The only initialization it needs is a 31-bit VMCS identifier at the beginning, which can be read via the IA32_FEATURE_CONTROL MSR (index 0x480), taking care to clear bit 31 (since VMXON uses it as a "shadow region" indicator, which we don't want).

Leaving VMX region is very simple:

  1. VmxDisable PROC
  2.     ; leave VMX operation
  3.     vmxoff
  4.  
  5.     ; clear VMX bit
  6.     mov     rax, cr4
  7.     btr     rax, 13
  8.     mov     cr4, rax
  9.  
  10.     ret
  11. VmxDisable ENDP


Whew! That was quite a lengthy post. I can continue doing post like these along with less-technical progress reports if people like them, or not - either way is fine with me. If you want to learn more about VMX, you can always consult the x86 programmer's manual here (volume 3, chapters 23-25 and chapter 30).

Edit: I've fixed a few bugs in the code; earlier I mentioned that bit 10 of ecx after the cpuid instruction indicated VMX support (even though the code tested bit 9), when in fact, it is bit 5 that indicates support.
Also, the required bit checking code clobbers the rbx register, which is non-volatile (i.e. it needs to be preserved across the function call). Finally, the vmxon operand was updated to reflect the newly-pushed register, and was actually tested for success (via the carry flag).

First Post!

Woo! So yeah, I decided to create a simple little blog for my adventures and discoveries as I write my Simple Windows Virtualization Layer (SWiVL) driver. Just a fair warning; this will be a more technical blog, with possible code snippets, so unless you're familiar with C/Windows driver programming, many of my posts may not make much sense.

Just as a little background about myself, I'm currently an undergraduate student at RPI, junior year, studying Computer Science. I've been programming for about 10 years now, ever since I got a Lego® Mindstorms® set for Christmas. I've most of my experience in C, C++, C♯, Java, Lua, and x86 assembly.

Anyway, onto the driver itself! The current goal of SWiVL is to provide user-mode Win32 applications a basic interface to the VMX (and possibly SVM in the future) virtualization extensions for x64-based CPUs, in much the same way KVM (kernel-based virtual machine) does for Linux. The project will actually be split into two different drivers; the basic hypervisor, which initializes VMX on all currently active CPUs and creates virtual machine objects, and the VM function driver, which manages a single VM instance. Both drivers will provide a simple IOCTL interface which will be used by a simple user-mode library. For now, the user-mode library will just provide simple wrappers around DeviceIoControl calls, but I do plan on possibly adding C++ classes for a more object-oriented approach to VM management.

If anyone's interested, I may post snippets of code as well as explanation behind them (and would certainly be open to any feedback; always looking for better approaches).

Bit of a long first post, but it was mostly to introduce myself and the project, and explain the current goals. I have no idea how often I'll be posting, or how long future posts will be, but I'll try to keep the blog updated whenever I make good progress. Stay tuned!