From 71560d185475117b10994d839afe059577e7768c Mon Sep 17 00:00:00 2001 From: Michael Brown Date: Wed, 27 Apr 2016 11:03:18 +0100 Subject: [PATCH] [librm] Preserve FPU, MMX and SSE state across calls to virt_call() The IBM Tivoli Provisioning Manager for OS Deployment (also known as TPMfOSD, Rembo-ia32, or Rembo Auto-Deploy) has a serious bug in some older versions (observed with v5.1.1.0, apparently fixed by v7.1.1.0) which can lead to arbitrary data corruption. As mentioned in commit 87723a0 ("[libflat] Test A20 gate without switching to flat real mode"), Tivoli's NBP sets up a VMM and makes calls to the PXE stack in VM86 mode. This appears to be some kind of attempt to run PXE API calls inside a sandbox. The VMM is fairly sophisticated: for example, it handles our attempts to switch into protected mode and patches our GDT so that our protected-mode code runs in ring 1 instead of ring 0. However, it neglects to apply any memory protections. In particular, it does not enable paging and leaves us with 4GB segment limits. We can therefore trivially break out of the sandbox by simply overwriting the GDT (or by modifying any of Tivoli's VMM code or data structures). When we attempt to execute privileged instructions (such as "lidt"), the CPU raises an exception and control is passed to the Tivoli VMM. This may result in a call to Tivoli's memcpy() function. Tivoli's memcpy() function includes optimisations which use the SSE registers %xmm0-%xmm3 to speed up aligned memory copies. Unfortunately, the Tivoli VMM's exception handler does not save or restore %xmm0-%xmm3. The net effect of this bug in the Tivoli VMM is that any privileged instruction (such as "lidt") issued by iPXE may result in unexpected corruption of the %xmm0-%xmm3 registers. Even more unfortunately, this problem affects the code path taken in response to a hardware interrupt from the NIC, since that code path will call PXENV_UNDI_ISR. The net effect therefore becomes that any NIC hardware interrupt (e.g. due to a received packet) may result in unexpected corruption of the %xmm0-%xmm3 registers. If a packet arrives while Tivoli is in the middle of using its memcpy() function, then the unexpected corruption of the %xmm0-%xmm3 registers will result in unexpected corruption in the destination buffer. The net effect therefore becomes that any received packet may result in a 16-byte block of corruption somewhere in any data that Tivoli copied using its memcpy() function. We can work around this bug in the Tivoli VMM by saving and restoring the %xmm0-%xmm3 registers across calls to virt_call(). To work around the problem, we need to save registers before attempting to execute any privileged instructions, and ensure that we attempt no further privileged instructions after restoring the registers. This is less simple than it may sound. We can use the "movups" instruction to save and restore individual registers, but this will itself generate an undefined opcode exception if SSE is not currently enabled according to the flags in %cr0 and %cr4. We can't access %cr0 or %cr4 before attempting the "movups" instruction, because access a control register is itself a privileged instruction (which may therefore trigger corruption of the registers that we're trying to save). The best solution seems to be to use the "fxsave" and "fxrstor" instructions. If SSE is not enabled, then these instructions may fail to save and restore the SSE register contents, but will not generate an undefined opcode exception. (If SSE is not enabled, then we don't really care about preserving the SSE register contents anyway.) The use of "fxsave" and "fxrstor" introduces an implicit assumption that the CPU supports SSE instructions (even though we make no assumption about whether or not SSE is currently enabled). SSE was introduced in 1999 with the Pentium III (and added by AMD in 2001), and is an architectural requirement for x86_64. Experimentation with current versions of gcc suggest that it may generate SSE instructions even when using "-m32", unless an explicit "-march=i386" or "-mno-sse" is used to inhibit this. It therefore seems reasonable to assume that SSE will be supported on any hardware that might realistically be used with new iPXE builds. As a side benefit of this change, the MMX register %mm0 will now be preserved across virt_call() even in an i386 build of iPXE using a driver that requires readq()/writeq(), and the SSE registers %xmm0-%xmm5 will now be preserved across virt_call() even in an x86_64 build of iPXE using the Hyper-V netvsc driver. Experimentation suggests that this change adds around 10% to the number of cycles required for a do-nothing virt_call(), most of which are due to the extra bytes copied using "rep movsb". Since the number of bytes copied is a compile-time constant local to librm.S, we could potentially reduce this impact by ensuring that we always copy a whole number of dwords and so can use "rep movsl" instead of "rep movsb". Signed-off-by: Michael Brown --- src/arch/x86/transitions/librm.S | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/arch/x86/transitions/librm.S b/src/arch/x86/transitions/librm.S index a082526b..e91ede37 100644 --- a/src/arch/x86/transitions/librm.S +++ b/src/arch/x86/transitions/librm.S @@ -207,6 +207,7 @@ VC_TMP_CR3: .space 4 VC_TMP_CR4: .space 4 VC_TMP_EMER: .space 8 .endif +VC_TMP_FXSAVE: .space 512 VC_TMP_END: .previous @@ -1000,8 +1001,11 @@ virt_call: /* Claim ownership of temporary static buffer */ cli - /* Preserve GDT and IDT in temporary static buffer */ + /* Preserve FPU, MMX and SSE state in temporary static buffer */ movw %cs:rm_ds, %ds + fxsave ( rm_tmpbuf + VC_TMP_FXSAVE ) + + /* Preserve GDT and IDT in temporary static buffer */ sidt ( rm_tmpbuf + VC_TMP_IDT ) sgdt ( rm_tmpbuf + VC_TMP_GDT ) @@ -1066,6 +1070,9 @@ vc_rmode: movl $MSR_EFER, %ecx wrmsr .endif + /* Restore FPU, MMX and SSE state from temporary static buffer */ + fxrstor ( rm_tmpbuf + VC_TMP_FXSAVE ) + /* Restore registers and flags and return */ popl %eax /* skip %cs and %ss */ popw %ds