This is v3 of this series. Again, it took me a while to complete the updates as support for virtual-8086 mode required extra rework. The two previous submissions can be found here [1] and here [2].
User-Mode Instruction Prevention (UMIP) is a security feature present in new Intel Processors. If enabled, it prevents the execution of certain instructions if the Current Privilege Level (CPL) is greater than 0. If these instructions were executed while in CPL > 0, user space applications could have access to system-wide settings such as the global and local descriptor tables, the segment selectors to the current task state and the local descriptor table.
These are the instructions covered by UMIP: * SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register
If any of these instructions is executed with CPL > 0, a general protection exception is issued when UMIP is enabled.
There is a caveat, however. Certain applications rely on some of these instructions to function. An example of this are applications that use WineHQ[3]. For instance, these applications rely on sidt returning a non- accesible memory location[4]. During the discussions, it was proposed that the fault could be relied to the user-space and perform the emulation in user-mode. However, this would break existing applications until, for instance, they update to a new WineHQ version. However, this approach would require UMIP to be disabled by default. The concensus in this forum is to always enable it.
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[5] does not support applications that use these instructions. It relies on WineHQ for this[6]. Furthermore, the applications for which the concern was raised run in protected mode [4].
This version keeps UMIP enabled at all times and by default. If a general protection fault caused by the instructions protected by UMIP is detected, such fault will be fixed-up by returning dummy values as follows:
* SGDT and SIDT return hard-coded dummy values as the base of the global descriptor and interrupt descriptor tables. These hard-coded values are located within a memory map hole in x86_64. For x86_32, the same values are used but truncated to 4 bytes. This is also the case for virtual- 8086 mode tasks, in which the base is truncated to 3 bytes. In all cases, the limit of the table is set to 0. * STR and SLDT return 0 as the segment selector. This looks appropriate since we are providing a dummy value as the base address of the global descriptor table. * SMSW returns the value with which the CR0 register is programmed in head_32/64.S at boot time. This is, the following bits are enabed: CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for Extension Type, which will always be 1 in recent processors with UMIP; CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment Mask. Additionally, in x86_64, CR0.31 for Paging is set.
The proposed emulation code is handles faults that happens in both protected and virtual-8086 mode.
I found very useful the code for Intel MPX (Memory Protection Extensions) used to parse opcodes and the memory locations contained in the general purpose registers when used as operands. I put some of this code in a separate library file that both MPX and UMIP can access and avoid code duplication. While here, I fixed two small bugs that I found in the MPX implementation. This new library was also extended to handle 16-bit address encodings as those found in virtual-8086 mode tasks.
A set of representative selftests for virtual-8086 mode tasks are included as part of this series. The tested uses use displacements, registers to indicate memory addresses as well as the use as registers as operands.
Extensive test cases were performed to test the page fault that is emulated when memory to write the instructions results is not accesible[7], to tests almost all combinations of operands (ModRM, SiB, REX prefix and displacements) in protected mode[8] and to test almost all the combinations of operands in virtual-8086 mode[9]. If there is interest, this could could also be submitted for the kernel selftests.
[1]. https://lwn.net/Articles/705877/ [2]. https://lkml.org/lkml/2016/12/23/265 [3]. https://www.winehq.org/ [4]. https://www.winehq.org/pipermail/wine-devel/2016-November/115320.html [5]. http://www.dosemu.org/ [6]. http://marc.info/?l=linux-kernel&m=147876798717927&w=2 [7]. https://github.com/01org/luv-yocto/blob/rneri/umip/meta-luv/recipes-core/umi... [8]. https://github.com/01org/luv-yocto/blob/rneri/umip/meta-luv/recipes-core/umi... [9]. https://github.com/01org/luv-yocto/blob/rneri/umip/meta-luv/recipes-kernel/l...
Thanks and BR, Ricardo
Changes since V2: * Added new utility functions to decode the memory addresses contained in registers when the 16-bit addressing encodings are used. This includes code to obtain and compute memory addresses using segment selectors for real-mode address translation. * Added support to emulate UMIP-protected instructions for virtual-8086 tasks. * Added selftests for virtual-8086 mode that contains representative use cases: address represented as a displacement, address in registers and registers as operands. * Instead of maintaining a static variable for the dummy base addresses of the IDT and GDT, a hard-coded value is used. * The emulated SMSW instructions now return the value with which the CR0 register is programmed in head_32/64.S This is: PE | MP | ET | NE | WP | AM. For x86_64, PG is also enabled. * The new file arch/x86/lib/insn-utils.c is now renamed as arch/x86/lib/ insn-kernel.c. It also has its own header. This helps keep in sync the the kernel and objtool instruction decoders. Also, the new insn-kernel.c contains utility functions that are only relevant in a kernel context. * Removed printed warnings for errors that occur when decoding instructions with invalid operands. * Added more comments on fixes in the instruction-decoding MPX functions. * Now user_64bit_mode(regs) is used instead of test_thread_flag(TIF_IA32) to determine if the task is 32-bit or 64-bit. * Found and fixed a bug in insn-decoder in which X86_MODRM_RM was incorrectly used to obtain the mod part of the ModRM byte. * Added more explanatory code in emulation and instruction decoding code. This includes a comment regarding that copy_from_user could fail if there exists a memory protection key in place. * Tested code with CONFIG_X86_DECODER_SELFTEST=y and everything passes now. * Prefixed get_reg_offset_rm with insn_ as this function is exposed via a header file. For clarity, this function was added in a separate patch.
Changes since V1: * Virtual-8086 mode tasks are not treated in a special manner. All code for this purpose was removed. * Instead of attempting to disable UMIP during a context switch or when entering virtual-8086 mode, UMIP remains enabled all the time. General protection faults that occur are fixed-up by returning dummy values as detailed above. * Removed umip= kernel parameter in favor of using clearcpuid=514 to disable UMIP. * Removed selftests designed to detect the absence of SIGSEGV signals when running in virtual-8086 mode. * Reused code from MPX to decode instructions operands. For this purpose code was put in a common location. * Fixed two bugs in MPX code that decodes operands.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Shuah Khan shuah@kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z Li liang.z.li@intel.com Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org
Ricardo Neri (10): x86/mpx: Do not use SIB index if index points to R/ESP x86/mpx: Fail decoding when SIB baseR/EBP is and no displacement is used x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel x86/insn-kernel: Add a function to obtain register offset in ModRM x86/insn-kernel: Add support to resolve 16-bit addressing encodings x86/cpufeature: Add User-Mode Instruction Prevention definitions x86: Add emulation code for UMIP instructions x86/traps: Fixup general protection faults caused by UMIP x86: Enable User-Mode Instruction Prevention selftests/x86: Add tests for User-Mode Instruction Prevention
arch/x86/Kconfig | 10 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/insn-kernel.h | 17 ++ arch/x86/include/asm/umip.h | 15 ++ arch/x86/include/uapi/asm/processor-flags.h | 2 + arch/x86/kernel/Makefile | 1 + arch/x86/kernel/cpu/common.c | 16 +- arch/x86/kernel/traps.c | 4 + arch/x86/kernel/umip.c | 251 +++++++++++++++++++ arch/x86/lib/Makefile | 2 +- arch/x86/lib/insn-kernel.c | 344 ++++++++++++++++++++++++++ arch/x86/mm/mpx.c | 120 +-------- tools/testing/selftests/x86/entry_from_vm86.c | 39 ++- 14 files changed, 708 insertions(+), 122 deletions(-) create mode 100644 arch/x86/include/asm/insn-kernel.h create mode 100644 arch/x86/include/asm/umip.h create mode 100644 arch/x86/kernel/umip.c create mode 100644 arch/x86/lib/insn-kernel.c
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when memory addressing is used (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of the SIB byte points to the R/ESP (i.e., index = 4), the index should not be used in the computation of the memory address.
In these cases the address is simply the value present in the register pointed by the base part of the SIB byte plus the displacement byte.
An example of such instruction could be
insn -0x80(%rsp)
This is represented as:
[opcode] 4c 23 80
ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP) SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX): Displacement -0x80
The correct address is (base) + displacement; no index is used.
We can achieve the desired effect of not using the index by making get_reg_offset return a negative offset in this particular case. A negative offset indicates callers that they should not use the index to calculate the address. This is equivalent to using a index of zero when multiplying it by the base.
Care is taken to allow R12 to be used as index, which is a valid scenario.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/mm/mpx.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index af59f80..9d15f6b 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -109,6 +109,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, regno = X86_SIB_INDEX(insn->sib.value); if (X86_REX_X(insn->rex_prefix.value)) regno += 8; + /* + * If mod !=3, register R/ESP (regno=4) is not used as index in + * the address computation. Check is done after looking at REX.X + * This is because R12 (regno=12) can be used as an index. + */ + if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3) + return -EINVAL; break;
case REG_TYPE_BASE: @@ -157,11 +164,16 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) goto out_err;
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); + /* + * A negative offset means that the register cannot be + * be used as an index. + */ if (indx_offset < 0) - goto out_err; + indx = 0; + else + indx = regs_get_register(regs, indx_offset);
base = regs_get_register(regs, base_offset); - indx = regs_get_register(regs, indx_offset); addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when a SIB byte is used and the base of such SIB byte points to R/EBP (i.e., base = 5), an explicit displacement is needed. This is, the mod part of the ModRM byte cannot be zero. In case that no displacement is needed, the mod part of the ModRM byte must be 1 with the displacement byte equal to zero.
Make the address decoder to return -EINVAL in the aforementioned case.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/mm/mpx.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 9d15f6b..c59a851 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -120,6 +120,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
case REG_TYPE_BASE: regno = X86_SIB_BASE(insn->sib.value); + /* + * If R/EBP (regno = 5) is indicated in the base part of the SIB + * byte, an explicit displacement must be specified. In other + * words, the mod part of the ModRM byte cannot be zero. + */ + if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) + return -EINVAL; + if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break;
Other kernel submodules can benefit from using the utility functions defined in mpx.c to obtain the addresses and values of operands contained in the general purpose registers. An instance of this is the emulation code used for instructions protected by the Intel User-Mode Instruction Prevention feature.
Thus, these functions are relocated to a new insn-kernel.c file. The reason to not relocate these utilities into insn.c is that the latter solely analyses instructions given by a struct insn without any knowledge of the kernel context. This new utilities insn-kernel.c aims to be used within the context of the kernel. For instance, it can be used to determine memory addresses as encoded in the contents of the general purpose registers.
These utilities come with a separate header. This is to avoid taking insn.c out of sync from the instructions decoders under tools/obj and tools/perf. This also avoids adding cumbersome #ifdef's for the #include'd files required to decode instructions in a kernel context.
Functions are simply relocated. There are not functional or indentation changes.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-kernel.h | 16 ++++ arch/x86/lib/Makefile | 2 +- arch/x86/lib/insn-kernel.c | 147 +++++++++++++++++++++++++++++++++++++ arch/x86/mm/mpx.c | 140 +---------------------------------- 4 files changed, 166 insertions(+), 139 deletions(-) create mode 100644 arch/x86/include/asm/insn-kernel.h create mode 100644 arch/x86/lib/insn-kernel.c
diff --git a/arch/x86/include/asm/insn-kernel.h b/arch/x86/include/asm/insn-kernel.h new file mode 100644 index 0000000..aef416a --- /dev/null +++ b/arch/x86/include/asm/insn-kernel.h @@ -0,0 +1,16 @@ +#ifndef _ASM_X86_INSN_KERNEL_H +#define _ASM_X86_INSN_KERNEL_H +/* + * A collection of utility functions for x86 instruction analysis to be + * used in a kernel context. Useful when, for instance, making sense + * of the registers indicated by operands. + */ + +#include <linux/compiler.h> +#include <linux/bug.h> +#include <linux/err.h> +#include <asm/ptrace.h> + +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); + +#endif /* _ASM_X86_INSN_KERNEL_H */ diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 34a7413..d33eff1 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += memcpy_$(BITS).o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o -lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o +lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-kernel.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o diff --git a/arch/x86/lib/insn-kernel.c b/arch/x86/lib/insn-kernel.c new file mode 100644 index 0000000..8072abe --- /dev/null +++ b/arch/x86/lib/insn-kernel.c @@ -0,0 +1,147 @@ +/* + * Utility functions for x86 operand and address decoding + * + * Copyright (C) Intel Corporation 2016 + */ +#include <linux/kernel.h> +#include <linux/string.h> +#include <asm/inat.h> +#include <asm/insn.h> +#include <asm/insn-kernel.h> + +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static int get_reg_offset(struct insn *insn, struct pt_regs *regs, + enum reg_type type) +{ + int regno = 0; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + int nr_registers = ARRAY_SIZE(regoff); + /* + * Don't possibly decode a 32-bit instructions as + * reading a 64-bit-only register. + */ + if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64) + nr_registers -= 8; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(insn->modrm.value); + if (X86_REX_B(insn->rex_prefix.value)) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(insn->sib.value); + if (X86_REX_X(insn->rex_prefix.value)) + regno += 8; + /* + * If mod !=3, register R/ESP (regno=4) is not used as index in + * the address computation. Check is done after looking at REX.X + * This is because R12 (regno=12) can be used as an index. + */ + if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3) + return -EINVAL; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(insn->sib.value); + /* + * If R/EBP (regno = 5) is indicated in the base part of the SIB + * byte, an explicit displacement must be specified. In other + * words, the mod part of the ModRM byte cannot be zero. + */ + if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) + return -EINVAL; + + if (X86_REX_B(insn->rex_prefix.value)) + regno += 8; + break; + + default: + pr_err("invalid register type"); + BUG(); + break; + } + + if (regno >= nr_registers) { + WARN_ONCE(1, "decoded an instruction with an invalid register"); + return -EINVAL; + } + return regoff[regno]; +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +{ + unsigned long addr, base, indx; + int addr_offset, base_offset, indx_offset; + insn_byte_t sib; + + insn_get_modrm(insn); + insn_get_sib(insn); + sib = insn->sib.value; + + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset < 0) + goto out_err; + addr = regs_get_register(regs, addr_offset); + } else { + if (insn->sib.nbytes) { + base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); + if (base_offset < 0) + goto out_err; + + indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); + /* + * A negative offset means that the register cannot be + * be used as an index. + */ + if (indx_offset < 0) + indx = 0; + else + indx = regs_get_register(regs, indx_offset); + + base = regs_get_register(regs, base_offset); + addr = base + indx * (1 << X86_SIB_SCALE(sib)); + } else { + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset < 0) + goto out_err; + addr = regs_get_register(regs, addr_offset); + } + addr += insn->displacement.value; + } + return (void __user *)addr; +out_err: + return (void __user *)-1; +} diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index c59a851..ca6fe13 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -11,6 +11,7 @@ #include <linux/sched/sysctl.h>
#include <asm/insn.h> +#include <asm/insn-kernel.h> #include <asm/mman.h> #include <asm/mmu_context.h> #include <asm/mpx.h> @@ -59,143 +60,6 @@ static unsigned long mpx_mmap(unsigned long len) return addr; }
-enum reg_type { - REG_TYPE_RM = 0, - REG_TYPE_INDEX, - REG_TYPE_BASE, -}; - -static int get_reg_offset(struct insn *insn, struct pt_regs *regs, - enum reg_type type) -{ - int regno = 0; - - static const int regoff[] = { - offsetof(struct pt_regs, ax), - offsetof(struct pt_regs, cx), - offsetof(struct pt_regs, dx), - offsetof(struct pt_regs, bx), - offsetof(struct pt_regs, sp), - offsetof(struct pt_regs, bp), - offsetof(struct pt_regs, si), - offsetof(struct pt_regs, di), -#ifdef CONFIG_X86_64 - offsetof(struct pt_regs, r8), - offsetof(struct pt_regs, r9), - offsetof(struct pt_regs, r10), - offsetof(struct pt_regs, r11), - offsetof(struct pt_regs, r12), - offsetof(struct pt_regs, r13), - offsetof(struct pt_regs, r14), - offsetof(struct pt_regs, r15), -#endif - }; - int nr_registers = ARRAY_SIZE(regoff); - /* - * Don't possibly decode a 32-bit instructions as - * reading a 64-bit-only register. - */ - if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64) - nr_registers -= 8; - - switch (type) { - case REG_TYPE_RM: - regno = X86_MODRM_RM(insn->modrm.value); - if (X86_REX_B(insn->rex_prefix.value)) - regno += 8; - break; - - case REG_TYPE_INDEX: - regno = X86_SIB_INDEX(insn->sib.value); - if (X86_REX_X(insn->rex_prefix.value)) - regno += 8; - /* - * If mod !=3, register R/ESP (regno=4) is not used as index in - * the address computation. Check is done after looking at REX.X - * This is because R12 (regno=12) can be used as an index. - */ - if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3) - return -EINVAL; - break; - - case REG_TYPE_BASE: - regno = X86_SIB_BASE(insn->sib.value); - /* - * If R/EBP (regno = 5) is indicated in the base part of the SIB - * byte, an explicit displacement must be specified. In other - * words, the mod part of the ModRM byte cannot be zero. - */ - if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) - return -EINVAL; - - if (X86_REX_B(insn->rex_prefix.value)) - regno += 8; - break; - - default: - pr_err("invalid register type"); - BUG(); - break; - } - - if (regno >= nr_registers) { - WARN_ONCE(1, "decoded an instruction with an invalid register"); - return -EINVAL; - } - return regoff[regno]; -} - -/* - * return the address being referenced be instruction - * for rm=3 returning the content of the rm reg - * for rm!=3 calculates the address using SIB and Disp - */ -static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) -{ - unsigned long addr, base, indx; - int addr_offset, base_offset, indx_offset; - insn_byte_t sib; - - insn_get_modrm(insn); - insn_get_sib(insn); - sib = insn->sib.value; - - if (X86_MODRM_MOD(insn->modrm.value) == 3) { - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) - goto out_err; - addr = regs_get_register(regs, addr_offset); - } else { - if (insn->sib.nbytes) { - base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); - if (base_offset < 0) - goto out_err; - - indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - /* - * A negative offset means that the register cannot be - * be used as an index. - */ - if (indx_offset < 0) - indx = 0; - else - indx = regs_get_register(regs, indx_offset); - - base = regs_get_register(regs, base_offset); - addr = base + indx * (1 << X86_SIB_SCALE(sib)); - } else { - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) - goto out_err; - addr = regs_get_register(regs, addr_offset); - } - addr += insn->displacement.value; - } - return (void __user *)addr; -out_err: - return (void __user *)-1; -} - static int mpx_insn_decode(struct insn *insn, struct pt_regs *regs) { @@ -308,7 +172,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) info->si_signo = SIGSEGV; info->si_errno = 0; info->si_code = SEGV_BNDERR; - info->si_addr = mpx_get_addr_ref(&insn, regs); + info->si_addr = insn_get_addr_ref(&insn, regs); /* * We were not able to extract an address from the instruction, * probably because there was something invalid in it.
On Wed, 25 Jan 2017 12:23:46 -0800 Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Other kernel submodules can benefit from using the utility functions defined in mpx.c to obtain the addresses and values of operands contained in the general purpose registers. An instance of this is the emulation code used for instructions protected by the Intel User-Mode Instruction Prevention feature.
Thus, these functions are relocated to a new insn-kernel.c file. The reason to not relocate these utilities into insn.c is that the latter solely analyses instructions given by a struct insn without any knowledge of the kernel context. This new utilities insn-kernel.c aims to be used within the context of the kernel. For instance, it can be used to determine memory addresses as encoded in the contents of the general purpose registers.
What would you mean the "kernel context" here? Extracting the register offset or an address by decoding instruction seems not depending on where the context (pt_regs) in kernel or user...
Of course, this is a kind of "evaluation" of instruction, so it might be better to split it to other file, but I think insn-eval.c is better.
Thank you,
These utilities come with a separate header. This is to avoid taking insn.c out of sync from the instructions decoders under tools/obj and tools/perf. This also avoids adding cumbersome #ifdef's for the #include'd files required to decode instructions in a kernel context.
Functions are simply relocated. There are not functional or indentation changes.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-kernel.h | 16 ++++ arch/x86/lib/Makefile | 2 +- arch/x86/lib/insn-kernel.c | 147 +++++++++++++++++++++++++++++++++++++ arch/x86/mm/mpx.c | 140 +---------------------------------- 4 files changed, 166 insertions(+), 139 deletions(-) create mode 100644 arch/x86/include/asm/insn-kernel.h create mode 100644 arch/x86/lib/insn-kernel.c
diff --git a/arch/x86/include/asm/insn-kernel.h b/arch/x86/include/asm/insn-kernel.h new file mode 100644 index 0000000..aef416a --- /dev/null +++ b/arch/x86/include/asm/insn-kernel.h @@ -0,0 +1,16 @@ +#ifndef _ASM_X86_INSN_KERNEL_H +#define _ASM_X86_INSN_KERNEL_H +/*
- A collection of utility functions for x86 instruction analysis to be
- used in a kernel context. Useful when, for instance, making sense
- of the registers indicated by operands.
- */
+#include <linux/compiler.h> +#include <linux/bug.h> +#include <linux/err.h> +#include <asm/ptrace.h>
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+#endif /* _ASM_X86_INSN_KERNEL_H */ diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 34a7413..d33eff1 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += memcpy_$(BITS).o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o -lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o +lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-kernel.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o diff --git a/arch/x86/lib/insn-kernel.c b/arch/x86/lib/insn-kernel.c new file mode 100644 index 0000000..8072abe --- /dev/null +++ b/arch/x86/lib/insn-kernel.c @@ -0,0 +1,147 @@ +/*
- Utility functions for x86 operand and address decoding
- Copyright (C) Intel Corporation 2016
- */
+#include <linux/kernel.h> +#include <linux/string.h> +#include <asm/inat.h> +#include <asm/insn.h> +#include <asm/insn-kernel.h>
+enum reg_type {
- REG_TYPE_RM = 0,
- REG_TYPE_INDEX,
- REG_TYPE_BASE,
+};
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
enum reg_type type)
+{
- int regno = 0;
- static const int regoff[] = {
offsetof(struct pt_regs, ax),
offsetof(struct pt_regs, cx),
offsetof(struct pt_regs, dx),
offsetof(struct pt_regs, bx),
offsetof(struct pt_regs, sp),
offsetof(struct pt_regs, bp),
offsetof(struct pt_regs, si),
offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
offsetof(struct pt_regs, r8),
offsetof(struct pt_regs, r9),
offsetof(struct pt_regs, r10),
offsetof(struct pt_regs, r11),
offsetof(struct pt_regs, r12),
offsetof(struct pt_regs, r13),
offsetof(struct pt_regs, r14),
offsetof(struct pt_regs, r15),
+#endif
- };
- int nr_registers = ARRAY_SIZE(regoff);
- /*
* Don't possibly decode a 32-bit instructions as
* reading a 64-bit-only register.
*/
- if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
nr_registers -= 8;
- switch (type) {
- case REG_TYPE_RM:
regno = X86_MODRM_RM(insn->modrm.value);
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
- case REG_TYPE_INDEX:
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
/*
* If mod !=3, register R/ESP (regno=4) is not used as index in
* the address computation. Check is done after looking at REX.X
* This is because R12 (regno=12) can be used as an index.
*/
if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
return -EINVAL;
break;
- case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
/*
* If R/EBP (regno = 5) is indicated in the base part of the SIB
* byte, an explicit displacement must be specified. In other
* words, the mod part of the ModRM byte cannot be zero.
*/
if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
return -EINVAL;
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
- default:
pr_err("invalid register type");
BUG();
break;
- }
- if (regno >= nr_registers) {
WARN_ONCE(1, "decoded an instruction with an invalid register");
return -EINVAL;
- }
- return regoff[regno];
+}
+/*
- return the address being referenced be instruction
- for rm=3 returning the content of the rm reg
- for rm!=3 calculates the address using SIB and Disp
- */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +{
- unsigned long addr, base, indx;
- int addr_offset, base_offset, indx_offset;
- insn_byte_t sib;
- insn_get_modrm(insn);
- insn_get_sib(insn);
- sib = insn->sib.value;
- if (X86_MODRM_MOD(insn->modrm.value) == 3) {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
addr = regs_get_register(regs, addr_offset);
- } else {
if (insn->sib.nbytes) {
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
if (base_offset < 0)
goto out_err;
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
/*
* A negative offset means that the register cannot be
* be used as an index.
*/
if (indx_offset < 0)
indx = 0;
else
indx = regs_get_register(regs, indx_offset);
base = regs_get_register(regs, base_offset);
addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
addr = regs_get_register(regs, addr_offset);
}
addr += insn->displacement.value;
- }
- return (void __user *)addr;
+out_err:
- return (void __user *)-1;
+} diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index c59a851..ca6fe13 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -11,6 +11,7 @@ #include <linux/sched/sysctl.h>
#include <asm/insn.h> +#include <asm/insn-kernel.h> #include <asm/mman.h> #include <asm/mmu_context.h> #include <asm/mpx.h> @@ -59,143 +60,6 @@ static unsigned long mpx_mmap(unsigned long len) return addr; }
-enum reg_type {
- REG_TYPE_RM = 0,
- REG_TYPE_INDEX,
- REG_TYPE_BASE,
-};
-static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
enum reg_type type)
-{
- int regno = 0;
- static const int regoff[] = {
offsetof(struct pt_regs, ax),
offsetof(struct pt_regs, cx),
offsetof(struct pt_regs, dx),
offsetof(struct pt_regs, bx),
offsetof(struct pt_regs, sp),
offsetof(struct pt_regs, bp),
offsetof(struct pt_regs, si),
offsetof(struct pt_regs, di),
-#ifdef CONFIG_X86_64
offsetof(struct pt_regs, r8),
offsetof(struct pt_regs, r9),
offsetof(struct pt_regs, r10),
offsetof(struct pt_regs, r11),
offsetof(struct pt_regs, r12),
offsetof(struct pt_regs, r13),
offsetof(struct pt_regs, r14),
offsetof(struct pt_regs, r15),
-#endif
- };
- int nr_registers = ARRAY_SIZE(regoff);
- /*
* Don't possibly decode a 32-bit instructions as
* reading a 64-bit-only register.
*/
- if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
nr_registers -= 8;
- switch (type) {
- case REG_TYPE_RM:
regno = X86_MODRM_RM(insn->modrm.value);
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
- case REG_TYPE_INDEX:
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
/*
* If mod !=3, register R/ESP (regno=4) is not used as index in
* the address computation. Check is done after looking at REX.X
* This is because R12 (regno=12) can be used as an index.
*/
if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
return -EINVAL;
break;
- case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
/*
* If R/EBP (regno = 5) is indicated in the base part of the SIB
* byte, an explicit displacement must be specified. In other
* words, the mod part of the ModRM byte cannot be zero.
*/
if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
return -EINVAL;
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
- default:
pr_err("invalid register type");
BUG();
break;
- }
- if (regno >= nr_registers) {
WARN_ONCE(1, "decoded an instruction with an invalid register");
return -EINVAL;
- }
- return regoff[regno];
-}
-/*
- return the address being referenced be instruction
- for rm=3 returning the content of the rm reg
- for rm!=3 calculates the address using SIB and Disp
- */
-static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) -{
- unsigned long addr, base, indx;
- int addr_offset, base_offset, indx_offset;
- insn_byte_t sib;
- insn_get_modrm(insn);
- insn_get_sib(insn);
- sib = insn->sib.value;
- if (X86_MODRM_MOD(insn->modrm.value) == 3) {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
addr = regs_get_register(regs, addr_offset);
- } else {
if (insn->sib.nbytes) {
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
if (base_offset < 0)
goto out_err;
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
/*
* A negative offset means that the register cannot be
* be used as an index.
*/
if (indx_offset < 0)
indx = 0;
else
indx = regs_get_register(regs, indx_offset);
base = regs_get_register(regs, base_offset);
addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
addr = regs_get_register(regs, addr_offset);
}
addr += insn->displacement.value;
- }
- return (void __user *)addr;
-out_err:
- return (void __user *)-1;
-}
static int mpx_insn_decode(struct insn *insn, struct pt_regs *regs) { @@ -308,7 +172,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) info->si_signo = SIGSEGV; info->si_errno = 0; info->si_code = SEGV_BNDERR;
- info->si_addr = mpx_get_addr_ref(&insn, regs);
- info->si_addr = insn_get_addr_ref(&insn, regs); /*
- We were not able to extract an address from the instruction,
- probably because there was something invalid in it.
-- 2.9.3
The function insn_get_reg_offset requires a type to indicate whether the returned offset is that given by by the ModRM or the SIB byte. Callers of this function would need the definition of the type struct. This is not needed. Instead, auxiliary functions can be defined for this purpose.
When the operand is a register, the emulation code for User-Mode Instruction Prevention needs to know the offset of the register indicated in the r/m part of the ModRM byte. Thus, start by adding an auxiliary function for this purpose.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-kernel.h | 1 + arch/x86/lib/insn-kernel.c | 5 +++++ 2 files changed, 6 insertions(+)
diff --git a/arch/x86/include/asm/insn-kernel.h b/arch/x86/include/asm/insn-kernel.h index aef416a..3f34649 100644 --- a/arch/x86/include/asm/insn-kernel.h +++ b/arch/x86/include/asm/insn-kernel.h @@ -12,5 +12,6 @@ #include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs);
#endif /* _ASM_X86_INSN_KERNEL_H */ diff --git a/arch/x86/lib/insn-kernel.c b/arch/x86/lib/insn-kernel.c index 8072abe..267cab4 100644 --- a/arch/x86/lib/insn-kernel.c +++ b/arch/x86/lib/insn-kernel.c @@ -95,6 +95,11 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs) +{ + return get_reg_offset(insn, regs, REG_TYPE_RM); +} + /* * return the address being referenced be instruction * for rm=3 returning the content of the rm reg
On Wed, 25 Jan 2017 12:23:47 -0800 Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
The function insn_get_reg_offset requires a type to indicate whether the returned offset is that given by by the ModRM or the SIB byte. Callers of this function would need the definition of the type struct. This is not needed. Instead, auxiliary functions can be defined for this purpose.
When the operand is a register, the emulation code for User-Mode Instruction Prevention needs to know the offset of the register indicated in the r/m part of the ModRM byte. Thus, start by adding an auxiliary function for this purpose.
Hmm, why wouldn't you just rename it to insn_get_reg_offset() and export it?
Thank you,
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-kernel.h | 1 + arch/x86/lib/insn-kernel.c | 5 +++++ 2 files changed, 6 insertions(+)
diff --git a/arch/x86/include/asm/insn-kernel.h b/arch/x86/include/asm/insn-kernel.h index aef416a..3f34649 100644 --- a/arch/x86/include/asm/insn-kernel.h +++ b/arch/x86/include/asm/insn-kernel.h @@ -12,5 +12,6 @@ #include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs);
#endif /* _ASM_X86_INSN_KERNEL_H */ diff --git a/arch/x86/lib/insn-kernel.c b/arch/x86/lib/insn-kernel.c index 8072abe..267cab4 100644 --- a/arch/x86/lib/insn-kernel.c +++ b/arch/x86/lib/insn-kernel.c @@ -95,6 +95,11 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs) +{
- return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
/*
- return the address being referenced be instruction
- for rm=3 returning the content of the rm reg
-- 2.9.3
Hi Masami,
On Thu, 2017-01-26 at 11:11 +0900, Masami Hiramatsu wrote:
On Wed, 25 Jan 2017 12:23:47 -0800 Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
The function insn_get_reg_offset requires a type to indicate whether the returned offset is that given by by the ModRM or the SIB byte. Callers of this function would need the definition of the type struct. This is not needed. Instead, auxiliary functions can be defined for this purpose.
When the operand is a register, the emulation code for User-Mode Instruction Prevention needs to know the offset of the register indicated in the r/m part of the ModRM byte. Thus, start by adding an auxiliary function for this purpose.
Hmm, why wouldn't you just rename it to insn_get_reg_offset() and export it?
Do you mean exporting the structure that I mention above? The problem that I am trying to solve is that callers sometimes want to know the offset of the register encoded in the SiB or the ModRM bytes. I could use something
insn_get_reg_offset(insn, regs, INSN_TYPE_MODRM) insn_get_reg_offset(insn, regs, INSN_TYPE_SIB)
Instead, I opted for
insn_get_reg_offset_rm(insn, regs) insn_get_reg_offset_sib(insn, regs)
to avoid exposing an enum with the INSN_TYPE_MODRM, INSN_TYPE_SIB.
If you feel that the former makes more sense, I can change the implementation.
Thanks and BR, Ricardo
Thank you,
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-kernel.h | 1 + arch/x86/lib/insn-kernel.c | 5 +++++ 2 files changed, 6 insertions(+)
diff --git a/arch/x86/include/asm/insn-kernel.h b/arch/x86/include/asm/insn-kernel.h index aef416a..3f34649 100644 --- a/arch/x86/include/asm/insn-kernel.h +++ b/arch/x86/include/asm/insn-kernel.h @@ -12,5 +12,6 @@ #include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs);
#endif /* _ASM_X86_INSN_KERNEL_H */ diff --git a/arch/x86/lib/insn-kernel.c b/arch/x86/lib/insn-kernel.c index 8072abe..267cab4 100644 --- a/arch/x86/lib/insn-kernel.c +++ b/arch/x86/lib/insn-kernel.c @@ -95,6 +95,11 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs) +{
- return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
/*
- return the address being referenced be instruction
- for rm=3 returning the content of the rm reg
-- 2.9.3
On Wed, 25 Jan 2017 22:07:16 -0800 Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Hi Masami,
On Thu, 2017-01-26 at 11:11 +0900, Masami Hiramatsu wrote:
On Wed, 25 Jan 2017 12:23:47 -0800 Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
The function insn_get_reg_offset requires a type to indicate whether the returned offset is that given by by the ModRM or the SIB byte. Callers of this function would need the definition of the type struct. This is not needed. Instead, auxiliary functions can be defined for this purpose.
When the operand is a register, the emulation code for User-Mode Instruction Prevention needs to know the offset of the register indicated in the r/m part of the ModRM byte. Thus, start by adding an auxiliary function for this purpose.
Hmm, why wouldn't you just rename it to insn_get_reg_offset() and export it?
Do you mean exporting the structure that I mention above? The problem that I am trying to solve is that callers sometimes want to know the offset of the register encoded in the SiB or the ModRM bytes. I could use something
insn_get_reg_offset(insn, regs, INSN_TYPE_MODRM) insn_get_reg_offset(insn, regs, INSN_TYPE_SIB)
Instead, I opted for
insn_get_reg_offset_rm(insn, regs) insn_get_reg_offset_sib(insn, regs)
to avoid exposing an enum with the INSN_TYPE_MODRM, INSN_TYPE_SIB.
OK, if so, I think you should export both of them at once, not only insn_get_reg_offset_rm().
Thank you,
If you feel that the former makes more sense, I can change the implementation.
Thanks and BR, Ricardo
Thank you,
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-kernel.h | 1 + arch/x86/lib/insn-kernel.c | 5 +++++ 2 files changed, 6 insertions(+)
diff --git a/arch/x86/include/asm/insn-kernel.h b/arch/x86/include/asm/insn-kernel.h index aef416a..3f34649 100644 --- a/arch/x86/include/asm/insn-kernel.h +++ b/arch/x86/include/asm/insn-kernel.h @@ -12,5 +12,6 @@ #include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs);
#endif /* _ASM_X86_INSN_KERNEL_H */ diff --git a/arch/x86/lib/insn-kernel.c b/arch/x86/lib/insn-kernel.c index 8072abe..267cab4 100644 --- a/arch/x86/lib/insn-kernel.c +++ b/arch/x86/lib/insn-kernel.c @@ -95,6 +95,11 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs) +{
- return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
/*
- return the address being referenced be instruction
- for rm=3 returning the content of the rm reg
-- 2.9.3
On Fri, 2017-01-27 at 16:53 +0900, Masami Hiramatsu wrote:
On Wed, 25 Jan 2017 22:07:16 -0800 Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Hi Masami,
On Thu, 2017-01-26 at 11:11 +0900, Masami Hiramatsu wrote:
On Wed, 25 Jan 2017 12:23:47 -0800 Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
The function insn_get_reg_offset requires a type to indicate whether the returned offset is that given by by the ModRM or the SIB byte. Callers of this function would need the definition of the type struct. This is not needed. Instead, auxiliary functions can be defined for this purpose.
When the operand is a register, the emulation code for User-Mode Instruction Prevention needs to know the offset of the register indicated in the r/m part of the ModRM byte. Thus, start by adding an auxiliary function for this purpose.
Hmm, why wouldn't you just rename it to insn_get_reg_offset() and export it?
Do you mean exporting the structure that I mention above? The problem that I am trying to solve is that callers sometimes want to know the offset of the register encoded in the SiB or the ModRM bytes. I could use something
insn_get_reg_offset(insn, regs, INSN_TYPE_MODRM) insn_get_reg_offset(insn, regs, INSN_TYPE_SIB)
Instead, I opted for
insn_get_reg_offset_rm(insn, regs) insn_get_reg_offset_sib(insn, regs)
to avoid exposing an enum with the INSN_TYPE_MODRM, INSN_TYPE_SIB.
OK, if so, I think you should export both of them at once, not only insn_get_reg_offset_rm().
Sure, I will include both functions.
Thanks and BR, Ricardo
Thank you,
If you feel that the former makes more sense, I can change the implementation.
Thanks and BR, Ricardo
Thank you,
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-kernel.h | 1 + arch/x86/lib/insn-kernel.c | 5 +++++ 2 files changed, 6 insertions(+)
diff --git a/arch/x86/include/asm/insn-kernel.h b/arch/x86/include/asm/insn-kernel.h index aef416a..3f34649 100644 --- a/arch/x86/include/asm/insn-kernel.h +++ b/arch/x86/include/asm/insn-kernel.h @@ -12,5 +12,6 @@ #include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs);
#endif /* _ASM_X86_INSN_KERNEL_H */ diff --git a/arch/x86/lib/insn-kernel.c b/arch/x86/lib/insn-kernel.c index 8072abe..267cab4 100644 --- a/arch/x86/lib/insn-kernel.c +++ b/arch/x86/lib/insn-kernel.c @@ -95,6 +95,11 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs) +{
- return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
/*
- return the address being referenced be instruction
- for rm=3 returning the content of the rm reg
-- 2.9.3
Tasks running in virtual-8086 mode will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section 2.1.5. 16-bit addressing encodings differ in several ways from the 32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte points to different registers and, in some cases, addresses can be indicated by the addition of the value of two registers. Also, there is no support for SiB bytes. Thus, a separate function is needed to parse this form of addressing.
Furthermore, virtual-8086 mode tasks will use real-mode addressing. This implies that the segment selectors do not point to a segment descriptor but are used to compute logical addresses. Hence, there is a need to add support to compute addresses using the segment selectors. If segment- override prefixes are present in the instructions, they take precedence.
Lastly, it is important to note that when a tasks is running in virtual- 8086 mode and an interrupt/exception occurs, the CPU pushes to the stack the segment selectors for ds, es, fs and gs. These are accesible via the struct kernel_vm86_regs rather than pt_regs.
Code for 16-bit addressing encodings is likely to be used only by virtual- 8086 mode tasks. Thus, this code is wrapped to be built only if the option CONFIG_VM86 is selected.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-kernel.c | 192 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+)
diff --git a/arch/x86/lib/insn-kernel.c b/arch/x86/lib/insn-kernel.c index 267cab4..10afdf3 100644 --- a/arch/x86/lib/insn-kernel.c +++ b/arch/x86/lib/insn-kernel.c @@ -8,6 +8,7 @@ #include <asm/inat.h> #include <asm/insn.h> #include <asm/insn-kernel.h> +#include <asm/vm86.h>
enum reg_type { REG_TYPE_RM = 0, @@ -95,6 +96,194 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+#ifdef CONFIG_VM86 + +/* + * Obtain the segment selector to use based on any prefixes in the instruction + * or in the offset of the register given by the r/m part of the ModRM byte. The + * register offset is as found in struct pt_regs. + */ +static unsigned short __get_segment_selector_16(struct pt_regs *regs, + struct insn *insn, int regoff) +{ + int i; + + struct kernel_vm86_regs *vm86regs = (struct kernel_vm86_regs *)regs; + + /* + * If not in virtual-8086 mode, the segment selector is not used + * to compute addresses but to select the segment descriptor. Return + * 0 to ease the computation of address. + */ + if (!v8086_mode(regs)) + return 0; + + insn_get_prefixes(insn); + + /* Check first if we have selector overrides */ + for (i = 0; i < insn->prefixes.nbytes; i++) { + switch (insn->prefixes.bytes[i]) { + /* + * Code and stack segment selector register are saved in all + * processor modes. Thus, it makes sense to take them + * from pt_regs. + */ + case 0x2e: + return (unsigned short)regs->cs; + case 0x36: + return (unsigned short)regs->ss; + /* + * The rest of the segment selector registers are only saved + * in virtual-8086 mode. Thus, we must obtain them from the + * vm86 register structure. + */ + case 0x3e: + return vm86regs->ds; + case 0x26: + return vm86regs->es; + case 0x64: + return vm86regs->fs; + case 0x65: + return vm86regs->gs; + /* + * No default action needed. We simply did not find any + * relevant prefixes. + */ + } + } + + /* + * If no overrides, use default selectors as described in the + * Intel documentationn. + */ + switch (regoff) { + case -EINVAL: /* no register involved in address computation */ + case offsetof(struct pt_regs, bx): + case offsetof(struct pt_regs, di): + case offsetof(struct pt_regs, si): + return vm86regs->ds; + case offsetof(struct pt_regs, bp): + case offsetof(struct pt_regs, sp): + return (unsigned short)regs->ss; + /* ax, cx, dx are not valid registers for 16-bit addressing*/ + default: + return -EINVAL; + } +} + +/* + * Obtain offsets from pt_regs to the two registers indicated by the + * r/m part of the ModRM byte. A negative offset indicates that the + * register should not be used. + */ +static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs, + int *offs1, int *offs2) +{ + /* 16-bit addressing can use one or two registers */ + static const int regoff1[] = { + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bx), + }; + + static const int regoff2[] = { + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + -EINVAL, + -EINVAL, + -EINVAL, + -EINVAL, + }; + + if (!offs1 || !offs2) + return -EINVAL; + + /* operand is a register, use the generic function */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + *offs1 = insn_get_reg_offset_rm(insn, regs); + *offs2 = -EINVAL; + return 0; + } + + *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)]; + *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)]; + + /* + * If no displacement is indicated in the mod part of the ModRM byte, + * (mod part is 0) and the r/m part of the same byte is 6, no register + * is used caculate the operand address. An r/m part of 6 means that + * the second register offset is already invalid. + */ + if ((X86_MODRM_MOD(insn->modrm.value) == 0) && + (X86_MODRM_RM(insn->modrm.value) == 6)) + *offs1 = -EINVAL; + + return 0; +} + +static void __user *insn_get_addr_ref_16(struct insn *insn, + struct pt_regs *regs) +{ + unsigned long addr; + unsigned short addr1 = 0, addr2 = 0; + int addr_offset1, addr_offset2; + int ret; + unsigned short seg = 0; + + insn_get_displacement(insn); + + /* + * If operand is a register, the layout is the same as in + * 32-bit and 64-bit addressing. + */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + addr_offset1 = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset1 < 0) + goto out_err; + seg = __get_segment_selector_16(regs, insn, addr_offset1); + addr = (seg << 4) + regs_get_register(regs, addr_offset1); + } else { + ret = get_reg_offset_16(insn, regs, &addr_offset1, + &addr_offset2); + if (ret < 0) + goto out_err; + /* + * Don't fail on invalid offset values. They might be invalid + * because they are not supported. Instead, use them in the + * calculation only if they contain a valid value. + */ + if (addr_offset1 >= 0) + addr1 = regs_get_register(regs, addr_offset1); + if (addr_offset2 >= 0) + addr2 = regs_get_register(regs, addr_offset2); + seg = __get_segment_selector_16(regs, insn, addr_offset1); + if (seg < 0) + goto out_err; + addr = (seg << 4) + addr1 + addr2; + } + addr += insn->displacement.value; + + return (void __user *)addr; +out_err: + return (void __user *)-1; +} +#else + +static void __user *insn_get_addr_ref_16(struct insn *insn, + struct pt_regs *regs) +{ + return (void __user *)-1; +} + +#endif /* CONFIG_VM86 */ + int insn_get_reg_offset_rm(struct insn *insn, struct pt_regs *regs) { return get_reg_offset(insn, regs, REG_TYPE_RM); @@ -111,6 +300,9 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) int addr_offset, base_offset, indx_offset; insn_byte_t sib;
+ if (insn->addr_bytes == 2) + return insn_get_addr_ref_16(insn, regs); + insn_get_modrm(insn); insn_get_sib(insn); sib = insn->sib.value;
On Wed, Jan 25, 2017 at 12:23 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Tasks running in virtual-8086 mode will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section 2.1.5. 16-bit addressing encodings differ in several ways from the 32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte points to different registers and, in some cases, addresses can be indicated by the addition of the value of two registers. Also, there is no support for SiB bytes. Thus, a separate function is needed to parse this form of addressing.
Furthermore, virtual-8086 mode tasks will use real-mode addressing. This implies that the segment selectors do not point to a segment descriptor but are used to compute logical addresses. Hence, there is a need to add support to compute addresses using the segment selectors. If segment- override prefixes are present in the instructions, they take precedence.
Lastly, it is important to note that when a tasks is running in virtual- 8086 mode and an interrupt/exception occurs, the CPU pushes to the stack the segment selectors for ds, es, fs and gs. These are accesible via the struct kernel_vm86_regs rather than pt_regs.
Code for 16-bit addressing encodings is likely to be used only by virtual- 8086 mode tasks. Thus, this code is wrapped to be built only if the option CONFIG_VM86 is selected.
That's not true. It's used in 16-bit protected mode, too. And there are (ugh!) six possibilities:
- Normal 32-bit protected mode. This should already work. - Normal 64-bit protected mode. This should also already work. (I forget whether a 16-bit SS is either illegal or has no effect in this case.) - Virtual 8086 mode - Normal 16-bit protected mode, used by DOSEMU and Wine. (16-bit CS, 16-bit address segment) - 16-bit CS, 32-bit address segment. IIRC this might be used by some 32-bit DOS programs to call BIOS. - 32-bit CS, 16-bit address segment. I don't know whether anything uses this.
I don't know if anything you're doing cares about SS's, DS's, etc. size, but I suspect you'll need to handle 16-bit CS.
Buchbinder adam.buchbinder@gmail.com,Colin Ian King colin.king@canonical.com,Lorenzo Stoakes lstoakes@gmail.com,Qiaowei Ren qiaowei.ren@intel.com,Arnaldo Carvalho de Melo acme@redhat.com,Adrian Hunter adrian.hunter@intel.com,Kees Cook keescook@chromium.org,Thomas Garnier thgarnie@google.com,Dmitry Vyukov dvyukov@google.com From: hpa@zytor.com Message-ID: 18E8698F-6C60-4B98-AE73-C371184C5135@zytor.com
On January 25, 2017 1:58:27 PM PST, Andy Lutomirski luto@amacapital.net wrote:
On Wed, Jan 25, 2017 at 12:23 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Tasks running in virtual-8086 mode will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture
Software
Developer's Manual Volume 2A Section 2.1.5. 16-bit addressing
encodings
differ in several ways from the 32-bit/64-bit addressing form
encodings:
the r/m part of the ModRM byte points to different registers and, in
some
cases, addresses can be indicated by the addition of the value of two registers. Also, there is no support for SiB bytes. Thus, a separate function is needed to parse this form of addressing.
Furthermore, virtual-8086 mode tasks will use real-mode addressing.
This
implies that the segment selectors do not point to a segment
descriptor
but are used to compute logical addresses. Hence, there is a need to add support to compute addresses using the segment selectors. If
segment-
override prefixes are present in the instructions, they take
precedence.
Lastly, it is important to note that when a tasks is running in
virtual-
8086 mode and an interrupt/exception occurs, the CPU pushes to the
stack
the segment selectors for ds, es, fs and gs. These are accesible via
the
struct kernel_vm86_regs rather than pt_regs.
Code for 16-bit addressing encodings is likely to be used only by
virtual-
8086 mode tasks. Thus, this code is wrapped to be built only if the option CONFIG_VM86 is selected.
That's not true. It's used in 16-bit protected mode, too. And there are (ugh!) six possibilities:
- Normal 32-bit protected mode. This should already work.
- Normal 64-bit protected mode. This should also already work. (I
forget whether a 16-bit SS is either illegal or has no effect in this case.)
- Virtual 8086 mode
- Normal 16-bit protected mode, used by DOSEMU and Wine. (16-bit CS,
16-bit address segment)
- 16-bit CS, 32-bit address segment. IIRC this might be used by some
32-bit DOS programs to call BIOS.
- 32-bit CS, 16-bit address segment. I don't know whether anything
uses this.
I don't know if anything you're doing cares about SS's, DS's, etc. size, but I suspect you'll need to handle 16-bit CS.
Only the CS bitness matters for the purpose of addressing modes; the SS bitness (which has no effect in 64-bit mode) only matters for implicit stack references unless I'm completely out to sea.
On Wed, 2017-01-25 at 13:58 -0800, Andy Lutomirski wrote:
On Wed, Jan 25, 2017 at 12:23 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Tasks running in virtual-8086 mode will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section 2.1.5. 16-bit addressing encodings differ in several ways from the 32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte points to different registers and, in some cases, addresses can be indicated by the addition of the value of two registers. Also, there is no support for SiB bytes. Thus, a separate function is needed to parse this form of addressing.
Furthermore, virtual-8086 mode tasks will use real-mode addressing. This implies that the segment selectors do not point to a segment descriptor but are used to compute logical addresses. Hence, there is a need to add support to compute addresses using the segment selectors. If segment- override prefixes are present in the instructions, they take precedence.
Lastly, it is important to note that when a tasks is running in virtual- 8086 mode and an interrupt/exception occurs, the CPU pushes to the stack the segment selectors for ds, es, fs and gs. These are accesible via the struct kernel_vm86_regs rather than pt_regs.
Code for 16-bit addressing encodings is likely to be used only by virtual- 8086 mode tasks. Thus, this code is wrapped to be built only if the option CONFIG_VM86 is selected.
That's not true. It's used in 16-bit protected mode, too. And there are (ugh!) six possibilities:
Thanks for the clarification. I will enable the decoding of addresses for 16-bit as well... and test the emulation code.
- Normal 32-bit protected mode. This should already work.
- Normal 64-bit protected mode. This should also already work. (I
forget whether a 16-bit SS is either illegal or has no effect in this case.)
For these two cases I am just taking the effective address that the user space application provides, given that the segment selectors were set beforehand (and with a base of 0).
copy_to/from_user(kernel_buf, effective_address, sizeof(kernel_buf));
- Virtual 8086 mode
In this case I calculate the linear address as: (segment_select << 4) + effective address.
- Normal 16-bit protected mode, used by DOSEMU and Wine. (16-bit CS,
16-bit address segment)
- 16-bit CS, 32-bit address segment. IIRC this might be used by some
32-bit DOS programs to call BIOS.
- 32-bit CS, 16-bit address segment. I don't know whether anything uses this.
In all these protected modes, are you referring to the size in bits of the base address of in the descriptor selected in the CS register? In such a case I would need to get the base address and add it to the effective address given in the operands of the instructions, right?
I don't know if anything you're doing cares about SS's, DS's, etc.
For 16-bit addressing encodings, I use DS for the BX, DI and SI registers, and no register at all. I use SS for BP and SP. This can be overridden by the segment selector prefixes. For 32/64-bit, I need to implement segment selector awareness.
size, but I suspect you'll need to handle 16-bit CS.
Unless I am missing what is special with the 16-bit base address, I only would need to add that base address to whatever effective address (aka, offset) is encoded in the ModRM and displacement bytes.
At this moment my implementation is able to decode the ModRM, SiB and displacement bytes for 16-bit and 32/64-bit address encodings.
Thanks and BR, Ricardo
On Wed, Jan 25, 2017 at 9:50 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
On Wed, 2017-01-25 at 13:58 -0800, Andy Lutomirski wrote:
On Wed, Jan 25, 2017 at 12:23 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Tasks running in virtual-8086 mode will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section 2.1.5. 16-bit addressing encodings differ in several ways from the 32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte points to different registers and, in some cases, addresses can be indicated by the addition of the value of two registers. Also, there is no support for SiB bytes. Thus, a separate function is needed to parse this form of addressing.
Furthermore, virtual-8086 mode tasks will use real-mode addressing. This implies that the segment selectors do not point to a segment descriptor but are used to compute logical addresses. Hence, there is a need to add support to compute addresses using the segment selectors. If segment- override prefixes are present in the instructions, they take precedence.
Lastly, it is important to note that when a tasks is running in virtual- 8086 mode and an interrupt/exception occurs, the CPU pushes to the stack the segment selectors for ds, es, fs and gs. These are accesible via the struct kernel_vm86_regs rather than pt_regs.
Code for 16-bit addressing encodings is likely to be used only by virtual- 8086 mode tasks. Thus, this code is wrapped to be built only if the option CONFIG_VM86 is selected.
That's not true. It's used in 16-bit protected mode, too. And there are (ugh!) six possibilities:
Thanks for the clarification. I will enable the decoding of addresses for 16-bit as well... and test the emulation code.
- Normal 32-bit protected mode. This should already work.
- Normal 64-bit protected mode. This should also already work. (I
forget whether a 16-bit SS is either illegal or has no effect in this case.)
For these two cases I am just taking the effective address that the user space application provides, given that the segment selectors were set beforehand (and with a base of 0).
What do you mean by the base being zero? User code can set a nonzero DS base if it wants. In 64-bit mode (user_64bit_mode(regs)), the base is ignored unless there's an fs or gs prefix, and in 32-bit mode the base is never ignored.
- Virtual 8086 mode
In this case I calculate the linear address as: (segment_select << 4) + effective address.
- Normal 16-bit protected mode, used by DOSEMU and Wine. (16-bit CS,
16-bit address segment)
- 16-bit CS, 32-bit address segment. IIRC this might be used by some
32-bit DOS programs to call BIOS.
- 32-bit CS, 16-bit address segment. I don't know whether anything uses this.
In all these protected modes, are you referring to the size in bits of the base address of in the descriptor selected in the CS register? In such a case I would need to get the base address and add it to the effective address given in the operands of the instructions, right?
No, I'm referring to the D/B bit. I'm a bit fuzzy on exactly how the instruction encoding works, but I think that 16-bit x86 code is encoded just like real mode code except that the selectors are used for real.
size, but I suspect you'll need to handle 16-bit CS.
Unless I am missing what is special with the 16-bit base address, I only would need to add that base address to whatever effective address (aka, offset) is encoded in the ModRM and displacement bytes.
Exactly. (And make sure the instruction decoder can decode 16-bit instructions correctly.)
On Thu, 2017-01-26 at 09:05 -0800, Andy Lutomirski wrote:
On Wed, Jan 25, 2017 at 9:50 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
On Wed, 2017-01-25 at 13:58 -0800, Andy Lutomirski wrote:
On Wed, Jan 25, 2017 at 12:23 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Tasks running in virtual-8086 mode will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section 2.1.5. 16-bit addressing encodings differ in several ways from the 32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte points to different registers and, in some cases, addresses can be indicated by the addition of the value of two registers. Also, there is no support for SiB bytes. Thus, a separate function is needed to parse this form of addressing.
Furthermore, virtual-8086 mode tasks will use real-mode addressing. This implies that the segment selectors do not point to a segment descriptor but are used to compute logical addresses. Hence, there is a need to add support to compute addresses using the segment selectors. If segment- override prefixes are present in the instructions, they take precedence.
Lastly, it is important to note that when a tasks is running in virtual- 8086 mode and an interrupt/exception occurs, the CPU pushes to the stack the segment selectors for ds, es, fs and gs. These are accesible via the struct kernel_vm86_regs rather than pt_regs.
Code for 16-bit addressing encodings is likely to be used only by virtual- 8086 mode tasks. Thus, this code is wrapped to be built only if the option CONFIG_VM86 is selected.
That's not true. It's used in 16-bit protected mode, too. And there are (ugh!) six possibilities:
Thanks for the clarification. I will enable the decoding of addresses for 16-bit as well... and test the emulation code.
- Normal 32-bit protected mode. This should already work.
- Normal 64-bit protected mode. This should also already work. (I
forget whether a 16-bit SS is either illegal or has no effect in this case.)
For these two cases I am just taking the effective address that the user space application provides, given that the segment selectors were set beforehand (and with a base of 0).
What do you mean by the base being zero? User code can set a nonzero DS base if it wants. In 64-bit mode (user_64bit_mode(regs)), the base is ignored unless there's an fs or gs prefix, and in 32-bit mode the base is never ignored.
Yes, I take this back. At the time of writing I was thinking about the __USER_CS and _USER_DS descriptors. You ar right, the base is not ignored.
- Virtual 8086 mode
In this case I calculate the linear address as: (segment_select << 4) + effective address.
- Normal 16-bit protected mode, used by DOSEMU and Wine. (16-bit CS,
16-bit address segment)
- 16-bit CS, 32-bit address segment. IIRC this might be used by some
32-bit DOS programs to call BIOS.
- 32-bit CS, 16-bit address segment. I don't know whether anything uses this.
In all these protected modes, are you referring to the size in bits of the base address of in the descriptor selected in the CS register? In such a case I would need to get the base address and add it to the effective address given in the operands of the instructions, right?
No, I'm referring to the D/B bit. I'm a bit fuzzy on exactly how the instruction encoding works, but I think that 16-bit x86 code is encoded just like real mode code except that the selectors are used for real.
I see. I have the logic to differentiate between 16-bit and 32-bit addresses. What I am missing is code to look at the value of this bit and any potential operand overrides. I will work on adding this.
size, but I suspect you'll need to handle 16-bit CS.
Unless I am missing what is special with the 16-bit base address, I only would need to add that base address to whatever effective address (aka, offset) is encoded in the ModRM and displacement bytes.
Exactly. (And make sure the instruction decoder can decode 16-bit instructions correctly.)
Will do. I have tested my 16-bit decoding extensively. I think it will work.
Thanks and BR, Ricardo
User-Mode Instruction Prevention is a security feature present in new Intel processors that, when set, prevents the execution of a subset of instructions if such instructions are executed in user mode (CPL > 0). Attempting to execute such instructions causes a general protection exception.
The subset of instructions comprises:
* SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register
This feature is also added to the list of disabled-features to allow a cleaner handling of build-time configuration.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/include/uapi/asm/processor-flags.h | 2 ++ 3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d45ab4b..fd04219 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -286,6 +286,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 85599ad..4707445 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -16,6 +16,12 @@ # define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31)) #endif
+#ifdef CONFIG_X86_INTEL_UMIP +# define DISABLE_UMIP 0 +#else +# define DISABLE_UMIP (1<<(X86_FEATURE_UMIP & 31)) +#endif + #ifdef CONFIG_X86_64 # define DISABLE_VME (1<<(X86_FEATURE_VME & 31)) # define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31)) @@ -55,7 +61,7 @@ #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 -#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE) +#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_UMIP) #define DISABLED_MASK17 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index 567de50..d2c2af8 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -104,6 +104,8 @@ #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT) #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */ #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT) +#define X86_CR4_UMIP_BIT 11 /* enable UMIP support */ +#define X86_CR4_UMIP _BITUL(X86_CR4_UMIP_BIT) #define X86_CR4_VMXE_BIT 13 /* enable VMX virtualization */ #define X86_CR4_VMXE _BITUL(X86_CR4_VMXE_BIT) #define X86_CR4_SMXE_BIT 14 /* enable safer mode (TXT) */
The feature User-Mode Instruction Prevention present in recent Intel processor prevents a group of instructions from being executed with CPL > 0. Otherwise, a general protection fault is issued.
Rather than relaying this fault to the user space (in the form of a SIGSEGV signal), the instructions protected by UMIP can be emulated to provide dummy results. This allows to conserve the current kernel behavior and not reveal the system resources that UMIP intends to protect (the global descriptor and interrupt descriptor tables, the segment selectors of the local descriptor table and the task state and the machine status word).
This emulation is needed because certain applications (e.g., WineHQ) rely on this subset of instructions to function.
The instructions protected by UMIP can be split in two groups. Those who return a kernel memory address (sgdt and sidt) and those who return a value (sldt, str and smsw).
For the instructions that return a kernel memory address. This is needed as applications such as WineHQ rely on the result being located in the kernel memory space function. The result is emulated as a hard-coded value that, in x86_64, lies in one of the memory holes in the kernel memory map. For i386, the same values are used but truncated to 4 bytes. the location of a dummy variable in the kernel memory space. The limit for the GDT and the IDT are set to zero.
The instructions sldt and str return a segment selector relative to the base address of the global descriptor table. Since the actual address of such table is not revealed, it makes sense to emulate the result as zero.
The instruction smsw is emulated to return the value that the register CR0 has at boot time as set in the head_32/64.S files.
Given that these particular instructions are especially important for virtual-8086 tasks, care is taken to appropriately emulate the results in this particular case: utilize segment selectors to form addresses and parse the contents of the general purpose results in 16-bit addressing encodings.
Lastly, it might be possible that the memory address to which the results must be written is not accessible. In such a case a SIGSEGV signal is generated with error code equal to SEGV_MAPERR.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/umip.h | 15 +++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/umip.c | 251 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 267 insertions(+) create mode 100644 arch/x86/include/asm/umip.h create mode 100644 arch/x86/kernel/umip.c
diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h new file mode 100644 index 0000000..077b236 --- /dev/null +++ b/arch/x86/include/asm/umip.h @@ -0,0 +1,15 @@ +#ifndef _ASM_X86_UMIP_H +#define _ASM_X86_UMIP_H + +#include <linux/types.h> +#include <asm/ptrace.h> + +#ifdef CONFIG_X86_INTEL_UMIP +bool fixup_umip_exception(struct pt_regs *regs); +#else +static inline bool fixup_umip_exception(struct pt_regs *regs) +{ + return false; +} +#endif /* CONFIG_X86_INTEL_UMIP */ +#endif /* _ASM_X86_UMIP_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 581386c..c4aec02 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o +obj-$(CONFIG_X86_INTEL_UMIP) += umip.o
ifdef CONFIG_FRAME_POINTER obj-y += unwind_frame.o diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c new file mode 100644 index 0000000..229f7a0 --- /dev/null +++ b/arch/x86/kernel/umip.c @@ -0,0 +1,251 @@ +/* + * umip.c Emulation for instruction protected by the Intel User-Mode + * Instruction Prevention. The instructions are: + * sgdt + * sldt + * sidt + * str + * smsw + * + * Copyright (c) 2016, Intel Corporation. + * Ricardo Neri ricardo.neri@linux.intel.com + */ + +#include <linux/uaccess.h> +#include <asm/umip.h> +#include <asm/traps.h> +#include <asm/insn.h> +#include <asm/insn-kernel.h> +#include <linux/ratelimit.h> + +/* + * == Base addresses of GDT and IDT + * Some applications to function rely finding the global descriptor table (GDT) + * and the interrupt descriptor table (IDT) in kernel memory. For x86_64 this + * matches a memory hole as detailed in Documentation/x86/x86_64/mm.txt. + * For x86_32, it does not match any particular hole, but it suffices + * to provide a memory location within kernel memory. + * + * == CRO flags for SMSW + * Use the flags given when booting, as found in head_64/32.S + */ + +#define CR0_FLAGS (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \ + X86_CR0_WP | X86_CR0_AM) +#ifdef CONFIG_X86_64 +#define UMIP_DUMMY_GDT_BASE 0xfffffffffffe0000 +#define UMIP_DUMMY_IDT_BASE 0xffffffffffff0000 +#define CR0_STATE (CR0_FLAGS | X86_CR0_PG) +#else +#define UMIP_DUMMY_GDT_BASE 0xfffe0000 +#define UMIP_DUMMY_IDT_BASE 0xffff0000 +#define CR0_STATE CR0_FLAGS +#endif + +/* + * Definitions for x86 page fault error code bits. Only a simple + * pagefault during a write in user context is supported. + */ +#define UMIP_PF_USER BIT(2) +#define UMIP_PF_WRITE BIT(1) + +enum umip_insn { + UMIP_SGDT = 0, /* opcode 0f 01 ModR/M reg 0 */ + UMIP_SIDT, /* opcode 0f 01 ModR/M reg 1 */ + UMIP_SLDT, /* opcode 0f 00 ModR/M reg 0 */ + UMIP_SMSW, /* opcode 0f 01 ModR/M reg 4 */ + UMIP_STR, /* opcode 0f 00 ModR/M reg 1 */ +}; + +static int __identify_insn(struct insn *insn) +{ + /* By getting modrm we also get the opcode. */ + insn_get_modrm(insn); + + /* All the instructions of interest start with 0x0f. */ + if (insn->opcode.bytes[0] != 0xf) + return -EINVAL; + + if (insn->opcode.bytes[1] == 0x1) { + switch (X86_MODRM_REG(insn->modrm.value)) { + case 0: + return UMIP_SGDT; + case 1: + return UMIP_SIDT; + case 4: + return UMIP_SMSW; + default: + return -EINVAL; + } + } else if (insn->opcode.bytes[1] == 0x0) { + if (X86_MODRM_REG(insn->modrm.value) == 0) + return UMIP_SLDT; + else if (X86_MODRM_REG(insn->modrm.value) == 1) + return UMIP_STR; + else + return -EINVAL; + } else { + return -EINVAL; + } +} + +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst, + unsigned char *data, int *data_size) +{ + unsigned long dummy_base_addr; + unsigned short dummy_limit = 0; + unsigned int dummy_value = 0; + + switch (umip_inst) { + /* + * These two instructions return the base address and limit of the + * global and interrupt descriptor table. The base address can be + * 32-bit or 64-bit. Limit is always 16-bit. + */ + case UMIP_SGDT: + case UMIP_SIDT: + if (umip_inst == UMIP_SGDT) + dummy_base_addr = UMIP_DUMMY_GDT_BASE; + else + dummy_base_addr = UMIP_DUMMY_IDT_BASE; + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + /* SGDT and SIDT do not take register as argument. */ + return -EINVAL; + } + /* Fill most significant byte with zeros if 16-bit addressing*/ + if (insn->addr_bytes == 2) + dummy_base_addr &= 0xffffff; + memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr)); + memcpy(data, &dummy_limit, sizeof(dummy_limit)); + *data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit); + break; + case UMIP_SMSW: + dummy_value = CR0_STATE; + /* + * These two instructions return a 16-bit value. We return + * all zeros. This is equivalent to a null descriptor for + * str and sldt. + */ + case UMIP_SLDT: + case UMIP_STR: + /* if operand is a register, it is zero-extended*/ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + memset(data, 0, insn->opnd_bytes); + *data_size = insn->opnd_bytes; + /* if not, only the two least significant bytes are copied */ + } else { + *data_size = 2; + } + memcpy(data, &dummy_value, sizeof(dummy_value)); + break; + default: + return -EINVAL; + } + return 0; +} + +static void __force_sig_info_umip_fault(void __user *address, + struct pt_regs *regs) +{ + siginfo_t info; + struct task_struct *tsk = current; + + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) { + printk_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%lx in %lx\n", + tsk->comm, task_pid_nr(tsk), regs->ip, + regs->sp, UMIP_PF_USER | UMIP_PF_WRITE, regs->ip); + } + + tsk->thread.cr2 = (unsigned long)address; + tsk->thread.error_code = UMIP_PF_USER | UMIP_PF_WRITE; + tsk->thread.trap_nr = X86_TRAP_PF; + + info.si_signo = SIGSEGV; + info.si_errno = 0; + info.si_code = SEGV_MAPERR; + info.si_addr = address; + force_sig_info(SIGSEGV, &info, tsk); +} + +bool fixup_umip_exception(struct pt_regs *regs) +{ + struct insn insn; + unsigned char buf[MAX_INSN_SIZE]; + /* 10 bytes is the maximum size of the result of UMIP instructions */ + unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; +#ifdef CONFIG_X86_64 + int x86_64 = user_64bit_mode(regs); +#else + int x86_64 = 0; +#endif + int not_copied, nr_copied, reg_offset, dummy_data_size; + void __user *uaddr; + unsigned long *reg_addr; + enum umip_insn umip_inst; + + if (v8086_mode(regs)) + /* + * In virtual-8086 mode the segment selector cs does not point + * to a segment descriptor but is used directly to compute the + * address. This is done after shifting it 4 bytes to the left. + * The result is added to the instruction pointer. + */ + not_copied = copy_from_user(buf, + (void __user *)((regs->cs << 4) + + regs->ip), + sizeof(buf)); + else + not_copied = copy_from_user(buf, (void __user *)regs->ip, + sizeof(buf)); + nr_copied = sizeof(buf) - not_copied; + /* + * The copy_from_user above could have failed if user code is protected + * by a memory protection key. Give up on emulation in such a case. + * Should we issue a page fault? + */ + if (!nr_copied) + return false; + insn_init(&insn, buf, nr_copied, x86_64); + /* + * Set address and operand sizes to the default of 16 bits. If + * overrides are found, sizes will be fixed when parsing the instruction + * prefixes. + */ + if (v8086_mode(regs)) { + insn.addr_bytes = 2; + insn.opnd_bytes = 2; + } + insn_get_length(&insn); + if (nr_copied < insn.length) + return false; + + umip_inst = __identify_insn(&insn); + /* Check if we found an instruction protected by UMIP */ + if (umip_inst < 0) + return false; + + if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size)) + return false; + + /* If operand is a register, write directly to it */ + if (X86_MODRM_MOD(insn.modrm.value) == 3) { + reg_offset = insn_get_reg_offset_rm(&insn, regs); + reg_addr = (unsigned long *)((unsigned long)regs + reg_offset); + memcpy(reg_addr, dummy_data, dummy_data_size); + } else { + uaddr = insn_get_addr_ref(&insn, regs); + nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size); + if (nr_copied > 0) { + /* + * If copy fails, send a signal and tell caller that + * fault was fixed up. + */ + __force_sig_info_umip_fault(uaddr, regs); + return true; + } + } + + /* increase IP to let the program keep going */ + regs->ip += insn.length; + return true; +}
On 01/25/17 12:23, Ricardo Neri wrote:
- case UMIP_SMSW:
dummy_value = CR0_STATE;
Unless the user space process is running in 64-bit mode this value should be & 0xffff. I'm not sure if we should even support fixing up UMIP instructions in 64-bit mode.
Also, please put an explicit /* fall through */ here.
- /*
* These two instructions return a 16-bit value. We return
* all zeros. This is equivalent to a null descriptor for
* str and sldt.
*/
- case UMIP_SLDT:
- case UMIP_STR:
/* if operand is a register, it is zero-extended*/
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
memset(data, 0, insn->opnd_bytes);
*data_size = insn->opnd_bytes;
/* if not, only the two least significant bytes are copied */
} else {
*data_size = 2;
}
memcpy(data, &dummy_value, sizeof(dummy_value));
break;
- default:
return -EINVAL;
- }
- return 0;
+bool fixup_umip_exception(struct pt_regs *regs) +{
- struct insn insn;
- unsigned char buf[MAX_INSN_SIZE];
- /* 10 bytes is the maximum size of the result of UMIP instructions */
- unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+#ifdef CONFIG_X86_64
- int x86_64 = user_64bit_mode(regs);
+#else
- int x86_64 = 0;
+#endif
Again, could we simply do:
if (user_64bit_mode(regs)) return false;
or are there known users of these instructions *in 64-bit mode*?
-hpa
On Wed, 2017-01-25 at 12:38 -0800, H. Peter Anvin wrote:
On 01/25/17 12:23, Ricardo Neri wrote:
- case UMIP_SMSW:
dummy_value = CR0_STATE;
Unless the user space process is running in 64-bit mode this value should be & 0xffff.
But wouldn't that prevent the bits CR0[63:16] or CR0[31:16] from being copied when a register operand is used? According to the Intel Software Development manual, SMSW returns
SMSW r16 operand size 16, store CR0[15:0] in r16 SMSW r32 operand size 32, zero-extend CR0[31:0], and store in r32 SMSW r64 operand size 64, zero-extend CR0[63:0], and store in r64
The number of bytes returned by the emulated results is controlled by the data_size variable. If it finds that the result will be saved in a memory location, it will only copy CR0[15:0], which is the expected behavior of SMSW if the result is to be copied in memory.
I'm not sure if we should even support fixing up UMIP instructions in 64-bit mode.
Probably not. The whole point of the emulation was to support virtual-8086 mode and 32-bit mode.
Also, please put an explicit /* fall through */ here.
Will do.
- /*
* These two instructions return a 16-bit value. We return
* all zeros. This is equivalent to a null descriptor for
* str and sldt.
*/
- case UMIP_SLDT:
- case UMIP_STR:
/* if operand is a register, it is zero-extended*/
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
memset(data, 0, insn->opnd_bytes);
*data_size = insn->opnd_bytes;
/* if not, only the two least significant bytes are copied */
} else {
*data_size = 2;
}
memcpy(data, &dummy_value, sizeof(dummy_value));
break;
The code above controls how many bytes are copied as the result of SMSW.
- default:
return -EINVAL;
- }
- return 0;
+bool fixup_umip_exception(struct pt_regs *regs) +{
- struct insn insn;
- unsigned char buf[MAX_INSN_SIZE];
- /* 10 bytes is the maximum size of the result of UMIP instructions */
- unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+#ifdef CONFIG_X86_64
- int x86_64 = user_64bit_mode(regs);
+#else
- int x86_64 = 0;
+#endif
Again, could we simply do:
if (user_64bit_mode(regs)) return false;
or are there known users of these instructions *in 64-bit mode*?
I am not aware of any. In that case, I will make this code return in this case.
Thanks and BR, Ricardo
-hpa
If the User-Mode Instruction Prevention CPU feature is available and enabled, a general protection fault will be issued if the instructions sgdt, sldt, sidt, str or smsw are executed from user-mode context (CPL > 0). If the fault was caused by any of the instructions protected by UMIP, fixup_umip_exception will emulate dummy results for these instructions. If emulation is successful, the result is passed to the user space program and no SIGSEGV signal is emitted.
Please note that fixup_umip_exception also caters for the case when the fault originated while running in virtual-8086 mode.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/kernel/traps.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index bf0c6d0..1da2f8d 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -64,6 +64,7 @@ #include <asm/trace/mpx.h> #include <asm/mpx.h> #include <asm/vm86.h> +#include <asm/umip.h>
#ifdef CONFIG_X86_64 #include <asm/x86_init.h> @@ -491,6 +492,9 @@ do_general_protection(struct pt_regs *regs, long error_code) RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); cond_local_irq_enable(regs);
+ if (user_mode(regs) && (fixup_umip_exception(regs) == true)) + return; + if (v8086_mode(regs)) { local_irq_enable(); handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a bit in %cr4.
It makes sense to enable UMIP at some point while booting, before user spaces come up. Like SMAP and SMEP, is not critical to have it enabled very early during boot. This is because UMIP is relevant only when there is a userspace to be protected from. Given the similarities in relevance, it makes sense to enable UMIP along with SMAP and SMEP.
UMIP is enabled by default. It can be disabled by adding clearcpuid=514 to the kernel parameters.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/Kconfig | 10 ++++++++++ arch/x86/kernel/cpu/common.c | 16 +++++++++++++++- 2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7b6fd68..824c6de 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1733,6 +1733,16 @@ config X86_SMAP
If unsure, say Y.
+config X86_INTEL_UMIP + def_bool y + depends on CPU_SUP_INTEL + prompt "User Mode Instruction Prevention" if EXPERT + ---help--- + The User Mode Instruction Prevention (UMIP) is a security + feature in newer Intel processors. If enabled, a general + protection fault is issued if the instructions SGDT, SLDT, + SIDT, SMSW and STR are executed in user mode. + config X86_INTEL_MPX prompt "Intel MPX (Memory Protection Extensions)" def_bool n diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f74e84e..f5f31c2 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -307,6 +307,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c) } }
+static __always_inline void setup_umip(struct cpuinfo_x86 *c) +{ + if (cpu_feature_enabled(X86_FEATURE_UMIP) && + cpu_has(c, X86_FEATURE_UMIP)) + cr4_set_bits(X86_CR4_UMIP); + else + /* + * Make sure UMIP is disabled in case it was enabled in a + * previous boot (e.g., via kexec). + */ + cr4_clear_bits(X86_CR4_UMIP); +} + /* * Protection Keys are not available in 32-bit mode. */ @@ -1077,9 +1090,10 @@ static void identify_cpu(struct cpuinfo_x86 *c) /* Disable the PN if appropriate */ squash_the_stupid_serial_number(c);
- /* Set up SMEP/SMAP */ + /* Set up SMEP/SMAP/UMIP */ setup_smep(c); setup_smap(c); + setup_umip(c);
/* * The vendor-specific functions might have changed features.
Certain user space programs that run on virtual-8086 mode may utilize instructions protected by the User-Mode Instruction Prevention (UMIP) security feature present in new Intel processors: SGDT, SIDT and SMSW. In such a case, a general protection fault is issued if UMIP is enabled. When such a fault happens, the kernel catches it and emulates the results of these instructions with dummy values. The purpose of this new test is to verify whether the impacted instructions can be executed without causing such #GP. If no #GP exceptions occur, we expect to exit virtual- 8086 mode from INT 0x80.
The instructions protected by UMIP are executed in representative use cases: a) the memory address of the result is given in the form of a displacement from the base of the data segment b) the memory address of the result is given in a general purpose register c) the result is stored directly in a general purpose register.
Unfortunately, it is not possible to check the results against a set of expected values because no emulation will occur in systems that do not have the UMIP feature. Instead, results are printed for verification.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- tools/testing/selftests/x86/entry_from_vm86.c | 39 ++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c index d075ea0..377b773 100644 --- a/tools/testing/selftests/x86/entry_from_vm86.c +++ b/tools/testing/selftests/x86/entry_from_vm86.c @@ -95,6 +95,22 @@ asm ( "int3\n\t" "vmcode_int80:\n\t" "int $0x80\n\t" + "umip:\n\t" + /* addressing via displacements */ + "smsw (2052)\n\t" + "sidt (2054)\n\t" + "sgdt (2060)\n\t" + /* addressing via registers */ + "mov $2066, %bx\n\t" + "smsw (%bx)\n\t" + "mov $2068, %bx\n\t" + "sidt (%bx)\n\t" + "mov $2074, %bx\n\t" + "sgdt (%bx)\n\t" + /* register operands, only for smsw */ + "smsw %ax\n\t" + "mov %ax, (2080)\n\t" + "int $0x80\n\t" ".size vmcode, . - vmcode\n\t" "end_vmcode:\n\t" ".code32\n\t" @@ -103,7 +119,7 @@ asm (
extern unsigned char vmcode[], end_vmcode[]; extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[], - vmcode_sti[], vmcode_int3[], vmcode_int80[]; + vmcode_sti[], vmcode_int3[], vmcode_int80[], umip[];
/* Returns false if the test was skipped. */ static bool do_test(struct vm86plus_struct *v86, unsigned long eip, @@ -218,6 +234,27 @@ int main(void) v86.regs.eax = (unsigned int)-1; do_test(&v86, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
+ /* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */ + do_test(&v86, umip - vmcode, VM86_INTx, 0x80, "UMIP tests"); + printf("[INFO]\tResults of UMIP-protected instructions via displacements:\n"); + printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2052)); + printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2054), + *(unsigned long *)(addr + 2056)); + printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2060), + *(unsigned long *)(addr + 2062)); + printf("[INFO]\tResults of UMIP-protected instructions via addressing in registers:\n"); + printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2066)); + printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2068), + *(unsigned long *)(addr + 2070)); + printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2074), + *(unsigned long *)(addr + 2076)); + printf("[INFO]\tResults of SMSW via register operands:\n"); + printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2080)); + /* Execute a null pointer */ v86.regs.cs = 0; v86.regs.ss = 0;
On 01/25/17 12:23, Ricardo Neri wrote:
- SMSW returns the value with which the CR0 register is programmed in head_32/64.S at boot time. This is, the following bits are enabed: CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for Extension Type, which will always be 1 in recent processors with UMIP; CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment Mask. Additionally, in x86_64, CR0.31 for Paging is set.
SMSW only returns CR0[15:0], so the reference here to CR0[31:16] seems odd.
-hpa
Hi Peter, On Wed, 2017-01-25 at 12:34 -0800, H. Peter Anvin wrote:
On 01/25/17 12:23, Ricardo Neri wrote:
- SMSW returns the value with which the CR0 register is programmed in head_32/64.S at boot time. This is, the following bits are enabed: CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for Extension Type, which will always be 1 in recent processors with UMIP; CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment Mask. Additionally, in x86_64, CR0.31 for Paging is set.
SMSW only returns CR0[15:0], so the reference here to CR0[31:16] seems odd.
I checked again the latest version (from Dec 2016) of the Intel Software Development Manual. The documentation for SMSW states the following:
SMSW r16 operand size 16, store CR0[15:0] in r16 SMSW r32 operand size 32, zero-extend CR0[31:0], and store in r32 SMSW r64 operand size 64, zero-extend CR0[63:0], and store in r64
When the operand is a memory location, yes, it only returns CR0[15:0]
Also, in the tests that I ran I wrote the result of SMSW to a 64-bit register. I get 0x80050033. It seems to me that it does write as many bits as the register operand can hold.
Am I missing something?
Thanks and BR, Ricardo
-hpa