This is v4 of this series. Again, it took me a while to complete the updates as support for 16-bit address encodings for protected mode required extra rework. The two previous submissions can be found here [1], here [2] and here [3].
=== What is UMIP?
User-Mode Instruction Prevention (UMIP) is a security feature present in new Intel Processors. If enabled, it prevents the execution of certain instructions if the Current Privilege Level (CPL) is greater than 0. If these instructions were executed while in CPL > 0, user space applications could have access to system-wide settings such as the global and local descriptor tables, the segment selectors to the current task state and the local descriptor table.
These are the instructions covered by UMIP: * SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register
If any of these instructions is executed with CPL > 0, a general protection exception is issued when UMIP is enabled.
=== How does it impact applications?
There is a caveat, however. Certain applications rely on some of these instructions to function. An example of this are applications that use WineHQ[4]. For instance, these applications rely on sidt returning a non- accessible memory location[5]. During the discussions, it was proposed that the fault could be relied to the user-space and perform the emulation in user-mode. However, this would break existing applications until, for instance, they update to a new WineHQ version. However, this approach would require UMIP to be disabled by default. The consensus in this forum is to always enable it.
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[6] does not support applications that use these instructions. It relies on WineHQ for this [7]. Furthermore, the applications for which the concern was raised run in protected mode [5].
=== How are UMIP-protected instructions emulated?
This version keeps UMIP enabled at all times and by default. If a general protection fault caused by the instructions protected by UMIP is detected, such fault will be fixed-up by returning dummy values as follows:
* SGDT and SIDT return hard-coded dummy values as the base of the global descriptor and interrupt descriptor tables. These hard-coded values correspond to memory addresses that are near the end of the kernel memory map. This is also the case for virtual-8086 mode tasks. In all my experiments in x86_32, the base of GDT and IDT was always a 4-byte address, even for 16-bit operands. Thus, my emulation code does the same. In all cases, the limit of the table is set to 0. * STR and SLDT return 0 as the segment selector. This looks appropriate since we are providing a dummy value as the base address of the global descriptor table. * SMSW returns the value with which the CR0 register is programmed in head_32/64.S at boot time. This is, the following bits are enabed: CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for Extension Type, which will always be 1 in recent processors with UMIP; CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment Mask. As per the Intel 64 and IA-32 Architectures Software Developer's Manual, SMSW returns a 16-bit results for memory operands. However, when the operand is a register, the results can be up to CR0[63:0]. Since the emulation code only kicks-in in x86_32, we return up to CR[31:0]. * The proposed emulation code is handles faults that happens in both protected and virtual-8086 mode.
=== How is this series laid out?
++ Fix bugs in MPX address evaluator I found very useful the code for Intel MPX (Memory Protection Extensions) used to parse opcodes and the memory locations contained in the general purpose registers when used as operands. I put some of this code in a separate library file that both MPX and UMIP can access and avoid code duplication. Before creating the new library, I fixed a couple of bugs that I found in how MPX determines the address contained in the instruction and operands.
++ Provide a new x86 instruction evaluating library With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c library. The basic functionality of this library is extended to obtain the segment descriptor selected by either segment override prefixes or the default segment by the involved registers in the calculation of the effective address. It was also extended to obtain the default address and operand sizes as well as the segment base address. Also, support to process 16-bit address encodings. Armed with this arsenal, it is now possible to determine the linear address onto which the emulated results shall be copied.
This code supports Normal 32-bit and 64-bit (i.e., __USER32_CS and/or __USER_CS) protected mode, virtual-8086 mode, 16-bit protected mode with 32-bit base address.
++ Emulate UMIP instructions A new fixup_umip_exception functions inspect the instruction at the instruction pointer. If it is an UMIP-protected instruction, it executes the emulation code. This uses all the address-computing code of the previous section.
++ Add self-tests Lastly, self-tests are added to entry_from_v86.c to exercise the most typical use cases of UMIP-protected instructions in a virtual-8086 mode. Extensive tests were performed to test all the combinations of ModRM, SiB and displacements for 16-bit and 32-bit encodings for the ss, ds, es and fs segments. This was done for both virtual-8086 and protected mode. Code of these tests can be found here [8].
[1]. https://lwn.net/Articles/705877/ [2]. https://lkml.org/lkml/2016/12/23/265 [3]. https://lkml.org/lkml/2017/1/25/622 [4]. https://www.winehq.org/ [5]. https://www.winehq.org/pipermail/wine-devel/2016-November/115320.html [6]. http://www.dosemu.org/ [7]. http://marc.info/?l=linux-kernel&m=147876798717927&w=2 [8]. https://github.com/01org/luv-yocto/blob/rneri/umip/meta-luv/recipes-core/umi...
Thanks and BR, Ricardo
Changes since V3: * Limited emulation to 32-bit and 16-bit modes. For 64-bit mode, a general protection fault is still issued when UMIP-protected instructions are executed with CPL > 0. * Expanded instruction-evaluating code to obtain segment descriptor along with their attributes such as base address and default address and operand sizes. Also, support for 16-bit encodings in protected mode was implemented. * When getting a segment descriptor, this include support to obtain those of a local descriptor table. * Now the instruction-evaluating code returns -EDOM when the value of registers should not be used in calculating the effective address. The value -EINVAL is left for errors. * Incorporate the value of the segment base address in the computation of linear addresses. * Renamed new instruction evaluation library from insn-kernel.c to insn-eval.c * Exported functions insn_get_reg_offset_* to obtain the register offset by ModRM r/m, SiB base and SiB index. * Improved documentation of functions. * Split patches further for easier review.
Changes since V2: * Added new utility functions to decode the memory addresses contained in registers when the 16-bit addressing encodings are used. This includes code to obtain and compute memory addresses using segment selectors for real-mode address translation. * Added support to emulate UMIP-protected instructions for virtual-8086 tasks. * Added self-tests for virtual-8086 mode that contains representative use cases: address represented as a displacement, address in registers and registers as operands. * Instead of maintaining a static variable for the dummy base addresses of the IDT and GDT, a hard-coded value is used. * The emulated SMSW instructions now return the value with which the CR0 register is programmed in head_32/64.S This is: PE | MP | ET | NE | WP | AM. For x86_64, PG is also enabled. * The new file arch/x86/lib/insn-utils.c is now renamed as arch/x86/lib/ insn-kernel.c. It also has its own header. This helps keep in sync the the kernel and objtool instruction decoders. Also, the new insn-kernel.c contains utility functions that are only relevant in a kernel context. * Removed printed warnings for errors that occur when decoding instructions with invalid operands. * Added more comments on fixes in the instruction-decoding MPX functions. * Now user_64bit_mode(regs) is used instead of test_thread_flag(TIF_IA32) to determine if the task is 32-bit or 64-bit. * Found and fixed a bug in insn-decoder in which X86_MODRM_RM was incorrectly used to obtain the mod part of the ModRM byte. * Added more explanatory code in emulation and instruction decoding code. This includes a comment regarding that copy_from_user could fail if there exists a memory protection key in place. * Tested code with CONFIG_X86_DECODER_SELFTEST=y and everything passes now. * Prefixed get_reg_offset_rm with insn_ as this function is exposed via a header file. For clarity, this function was added in a separate patch.
Changes since V1: * Virtual-8086 mode tasks are not treated in a special manner. All code for this purpose was removed. * Instead of attempting to disable UMIP during a context switch or when entering virtual-8086 mode, UMIP remains enabled all the time. General protection faults that occur are fixed-up by returning dummy values as detailed above. * Removed umip= kernel parameter in favor of using clearcpuid=514 to disable UMIP. * Removed selftests designed to detect the absence of SIGSEGV signals when running in virtual-8086 mode. * Reused code from MPX to decode instructions operands. For this purpose code was put in a common location. * Fixed two bugs in MPX code that decodes operands.
Ricardo Neri (17): x86/mpx: Do not use SIB index if index points to R/ESP x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0 x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel x86/insn-eval: Add utility functions to get register offsets x86/insn-eval: Add utility function to get segment selector x86/insn-eval: Add utility function to get segment descriptor x86/insn-eval: Add utility function to get segment descriptor base address x86/insn-eval: Add functions to get default operand and address sizes x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero insn/eval: Incorporate segment base in address computation x86/insn-eval: Add support to resolve 16-bit addressing encodings x86/cpufeature: Add User-Mode Instruction Prevention definitions x86: Add emulation code for UMIP instructions x86/umip: Force a page fault when unable to copy emulated result to user x86/traps: Fixup general protection faults caused by UMIP x86: Enable User-Mode Instruction Prevention selftests/x86: Add tests for User-Mode Instruction Prevention
arch/x86/Kconfig | 10 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/insn-eval.h | 23 + arch/x86/include/asm/umip.h | 15 + arch/x86/include/uapi/asm/processor-flags.h | 2 + arch/x86/kernel/Makefile | 1 + arch/x86/kernel/cpu/common.c | 16 +- arch/x86/kernel/traps.c | 4 + arch/x86/kernel/umip.c | 303 +++++++++++ arch/x86/lib/Makefile | 2 +- arch/x86/lib/insn-eval.c | 742 ++++++++++++++++++++++++++ arch/x86/mm/mpx.c | 120 +---- tools/testing/selftests/x86/entry_from_vm86.c | 39 +- 14 files changed, 1164 insertions(+), 122 deletions(-) create mode 100644 arch/x86/include/asm/insn-eval.h create mode 100644 arch/x86/include/asm/umip.h create mode 100644 arch/x86/kernel/umip.c create mode 100644 arch/x86/lib/insn-eval.c
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when memory addressing is used (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of the SIB byte points to the R/ESP (i.e., index = 4), the index should not be used in the computation of the memory address.
In these cases the address is simply the value present in the register pointed by the base part of the SIB byte plus the displacement byte.
An example of such instruction could be
insn -0x80(%rsp)
This is represented as:
[opcode] 4c 23 80
ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP) SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX): Displacement -0x80
The correct address is (base) + displacement; no index is used.
We can achieve the desired effect of not using the index by making get_reg_offset return -EDOM in this particular case. This value indicates callers that they should not use the index to calculate the address. EINVAL continues to indicate that an error when decoding the SIB byte.
Care is taken to allow R12 to be used as index, which is a valid scenario.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/mm/mpx.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 86c2d96..6a034bc 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, regno = X86_SIB_INDEX(insn->sib.value); if (X86_REX_X(insn->rex_prefix.value)) regno += 8; + /* + * If mod !=3, register R/ESP (regno=4) is not used as index in + * the address computation. Check is done after looking at REX.X + * This is because R12 (regno=12) can be used as an index. + */ + if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3) + return -EDOM; break;
case REG_TYPE_BASE: @@ -158,11 +165,20 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) goto out_err;
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); + /* + * A negative offset generally means a error, except + * -EDOM, which means that the contents of the register + * should not be used as index. + */ if (indx_offset < 0) - goto out_err; + if (indx_offset == -EDOM) + indx = 0; + else + goto out_err; + else + indx = regs_get_register(regs, indx_offset);
base = regs_get_register(regs, base_offset); - indx = regs_get_register(regs, indx_offset); addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
On Wed, Feb 22, 2017 at 10:36:50PM -0800, Ricardo Neri wrote:
/*
* A negative offset generally means a error, except
* -EDOM, which means that the contents of the register
* should not be used as index.
*/ if (indx_offset < 0)
goto out_err;
if (indx_offset == -EDOM)
indx = 0;
else
goto out_err;
else
indx = regs_get_register(regs, indx_offset);
Kernel coding style requires more brackets than are strictly required by C, any block longer than 1 line needs then. Also, if one leg of a conditional needs them, then they should be on both legs.
Your code has many such instances, please change them all.
On 23/02/17 07:24, Peter Zijlstra wrote:
On Wed, Feb 22, 2017 at 10:36:50PM -0800, Ricardo Neri wrote:
/*
* A negative offset generally means a error, except
* -EDOM, which means that the contents of the register
* should not be used as index.
*/ if (indx_offset < 0)
goto out_err;
if (indx_offset == -EDOM)
indx = 0;
else
goto out_err;
else
indx = regs_get_register(regs, indx_offset);
Kernel coding style requires more brackets than are strictly required by C, any block longer than 1 line needs then. Also, if one leg of a conditional needs them, then they should be on both legs.
Your code has many such instances, please change them all.
This was one issue raised with the OpenSSL review where a lack of brackets made some coding errors less obvious and in the changes was down as applying "KNF format the whole source tree"
Probably the easiest fix is to use some tool like "indent --linux-style" though I am not completely sure that option is perfect.
Certainly if you have code with an odd mix of styles it is much harder to read, and ultimately source code is for *humans* to understand. So enforcing a consistent style, even if it is not your own style, makes it much easier to follow!
Just my 2p worth.
Regard, Paul
On Thu, 2017-02-23 at 08:24 +0100, Peter Zijlstra wrote:
On Wed, Feb 22, 2017 at 10:36:50PM -0800, Ricardo Neri wrote:
/*
* A negative offset generally means a error, except
* -EDOM, which means that the contents of the register
* should not be used as index.
*/ if (indx_offset < 0)
goto out_err;
if (indx_offset == -EDOM)
indx = 0;
else
goto out_err;
else
indx = regs_get_register(regs, indx_offset);
Kernel coding style requires more brackets than are strictly required by C, any block longer than 1 line needs then. Also, if one leg of a conditional needs them, then they should be on both legs.
Your code has many such instances, please change them all.
Will do. Sorry for the noise. These instances escaped the checkpatch script.
Thanks and BR, Ricardo
On Thu, 2017-02-23 at 14:17 -0800, Ricardo Neri wrote:
On Thu, 2017-02-23 at 08:24 +0100, Peter Zijlstra wrote:
On Wed, Feb 22, 2017 at 10:36:50PM -0800, Ricardo Neri wrote:
/*
* A negative offset generally means a error, except
* -EDOM, which means that the contents of the register
* should not be used as index.
*/ if (indx_offset < 0)
goto out_err;
if (indx_offset == -EDOM)
indx = 0;
else
goto out_err;
else
indx = regs_get_register(regs, indx_offset);
Kernel coding style requires more brackets than are strictly required by C, any block longer than 1 line needs then. Also, if one leg of a conditional needs them, then they should be on both legs.
Your code has many such instances, please change them all.
Will do. Sorry for the noise. These instances escaped the checkpatch script.
Also, this code would read better with the inner test reversed or done first
if (indx_offset < 0) { if (indx_offset != -EDOM) goto out_err; indx = 0; } else { indx = regs_get_register(etc...) }
or if (indx_offset == -EDOM) indx = 0; else if (indx_offset < 0) goto err; else indx = regs_get_register(etc...)
The compiler should generate the same code in any case, but either could improve reader understanding.
On Thu, 2017-02-23 at 18:33 -0800, Joe Perches wrote:
On Thu, 2017-02-23 at 14:17 -0800, Ricardo Neri wrote:
On Thu, 2017-02-23 at 08:24 +0100, Peter Zijlstra wrote:
On Wed, Feb 22, 2017 at 10:36:50PM -0800, Ricardo Neri wrote:
/*
* A negative offset generally means a error, except
* -EDOM, which means that the contents of the register
* should not be used as index.
*/ if (indx_offset < 0)
goto out_err;
if (indx_offset == -EDOM)
indx = 0;
else
goto out_err;
else
indx = regs_get_register(regs, indx_offset);
Kernel coding style requires more brackets than are strictly required by C, any block longer than 1 line needs then. Also, if one leg of a conditional needs them, then they should be on both legs.
Your code has many such instances, please change them all.
Will do. Sorry for the noise. These instances escaped the checkpatch script.
Also, this code would read better with the inner test reversed or done first
if (indx_offset < 0) { if (indx_offset != -EDOM) goto out_err; indx = 0; } else { indx = regs_get_register(etc...) }
or if (indx_offset == -EDOM) indx = 0; else if (indx_offset < 0) goto err; else indx = regs_get_register(etc...)
The compiler should generate the same code in any case, but either could improve reader understanding.
I agree! I will change it with your clever solution.
Thanks and BR, Ricardo
On Feb 23, 2017 9:33 PM, "Joe Perches" joe@perches.com wrote:
On Thu, 2017-02-23 at 14:17 -0800, Ricardo Neri wrote:
On Thu, 2017-02-23 at 08:24 +0100, Peter Zijlstra wrote:
On Wed, Feb 22, 2017 at 10:36:50PM -0800, Ricardo Neri wrote:
/*
* A negative offset generally means a error, except
* -EDOM, which means that the contents of the
register
* should not be used as index.
*/ if (indx_offset < 0)
goto out_err;
if (indx_offset == -EDOM)
indx = 0;
else
goto out_err;
else
indx = regs_get_register(regs, indx_offset);
Kernel coding style requires more brackets than are strictly required by C, any block longer than 1 line needs then. Also, if one leg of a conditional needs them, then they should be on both legs.
Your code has many such instances, please change them all.
Will do. Sorry for the noise. These instances escaped the checkpatch script.
Also, this code would read better with the inner test reversed or done first
if (indx_offset < 0) { if (indx_offset != -EDOM) goto out_err; indx = 0; } else { indx = regs_get_register(etc...) }
or if (indx_offset == -EDOM) indx = 0; else if (indx_offset < 0) goto err;
Or goto out_err;
else indx = regs_get_register(etc...)
The compiler should generate the same code in any case, but either could improve reader understanding.
Also, it may be a tweak more efficient to handle the most likely runtime case in the conditional stack first (whichever that may be).
On Fri, 2017-02-24 at 09:47 -0500, Nathan Howard wrote:
Also, this code would read better with the inner test reversed or done first if (indx_offset < 0) { if (indx_offset != -EDOM) goto out_err; indx = 0; } else { indx = regs_get_register(etc...) } or if (indx_offset == -EDOM) indx = 0; else if (indx_offset < 0) goto err;
Or goto out_err;
else indx = regs_get_register(etc...) The compiler should generate the same code in any case, but either could improve reader understanding.
Also, it may be a tweak more efficient to handle the most likely runtime case in the conditional stack first (whichever that may be).
The most likely case will be a positive value but I need to check for negatives first :( I could wrap the first conditional in an "unlikely".
Thanks and BR, Ricardo
On Thu, Feb 23, 2017 at 9:33 PM, Joe Perches joe@perches.com wrote:
On Thu, 2017-02-23 at 14:17 -0800, Ricardo Neri wrote:
On Thu, 2017-02-23 at 08:24 +0100, Peter Zijlstra wrote:
On Wed, Feb 22, 2017 at 10:36:50PM -0800, Ricardo Neri wrote:
/*
* A negative offset generally means a error, except
* -EDOM, which means that the contents of the register
* should not be used as index.
*/ if (indx_offset < 0)
goto out_err;
if (indx_offset == -EDOM)
indx = 0;
else
goto out_err;
else
indx = regs_get_register(regs, indx_offset);
Kernel coding style requires more brackets than are strictly required by C, any block longer than 1 line needs then. Also, if one leg of a conditional needs them, then they should be on both legs.
Your code has many such instances, please change them all.
Will do. Sorry for the noise. These instances escaped the checkpatch script.
Also, this code would read better with the inner test reversed or done first
if (indx_offset < 0) { if (indx_offset != -EDOM) goto out_err; indx = 0; } else { indx = regs_get_register(etc...) }
or if (indx_offset == -EDOM) indx = 0; else if (indx_offset < 0) goto err;
Or, goto out_err;
else indx = regs_get_register(etc...)
The compiler should generate the same code in any case, but either could improve reader understanding.
Also, it may be a tweak more efficient to handle the most likely runtime case in the conditional stack first (whichever that may be).
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when a SIB byte is used and the base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part of the ModRM byte is zero, the value of such register will not be used as part of the address computation. To signal this, a -EDOM error is returned to indicate callers that they should ignore the value.
Also, for this particular case, a displacement of 32-bits should follow the SIB byte if the mod part of ModRM is equal to zero. The instruction decoder ensures that this is the case.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/mm/mpx.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 6a034bc..f660ddf 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
case REG_TYPE_BASE: regno = X86_SIB_BASE(insn->sib.value); + /* + * If mod is 0 and register R/EBP (regno=5) is indicated in the + * base part of the SIB byte, the value of such register should + * not be used in the address computation. Also, a 32-bit + * displacement is expected in this case; the instruction + * decoder takes care of it. This is true for both R13 and + * R/EBP as REX.B will not be decoded. + */ + if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) + return -EDOM; + if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break; @@ -160,16 +171,22 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) addr = regs_get_register(regs, addr_offset); } else { if (insn->sib.nbytes) { + /* + * Negative values in the base and index offset means + * an error when decoding the SIB byte. Except -EDOM, + * which means that the registers should not be used + * in the address computation. + */ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); if (base_offset < 0) - goto out_err; + if (base_offset == -EDOM) + base = 0; + else + goto out_err; + else + base = regs_get_register(regs, base_offset);
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - /* - * A negative offset generally means a error, except - * -EDOM, which means that the contents of the register - * should not be used as index. - */ if (indx_offset < 0) if (indx_offset == -EDOM) indx = 0; @@ -178,7 +195,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) else indx = regs_get_register(regs, indx_offset);
- base = regs_get_register(regs, base_offset); addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
Other kernel submodules can benefit from using the utility functions defined in mpx.c to obtain the addresses and values of operands contained in the general purpose registers. An instance of this is the emulation code used for instructions protected by the Intel User-Mode Instruction Prevention feature.
Thus, these functions are relocated to a new insn-eval.c file. The reason to not relocate these utilities into insn.c is that the latter solely analyses instructions given by a struct insn without any knowledge of the meaning of the values of instruction operands. This new utility insn- eval.c aims to be used to resolve effective and linear addresses based on the contents of the instruction operands as well as the contents of the struct pt_regs.
These utilities come with a separate header. This is to avoid taking insn.c out of sync from the instructions decoders under tools/obj and tools/perf. This also avoids adding cumbersome #ifdef's for the #include'd files required to decode instructions in a kernel context.
Functions are simply relocated. There are not functional or indentation changes.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-eval.h | 16 ++++ arch/x86/lib/Makefile | 2 +- arch/x86/lib/insn-eval.c | 160 +++++++++++++++++++++++++++++++++++++++ arch/x86/mm/mpx.c | 152 +------------------------------------ 4 files changed, 179 insertions(+), 151 deletions(-) create mode 100644 arch/x86/include/asm/insn-eval.h create mode 100644 arch/x86/lib/insn-eval.c
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h new file mode 100644 index 0000000..5cab1b1 --- /dev/null +++ b/arch/x86/include/asm/insn-eval.h @@ -0,0 +1,16 @@ +#ifndef _ASM_X86_INSN_EVAL_H +#define _ASM_X86_INSN_EVAL_H +/* + * A collection of utility functions for x86 instruction analysis to be + * used in a kernel context. Useful when, for instance, making sense + * of the registers indicated by operands. + */ + +#include <linux/compiler.h> +#include <linux/bug.h> +#include <linux/err.h> +#include <asm/ptrace.h> + +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); + +#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 34a7413..675d7b0 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += memcpy_$(BITS).o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o -lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o +lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c new file mode 100644 index 0000000..2ebfaa4 --- /dev/null +++ b/arch/x86/lib/insn-eval.c @@ -0,0 +1,160 @@ +/* + * Utility functions for x86 operand and address decoding + * + * Copyright (C) Intel Corporation 2016 + */ +#include <linux/kernel.h> +#include <linux/string.h> +#include <asm/inat.h> +#include <asm/insn.h> +#include <asm/insn-eval.h> + +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static int get_reg_offset(struct insn *insn, struct pt_regs *regs, + enum reg_type type) +{ + int regno = 0; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + int nr_registers = ARRAY_SIZE(regoff); + /* + * Don't possibly decode a 32-bit instructions as + * reading a 64-bit-only register. + */ + if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64) + nr_registers -= 8; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(insn->modrm.value); + if (X86_REX_B(insn->rex_prefix.value)) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(insn->sib.value); + if (X86_REX_X(insn->rex_prefix.value)) + regno += 8; + /* + * If mod !=3, register R/ESP (regno=4) is not used as index in + * the address computation. Check is done after looking at REX.X + * This is because R12 (regno=12) can be used as an index. + */ + if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3) + return -EDOM; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(insn->sib.value); + /* + * If mod is 0 and register R/EBP (regno=5) is indicated in the + * base part of the SIB byte, the value of such register should + * not be used in the address computation. Also, a 32-bit + * displacement is expected in this case; the instruction + * decoder takes care of it. This is true for both R13 and + * R/EBP as REX.B will not be decoded. + */ + if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) + return -EDOM; + + if (X86_REX_B(insn->rex_prefix.value)) + regno += 8; + break; + + default: + pr_err("invalid register type"); + BUG(); + break; + } + + if (regno >= nr_registers) { + WARN_ONCE(1, "decoded an instruction with an invalid register"); + return -EINVAL; + } + return regoff[regno]; +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +{ + unsigned long addr, base, indx; + int addr_offset, base_offset, indx_offset; + insn_byte_t sib; + + insn_get_modrm(insn); + insn_get_sib(insn); + sib = insn->sib.value; + + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset < 0) + goto out_err; + addr = regs_get_register(regs, addr_offset); + } else { + if (insn->sib.nbytes) { + /* + * Negative values in the base and index offset means + * an error when decoding the SIB byte. Except -EDOM, + * which means that the registers should not be used + * in the address computation. + */ + + base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); + if (base_offset < 0) + if (base_offset == -EDOM) + base = 0; + else + goto out_err; + else + base = regs_get_register(regs, base_offset); + + indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); + if (indx_offset < 0) + if (indx_offset == -EDOM) + indx = 0; + else + goto out_err; + else + indx = regs_get_register(regs, indx_offset); + + addr = base + indx * (1 << X86_SIB_SCALE(sib)); + } else { + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset < 0) + goto out_err; + addr = regs_get_register(regs, addr_offset); + } + addr += insn->displacement.value; + } + return (void __user *)addr; +out_err: + return (void __user *)-1; +} diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index f660ddf..55b9c47 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -12,6 +12,7 @@ #include <linux/sched/sysctl.h>
#include <asm/insn.h> +#include <asm/insn-eval.h> #include <asm/mman.h> #include <asm/mmu_context.h> #include <asm/mpx.h> @@ -60,155 +61,6 @@ static unsigned long mpx_mmap(unsigned long len) return addr; }
-enum reg_type { - REG_TYPE_RM = 0, - REG_TYPE_INDEX, - REG_TYPE_BASE, -}; - -static int get_reg_offset(struct insn *insn, struct pt_regs *regs, - enum reg_type type) -{ - int regno = 0; - - static const int regoff[] = { - offsetof(struct pt_regs, ax), - offsetof(struct pt_regs, cx), - offsetof(struct pt_regs, dx), - offsetof(struct pt_regs, bx), - offsetof(struct pt_regs, sp), - offsetof(struct pt_regs, bp), - offsetof(struct pt_regs, si), - offsetof(struct pt_regs, di), -#ifdef CONFIG_X86_64 - offsetof(struct pt_regs, r8), - offsetof(struct pt_regs, r9), - offsetof(struct pt_regs, r10), - offsetof(struct pt_regs, r11), - offsetof(struct pt_regs, r12), - offsetof(struct pt_regs, r13), - offsetof(struct pt_regs, r14), - offsetof(struct pt_regs, r15), -#endif - }; - int nr_registers = ARRAY_SIZE(regoff); - /* - * Don't possibly decode a 32-bit instructions as - * reading a 64-bit-only register. - */ - if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64) - nr_registers -= 8; - - switch (type) { - case REG_TYPE_RM: - regno = X86_MODRM_RM(insn->modrm.value); - if (X86_REX_B(insn->rex_prefix.value)) - regno += 8; - break; - - case REG_TYPE_INDEX: - regno = X86_SIB_INDEX(insn->sib.value); - if (X86_REX_X(insn->rex_prefix.value)) - regno += 8; - /* - * If mod !=3, register R/ESP (regno=4) is not used as index in - * the address computation. Check is done after looking at REX.X - * This is because R12 (regno=12) can be used as an index. - */ - if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3) - return -EDOM; - break; - - case REG_TYPE_BASE: - regno = X86_SIB_BASE(insn->sib.value); - /* - * If mod is 0 and register R/EBP (regno=5) is indicated in the - * base part of the SIB byte, the value of such register should - * not be used in the address computation. Also, a 32-bit - * displacement is expected in this case; the instruction - * decoder takes care of it. This is true for both R13 and - * R/EBP as REX.B will not be decoded. - */ - if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) - return -EDOM; - - if (X86_REX_B(insn->rex_prefix.value)) - regno += 8; - break; - - default: - pr_err("invalid register type"); - BUG(); - break; - } - - if (regno >= nr_registers) { - WARN_ONCE(1, "decoded an instruction with an invalid register"); - return -EINVAL; - } - return regoff[regno]; -} - -/* - * return the address being referenced be instruction - * for rm=3 returning the content of the rm reg - * for rm!=3 calculates the address using SIB and Disp - */ -static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) -{ - unsigned long addr, base, indx; - int addr_offset, base_offset, indx_offset; - insn_byte_t sib; - - insn_get_modrm(insn); - insn_get_sib(insn); - sib = insn->sib.value; - - if (X86_MODRM_MOD(insn->modrm.value) == 3) { - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) - goto out_err; - addr = regs_get_register(regs, addr_offset); - } else { - if (insn->sib.nbytes) { - /* - * Negative values in the base and index offset means - * an error when decoding the SIB byte. Except -EDOM, - * which means that the registers should not be used - * in the address computation. - */ - base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); - if (base_offset < 0) - if (base_offset == -EDOM) - base = 0; - else - goto out_err; - else - base = regs_get_register(regs, base_offset); - - indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - if (indx_offset < 0) - if (indx_offset == -EDOM) - indx = 0; - else - goto out_err; - else - indx = regs_get_register(regs, indx_offset); - - addr = base + indx * (1 << X86_SIB_SCALE(sib)); - } else { - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) - goto out_err; - addr = regs_get_register(regs, addr_offset); - } - addr += insn->displacement.value; - } - return (void __user *)addr; -out_err: - return (void __user *)-1; -} - static int mpx_insn_decode(struct insn *insn, struct pt_regs *regs) { @@ -321,7 +173,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) info->si_signo = SIGSEGV; info->si_errno = 0; info->si_code = SEGV_BNDERR; - info->si_addr = mpx_get_addr_ref(&insn, regs); + info->si_addr = insn_get_addr_ref(&insn, regs); /* * We were not able to extract an address from the instruction, * probably because there was something invalid in it.
Hi Ricardo,
[auto build test ERROR on tip/auto-latest] [also build test ERROR on v4.10 next-20170223] [cannot apply to tip/x86/core] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Ricardo-Neri/x86-Enable-User-Mode-I... config: x86_64-randconfig-ne0-02231751 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64
Note: the linux-review/Ricardo-Neri/x86-Enable-User-Mode-Instruction-Prevention/20170223-145356 HEAD 34056233e7c134c8c4c6d8308592d2e0aed67f1b builds fine. It only hurts bisectibility.
All errors (new ones prefixed by >>):
arch/x86/lib/insn-eval.c:106:21: error: static declaration of 'insn_get_addr_ref' follows non-static declaration
static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) ^~~~~~~~~~~~~~~~~ In file included from arch/x86/lib/insn-eval.c:10:0: arch/x86/include/asm/insn-eval.h:14:14: note: previous declaration of 'insn_get_addr_ref' was here void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); ^~~~~~~~~~~~~~~~~ arch/x86/lib/insn-eval.c:106:21: warning: 'insn_get_addr_ref' defined but not used [-Wunused-function] static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) ^~~~~~~~~~~~~~~~~
vim +/insn_get_addr_ref +106 arch/x86/lib/insn-eval.c
100 101 /* 102 * return the address being referenced be instruction 103 * for rm=3 returning the content of the rm reg 104 * for rm!=3 calculates the address using SIB and Disp 105 */
106 static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
107 { 108 unsigned long addr, base, indx; 109 int addr_offset, base_offset, indx_offset;
--- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Thu, 2017-02-23 at 18:54 +0800, kbuild test robot wrote:
arch/x86/lib/insn-eval.c:106:21: error: static declaration of
'insn_get_addr_ref' follows non-static declaration static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) ^~~~~~~~~~~~~~~~~ In file included from arch/x86/lib/insn-eval.c:10:0: arch/x86/include/asm/insn-eval.h:14:14: note: previous declaration of 'insn_get_addr_ref' was here void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); ^~~~~~~~~~~~~~~~~ arch/x86/lib/insn-eval.c:106:21: warning: 'insn_get_addr_ref' defined but not used [-Wunused-function] static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) ^~~~~~~~~~~~~~~~~
vim +/insn_get_addr_ref +106 arch/x86/lib/insn-eval.c
Uh! This escaped my tests after many rebases. I will correct it.
Thanks and BR, Ricardo
The function insn_get_reg_offset takes as argument an enumeration that indicates the type of offset that is returned: the R/M part of the ModRM byte, the index of the SIB byte or the base of the SIB byte. Callers of this function would need the definition of such enumeration. This is not needed. Instead, helper functions can be defined for this purpose can be added. These functions are useful in cases when, for instance, the caller needs to decide whether the operand is a register or a memory location by looking at the mod part of the ModRM byte.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-eval.h | 3 +++ arch/x86/lib/insn-eval.c | 51 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 5cab1b1..754211b 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -12,5 +12,8 @@ #include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 2ebfaa4..6c62fbf 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+/** + * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte + * @insn: Instruction structure containing the ModRM byte + * @regs: Set of registers indicated by the ModRM byte + * + * Obtain the register indicated by the r/m part of the ModRM byte. The + * register is obtained as an offset from the base of pt_regs. In specific + * cases, the returned value can be -EDOM to indicate that the particular value + * of ModRM does not refer to a register. + * + * Return: Register indicated by r/m, as an offset within struct pt_regs + */ +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs) +{ + return get_reg_offset(insn, regs, REG_TYPE_RM); +} + +/** + * insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte + * @insn: Instruction structure containing the SiB byte + * @regs: Set of registers indicated by the SiB byte + * + * Obtain the register indicated by the base part of the SiB byte. The + * register is obtained as an offset from the base of pt_regs. In specific + * cases, the returned value can be -EDOM to indicate that the particular value + * of SiB does not refer to a register. + * + * Return: Register indicated by SiB's base, as an offset within struct pt_regs + */ +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs) +{ + return get_reg_offset(insn, regs, REG_TYPE_BASE); +} + +/** + * insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte + * @insn: Instruction structure containing the SiB byte + * @regs: Set of registers indicated by the SiB byte + * + * Obtain the register indicated by the index part of the SiB byte. The + * register is obtained as an offset from the index of pt_regs. In specific + * cases, the returned value can be -EDOM to indicate that the particular value + * of SiB does not refer to a register. + * + * Return: Register indicated by SiB's base, as an offset within struct pt_regs + */ +int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) +{ + return get_reg_offset(insn, regs, REG_TYPE_INDEX); +} + /* * return the address being referenced be instruction * for rm=3 returning the content of the rm reg
When computing a linear address and segmentation is used, we need to know the base address of the segment involved in the computation. In most of the cases, it will be sufficient to use USER_DS, which has a base of 0. However, it may be possible that a user space program defines its own segments via a local descriptor table. Thus, the base address of the segment is needed.
The segment selector to be used when computing a linear address is determined by any either segment select override prefixes in the instruction or the registers involved in the computation of the effective address; in that order. Also, there are cases when the overrides shall be ignored.
This function can be used in both protected mode and virtual-8086 mode. When in protected mode, the segment selector is obtained from the pt_regs structure. When in virtual-8086 mode, data segments are obtained from the kernel_vm86_regs.
When in CONFIG_X86_64, selectors for data segments are absent from pt_regs. Hence, the returned selector is zero to signal that segmentation is not in use.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 163 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 6c62fbf..516902e 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -8,6 +8,7 @@ #include <asm/inat.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/vm86.h>
enum reg_type { REG_TYPE_RM = 0, @@ -15,6 +16,168 @@ enum reg_type { REG_TYPE_BASE, };
+/** + * get_segment_selector() - obtain segment selector + * @regs: Set of registers containing the segment selector + * @insn: Instruction structure with selector override prefixes + * @regoff: Operand offset, in pt_regs, of which the selector is needed + * + * The segment selector to which an effective address refers depends on + * a) segment selector overrides instruction prefixes or b) the operand + * register indicated in the ModRM or SiB byte. + * + * For case a), the function inspects any prefixes in the insn instruction; + * insn can be null to indicate that selector override prefixes shall be + * ignored. This is useful when the use of prefixes is forbidden (e.g., + * obtaining the code selector). For case b), the operand register shall be + * represented as the offset from the base address of pt_regs. Also, regoff + * can be -EINVAL for cases in which registers are not used as operands (e.g., + * when the mod and r/m parts of the ModRM byte are 0 and 5, respectively). + * + * The returned segment selector is obtained from the regs structure. Both + * protected and virtual-8086 modes are supported. In virtual-8086 mode, + * data segments are obtained from the kernel_vm86_regs structure. + * For CONFIG_X86_64, the returned segment selector is null if such selector + * refers to es, fs or gs. + * + * Return: Value of the segment selector + */ +static unsigned short get_segment_selector(struct pt_regs *regs, + struct insn *insn, int regoff) +{ + int i; + + struct kernel_vm86_regs *vm86regs = (struct kernel_vm86_regs *)regs; + + if (!insn) + goto default_seg; + + insn_get_prefixes(insn); + + if (v8086_mode(regs)) { + /* + * Check first if we have selector overrides. Having more than + * one selector override leads to undefined behavior. We + * only use the first one and return + */ + for (i = 0; i < insn->prefixes.nbytes; i++) { + switch (insn->prefixes.bytes[i]) { + /* + * Code and stack segment selector register are saved in + * all processor modes. Thus, it makes sense to take + * them from pt_regs. + */ + case 0x2e: + return (unsigned short)regs->cs; + case 0x36: + return (unsigned short)regs->ss; + /* + * The rest of the segment selector registers are only + * saved in virtual-8086 mode. Thus, we must obtain them + * from the vm86 register structure. + */ + case 0x3e: + return vm86regs->ds; + case 0x26: + return vm86regs->es; + case 0x64: + return vm86regs->fs; + case 0x65: + return vm86regs->gs; + /* + * No default action needed. We simply did not find any + * relevant prefixes. + */ + } + } + } else {/* protected mode */ + /* + * Check first if we have selector overrides. Having more than + * one selector override leads to undefined behavior. We + * only use the first one and return. + */ + for (i = 0; i < insn->prefixes.nbytes; i++) { + switch (insn->prefixes.bytes[i]) { + /* + * Code and stack segment selector register are saved in + * all processor modes. Thus, it makes sense to take + * them from pt_regs. + */ + case 0x2e: + return (unsigned short)regs->cs; + case 0x36: + return (unsigned short)regs->ss; +#ifdef CONFIG_X86_32 + case 0x3e: + return (unsigned short)regs->ds; + case 0x26: + return (unsigned short)regs->es; + case 0x64: + return (unsigned short)regs->fs; + case 0x65: + return (unsigned short)regs->gs; +#else + /* do not return any segment selector in x86_64 */ + case 0x3e: + case 0x26: + case 0x64: + case 0x65: + return 0; +#endif + /* + * No default action needed. We simply did not find any + * relevant prefixes. + */ + } + } + } + +default_seg: + /* + * If no overrides, use default selectors as described in the + * Intel documentation: SS for ESP or EBP. DS for all data references, + * except when relative to stack or string destination. + * Also, AX, CX and DX are not valid register operands in 16-bit + * address encodings. + * Callers must interpret the result correctly according to the type + * of instructions (e.g., use ES for string instructions). + * Also, some values of modrm and sib might seem to indicate the use + * of EBP and ESP (e.g., modrm_mod = 0, modrm_rm = 5) but actually + * they refer to cases in which only a displacement used. These cases + * should be indentified by the caller and not with this function. + */ + switch (regoff) { + case offsetof(struct pt_regs, ax): + /* fall through */ + case offsetof(struct pt_regs, cx): + /* fall through */ + case offsetof(struct pt_regs, dx): + if (insn && insn->addr_bytes == 2) + return 0; + case -EDOM: /* no register involved in address computation */ + case offsetof(struct pt_regs, bx): + /* fall through */ + case offsetof(struct pt_regs, di): + /* fall through */ + case offsetof(struct pt_regs, si): + if (v8086_mode(regs)) + return vm86regs->ds; +#ifdef CONFIG_X86_32 + return (unsigned short)regs->ds; +#else + return 0; +#endif + case offsetof(struct pt_regs, bp): + /* fall through */ + case offsetof(struct pt_regs, sp): + return (unsigned short)regs->ss; + case offsetof(struct pt_regs, ip): + return (unsigned short)regs->cs; + default: + return 0; + } +} + static int get_reg_offset(struct insn *insn, struct pt_regs *regs, enum reg_type type) {
The segment descriptor contains information that is relevant to how linear address need to be computed. It contains the default size of addresses as well as the base address of the segment. Thus, given a segment selector, we ought look at segment descriptor to correctly calculate the linear address.
In protected mode, the segment selector might indicate a segment descriptor from either the global descriptor table or a local descriptor table. Both cases are considered in this function.
This function is the initial implementation for subsequent functions that will obtain the aforementioned attributes of the segment descriptor.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 516902e..e6d5dfb 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -5,9 +5,13 @@ */ #include <linux/kernel.h> #include <linux/string.h> +#include <asm/desc_defs.h> +#include <asm/desc.h> #include <asm/inat.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/ldt.h> +#include <linux/mmu_context.h> #include <asm/vm86.h>
enum reg_type { @@ -262,6 +266,63 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, }
/** + * get_desc() - Obtain address of segment descriptor + * @seg: Segment selector + * @desc: Pointer to the selected segment descriptor + * + * Given a segment selector, obtain a memory pointer to the segment + * descriptor. Both global and local descriptor tables are supported. + * desc will contain the address of the descriptor. + * + * Return: 0 if success, -EINVAL if failure + */ +static int get_desc(unsigned short seg, struct desc_struct **desc) +{ + struct desc_ptr gdt_desc = {0, 0}; + unsigned long desc_base; + + if (!desc) + return -EINVAL; + + desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK); + +#ifdef CONFIG_MODIFY_LDT_SYSCALL + if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) { + seg >>= 3; + + mutex_lock(¤t->active_mm->context.lock); + if (unlikely(!current->active_mm->context.ldt || + seg >= current->active_mm->context.ldt->size)) { + *desc = NULL; + mutex_unlock(¤t->active_mm->context.lock); + return -EINVAL; + } + + *desc = ¤t->active_mm->context.ldt->entries[seg]; + mutex_unlock(¤t->active_mm->context.lock); + return 0; + } +#endif + native_store_gdt(&gdt_desc); + + /* + * Bits [15:3] of the segment selector contain the index. Such + * index needs to be multiplied by 8. However, as the index + * least significant bit is already in bit 3, we don't have + * to perform the multiplication. + */ + desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK); + + if (desc_base > gdt_desc.size) { + *desc = NULL; + return -EINVAL; + } + + *desc = (struct desc_struct *)(gdt_desc.address + desc_base); + return 0; +} + +/** * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Set of registers indicated by the ModRM byte
With segmentation, the base address of the segment descriptor is needed to compute a linear address. The segment descriptor used in the address computation depends on either any segment override prefixes in the in the instruction or the default segment determined by the registers involved in the address computation. Thus, both the instruction as well as the register (specified as the offset from the base of pt_regs) are given as inputs. Furthermore, if insn is null, overrides are ignored; this is useful when, for instance, obtaining the base address of the instruction pointer (the code segment is always used).
The segment selector is determined by get_seg_selector with the inputs described above. Once the selector is known the base address is determined. In protected mode, the selector is used to obtain the segment descriptor and then its base address. In virtual-8086 mode, the base address is computed as the value of the segment selector shifted 4 positions to the left.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-eval.h | 2 ++ arch/x86/lib/insn-eval.c | 42 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 754211b..0de3083 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, + int regoff);
#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index e6d5dfb..4e3f797 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -323,6 +323,48 @@ static int get_desc(unsigned short seg, struct desc_struct **desc) }
/** + * insn_get_seg_base() - Obtain base address contained in descriptor + * @regs: Set of registers containing the segment selector + * @insn: Instruction structure with selector override prefixes + * @regoff: Operand offset, in pt_regs, of which the selector is needed + * + * Obtain the base address of the segment descriptor as indicated by either any + * segment override prefixes contained in insn or the default segment applicable + * to the register indicated by regoff. regoff is specified as the offset in + * bytes from the base of pt_regs. If insn is not null and contain any segment + * override prefixes, the override is used instead of the default segment. + * + * Return: In protected mode, 0 if in CONFIG_X86_64, -1L in case of error, + * or the base address indicated in the selected segment descriptor. In + * virtual-8086, the segment selector shifted four positions to the right. + */ +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, + int regoff) +{ + struct desc_struct *desc; + unsigned short seg; + int ret; + + seg = get_segment_selector(regs, insn, regoff); + + if (v8086_mode(regs)) + /* + * Base is simply the segment selector sifted 4 + * positions to the right. + */ + return (unsigned long)(seg << 4); + + /* 64-bit mode */ + if (!seg) + return 0; + ret = get_desc(seg, &desc); + if (ret) + return -1L; + + return get_desc_base(desc); +} + +/** * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Set of registers indicated by the ModRM byte
These functions read the default values of the address and operand sizes as specified in the segment descriptor. This information is determined from the D and L bits. Hence, it can be used for both IA-32e 64-bit and 32-bit legacy modes. For virtual-8086 mode, the default address and operand sizes are always 2 bytes.
The D bit is only meaningful for code segments. Thus, these functions always use the code segment selector contained in regs.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-eval.h | 2 + arch/x86/lib/insn-eval.c | 80 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 0de3083..cd4008251 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs); +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs); unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, int regoff);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 4e3f797..3fe4ddb 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -365,6 +365,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, }
/** + * insn_get_seg_default_address_bytes - Obtain default address size of segment + * @regs: Set of registers containing the segment selector + * + * Obtain the default address size as indicated in the segment descriptor + * selected in regs' code segment selector. In protected mode, the default + * address is determined by inspecting the L and D bits of the segment + * descriptor. In virtual-8086 mode, the default is always two bytes. + * + * Return: Default address size of segment + */ +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs) +{ + struct desc_struct *desc; + unsigned short seg; + int ret; + + if (v8086_mode(regs)) + return 2; + + seg = (unsigned short)regs->cs; + + ret = get_desc(seg, &desc); + if (ret) + return 0; + + switch ((desc->l << 1) | desc->d) { + case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */ + return 2; + case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */ + return 4; + case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */ + return 8; + case 3: /* Invalid setting. CS.L=1, CS.D=1 */ + /* fall through */ + default: + return 0; + } +} + +/** + * insn_get_seg_default_operand_bytes - Obtain default operand size of segment + * @regs: Set of registers containing the segment selector + * + * Obtain the default operand size as indicated in the segment descriptor + * selected in regs' code segment selector. In protected mode, the default + * operand size is determined by inspecting the L and D bits of the segment + * descriptor. In virtual-8086 mode, the default is always two bytes. + * + * Return: Default operand size of segment + */ +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs) +{ + struct desc_struct *desc; + unsigned short seg; + int ret; + + if (v8086_mode(regs)) + return 2; + + seg = (unsigned short)regs->cs; + + ret = get_desc(seg, &desc); + if (ret) + return 0; + + switch ((desc->l << 1) | desc->d) { + case 0: /* Legacy mode. 16-bit or 8-bit operands CS.L=0, CS.D=0 */ + return 2; + case 1: /* Legacy mode. 32- or 8 bit operands CS.L=0, CS.D=1 */ + /* fall through */ + case 2: /* IA-32e 64-bit mode. 32- or 8-bit opnds. CS.L=1, CS.D=0 */ + return 4; + case 3: /* Invalid setting. CS.L=1, CS.D=1 */ + /* fall through */ + default: + return 0; + } +} + +/** * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Set of registers indicated by the ModRM byte
Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when the mod part of the ModRM byte is zero and R/EBP is specified in the R/M part of such bit, the value of the aforementioned register should not be used in the address computation. Instead, a 32-bit displacement is expected. The instruction decoder takes care of setting the displacement to the expected value. Returning -EDOM signals callers that they should ignore the value of such register when computing the address encoded in the instruction operands.
Also, callers should exercise care to correctly interpret this particular case. In IA-32e 64-bit mode, the address is given by the displacement plus the value of the RIP. In IA-32e compatibility mode, the value of EIP is ignored. This correction is done for our insn_get_addr_ref.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 33 ++++++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 3fe4ddb..d6525c2 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -218,6 +218,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, switch (type) { case REG_TYPE_RM: regno = X86_MODRM_RM(insn->modrm.value); + /* if mod=0, register R/EBP is not used in the address + * computation. Instead, a 32-bit displacement is expected; + * the instruction decoder takes care of reading such + * displacement. This is true for both R/EBP and R13, as the + * REX.B bit is not decoded. + */ + if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) + return -EDOM; if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break; @@ -544,10 +552,29 @@ static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { + unsigned char addr_bytes; + + addr_bytes = insn_get_seg_default_address_bytes(regs); addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) - goto out_err; - addr = regs_get_register(regs, addr_offset); + if (addr_offset < 0) { + /* -EDOM means that we must ignore the + * address_offset. The only case in which we + * see this value is when R/M points to R/EBP. + * In such a case, the address involves using + * the instruction pointer for 64-bit mode. + */ + if (addr_offset == -EDOM) { + /* if in 64-bit mode */ + if (addr_bytes == 8) + addr = regs->ip; + else + addr = 0; + } else { + goto out_err; + } + } else { + addr = regs_get_register(regs, addr_offset); + } } addr += insn->displacement.value; }
insn_get_addr_ref returns the effective address as defined by the section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software Developer's Manual. In order to truly give the linear address, we must add the effective address to the segment base as described by the segment descriptor.
In most cases, the base will be 0 if the USER_DS segment is used or if segmentation is not used. However, the base address is not necessarily zero if a user programs defines its own segments. This is possible by using a local descriptor table.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index d6525c2..b3a2fe8 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -523,6 +523,7 @@ static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) if (addr_offset < 0) goto out_err; addr = regs_get_register(regs, addr_offset); + addr += insn_get_seg_base(regs, insn, addr_offset); } else { if (insn->sib.nbytes) { /* @@ -551,6 +552,7 @@ static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) indx = regs_get_register(regs, indx_offset);
addr = base + indx * (1 << X86_SIB_SCALE(sib)); + addr += insn_get_seg_base(regs, insn, base_offset); } else { unsigned char addr_bytes;
@@ -575,8 +577,10 @@ static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) } else { addr = regs_get_register(regs, addr_offset); } + addr += insn_get_seg_base(regs, insn, addr_offset); } addr += insn->displacement.value; + } return (void __user *)addr; out_err:
Tasks running in virtual-8086 mode or in protected mode with code segment descriptors that specify 16-bit default address sizes via the D bit will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section 2.1.5. 16-bit addressing encodings differ in several ways from the 32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte points to different registers and, in some cases, addresses can be indicated by the addition of the value of two registers. Also, there is no support for SiB bytes. Thus, a separate function is needed to parse this form of addressing.
A couple of functions are introduced. get_reg_offset_16 obtains the offset from the base of pt_regs of the registers indicated by the ModRM byte of the address encoding. insn_get_addr_ref_16 computes the linear address indicated by the instructions using the value of the registers given by ModRM as well as the base address of the segment.
Lastly, the original function insn_get_addr_ref is renamed as insn_get_addr_ref_32_64. A new insn_get_addr_ref function decides what type of address decoding must be done base on the number of address bytes given by the instruction.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 155 insertions(+), 1 deletion(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index b3a2fe8..ea5a38d 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -274,6 +274,73 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, }
/** + * get_reg_offset_16 - Obtain offset of register indicated by instruction + * @insn: Instruction structure containing ModRM and SiB bytes + * @regs: Set of registers referred by the instruction + * @offs1: Offset of the first operand register + * @offs2: Offset of the second opeand register, if applicable. + * + * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte + * within insn. This function is to be used with 16-bit address encodings. The + * offs1 and offs2 will be written with the offset of the two registers + * indicated by the instruction. In cases where any of the registers is not + * referenced by the instruction, the value will be set to -EDOM. + * + * Return: 0 on success, -EINVAL on failure. + */ +static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs, + int *offs1, int *offs2) +{ + /* 16-bit addressing can use one or two registers */ + static const int regoff1[] = { + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bx), + }; + + static const int regoff2[] = { + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + -EDOM, + -EDOM, + -EDOM, + -EDOM, + }; + + if (!offs1 || !offs2) + return -EINVAL; + + /* operand is a register, use the generic function */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + *offs1 = insn_get_reg_offset_modrm_rm(insn, regs); + *offs2 = -EDOM; + return 0; + } + + *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)]; + *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)]; + + /* + * If no displacement is indicated in the mod part of the ModRM byte, + * (mod part is 0) and the r/m part of the same byte is 6, no register + * is used caculate the operand address. An r/m part of 6 means that + * the second register offset is already invalid. + */ + if ((X86_MODRM_MOD(insn->modrm.value) == 0) && + (X86_MODRM_RM(insn->modrm.value) == 6)) + *offs1 = -EDOM; + + return 0; +} + +/** * get_desc() - Obtain address of segment descriptor * @seg: Segment selector * @desc: Pointer to the selected segment descriptor @@ -503,12 +570,79 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) return get_reg_offset(insn, regs, REG_TYPE_INDEX); }
+/** + * insn_get_addr_ref_16 - Obtain the 16-bit address referred by instruction + * @insn: Instruction structure containing ModRM byte and displacement + * @regs: Set of registers referred by the instruction + * + * This function is to be used with 16-bit address encodings. Obtain the memory + * address referred by the instruction's ModRM bytes and displacement. Also, the + * segment used as base is determined by either any segment override prefixes in + * insn or the default segment of the registers involved in the address + * computation. + * the ModRM byte + * + * Return: linear address referenced by instruction and registers + */ +static void __user *insn_get_addr_ref_16(struct insn *insn, + struct pt_regs *regs) +{ + unsigned long addr; + unsigned short addr1 = 0, addr2 = 0; + int addr_offset1, addr_offset2; + int ret; + + insn_get_modrm(insn); + insn_get_displacement(insn); + + /* + * If operand is a register, the layout is the same as in + * 32-bit and 64-bit addressing. + */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + addr_offset1 = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset1 < 0) + goto out_err; + addr = regs_get_register(regs, addr_offset1); + addr += insn_get_seg_base(regs, insn, addr_offset1); + } else { + ret = get_reg_offset_16(insn, regs, &addr_offset1, + &addr_offset2); + if (ret < 0) + goto out_err; + /* + * Don't fail on invalid offset values. They might be invalid + * because they cannot be used for this particular value of + * the ModRM. Instead, use them in the computation only if + * they contain a valid value. + */ + if (addr_offset1 != -EDOM) + addr1 = 0xffff & regs_get_register(regs, addr_offset1); + if (addr_offset2 != -EDOM) + addr2 = 0xffff & regs_get_register(regs, addr_offset2); + addr = (unsigned long)(addr1 + addr2); + /* + * The first register is in the operand implies the SS or DS + * segment selectors, the second register in the operand can + * only imply DS. Thus, use the first register to obtain + * the segment selector. + */ + addr += insn_get_seg_base(regs, insn, addr_offset1); + } + addr += insn->displacement.value; + + return (void __user *)addr; +out_err: + return (void __user *)-1; +} + /* * return the address being referenced be instruction * for rm=3 returning the content of the rm reg * for rm!=3 calculates the address using SIB and Disp */ -static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +static void __user *insn_get_addr_ref_32_64(struct insn *insn, + struct pt_regs *regs) { unsigned long addr, base, indx; int addr_offset, base_offset, indx_offset; @@ -586,3 +720,23 @@ static void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) out_err: return (void __user *)-1; } + +/** + * insn_get_addr_ref - Obtain the linear address referred by instruction + * @insn: Instruction structure containing ModRM byte and displacement + * @regs: Set of registers referred by the instruction + * + * Obtain the memory address referred by the instruction's ModRM bytes and + * displacement. Also, the segment used as base is determined by either any + * segment override prefixes in insn or the default segment of the registers + * involved in the address computation. + * + * Return: linear address referenced by instruction and registers + */ +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +{ + if (insn->addr_bytes == 2) + return insn_get_addr_ref_16(insn, regs); + else + return insn_get_addr_ref_32_64(insn, regs); +}
User-Mode Instruction Prevention is a security feature present in new Intel processors that, when set, prevents the execution of a subset of instructions if such instructions are executed in user mode (CPL > 0). Attempting to execute such instructions causes a general protection exception.
The subset of instructions comprises:
* SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register
This feature is also added to the list of disabled-features to allow a cleaner handling of build-time configuration.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/include/uapi/asm/processor-flags.h | 2 ++ 3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 4e77723..0739f1e 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -286,6 +286,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 85599ad..4707445 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -16,6 +16,12 @@ # define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31)) #endif
+#ifdef CONFIG_X86_INTEL_UMIP +# define DISABLE_UMIP 0 +#else +# define DISABLE_UMIP (1<<(X86_FEATURE_UMIP & 31)) +#endif + #ifdef CONFIG_X86_64 # define DISABLE_VME (1<<(X86_FEATURE_VME & 31)) # define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31)) @@ -55,7 +61,7 @@ #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 -#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE) +#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_UMIP) #define DISABLED_MASK17 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index 567de50..d2c2af8 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -104,6 +104,8 @@ #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT) #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */ #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT) +#define X86_CR4_UMIP_BIT 11 /* enable UMIP support */ +#define X86_CR4_UMIP _BITUL(X86_CR4_UMIP_BIT) #define X86_CR4_VMXE_BIT 13 /* enable VMX virtualization */ #define X86_CR4_VMXE _BITUL(X86_CR4_VMXE_BIT) #define X86_CR4_SMXE_BIT 14 /* enable safer mode (TXT) */
The feature User-Mode Instruction Prevention present in recent Intel processor prevents a group of instructions from being executed with CPL > 0. Otherwise, a general protection fault is issued.
Rather than relaying this fault to the user space (in the form of a SIGSEGV signal), the instructions protected by UMIP can be emulated to provide dummy results. This allows to conserve the current kernel behavior and not reveal the system resources that UMIP intends to protect (the global descriptor and interrupt descriptor tables, the segment selectors of the local descriptor table and the task state and the machine status word).
This emulation is needed because certain applications (e.g., WineHQ) rely on this subset of instructions to function.
The instructions protected by UMIP can be split in two groups. Those who return a kernel memory address (sgdt and sidt) and those who return a value (sldt, str and smsw).
For the instructions that return a kernel memory address, applications such as WineHQ rely on the result being located in the kernel memory space. The result is emulated as a hard-coded value that, lies close to the top of the kernel memory. The limit for the GDT and the IDT are set to zero.
The instructions sldt and str return a segment selector relative to the base address of the global descriptor table. Since the actual address of such table is not revealed, it makes sense to emulate the result as zero.
The instruction smsw is emulated to return the value that the register CR0 has at boot time as set in the head_32.
Care is taken to appropriately emulate the results when segmentation is used. This is, rather than relying on USER_DS and USER_CS, the function insn_get_addr_ref inspects the segment descriptor pointed by the registers in pt_regs. This ensures that we correctly obtain the segment base address and the address and operand sizes even if the user space application uses local descriptor table.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/umip.h | 15 +++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/umip.c | 262 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 278 insertions(+) create mode 100644 arch/x86/include/asm/umip.h create mode 100644 arch/x86/kernel/umip.c
diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h new file mode 100644 index 0000000..077b236 --- /dev/null +++ b/arch/x86/include/asm/umip.h @@ -0,0 +1,15 @@ +#ifndef _ASM_X86_UMIP_H +#define _ASM_X86_UMIP_H + +#include <linux/types.h> +#include <asm/ptrace.h> + +#ifdef CONFIG_X86_INTEL_UMIP +bool fixup_umip_exception(struct pt_regs *regs); +#else +static inline bool fixup_umip_exception(struct pt_regs *regs) +{ + return false; +} +#endif /* CONFIG_X86_INTEL_UMIP */ +#endif /* _ASM_X86_UMIP_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index bdcdb3b..424b58f 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -123,6 +123,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o +obj-$(CONFIG_X86_INTEL_UMIP) += umip.o
ifdef CONFIG_FRAME_POINTER obj-y += unwind_frame.o diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c new file mode 100644 index 0000000..b16542a --- /dev/null +++ b/arch/x86/kernel/umip.c @@ -0,0 +1,262 @@ +/* + * umip.c Emulation for instruction protected by the Intel User-Mode + * Instruction Prevention. The instructions are: + * sgdt + * sldt + * sidt + * str + * smsw + * + * Copyright (c) 2016, Intel Corporation. + * Ricardo Neri ricardo.neri@linux.intel.com + */ + +#include <linux/uaccess.h> +#include <asm/umip.h> +#include <asm/traps.h> +#include <asm/insn.h> +#include <asm/insn-eval.h> +#include <linux/ratelimit.h> + +/* + * == Base addresses of GDT and IDT + * Some applications to function rely finding the global descriptor table (GDT) + * and the interrupt descriptor table (IDT) in kernel memory. + * For x86_32, the selected values do not match any particular hole, but it + * suffices to provide a memory location within kernel memory. + * + * == CRO flags for SMSW + * Use the flags given when booting, as found in head_32.S + */ + +#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \ + X86_CR0_WP | X86_CR0_AM) +#define UMIP_DUMMY_GDT_BASE 0xfffe0000 +#define UMIP_DUMMY_IDT_BASE 0xffff0000 + +/* + * Definitions for x86 page fault error code bits. Only a simple + * pagefault during a write in user context is supported. + */ +#define UMIP_PF_USER BIT(2) +#define UMIP_PF_WRITE BIT(1) + +enum umip_insn { + UMIP_SGDT = 0, /* opcode 0f 01 ModR/M reg 0 */ + UMIP_SIDT, /* opcode 0f 01 ModR/M reg 1 */ + UMIP_SLDT, /* opcode 0f 00 ModR/M reg 0 */ + UMIP_SMSW, /* opcode 0f 01 ModR/M reg 4 */ + UMIP_STR, /* opcode 0f 00 ModR/M reg 1 */ +}; + +/** + * __identify_insn - Identify a UMIP-protected instruction + * @insn: Instruction structure with opcode and ModRM byte. + * + * From the instruction opcode and the reg part of the ModRM byte, identify, + * if any, a UMIP-protected instruction. + * + * Return: an enumeration of a UMIP-protected instruction; -EINVAL on failure. + */ +static int __identify_insn(struct insn *insn) +{ + /* By getting modrm we also get the opcode. */ + insn_get_modrm(insn); + + /* All the instructions of interest start with 0x0f. */ + if (insn->opcode.bytes[0] != 0xf) + return -EINVAL; + + if (insn->opcode.bytes[1] == 0x1) { + switch (X86_MODRM_REG(insn->modrm.value)) { + case 0: + return UMIP_SGDT; + case 1: + return UMIP_SIDT; + case 4: + return UMIP_SMSW; + default: + return -EINVAL; + } + } else if (insn->opcode.bytes[1] == 0x0) { + if (X86_MODRM_REG(insn->modrm.value) == 0) + return UMIP_SLDT; + else if (X86_MODRM_REG(insn->modrm.value) == 1) + return UMIP_STR; + else + return -EINVAL; + } else { + return -EINVAL; + } +} + +/** + * __emulate_umip_insn - Emulate UMIP instructions with dummy values + * @insn: Instruction structure with ModRM byte + * @umip_inst: Instruction to emulate + * @data: Buffer onto which the dummy values will be copied + * @data_size: Size of the emulated result + * + * Emulate an instruction protected by UMIP. The result of the emulation + * is saved in the provided buffer. The size of the results depends on both + * the instruction and type of operand (register vs memory address). Thus, + * the size of the result needs to be updated. + * + * Result: 0 if success, -EINVAL on failure to emulate + */ +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst, + unsigned char *data, int *data_size) +{ + unsigned long dummy_base_addr; + unsigned short dummy_limit = 0; + unsigned int dummy_value = 0; + + switch (umip_inst) { + /* + * These two instructions return the base address and limit of the + * global and interrupt descriptor table. The base address can be + * 24-bit, 32-bit or 64-bit. Limit is always 16-bit. If the operand + * size is 16-bit the returned value of the base address is supposed + * to be a zero-extended 24-byte number. However, it seems that a + * 32-byte number is always returned in legacy protected mode + * irrespective of the operand size. + */ + case UMIP_SGDT: + /* fall through */ + case UMIP_SIDT: + if (umip_inst == UMIP_SGDT) + dummy_base_addr = UMIP_DUMMY_GDT_BASE; + else + dummy_base_addr = UMIP_DUMMY_IDT_BASE; + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + /* SGDT and SIDT do not take register as argument. */ + return -EINVAL; + } + + memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr)); + memcpy(data, &dummy_limit, sizeof(dummy_limit)); + *data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit); + break; + case UMIP_SMSW: + /* + * Even though CR0_STATE contain 4 bytes, the number + * of bytes to be copied in the result buffer is determined + * by whether the operand is a register or a memory location. + */ + dummy_value = CR0_STATE; + /* + * These two instructions return a 16-bit value. We return + * all zeros. This is equivalent to a null descriptor for + * str and sldt. + */ + /* fall through */ + case UMIP_SLDT: + /* fall through */ + case UMIP_STR: + /* if operand is a register, it is zero-extended */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + memset(data, 0, insn->opnd_bytes); + *data_size = insn->opnd_bytes; + /* if not, only the two least significant bytes are copied */ + } else { + *data_size = 2; + } + memcpy(data, &dummy_value, sizeof(dummy_value)); + break; + default: + return -EINVAL; + } + return 0; +} + +/** + * fixup_umip_exception - Fixup #GP faults caused by UMIP + * @regs: Registers as saved when entering the #GP trap + * + * The instructions sgdt, sidt, str, smsw, sldt cause a general protection + * fault if with CPL > 0 (i.e., from user space). This function can be + * used to emulate the results of the aforementioned instructions with + * dummy values. Results are copied to user-space memory as indicated by + * the instruction pointed by EIP using the registers indicated in the + * instruction operands. This function also takes care of determining + * the address to which the results must be copied. + */ +bool fixup_umip_exception(struct pt_regs *regs) +{ + struct insn insn; + unsigned char buf[MAX_INSN_SIZE]; + /* 10 bytes is the maximum size of the result of UMIP instructions */ + unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; + unsigned long seg_base; + int not_copied, nr_copied, reg_offset, dummy_data_size; + void __user *uaddr; + unsigned long *reg_addr; + enum umip_insn umip_inst; + + /* + * Use the segment base in case user space used a different code + * segment, either in protected (e.g., from an LDT) or virtual-8086 + * modes. In most of the cases seg_base will be zero as in USER_CS. + */ + seg_base = insn_get_seg_base(regs, NULL, offsetof(struct pt_regs, ip)); + not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip), + sizeof(buf)); + nr_copied = sizeof(buf) - not_copied; + /* + * The copy_from_user above could have failed if user code is protected + * by a memory protection key. Give up on emulation in such a case. + * Should we issue a page fault? + */ + if (!nr_copied) + return false; + + insn_init(&insn, buf, nr_copied, 0); + + /* + * Override the default operand and address sizes to what is specified + * in the code segment descriptor. The instruction decoder only sets + * the address size it to either 4 or 8 address bytes and does nothing + * for the operand bytes. This OK for most of the cases, but we could + * have special cases where, for instance, a 16-bit code segment + * descriptor is used. + * If there are overrides, the instruction decoder correctly updates + * these values, even for 16-bit defaults. + */ + insn.addr_bytes = insn_get_seg_default_address_bytes(regs); + insn.opnd_bytes = insn_get_seg_default_operand_bytes(regs); + + if (!insn.addr_bytes || !insn.opnd_bytes) + return false; + + /* if in 64-bit mode, do not emulate */ + if (insn.addr_bytes == 8) + return false; + + insn_get_length(&insn); + if (nr_copied < insn.length) + return false; + + umip_inst = __identify_insn(&insn); + /* Check if we found an instruction protected by UMIP */ + if (umip_inst < 0) + return false; + + if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size)) + return false; + + /* If operand is a register, write directly to it */ + if (X86_MODRM_MOD(insn.modrm.value) == 3) { + reg_offset = insn_get_reg_offset_modrm_rm(&insn, regs); + reg_addr = (unsigned long *)((unsigned long)regs + reg_offset); + memcpy(reg_addr, dummy_data, dummy_data_size); + } else { + uaddr = insn_get_addr_ref(&insn, regs); + nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size); + if (nr_copied > 0) + return false; + } + + /* increase IP to let the program keep going */ + regs->ip += insn.length; + return true; +}
fixup_umip_exception will be called from do_general_protection. If the former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV. However, when emulation is successful but the emulated result cannot be copied to user space memory, it is more accurate to issue a SIGSEGV with SEGV_MAPERR with the offending address. A new function is inspired in force_sig_info_fault is introduced to model the page fault.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/kernel/umip.c | 45 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c index b16542a..93bc80d 100644 --- a/arch/x86/kernel/umip.c +++ b/arch/x86/kernel/umip.c @@ -170,6 +170,41 @@ static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst, }
/** + * __force_sig_info_umip_fault - Force a SIGSEGV with SEGV_MAPERR + * @address: Address that caused the signal + * @regs: Register set containing the instruction pointer + * + * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is + * intended to be used to provide a segmentation fault when the result of the + * UMIP emulation could not be copied to the user space memory. + * + * Return: none + */ +static void __force_sig_info_umip_fault(void __user *address, + struct pt_regs *regs) +{ + siginfo_t info; + struct task_struct *tsk = current; + + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) { + printk_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%lx in %lx\n", + tsk->comm, task_pid_nr(tsk), regs->ip, + regs->sp, UMIP_PF_USER | UMIP_PF_WRITE, + regs->ip); + } + + tsk->thread.cr2 = (unsigned long)address; + tsk->thread.error_code = UMIP_PF_USER | UMIP_PF_WRITE; + tsk->thread.trap_nr = X86_TRAP_PF; + + info.si_signo = SIGSEGV; + info.si_errno = 0; + info.si_code = SEGV_MAPERR; + info.si_addr = address; + force_sig_info(SIGSEGV, &info, tsk); +} + +/** * fixup_umip_exception - Fixup #GP faults caused by UMIP * @regs: Registers as saved when entering the #GP trap * @@ -252,8 +287,14 @@ bool fixup_umip_exception(struct pt_regs *regs) } else { uaddr = insn_get_addr_ref(&insn, regs); nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size); - if (nr_copied > 0) - return false; + if (nr_copied > 0) { + /* + * If copy fails, send a signal and tell caller that + * fault was fixed up + */ + __force_sig_info_umip_fault(uaddr, regs); + return true; + } }
/* increase IP to let the program keep going */
If the User-Mode Instruction Prevention CPU feature is available and enabled, a general protection fault will be issued if the instructions sgdt, sldt, sidt, str or smsw are executed from user-mode context (CPL > 0). If the fault was caused by any of the instructions protected by UMIP, fixup_umip_exception will emulate dummy results for these instructions. If emulation is successful, the result is passed to the user space program and no SIGSEGV signal is emitted.
Please note that fixup_umip_exception also caters for the case when the fault originated while running in virtual-8086 mode.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/kernel/traps.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 948443e..39614ef 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -65,6 +65,7 @@ #include <asm/trace/mpx.h> #include <asm/mpx.h> #include <asm/vm86.h> +#include <asm/umip.h>
#ifdef CONFIG_X86_64 #include <asm/x86_init.h> @@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code) RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); cond_local_irq_enable(regs);
+ if (user_mode(regs) && (fixup_umip_exception(regs) == true)) + return; + if (v8086_mode(regs)) { local_irq_enable(); handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
On Wed, Feb 22, 2017 at 10:37:04PM -0800, Ricardo Neri wrote:
@@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code) RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); cond_local_irq_enable(regs);
- if (user_mode(regs) && (fixup_umip_exception(regs) == true))
return;
I'm thinking
if (user_mode(regs) && fixup_umip_exception(regs)) return;
is actually easier to read.
On Thu, 2017-02-23 at 10:27 +0100, Peter Zijlstra wrote:
On Wed, Feb 22, 2017 at 10:37:04PM -0800, Ricardo Neri wrote:
@@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code) RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); cond_local_irq_enable(regs);
- if (user_mode(regs) && (fixup_umip_exception(regs) == true))
return;
I'm thinking
if (user_mode(regs) && fixup_umip_exception(regs)) return;
is actually easier to read.
In a previous version Andy Lutomirsky suggested that if (user_mode(regs) && (fixup_umip_exception(regs) == 0))
was easier to read :). Although at the time fixup_umip_exception returned a numeric value. Now it only returns true/false for successful/failed emulation. If with true/false not comparing to true makes it easier to read, I will make the change.
Thanks and BR, Ricardo
On Thu, Feb 23, 2017 at 2:15 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
On Thu, 2017-02-23 at 10:27 +0100, Peter Zijlstra wrote:
On Wed, Feb 22, 2017 at 10:37:04PM -0800, Ricardo Neri wrote:
@@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code) RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); cond_local_irq_enable(regs);
- if (user_mode(regs) && (fixup_umip_exception(regs) == true))
return;
I'm thinking
if (user_mode(regs) && fixup_umip_exception(regs)) return;
is actually easier to read.
In a previous version Andy Lutomirsky suggested that if (user_mode(regs) && (fixup_umip_exception(regs) == 0))
was easier to read :). Although at the time fixup_umip_exception returned a numeric value. Now it only returns true/false for successful/failed emulation. If with true/false not comparing to true makes it easier to read, I will make the change.
I think == true is silly :)
--Andy
On Fri, 2017-02-24 at 11:11 -0800, Andy Lutomirski wrote:
In a previous version Andy Lutomirsky suggested that if (user_mode(regs) && (fixup_umip_exception(regs) == 0))
was easier to read :). Although at the time fixup_umip_exception returned a numeric value. Now it only returns true/false for successful/failed emulation. If with true/false not comparing to
true
makes it easier to read, I will make the change.
I think == true is silly :)
Then I'll make the change.
Thanks and BR, Ricardo
Luck tony.luck@intel.com From: hpa@zytor.com Message-ID: C4474E45-EAE0-45D3-8DB1-78AA1C2548A8@zytor.com
On February 24, 2017 11:36:19 AM PST, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
On Fri, 2017-02-24 at 11:11 -0800, Andy Lutomirski wrote:
In a previous version Andy Lutomirsky suggested that if (user_mode(regs) && (fixup_umip_exception(regs) == 0))
was easier to read :). Although at the time fixup_umip_exception returned a numeric value. Now it only returns true/false for successful/failed emulation. If with true/false not comparing to
true
makes it easier to read, I will make the change.
I think == true is silly :)
Then I'll make the change.
Thanks and BR, Ricardo
It's worse than silly, it is potentially toxic.
true is a macro which it's defined as 1. Thus
foo == true
... doesn't actually mean what people *think* it does, which is roughly the same thing as
!!foo
However, if foo is not a boolean, this is *very* different; consider if foo is 2.
User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a bit in %cr4.
It makes sense to enable UMIP at some point while booting, before user spaces come up. Like SMAP and SMEP, is not critical to have it enabled very early during boot. This is because UMIP is relevant only when there is a userspace to be protected from. Given the similarities in relevance, it makes sense to enable UMIP along with SMAP and SMEP.
UMIP is enabled by default. It can be disabled by adding clearcpuid=514 to the kernel parameters.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/Kconfig | 10 ++++++++++ arch/x86/kernel/cpu/common.c | 16 +++++++++++++++- 2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f8fbfc5..8819fb2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1733,6 +1733,16 @@ config X86_SMAP
If unsure, say Y.
+config X86_INTEL_UMIP + def_bool y + depends on CPU_SUP_INTEL + prompt "User Mode Instruction Prevention" if EXPERT + ---help--- + The User Mode Instruction Prevention (UMIP) is a security + feature in newer Intel processors. If enabled, a general + protection fault is issued if the instructions SGDT, SLDT, + SIDT, SMSW and STR are executed in user mode. + config X86_INTEL_MPX prompt "Intel MPX (Memory Protection Extensions)" def_bool n diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index c188ae5..8668828 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -312,6 +312,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c) } }
+static __always_inline void setup_umip(struct cpuinfo_x86 *c) +{ + if (cpu_feature_enabled(X86_FEATURE_UMIP) && + cpu_has(c, X86_FEATURE_UMIP)) + cr4_set_bits(X86_CR4_UMIP); + else + /* + * Make sure UMIP is disabled in case it was enabled in a + * previous boot (e.g., via kexec). + */ + cr4_clear_bits(X86_CR4_UMIP); +} + /* * Protection Keys are not available in 32-bit mode. */ @@ -1083,9 +1096,10 @@ static void identify_cpu(struct cpuinfo_x86 *c) /* Disable the PN if appropriate */ squash_the_stupid_serial_number(c);
- /* Set up SMEP/SMAP */ + /* Set up SMEP/SMAP/UMIP */ setup_smep(c); setup_smap(c); + setup_umip(c);
/* * The vendor-specific functions might have changed features.
Certain user space programs that run on virtual-8086 mode may utilize instructions protected by the User-Mode Instruction Prevention (UMIP) security feature present in new Intel processors: SGDT, SIDT and SMSW. In such a case, a general protection fault is issued if UMIP is enabled. When such a fault happens, the kernel catches it and emulates the results of these instructions with dummy values. The purpose of this new test is to verify whether the impacted instructions can be executed without causing such #GP. If no #GP exceptions occur, we expect to exit virtual- 8086 mode from INT 0x80.
The instructions protected by UMIP are executed in representative use cases: a) the memory address of the result is given in the form of a displacement from the base of the data segment b) the memory address of the result is given in a general purpose register c) the result is stored directly in a general purpose register.
Unfortunately, it is not possible to check the results against a set of expected values because no emulation will occur in systems that do not have the UMIP feature. Instead, results are printed for verification.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- tools/testing/selftests/x86/entry_from_vm86.c | 39 ++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c index d075ea0..377b773 100644 --- a/tools/testing/selftests/x86/entry_from_vm86.c +++ b/tools/testing/selftests/x86/entry_from_vm86.c @@ -95,6 +95,22 @@ asm ( "int3\n\t" "vmcode_int80:\n\t" "int $0x80\n\t" + "umip:\n\t" + /* addressing via displacements */ + "smsw (2052)\n\t" + "sidt (2054)\n\t" + "sgdt (2060)\n\t" + /* addressing via registers */ + "mov $2066, %bx\n\t" + "smsw (%bx)\n\t" + "mov $2068, %bx\n\t" + "sidt (%bx)\n\t" + "mov $2074, %bx\n\t" + "sgdt (%bx)\n\t" + /* register operands, only for smsw */ + "smsw %ax\n\t" + "mov %ax, (2080)\n\t" + "int $0x80\n\t" ".size vmcode, . - vmcode\n\t" "end_vmcode:\n\t" ".code32\n\t" @@ -103,7 +119,7 @@ asm (
extern unsigned char vmcode[], end_vmcode[]; extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[], - vmcode_sti[], vmcode_int3[], vmcode_int80[]; + vmcode_sti[], vmcode_int3[], vmcode_int80[], umip[];
/* Returns false if the test was skipped. */ static bool do_test(struct vm86plus_struct *v86, unsigned long eip, @@ -218,6 +234,27 @@ int main(void) v86.regs.eax = (unsigned int)-1; do_test(&v86, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
+ /* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */ + do_test(&v86, umip - vmcode, VM86_INTx, 0x80, "UMIP tests"); + printf("[INFO]\tResults of UMIP-protected instructions via displacements:\n"); + printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2052)); + printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2054), + *(unsigned long *)(addr + 2056)); + printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2060), + *(unsigned long *)(addr + 2062)); + printf("[INFO]\tResults of UMIP-protected instructions via addressing in registers:\n"); + printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2066)); + printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2068), + *(unsigned long *)(addr + 2070)); + printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2074), + *(unsigned long *)(addr + 2076)); + printf("[INFO]\tResults of SMSW via register operands:\n"); + printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2080)); + /* Execute a null pointer */ v86.regs.cs = 0; v86.regs.ss = 0;