This is v6 of this series. The five previous submissions can be found here [1], here [2], here[3], here[4], and here[5]. This version addresses the comments received in v4 plus improvements of the handling of emulation in 64-bit builds. Please see details in the change log.
=== What is UMIP?
User-Mode Instruction Prevention (UMIP) is a security feature present in new Intel Processors. If enabled, it prevents the execution of certain instructions if the Current Privilege Level (CPL) is greater than 0. If these instructions were executed while in CPL > 0, user space applications could have access to system-wide settings such as the global and local descriptor tables, the segment selectors to the current task state and the local descriptor table.
These are the instructions covered by UMIP: * SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register
If any of these instructions is executed with CPL > 0, a general protection exception is issued when UMIP is enabled.
=== How does it impact applications?
There is a caveat, however. Certain applications rely on some of these instructions to function. An example of this are applications that use WineHQ[6]. For instance, these applications rely on sidt returning a non- accessible memory location[8]. During the discussions, it was proposed that the fault could be relied to the user-space and perform the emulation in user-mode. However, this would break existing applications until, for instance, they update to a new WineHQ version. However, this approach would require UMIP to be disabled by default. The consensus in this forum is to always enable it.
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[8] does not support applications that use these instructions. It relies on WineHQ for this [9]. Furthermore, the applications for which the concern was raised run in protected mode [8].
Please note that UMIP is always enabled for both 64-bit and 32-bit Linux builds. However, emulation of the UMIP-protected instructions is not done for 64-bit processes. 64-bit user space applications will receive the SIGSEGV signal when UMIP instructions causes a general protection fault.
=== How are UMIP-protected instructions emulated?
This version keeps UMIP enabled at all times and by default. If a general protection fault caused by the instructions protected by UMIP is detected, such fault will be fixed-up by returning dummy values as follows:
* SGDT and SIDT return hard-coded dummy values as the base of the global descriptor and interrupt descriptor tables. These hard-coded values correspond to memory addresses that are near the end of the kernel memory map. This is also the case for virtual-8086 mode tasks. In all my experiments in x86_32, the base of GDT and IDT was always a 4-byte address, even for 16-bit operands. Thus, my emulation code does the same. In all cases, the limit of the table is set to 0. * STR and SLDT return 0 as the segment selector. This looks appropriate since we are providing a dummy value as the base address of the global descriptor table. * SMSW returns the value with which the CR0 register is programmed in head_32/64.S at boot time. This is, the following bits are enabled: CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for Extension Type, which will always be 1 in recent processors with UMIP; CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment Mask. As per the Intel 64 and IA-32 Architectures Software Developer's Manual, SMSW returns a 16-bit results for memory operands. However, when the operand is a register, the results can be up to CR0[63:0]. Since the emulation code only kicks-in in x86_32, we return up to CR[31:0]. * The proposed emulation code is handles faults that happens in both protected and virtual-8086 mode.
=== How is this series laid out?
++ Fix bugs in MPX address evaluator I found very useful the code for Intel MPX (Memory Protection Extensions) used to parse opcodes and the memory locations contained in the general purpose registers when used as operands. I put some of this code in a separate library file that both MPX and UMIP can access and avoid code duplication. Before creating the new library, I fixed a couple of bugs that I found in how MPX determines the address contained in the instruction and operands.
++ Provide a new x86 instruction evaluating library With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c library. The basic functionality of this library is extended to obtain the segment descriptor selected by either segment override prefixes or the default segment by the involved registers in the calculation of the effective address. It was also extended to obtain the default address and operand sizes as well as the segment base address. Also, support to process 16-bit address encodings. Armed with this arsenal, it is now possible to determine the linear address onto which the emulated results shall be copied.
This code supports Normal 32-bit and 64-bit (i.e., __USER32_CS and/or __USER_CS) protected mode, virtual-8086 mode, 16-bit protected mode with 32-bit base address.
++ Emulate UMIP instructions A new fixup_umip_exception functions inspect the instruction at the instruction pointer. If it is an UMIP-protected instruction, it executes the emulation code. This uses all the address-computing code of the previous section.
++ Add self-tests Lastly, self-tests are added to entry_from_v86.c to exercise the most typical use cases of UMIP-protected instructions in a virtual-8086 mode.
++ Extensive tests Extensive tests were performed to test all the combinations of ModRM, SiB and displacements for 16-bit and 32-bit encodings for the ss, ds, es, fs and gs segments. Tests also include a 64-bit program that uses segmentation via fs and gs. For this purpose, I temporarily, and not as part of this patchset, enabled UMIP support for 64-bit process with the intention to test the computations of linear addresses in 64-bit mode, including the extra R8-R15 registers. Extensive test is also implemented for virtual-8086 tasks. Code of these tests can be found here [10] and here [11].
[1]. https://lwn.net/Articles/705877/ [2]. https://lkml.org/lkml/2016/12/23/265 [3]. https://lkml.org/lkml/2017/1/25/622 [4]. https://lkml.org/lkml/2017/2/23/40 [5]. https://lkml.org/lkml/2017/3/3/678 [7]. https://www.winehq.org/ [8]. https://www.winehq.org/pipermail/wine-devel/2016-November/115320.html [9]. http://www.dosemu.org/ [9]. http://marc.info/?l=linux-kernel&m=147876798717927&w=2 [10]. https://github.com/01org/luv-yocto/tree/rneri/umip/meta-luv/recipes-core/umi... [11]. https://github.com/01org/luv-yocto/commit/a72a7fe7d68693c0f4100ad86de6ecabde...
Thanks and BR, Ricardo
Changes since V5: * Relocate the page fault error code enumerations to traps.h
Changes since V4: * Audited patches to use braces in all the branches of conditional. statements, except those in which the conditional action only takes one line. * Implemented support in 64-builds for both 32-bit and 64-bit tasks in the instruction evaluating library. * Split segment selector function in the instruction evaluating library into two functions to resolve the segment type by instruction override or default and a separate function to actually read the segment selector. * Fixed a bug when evaluating 32-bit effective addresses with 64-bit kernels. * Split patches further for for easier review. * Use signed variables for computation of effective address. * Fixed issue with a spurious static modifier in function insn_get_addr_ref found by kbuild test bot. * Removed comparison between true and fixup_umip_exception. * Reworked check logic when identifying erroneous vs invalid values of the SiB base and index.
Changes since V3: * Limited emulation to 32-bit and 16-bit modes. For 64-bit mode, a general protection fault is still issued when UMIP-protected instructions are executed with CPL > 0. * Expanded instruction-evaluating code to obtain segment descriptor along with their attributes such as base address and default address and operand sizes. Also, support for 16-bit encodings in protected mode was implemented. * When getting a segment descriptor, this include support to obtain those of a local descriptor table. * Now the instruction-evaluating code returns -EDOM when the value of registers should not be used in calculating the effective address. The value -EINVAL is left for errors. * Incorporate the value of the segment base address in the computation of linear addresses. * Renamed new instruction evaluation library from insn-kernel.c to insn-eval.c * Exported functions insn_get_reg_offset_* to obtain the register offset by ModRM r/m, SiB base and SiB index. * Improved documentation of functions. * Split patches further for easier review.
Changes since V2: * Added new utility functions to decode the memory addresses contained in registers when the 16-bit addressing encodings are used. This includes code to obtain and compute memory addresses using segment selectors for real-mode address translation. * Added support to emulate UMIP-protected instructions for virtual-8086 tasks. * Added self-tests for virtual-8086 mode that contains representative use cases: address represented as a displacement, address in registers and registers as operands. * Instead of maintaining a static variable for the dummy base addresses of the IDT and GDT, a hard-coded value is used. * The emulated SMSW instructions now return the value with which the CR0 register is programmed in head_32/64.S This is: PE | MP | ET | NE | WP | AM. For x86_64, PG is also enabled. * The new file arch/x86/lib/insn-utils.c is now renamed as arch/x86/lib/ insn-kernel.c. It also has its own header. This helps keep in sync the the kernel and objtool instruction decoders. Also, the new insn-kernel.c contains utility functions that are only relevant in a kernel context. * Removed printed warnings for errors that occur when decoding instructions with invalid operands. * Added more comments on fixes in the instruction-decoding MPX functions. * Now user_64bit_mode(regs) is used instead of test_thread_flag(TIF_IA32) to determine if the task is 32-bit or 64-bit. * Found and fixed a bug in insn-decoder in which X86_MODRM_RM was incorrectly used to obtain the mod part of the ModRM byte. * Added more explanatory code in emulation and instruction decoding code. This includes a comment regarding that copy_from_user could fail if there exists a memory protection key in place. * Tested code with CONFIG_X86_DECODER_SELFTEST=y and everything passes now. * Prefixed get_reg_offset_rm with insn_ as this function is exposed via a header file. For clarity, this function was added in a separate patch.
Changes since V1: * Virtual-8086 mode tasks are not treated in a special manner. All code for this purpose was removed. * Instead of attempting to disable UMIP during a context switch or when entering virtual-8086 mode, UMIP remains enabled all the time. General protection faults that occur are fixed-up by returning dummy values as detailed above. * Removed umip= kernel parameter in favor of using clearcpuid=514 to disable UMIP. * Removed selftests designed to detect the absence of SIGSEGV signals when running in virtual-8086 mode. * Reused code from MPX to decode instructions operands. For this purpose code was put in a common location. * Fixed two bugs in MPX code that decodes operands.
Ricardo Neri (21): x86/mpx: Use signed variables to compute effective addresses x86/mpx: Do not use SIB index if index points to R/ESP x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0 x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel x86/insn-eval: Add utility functions to get register offsets x86/insn-eval: Add utility functions to get segment selector x86/insn-eval: Add utility function to get segment descriptor x86/insn-eval: Add utility function to get segment descriptor base address x86/insn-eval: Add functions to get default operand and address sizes x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero insn/eval: Incorporate segment base in address computation x86/insn: Support both signed 32-bit and 64-bit effective addresses x86/insn-eval: Add support to resolve 16-bit addressing encodings x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings x86/mm: Relocate page fault error codes to traps.h x86/cpufeature: Add User-Mode Instruction Prevention definitions x86: Add emulation code for UMIP instructions x86/umip: Force a page fault when unable to copy emulated result to user x86/traps: Fixup general protection faults caused by UMIP x86: Enable User-Mode Instruction Prevention selftests/x86: Add tests for User-Mode Instruction Prevention
arch/x86/Kconfig | 10 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/insn-eval.h | 23 + arch/x86/include/asm/traps.h | 18 + arch/x86/include/asm/umip.h | 15 + arch/x86/include/uapi/asm/processor-flags.h | 2 + arch/x86/kernel/Makefile | 1 + arch/x86/kernel/cpu/common.c | 16 +- arch/x86/kernel/traps.c | 4 + arch/x86/kernel/umip.c | 298 +++++++++ arch/x86/lib/Makefile | 2 +- arch/x86/lib/insn-eval.c | 832 ++++++++++++++++++++++++++ arch/x86/mm/fault.c | 88 ++- arch/x86/mm/mpx.c | 120 +--- tools/testing/selftests/x86/entry_from_vm86.c | 39 +- 16 files changed, 1301 insertions(+), 176 deletions(-) create mode 100644 arch/x86/include/asm/insn-eval.h create mode 100644 arch/x86/include/asm/umip.h create mode 100644 arch/x86/kernel/umip.c create mode 100644 arch/x86/lib/insn-eval.c
Even though memory addresses are unsigned. The operands used to compute the effective address do have a sign. This is true for the r/m part of the ModRM byte, the base and index parts of the SiB byte as well as the displacement. Thus, signed variables shall be used when computing the effective address from these operands. Once the signed effective address has been computed, it is casted to an unsigned long to determine the linear address.
Variables are renamed to better reflect the type of address being computed.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Nathan Howard liverlint@gmail.com Cc: Adan Hawthorn adanhawthorn@gmail.com Cc: Joe Perches joe@perches.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/mm/mpx.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 5126dfd..ff112e3 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -138,7 +138,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, */ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) { - unsigned long addr, base, indx; + unsigned long linear_addr; + long eff_addr, base, indx; int addr_offset, base_offset, indx_offset; insn_byte_t sib;
@@ -150,7 +151,7 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) goto out_err; - addr = regs_get_register(regs, addr_offset); + eff_addr = regs_get_register(regs, addr_offset); } else { if (insn->sib.nbytes) { base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); @@ -163,16 +164,18 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
base = regs_get_register(regs, base_offset); indx = regs_get_register(regs, indx_offset); - addr = base + indx * (1 << X86_SIB_SCALE(sib)); + eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) goto out_err; - addr = regs_get_register(regs, addr_offset); + eff_addr = regs_get_register(regs, addr_offset); } - addr += insn->displacement.value; + eff_addr += insn->displacement.value; } - return (void __user *)addr; + linear_addr = (unsigned long)eff_addr; + + return (void __user *)linear_addr; out_err: return (void __user *)-1; }
On Tue, Mar 07, 2017 at 04:32:34PM -0800, Ricardo Neri wrote:
Even though memory addresses are unsigned. The operands used to compute the
... unsigned, the operands ...
effective address do have a sign. This is true for the r/m part of the ModRM byte, the base and index parts of the SiB byte as well as the displacement. Thus, signed variables shall be used when computing the effective address from these operands. Once the signed effective address has been computed, it is casted to an unsigned long to determine the linear address.
Variables are renamed to better reflect the type of address being computed.
On Tue, 2017-04-11 at 23:56 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:34PM -0800, Ricardo Neri wrote:
Even though memory addresses are unsigned. The operands used to compute the
... unsigned, the operands ...
Oops! I will correct.
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when memory addressing is used (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of the SIB byte points to the R/ESP (i.e., index = 4), the index should not be used in the computation of the memory address.
In these cases the address is simply the value present in the register pointed by the base part of the SIB byte plus the displacement byte.
An example of such instruction could be
insn -0x80(%rsp)
This is represented as:
[opcode] 4c 23 80
ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP) SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX): Displacement -0x80
The correct address is (base) + displacement; no index is used.
We can achieve the desired effect of not using the index by making get_reg_offset return -EDOM in this particular case. This value indicates callers that they should not use the index to calculate the address. EINVAL continues to indicate that an error when decoding the SIB byte.
Care is taken to allow R12 to be used as index, which is a valid scenario.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Nathan Howard liverlint@gmail.com Cc: Adan Hawthorn adanhawthorn@gmail.com Cc: Joe Perches joe@perches.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/mm/mpx.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index ff112e3..d9e92d6 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, regno = X86_SIB_INDEX(insn->sib.value); if (X86_REX_X(insn->rex_prefix.value)) regno += 8; + /* + * If mod !=3, register R/ESP (regno=4) is not used as index in + * the address computation. Check is done after looking at REX.X + * This is because R12 (regno=12) can be used as an index. + */ + if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3) + return -EDOM; break;
case REG_TYPE_BASE: @@ -159,11 +166,19 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) goto out_err;
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - if (indx_offset < 0) + /* + * A negative offset generally means a error, except + * -EDOM, which means that the contents of the register + * should not be used as index. + */ + if (unlikely(indx_offset == -EDOM)) + indx = 0; + else if (unlikely(indx_offset < 0)) goto out_err; + else + indx = regs_get_register(regs, indx_offset);
base = regs_get_register(regs, base_offset); - indx = regs_get_register(regs, indx_offset); eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
On Tue, Mar 07, 2017 at 04:32:35PM -0800, Ricardo Neri wrote:
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when memory addressing is used (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of the SIB byte points to the R/ESP (i.e., index = 4), the index should not be used in the computation of the memory address.
In these cases the address is simply the value present in the register pointed by the base part of the SIB byte plus the displacement byte.
An example of such instruction could be
insn -0x80(%rsp)
This is represented as:
[opcode] 4c 23 80 ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP) SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX): Displacement -0x80
The correct address is (base) + displacement; no index is used.
We can achieve the desired effect of not using the index by making get_reg_offset return -EDOM in this particular case. This value indicates callers that they should not use the index to calculate the address. EINVAL continues to indicate that an error when decoding the SIB byte.
Care is taken to allow R12 to be used as index, which is a valid scenario.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Nathan Howard liverlint@gmail.com Cc: Adan Hawthorn adanhawthorn@gmail.com Cc: Joe Perches joe@perches.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/mm/mpx.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index ff112e3..d9e92d6 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, regno = X86_SIB_INDEX(insn->sib.value); if (X86_REX_X(insn->rex_prefix.value)) regno += 8;
/*
* If mod !=3, register R/ESP (regno=4) is not used as index in
* the address computation. Check is done after looking at REX.X
* This is because R12 (regno=12) can be used as an index.
*/
if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
return -EDOM;
Hmm, ok, so this is a bit confusing, to me at least. Maybe you're saying the same things but here's how I see it:
1. When ModRM.mod != 11b and ModRM.rm == 100b, all that does mean is that you have a SIB byte following. I.e., you have indexed register-indirect addressing.
Now, you still need to decode the SIB byte and it goes this way:
SIB.index == 100b means that the index register specification is null, i.e., the scale*index portion of that indexed register-indirect addressing is null, i.e., you have an offset following the SIB byte. Now, depending on ModRM.mod, that offset is:
ModRM.mod == 01b -> 1 byte offset ModRM.mod == 10b -> 4 bytes offset
That's why for an instruction like this one (let's use your example) you have:
8b 4c 23 80 mov -0x80(%rbx,%riz,1),%ecx
That's basically a binutils hack to state that the SIB index register is null.
Another SIB index register works, of course:
8b 4c 03 80 mov -0x80(%rbx,%rax,1),%ecx
Ok, so far so good.
2. Now, the %r12 thing is part of the REX implications to those encodings: That's the REX.X bit which adds a fourth bit to the encoding of the SIB base register, i.e., if you specify a register with SIB.index, you want to be able to specify all 16 regs, thus the 4th bit. That's why it says that the SIB byte is required for %r12-based addressing.
I.e., you can still have a SIB.index == 100b addressing with an index register which is not null but that is only because SIB.index is now {REX.X=1b, 100b}, i.e.:
Prefixes: REX: 0x43 { 4 [w]: 0 [r]: 0 [x]: 1 [b]: 1 } Opcode: 0x8b ModRM: 0x4c [mod:1b][.R:0b,reg:1b][.B:1b,r/m:1100b] register-indirect mode, 1-byte offset in displ. field SIB: 0x63 [.B:1b,base:1011b][.X:1b,idx:1100b][scale: 1]
MOV Gv,Ev; MOV reg{16,32,64} reg/mem{16,32,64} 0: 43 8b 4c 63 80 mov -0x80(%r11,%r12,2),%ecx
So, I'm not saying your version is necessarily wrong - I'm just saying that it could explain the situation a bit more verbose.
Btw, I'd flip the if-test above:
if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
to make it just like the order the conditions are specified in the manuals.
Thanks.
Hi Boris,
I am sorry I missed your feedback earlier. Thanks for commenting!
On Tue, 2017-04-11 at 13:31 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:35PM -0800, Ricardo Neri wrote:
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when memory addressing is used (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of the SIB byte points to the R/ESP (i.e., index = 4), the index should not be used in the computation of the memory address.
In these cases the address is simply the value present in the register pointed by the base part of the SIB byte plus the displacement byte.
An example of such instruction could be
insn -0x80(%rsp)
This is represented as:
[opcode] 4c 23 80 ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP) SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX): Displacement -0x80
The correct address is (base) + displacement; no index is used.
We can achieve the desired effect of not using the index by making get_reg_offset return -EDOM in this particular case. This value indicates callers that they should not use the index to calculate the address. EINVAL continues to indicate that an error when decoding the SIB byte.
Care is taken to allow R12 to be used as index, which is a valid scenario.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Nathan Howard liverlint@gmail.com Cc: Adan Hawthorn adanhawthorn@gmail.com Cc: Joe Perches joe@perches.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/mm/mpx.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index ff112e3..d9e92d6 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, regno = X86_SIB_INDEX(insn->sib.value); if (X86_REX_X(insn->rex_prefix.value)) regno += 8;
/*
* If mod !=3, register R/ESP (regno=4) is not used as index in
* the address computation. Check is done after looking at REX.X
* This is because R12 (regno=12) can be used as an index.
*/
if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
return -EDOM;
Hmm, ok, so this is a bit confusing, to me at least. Maybe you're saying the same things but here's how I see it:
- When ModRM.mod != 11b and ModRM.rm == 100b, all that does mean
is that you have a SIB byte following. I.e., you have indexed register-indirect addressing.
Yes, callers of this function already know that there is a SIB byte because they saw ModRM.mod != 11b and ModRM.rm == 100b and struct insn.sib.nbytes is non zero.
Now, you still need to decode the SIB byte and it goes this way:
SIB.index == 100b means that the index register specification is null, i.e., the scale*index portion of that indexed register-indirect addressing is null, i.e., you have an offset following the SIB byte. Now, depending on ModRM.mod, that offset is:
Yes, for this reason if ModRM.rm != 11b and an index of 100b is found the function return -EDOM to indicate callers to not use the index. We need to return -EDOM because this function returns an offset from the base of struct pt_regs for successful cases. A negative value indicates to not use the offset.
Perhaps a better wording is to say as you propose: the scale*index portion that indexed register-indirect addressing is null. I will take your wording!
ModRM.mod == 01b -> 1 byte offset ModRM.mod == 10b -> 4 bytes offset
Callers will now the size of the offset based on struct insn.displacement.value.
That's why for an instruction like this one (let's use your example) you have:
8b 4c 23 80 mov -0x80(%rbx,%riz,1),%ecx
That's basically a binutils hack to state that the SIB index register is null.
Another SIB index register works, of course:
8b 4c 03 80 mov -0x80(%rbx,%rax,1),%ecx
Ok, so far so good.
- Now, the %r12 thing is part of the REX implications to those
encodings: That's the REX.X bit which adds a fourth bit to the encoding of the SIB base register, i.e., if you specify a register with SIB.index, you want to be able to specify all 16 regs, thus the 4th bit. That's why it says that the SIB byte is required for %r12-based addressing.
I.e., you can still have a SIB.index == 100b addressing with an index register which is not null but that is only because SIB.index is now {REX.X=1b, 100b}, i.e.:
Prefixes: REX: 0x43 { 4 [w]: 0 [r]: 0 [x]: 1 [b]: 1 } Opcode: 0x8b ModRM: 0x4c [mod:1b][.R:0b,reg:1b][.B:1b,r/m:1100b] register-indirect mode, 1-byte offset in displ. field SIB: 0x63 [.B:1b,base:1011b][.X:1b,idx:1100b][scale: 1]
MOV Gv,Ev; MOV reg{16,32,64} reg/mem{16,32,64} 0: 43 8b 4c 63 80 mov -0x80(%r11,%r12,2),%ecx
Correct, that is why we check the value of regno (the value of ModRM.rm) after correcting its value in case we find REX.X set. In this way we can use %r12.
So, I'm not saying your version is necessarily wrong - I'm just saying that it could explain the situation a bit more verbose.
Sure. I will be more verbose in my commit message.
Btw, I'd flip the if-test above:
if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
to make it just like the order the conditions are specified in the manuals.
Will do.
Thanks and BR, Ricardo
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when a SIB byte is used and the base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part of the ModRM byte is zero, the value of such register will not be used as part of the address computation. To signal this, a -EDOM error is returned to indicate callers that they should ignore the value.
Also, for this particular case, a displacement of 32-bits should follow the SIB byte if the mod part of ModRM is equal to zero. The instruction decoder ensures that this is the case.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Nathan Howard liverlint@gmail.com Cc: Adan Hawthorn adanhawthorn@gmail.com Cc: Joe Perches joe@perches.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/mm/mpx.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index d9e92d6..ef7eb67 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
case REG_TYPE_BASE: regno = X86_SIB_BASE(insn->sib.value); + /* + * If mod is 0 and register R/EBP (regno=5) is indicated in the + * base part of the SIB byte, the value of such register should + * not be used in the address computation. Also, a 32-bit + * displacement is expected in this case; the instruction + * decoder takes care of it. This is true for both R13 and + * R/EBP as REX.B will not be decoded. + */ + if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) + return -EDOM; + if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break; @@ -161,16 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = regs_get_register(regs, addr_offset); } else { if (insn->sib.nbytes) { + /* + * Negative values in the base and index offset means + * an error when decoding the SIB byte. Except -EDOM, + * which means that the registers should not be used + * in the address computation. + */ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); - if (base_offset < 0) + if (unlikely(base_offset == -EDOM)) + base = 0; + else if (unlikely(base_offset < 0)) goto out_err; + else + base = regs_get_register(regs, base_offset);
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - /* - * A negative offset generally means a error, except - * -EDOM, which means that the contents of the register - * should not be used as index. - */ if (unlikely(indx_offset == -EDOM)) indx = 0; else if (unlikely(indx_offset < 0)) @@ -178,7 +194,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) else indx = regs_get_register(regs, indx_offset);
- base = regs_get_register(regs, base_offset); eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
On Tue, Mar 07, 2017 at 04:32:36PM -0800, Ricardo Neri wrote:
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when a SIB byte is used and the base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part of the ModRM byte is zero, the value of such register will not be used as part of the address computation. To signal this, a -EDOM error is returned to indicate callers that they should ignore the value.
Also, for this particular case, a displacement of 32-bits should follow the SIB byte if the mod part of ModRM is equal to zero. The instruction decoder ensures that this is the case.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Nathan Howard liverlint@gmail.com Cc: Adan Hawthorn adanhawthorn@gmail.com Cc: Joe Perches joe@perches.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/mm/mpx.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index d9e92d6..ef7eb67 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
case REG_TYPE_BASE: regno = X86_SIB_BASE(insn->sib.value);
/*
* If mod is 0 and register R/EBP (regno=5) is indicated in the
* base part of the SIB byte,
you can simply say here: "if SIB.base == 5, the base of the register-indirect addressing is 0."
the value of such register should
* not be used in the address computation. Also, a 32-bit
Not "Also" but "In this case, a 32-bit displacement..."
* displacement is expected in this case; the instruction
* decoder takes care of it. This is true for both R13 and
* R/EBP as REX.B will not be decoded.
You don't need that sentence as the only thing that matters is ModRM.mod being 0.
*/
if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
The 0 test we normally do with the ! (also flip parts of if-condition):
if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
return -EDOM;
- if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break;
@@ -161,16 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = regs_get_register(regs, addr_offset); } else { if (insn->sib.nbytes) {
/*
* Negative values in the base and index offset means
* an error when decoding the SIB byte. Except -EDOM,
* which means that the registers should not be used
* in the address computation.
*/ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
if (base_offset < 0)
if (unlikely(base_offset == -EDOM))
base = 0;
else if (unlikely(base_offset < 0))
Bah, unlikely's in something which is not really a hot path. They only encumber readability, no need for them.
On Wed, 2017-04-12 at 00:08 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:36PM -0800, Ricardo Neri wrote:
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when a SIB byte is used and the base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part of the ModRM byte is zero, the value of such register will not be used as part of the address computation. To signal this, a -EDOM error is returned to indicate callers that they should ignore the value.
Also, for this particular case, a displacement of 32-bits should follow the SIB byte if the mod part of ModRM is equal to zero. The instruction decoder ensures that this is the case.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Nathan Howard liverlint@gmail.com Cc: Adan Hawthorn adanhawthorn@gmail.com Cc: Joe Perches joe@perches.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/mm/mpx.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index d9e92d6..ef7eb67 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
case REG_TYPE_BASE: regno = X86_SIB_BASE(insn->sib.value);
/*
* If mod is 0 and register R/EBP (regno=5) is indicated in the
* base part of the SIB byte,
you can simply say here: "if SIB.base == 5, the base of the register-indirect addressing is 0."
This is better wording. I will change it.
the value of such register should
* not be used in the address computation. Also, a 32-bit
Not "Also" but "In this case, a 32-bit displacement..."
Will change.
* displacement is expected in this case; the instruction
* decoder takes care of it. This is true for both R13 and
* R/EBP as REX.B will not be decoded.
You don't need that sentence as the only thing that matters is ModRM.mod being 0.
For the specific case of ModRM.mod being 0, I feel I need to clarify that REX.B is not decoded and if SIB.base is %r13 the base is also 0. This comment adds clarity because REX.X is decoded when determining SIB.index.
*/
if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
The 0 test we normally do with the ! (also flip parts of if-condition):
if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
Will change it.
return -EDOM;
- if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break;
@@ -161,16 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = regs_get_register(regs, addr_offset); } else { if (insn->sib.nbytes) {
/*
* Negative values in the base and index offset means
* an error when decoding the SIB byte. Except -EDOM,
* which means that the registers should not be used
* in the address computation.
*/ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
if (base_offset < 0)
if (unlikely(base_offset == -EDOM))
base = 0;
else if (unlikely(base_offset < 0))
Bah, unlikely's in something which is not really a hot path. They only encumber readability, no need for them.
I will remove them.
Thanks and BR, Ricardo
On Tue, Apr 25, 2017 at 07:04:20PM -0700, Ricardo Neri wrote:
For the specific case of ModRM.mod being 0, I feel I need to clarify that REX.B is not decoded and if SIB.base is %r13 the base is also 0.
Well, that all doesn't matter. The rule is this:
ModRM.mod == 00b and ModRM.r/m == 101b -> effective address: disp32
See Table 2-2. "32-Bit Addressing Forms with the ModR/M Byte" in the SDM.
So the base register is not used. How that base register is specified then doesn't matter (undecoded REX bits or not).
This comment adds clarity because REX.X is decoded when determining SIB.index.
Well, that's a different thing. The REX bits participating in the SIB fields don't matter about this particular case. We only want to say that we're returning a disp32 without a base register and the comment should keep it simple without extraneous information.
I know, you want to mention what Table 2-5. "Special Cases of REX Encodings" says but we should avoid unnecessary content in the comment. People who want details can stare at the manuals - the comment should only document what that particular case is.
Btw, you could write it even better:
if (!X86_MODRM_MOD(insn->modrm.value) && X86_MODRM_RM(insn->modrm.value) == 5)
and then it is basically a 1:1 copy of the rule from Table 2-2.
On Wed, 2017-04-26 at 10:05 +0200, Borislav Petkov wrote:
On Tue, Apr 25, 2017 at 07:04:20PM -0700, Ricardo Neri wrote:
For the specific case of ModRM.mod being 0, I feel I need to clarify that REX.B is not decoded and if SIB.base is %r13 the base is also 0.
Well, that all doesn't matter. The rule is this:
ModRM.mod == 00b and ModRM.r/m == 101b -> effective address: disp32
See Table 2-2. "32-Bit Addressing Forms with the ModR/M Byte" in the SDM.
You are right. This summarizes the rule. Then I will shorten the comment.
So the base register is not used. How that base register is specified then doesn't matter (undecoded REX bits or not).
This comment adds clarity because REX.X is decoded when determining SIB.index.
Well, that's a different thing. The REX bits participating in the SIB fields don't matter about this particular case. We only want to say that we're returning a disp32 without a base register and the comment should keep it simple without extraneous information.
I know, you want to mention what Table 2-5. "Special Cases of REX Encodings" says but we should avoid unnecessary content in the comment. People who want details can stare at the manuals - the comment should only document what that particular case is.
Btw, you could write it even better:
if (!X86_MODRM_MOD(insn->modrm.value) && X86_MODRM_RM(insn->modrm.value) == 5)
and then it is basically a 1:1 copy of the rule from Table 2-2.
It is!
Thanks and BR, Ricardo
Other kernel submodules can benefit from using the utility functions defined in mpx.c to obtain the addresses and values of operands contained in the general purpose registers. An instance of this is the emulation code used for instructions protected by the Intel User-Mode Instruction Prevention feature.
Thus, these functions are relocated to a new insn-eval.c file. The reason to not relocate these utilities into insn.c is that the latter solely analyses instructions given by a struct insn without any knowledge of the meaning of the values of instruction operands. This new utility insn- eval.c aims to be used to resolve effective and userspace linear addresses based on the contents of the instruction operands as well as the contents of pt_regs structure.
These utilities come with a separate header. This is to avoid taking insn.c out of sync from the instructions decoders under tools/obj and tools/perf. This also avoids adding cumbersome #ifdef's for the #include'd files required to decode instructions in a kernel context.
Functions are simply relocated. There are not functional or indentation changes.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-eval.h | 16 ++++ arch/x86/lib/Makefile | 2 +- arch/x86/lib/insn-eval.c | 160 +++++++++++++++++++++++++++++++++++++++ arch/x86/mm/mpx.c | 153 +------------------------------------ 4 files changed, 179 insertions(+), 152 deletions(-) create mode 100644 arch/x86/include/asm/insn-eval.h create mode 100644 arch/x86/lib/insn-eval.c
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h new file mode 100644 index 0000000..5cab1b1 --- /dev/null +++ b/arch/x86/include/asm/insn-eval.h @@ -0,0 +1,16 @@ +#ifndef _ASM_X86_INSN_EVAL_H +#define _ASM_X86_INSN_EVAL_H +/* + * A collection of utility functions for x86 instruction analysis to be + * used in a kernel context. Useful when, for instance, making sense + * of the registers indicated by operands. + */ + +#include <linux/compiler.h> +#include <linux/bug.h> +#include <linux/err.h> +#include <asm/ptrace.h> + +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); + +#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 34a7413..675d7b0 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += memcpy_$(BITS).o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o -lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o +lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c new file mode 100644 index 0000000..23cf010 --- /dev/null +++ b/arch/x86/lib/insn-eval.c @@ -0,0 +1,160 @@ +/* + * Utility functions for x86 operand and address decoding + * + * Copyright (C) Intel Corporation 2017 + */ +#include <linux/kernel.h> +#include <linux/string.h> +#include <asm/inat.h> +#include <asm/insn.h> +#include <asm/insn-eval.h> + +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static int get_reg_offset(struct insn *insn, struct pt_regs *regs, + enum reg_type type) +{ + int regno = 0; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + int nr_registers = ARRAY_SIZE(regoff); + /* + * Don't possibly decode a 32-bit instructions as + * reading a 64-bit-only register. + */ + if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64) + nr_registers -= 8; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(insn->modrm.value); + if (X86_REX_B(insn->rex_prefix.value)) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(insn->sib.value); + if (X86_REX_X(insn->rex_prefix.value)) + regno += 8; + /* + * If mod !=3, register R/ESP (regno=4) is not used as index in + * the address computation. Check is done after looking at REX.X + * This is because R12 (regno=12) can be used as an index. + */ + if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3) + return -EDOM; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(insn->sib.value); + /* + * If mod is 0 and register R/EBP (regno=5) is indicated in the + * base part of the SIB byte, the value of such register should + * not be used in the address computation. Also, a 32-bit + * displacement is expected in this case; the instruction + * decoder takes care of it. This is true for both R13 and + * R/EBP as REX.B will not be decoded. + */ + if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) + return -EDOM; + + if (X86_REX_B(insn->rex_prefix.value)) + regno += 8; + break; + + default: + pr_err("invalid register type"); + BUG(); + break; + } + + if (regno >= nr_registers) { + WARN_ONCE(1, "decoded an instruction with an invalid register"); + return -EINVAL; + } + return regoff[regno]; +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +{ + unsigned long linear_addr; + long eff_addr, base, indx; + int addr_offset, base_offset, indx_offset; + insn_byte_t sib; + + insn_get_modrm(insn); + insn_get_sib(insn); + sib = insn->sib.value; + + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset < 0) + goto out_err; + eff_addr = regs_get_register(regs, addr_offset); + } else { + if (insn->sib.nbytes) { + /* + * Negative values in the base and index offset means + * an error when decoding the SIB byte. Except -EDOM, + * which means that the registers should not be used + * in the address computation. + */ + base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); + if (unlikely(base_offset == -EDOM)) + base = 0; + else if (unlikely(base_offset < 0)) + goto out_err; + else + base = regs_get_register(regs, base_offset); + + indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); + if (unlikely(indx_offset == -EDOM)) + indx = 0; + else if (unlikely(indx_offset < 0)) + goto out_err; + else + indx = regs_get_register(regs, indx_offset); + + eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); + } else { + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset < 0) + goto out_err; + eff_addr = regs_get_register(regs, addr_offset); + } + eff_addr += insn->displacement.value; + } + linear_addr = (unsigned long)eff_addr; + + return (void __user *)linear_addr; +out_err: + return (void __user *)-1; +} diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index ef7eb67..4c3efd6 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -12,6 +12,7 @@ #include <linux/sched/sysctl.h>
#include <asm/insn.h> +#include <asm/insn-eval.h> #include <asm/mman.h> #include <asm/mmu_context.h> #include <asm/mpx.h> @@ -60,156 +61,6 @@ static unsigned long mpx_mmap(unsigned long len) return addr; }
-enum reg_type { - REG_TYPE_RM = 0, - REG_TYPE_INDEX, - REG_TYPE_BASE, -}; - -static int get_reg_offset(struct insn *insn, struct pt_regs *regs, - enum reg_type type) -{ - int regno = 0; - - static const int regoff[] = { - offsetof(struct pt_regs, ax), - offsetof(struct pt_regs, cx), - offsetof(struct pt_regs, dx), - offsetof(struct pt_regs, bx), - offsetof(struct pt_regs, sp), - offsetof(struct pt_regs, bp), - offsetof(struct pt_regs, si), - offsetof(struct pt_regs, di), -#ifdef CONFIG_X86_64 - offsetof(struct pt_regs, r8), - offsetof(struct pt_regs, r9), - offsetof(struct pt_regs, r10), - offsetof(struct pt_regs, r11), - offsetof(struct pt_regs, r12), - offsetof(struct pt_regs, r13), - offsetof(struct pt_regs, r14), - offsetof(struct pt_regs, r15), -#endif - }; - int nr_registers = ARRAY_SIZE(regoff); - /* - * Don't possibly decode a 32-bit instructions as - * reading a 64-bit-only register. - */ - if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64) - nr_registers -= 8; - - switch (type) { - case REG_TYPE_RM: - regno = X86_MODRM_RM(insn->modrm.value); - if (X86_REX_B(insn->rex_prefix.value)) - regno += 8; - break; - - case REG_TYPE_INDEX: - regno = X86_SIB_INDEX(insn->sib.value); - if (X86_REX_X(insn->rex_prefix.value)) - regno += 8; - /* - * If mod !=3, register R/ESP (regno=4) is not used as index in - * the address computation. Check is done after looking at REX.X - * This is because R12 (regno=12) can be used as an index. - */ - if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3) - return -EDOM; - break; - - case REG_TYPE_BASE: - regno = X86_SIB_BASE(insn->sib.value); - /* - * If mod is 0 and register R/EBP (regno=5) is indicated in the - * base part of the SIB byte, the value of such register should - * not be used in the address computation. Also, a 32-bit - * displacement is expected in this case; the instruction - * decoder takes care of it. This is true for both R13 and - * R/EBP as REX.B will not be decoded. - */ - if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) - return -EDOM; - - if (X86_REX_B(insn->rex_prefix.value)) - regno += 8; - break; - - default: - pr_err("invalid register type"); - BUG(); - break; - } - - if (regno >= nr_registers) { - WARN_ONCE(1, "decoded an instruction with an invalid register"); - return -EINVAL; - } - return regoff[regno]; -} - -/* - * return the address being referenced be instruction - * for rm=3 returning the content of the rm reg - * for rm!=3 calculates the address using SIB and Disp - */ -static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) -{ - unsigned long linear_addr; - long eff_addr, base, indx; - int addr_offset, base_offset, indx_offset; - insn_byte_t sib; - - insn_get_modrm(insn); - insn_get_sib(insn); - sib = insn->sib.value; - - if (X86_MODRM_MOD(insn->modrm.value) == 3) { - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) - goto out_err; - eff_addr = regs_get_register(regs, addr_offset); - } else { - if (insn->sib.nbytes) { - /* - * Negative values in the base and index offset means - * an error when decoding the SIB byte. Except -EDOM, - * which means that the registers should not be used - * in the address computation. - */ - base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); - if (unlikely(base_offset == -EDOM)) - base = 0; - else if (unlikely(base_offset < 0)) - goto out_err; - else - base = regs_get_register(regs, base_offset); - - indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - if (unlikely(indx_offset == -EDOM)) - indx = 0; - else if (unlikely(indx_offset < 0)) - goto out_err; - else - indx = regs_get_register(regs, indx_offset); - - eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); - } else { - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) - goto out_err; - eff_addr = regs_get_register(regs, addr_offset); - } - eff_addr += insn->displacement.value; - } - linear_addr = (unsigned long)eff_addr; - - return (void __user *)linear_addr; -out_err: - return (void __user *)-1; -} - static int mpx_insn_decode(struct insn *insn, struct pt_regs *regs) { @@ -322,7 +173,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) info->si_signo = SIGSEGV; info->si_errno = 0; info->si_code = SEGV_BNDERR; - info->si_addr = mpx_get_addr_ref(&insn, regs); + info->si_addr = insn_get_addr_ref(&insn, regs); /* * We were not able to extract an address from the instruction, * probably because there was something invalid in it.
On Tue, Mar 07, 2017 at 04:32:37PM -0800, Ricardo Neri wrote:
Other kernel submodules can benefit from using the utility functions defined in mpx.c to obtain the addresses and values of operands contained in the general purpose registers. An instance of this is the emulation code used for instructions protected by the Intel User-Mode Instruction Prevention feature.
Thus, these functions are relocated to a new insn-eval.c file. The reason to not relocate these utilities into insn.c is that the latter solely analyses instructions given by a struct insn without any knowledge of the meaning of the values of instruction operands. This new utility insn- eval.c aims to be used to resolve effective and userspace linear addresses based on the contents of the instruction operands as well as the contents of pt_regs structure.
These utilities come with a separate header. This is to avoid taking insn.c out of sync from the instructions decoders under tools/obj and tools/perf. This also avoids adding cumbersome #ifdef's for the #include'd files required to decode instructions in a kernel context.
Functions are simply relocated. There are not functional or indentation changes.
...
- case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
/*
* If mod is 0 and register R/EBP (regno=5) is indicated in the
* base part of the SIB byte, the value of such register should
* not be used in the address computation. Also, a 32-bit
* displacement is expected in this case; the instruction
* decoder takes care of it. This is true for both R13 and
* R/EBP as REX.B will not be decoded.
*/
if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
return -EDOM;
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
- default:
pr_err("invalid register type");
BUG();
WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON() #211: FILE: arch/x86/lib/insn-eval.c:90: + BUG();
And checkpatch is kinda right. We need to warn here, not explode. Oh and that function returns negative values on error...
Please change that with a patch ontop of the move.
Thanks.
On Wed, 2017-04-12 at 12:03 +0200, Borislav Petkov wrote:
* If mod is 0 and register R/EBP (regno=5) is
indicated in the
* base part of the SIB byte, the value of such
register should
* not be used in the address computation. Also, a
32-bit
* displacement is expected in this case; the
instruction
* decoder takes care of it. This is true for both R13
and
* R/EBP as REX.B will not be decoded.
*/
if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) ==
return -EDOM;
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
default:
pr_err("invalid register type");
BUG();
WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON() #211: FILE: arch/x86/lib/insn-eval.c:90:
BUG();
And checkpatch is kinda right. We need to warn here, not explode. Oh and that function returns negative values on error...
Please change that with a patch ontop of the move.
Sure, I will change it.
The function insn_get_reg_offset takes as argument an enumeration that indicates the type of offset that is returned: the R/M part of the ModRM byte, the index of the SIB byte or the base of the SIB byte. Callers of this function would need the definition of such enumeration. This is not needed. Instead, helper functions can be defined for this purpose can be added. These functions are useful in cases when, for instance, the caller needs to decide whether the operand is a register or a memory location by looking at the mod part of the ModRM byte.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-eval.h | 3 +++ arch/x86/lib/insn-eval.c | 51 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 5cab1b1..754211b 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -12,5 +12,8 @@ #include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 23cf010..78df1c9 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+/** + * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte + * @insn: Instruction structure containing the ModRM byte + * @regs: Set of registers indicated by the ModRM byte + * + * Obtain the register indicated by the r/m part of the ModRM byte. The + * register is obtained as an offset from the base of pt_regs. In specific + * cases, the returned value can be -EDOM to indicate that the particular value + * of ModRM does not refer to a register. + * + * Return: Register indicated by r/m, as an offset within struct pt_regs + */ +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs) +{ + return get_reg_offset(insn, regs, REG_TYPE_RM); +} + +/** + * insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte + * @insn: Instruction structure containing the SiB byte + * @regs: Set of registers indicated by the SiB byte + * + * Obtain the register indicated by the base part of the SiB byte. The + * register is obtained as an offset from the base of pt_regs. In specific + * cases, the returned value can be -EDOM to indicate that the particular value + * of SiB does not refer to a register. + * + * Return: Register indicated by SiB's base, as an offset within struct pt_regs + */ +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs) +{ + return get_reg_offset(insn, regs, REG_TYPE_BASE); +} + +/** + * insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte + * @insn: Instruction structure containing the SiB byte + * @regs: Set of registers indicated by the SiB byte + * + * Obtain the register indicated by the index part of the SiB byte. The + * register is obtained as an offset from the index of pt_regs. In specific + * cases, the returned value can be -EDOM to indicate that the particular value + * of SiB does not refer to a register. + * + * Return: Register indicated by SiB's base, as an offset within struct pt_regs + */ +int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) +{ + return get_reg_offset(insn, regs, REG_TYPE_INDEX); +} + /* * return the address being referenced be instruction * for rm=3 returning the content of the rm reg
On Tue, Mar 07, 2017 at 04:32:38PM -0800, Ricardo Neri wrote:
The function insn_get_reg_offset takes as argument an enumeration that
Please end function names with parentheses.
And do you mean get_reg_offset(), per chance?
indicates the type of offset that is returned: the R/M part of the ModRM byte, the index of the SIB byte or the base of the SIB byte.
Err, you mean, it returns the offset to the register the argument specifies.
Callers of this function would need the definition of such enumeration. This is not needed. Instead, helper functions can be defined for this purpose can be added.
"Instead, add helpers... "
These functions are useful in cases when, for instance, the caller needs to decide whether the operand is a register or a memory location by looking at the mod part of the ModRM byte.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-eval.h | 3 +++ arch/x86/lib/insn-eval.c | 51 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 5cab1b1..754211b 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -12,5 +12,8 @@ #include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
Forgotten to edit the copy-paste?
Which means, nothing really needs insn_get_reg_offset_sib_index() and you can get rid of it?
#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 23cf010..78df1c9 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+/**
- insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
- @insn: Instruction structure containing the ModRM byte
- @regs: Set of registers indicated by the ModRM byte
That's simply struct pt_regs - not a set of registers indicated by ModRM?!?
- Obtain the register indicated by the r/m part of the ModRM byte. The
- register is obtained as an offset from the base of pt_regs. In specific
- cases, the returned value can be -EDOM to indicate that the particular value
- of ModRM does not refer to a register.
Put that sentence under the "Return: " paragraph below so that it is immediately obvious what the retvals are.
- Return: Register indicated by r/m, as an offset within struct pt_regs
- */
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs)
That name is too long: insn_get_modrm_rm_off() should be enough.
+{
- return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+/**
- insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte
- @insn: Instruction structure containing the SiB byte
- @regs: Set of registers indicated by the SiB byte
- Obtain the register indicated by the base part of the SiB byte. The
- register is obtained as an offset from the base of pt_regs. In specific
- cases, the returned value can be -EDOM to indicate that the particular value
- of SiB does not refer to a register.
- Return: Register indicated by SiB's base, as an offset within struct pt_regs
Let's stick to a single spelling: SIB, all caps.
- */
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs)
insn_get_sib_base_off()
Ditto for the rest of the comments on insn_get_reg_offset_modrm_rm() above.
+{
- return get_reg_offset(insn, regs, REG_TYPE_BASE);
+}
+/**
- insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte
- @insn: Instruction structure containing the SiB byte
- @regs: Set of registers indicated by the SiB byte
- Obtain the register indicated by the index part of the SiB byte. The
- register is obtained as an offset from the index of pt_regs. In specific
- cases, the returned value can be -EDOM to indicate that the particular value
- of SiB does not refer to a register.
- Return: Register indicated by SiB's base, as an offset within struct pt_regs
- */
+int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
insn_get_sib_idx_off()
And again, if this function is unused, don't add it.
Thanks.
On Wed, 2017-04-12 at 18:28 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:38PM -0800, Ricardo Neri wrote:
The function insn_get_reg_offset takes as argument an enumeration that
Please end function names with parentheses.
Will do!
And do you mean get_reg_offset(), per chance?
Yes, I meant that. This was a copy/paste error.
indicates the type of offset that is returned: the R/M part of the ModRM byte, the index of the SIB byte or the base of the SIB byte.
Err, you mean, it returns the offset to the register the argument specifies.
Yes. I will reword.
Callers of this function would need the definition of such enumeration. This is not needed. Instead, helper functions can be defined for this purpose can be added.
"Instead, add helpers... "
I will reword.
These functions are useful in cases when, for instance, the caller needs to decide whether the operand is a register or a memory location by looking at the mod part of the ModRM byte.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-eval.h | 3 +++ arch/x86/lib/insn-eval.c | 51 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 5cab1b1..754211b 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -12,5 +12,8 @@ #include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
Forgotten to edit the copy-paste?
Which means, nothing really needs insn_get_reg_offset_sib_index() and you can get rid of it?
Yes, I can get rid of it.
#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 23cf010..78df1c9 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; }
+/**
- insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
- @insn: Instruction structure containing the ModRM byte
- @regs: Set of registers indicated by the ModRM byte
That's simply struct pt_regs - not a set of registers indicated by ModRM?!?
I will reword it to say "A struct pt_regs containing register values indicated by the ModRM byte".
- Obtain the register indicated by the r/m part of the ModRM byte. The
- register is obtained as an offset from the base of pt_regs. In specific
- cases, the returned value can be -EDOM to indicate that the particular value
- of ModRM does not refer to a register.
Put that sentence under the "Return: " paragraph below so that it is immediately obvious what the retvals are.
Will do.
- Return: Register indicated by r/m, as an offset within struct pt_regs
- */
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs)
That name is too long: insn_get_modrm_rm_off() should be enough.
+{
- return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+/**
- insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte
- @insn: Instruction structure containing the SiB byte
- @regs: Set of registers indicated by the SiB byte
- Obtain the register indicated by the base part of the SiB byte. The
- register is obtained as an offset from the base of pt_regs. In specific
- cases, the returned value can be -EDOM to indicate that the particular value
- of SiB does not refer to a register.
- Return: Register indicated by SiB's base, as an offset within struct pt_regs
Will make the spelling consistent.
Let's stick to a single spelling: SIB, all caps.
- */
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs)
insn_get_sib_base_off()
Ditto for the rest of the comments on insn_get_reg_offset_modrm_rm() above.
+{
- return get_reg_offset(insn, regs, REG_TYPE_BASE);
+}
+/**
- insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte
- @insn: Instruction structure containing the SiB byte
- @regs: Set of registers indicated by the SiB byte
- Obtain the register indicated by the index part of the SiB byte. The
- register is obtained as an offset from the index of pt_regs. In specific
- cases, the returned value can be -EDOM to indicate that the particular value
- of SiB does not refer to a register.
- Return: Register indicated by SiB's base, as an offset within struct pt_regs
- */
+int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
insn_get_sib_idx_off()
And again, if this function is unused, don't add it.
Masami Hiramatsu had originally requested to add the two functions. I suppose the unneeded functions could be added if/when needed.
Thanks and BR, Ricardo
On Wed, Apr 26, 2017 at 11:13:44AM -0700, Ricardo Neri wrote:
Masami Hiramatsu had originally requested to add the two functions. I suppose the unneeded functions could be added if/when needed.
Yap, exactly.
When computing a linear address and segmentation is used, we need to know the base address of the segment involved in the computation. In most of the cases, the segment base address will be zero as in USER_DS/USER32_DS. However, it may be possible that a user space program defines its own segments via a local descriptor table. In such a case, the segment base address may not be zero .Thus, the segment base address is needed to calculate correctly the linear address.
The segment selector to be used when computing a linear address is determined by either any of segment select override prefixes in the instruction or inferred from the registers involved in the computation of the effective address; in that order. Also, there are cases when the overrides shall be ignored.
For clarity, this process can be split into two steps: resolving the relevant segment and, once known, read the applicable segment selector. The method to obtain the segment selector depends on several factors. In 32-bit builds, segment selectors are saved into the pt_regs structure when switching to kernel mode. The same is also true for virtual-8086 mode. In 64-bit builds, segmentation is mostly ignored, except when running a program in 32-bit legacy mode. In this case, CS and SS can be obtained from pt_regs. DS, ES, FS and GS can be read directly from registers. Lastly, segmentation is possible in 64-bit mode via FS and GS. In these two cases, base addresses are obtained from the relevant MSRs.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 195 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 78df1c9..8d45df8 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -8,6 +8,7 @@ #include <asm/inat.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/vm86.h>
enum reg_type { REG_TYPE_RM = 0, @@ -15,6 +16,200 @@ enum reg_type { REG_TYPE_BASE, };
+enum segment { + SEG_CS = 0x23, + SEG_SS = 0x36, + SEG_DS = 0x3e, + SEG_ES = 0x26, + SEG_FS = 0x64, + SEG_GS = 0x65 +}; + +/** + * resolve_seg_selector() - obtain segment selector + * @regs: Set of registers containing the segment selector + * @insn: Instruction structure with selector override prefixes + * @regoff: Operand offset, in pt_regs, of which the selector is needed + * @default: Resolve default segment selector (i.e., ignore overrides) + * + * The segment selector to which an effective address refers depends on + * a) segment selector overrides instruction prefixes or b) the operand + * register indicated in the ModRM or SiB byte. + * + * For case a), the function inspects any prefixes in the insn instruction; + * insn can be null to indicate that selector override prefixes shall be + * ignored. This is useful when the use of prefixes is forbidden (e.g., + * obtaining the code selector). For case b), the operand register shall be + * represented as the offset from the base address of pt_regs. Also, regoff + * can be -EINVAL for cases in which registers are not used as operands (e.g., + * when the mod and r/m parts of the ModRM byte are 0 and 5, respectively). + * + * This function returns the segment selector to utilize as per the conditions + * described above. Please note that this functin does not return the value + * of the segment selector. The value of the segment selector needs to be + * obtained using get_segment_selector and passing the segment selector type + * resolved by this function. + * + * Return: Segment selector to use, among CS, SS, DS, ES, FS or GS. + */ +static int resolve_seg_selector(struct insn *insn, int regoff, bool get_default) +{ + int i; + + if (!insn) + return -EINVAL; + + if (get_default) + goto default_seg; + /* + * Check first if we have selector overrides. Having more than + * one selector override leads to undefined behavior. We + * only use the first one and return + */ + for (i = 0; i < insn->prefixes.nbytes; i++) { + switch (insn->prefixes.bytes[i]) { + case SEG_CS: + return SEG_CS; + case SEG_SS: + return SEG_SS; + case SEG_DS: + return SEG_DS; + case SEG_ES: + return SEG_ES; + case SEG_FS: + return SEG_FS; + case SEG_GS: + return SEG_GS; + default: + return -EINVAL; + } + } + +default_seg: + /* + * If no overrides, use default selectors as described in the + * Intel documentation: SS for ESP or EBP. DS for all data references, + * except when relative to stack or string destination. + * Also, AX, CX and DX are not valid register operands in 16-bit + * address encodings. + * Callers must interpret the result correctly according to the type + * of instructions (e.g., use ES for string instructions). + * Also, some values of modrm and sib might seem to indicate the use + * of EBP and ESP (e.g., modrm_mod = 0, modrm_rm = 5) but actually + * they refer to cases in which only a displacement used. These cases + * should be indentified by the caller and not with this function. + */ + switch (regoff) { + case offsetof(struct pt_regs, ax): + /* fall through */ + case offsetof(struct pt_regs, cx): + /* fall through */ + case offsetof(struct pt_regs, dx): + if (insn && insn->addr_bytes == 2) + return -EINVAL; + case -EDOM: /* no register involved in address computation */ + case offsetof(struct pt_regs, bx): + /* fall through */ + case offsetof(struct pt_regs, di): + /* fall through */ + case offsetof(struct pt_regs, si): + return SEG_DS; + case offsetof(struct pt_regs, bp): + /* fall through */ + case offsetof(struct pt_regs, sp): + return SEG_SS; + case offsetof(struct pt_regs, ip): + return SEG_CS; + default: + return -EINVAL; + } +} + +/** + * get_segment_selector() - obtain segment selector + * @regs: Set of registers containing the segment selector + * @seg_type: Type of segment selector to obtain + * @regoff: Operand offset, in pt_regs, of which the selector is needed + * + * Obtain the segment selector for any of CS, SS, DS, ES, FS, GS. In + * CONFIG_X86_32, the segment is obtained from either pt_regs or + * kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are obtained + * from pt_regs. DS, ES, FS and GS are obtained by reading the ds and es, fs + * and gs, respectively. + * + * Return: Value of the segment selector + */ +static unsigned short get_segment_selector(struct pt_regs *regs, + enum segment seg_type) +{ +#ifdef CONFIG_X86_64 + unsigned short seg_sel; + + switch (seg_type) { + case SEG_CS: + return (unsigned short)(regs->cs & 0xffff); + case SEG_SS: + return (unsigned short)(regs->ss & 0xffff); + case SEG_DS: + savesegment(ds, seg_sel); + return seg_sel; + case SEG_ES: + savesegment(es, seg_sel); + return seg_sel; + case SEG_FS: + savesegment(fs, seg_sel); + return seg_sel; + case SEG_GS: + savesegment(gs, seg_sel); + return seg_sel; + default: + return -1; + } +#else /* CONFIG_X86_32 */ + struct kernel_vm86_regs *vm86regs = (struct kernel_vm86_regs *)regs; + + if (v8086_mode(regs)) { + switch (seg_type) { + case SEG_CS: + return (unsigned short)(regs->cs & 0xffff); + case SEG_SS: + return (unsigned short)(regs->ss & 0xffff); + case SEG_DS: + return vm86regs->ds; + case SEG_ES: + return vm86regs->es; + case SEG_FS: + return vm86regs->fs; + case SEG_GS: + return vm86regs->gs; + default: + return -1; + } + } + + switch (seg_type) { + case SEG_CS: + return (unsigned short)(regs->cs & 0xffff); + case SEG_SS: + return (unsigned short)(regs->ss & 0xffff); + case SEG_DS: + return (unsigned short)(regs->ds & 0xffff); + case SEG_ES: + return (unsigned short)(regs->es & 0xffff); + case SEG_FS: + return (unsigned short)(regs->fs & 0xffff); + case SEG_GS: + /* + * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS. + * The macro below takes care of both cases. + */ + return get_user_gs(regs); + default: + return -1; + } +#endif /* CONFIG_X86_64 */ +} + static int get_reg_offset(struct insn *insn, struct pt_regs *regs, enum reg_type type) {
On Tue, Mar 07, 2017 at 04:32:39PM -0800, Ricardo Neri wrote:
When computing a linear address and segmentation is used, we need to know the base address of the segment involved in the computation. In most of the cases, the segment base address will be zero as in USER_DS/USER32_DS. However, it may be possible that a user space program defines its own segments via a local descriptor table. In such a case, the segment base address may not be zero .Thus, the segment base address is needed to calculate correctly the linear address.
The segment selector to be used when computing a linear address is determined by either any of segment select override prefixes in the instruction or inferred from the registers involved in the computation of the effective address; in that order. Also, there are cases when the overrides shall be ignored.
For clarity, this process can be split into two steps: resolving the relevant segment and, once known, read the applicable segment selector. The method to obtain the segment selector depends on several factors. In 32-bit builds, segment selectors are saved into the pt_regs structure when switching to kernel mode. The same is also true for virtual-8086 mode. In 64-bit builds, segmentation is mostly ignored, except when running a program in 32-bit legacy mode. In this case, CS and SS can be obtained from pt_regs. DS, ES, FS and GS can be read directly from registers.
Lastly, segmentation is possible in 64-bit mode via FS and GS.
I'd say "Lastly, the only two segment registers which are not ignored in long mode are FS and GS."
In these two cases, base addresses are obtained from the relevant MSRs.
s/relevant/respective/
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/lib/insn-eval.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 195 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 78df1c9..8d45df8 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -8,6 +8,7 @@ #include <asm/inat.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/vm86.h>
enum reg_type { REG_TYPE_RM = 0, @@ -15,6 +16,200 @@ enum reg_type { REG_TYPE_BASE, };
+enum segment {
- SEG_CS = 0x23,
- SEG_SS = 0x36,
- SEG_DS = 0x3e,
- SEG_ES = 0x26,
- SEG_FS = 0x64,
- SEG_GS = 0x65
+};
+/**
- resolve_seg_selector() - obtain segment selector
- @regs: Set of registers containing the segment selector
That arg is gone.
- @insn: Instruction structure with selector override prefixes
- @regoff: Operand offset, in pt_regs, of which the selector is needed
- @default: Resolve default segment selector (i.e., ignore overrides)
- The segment selector to which an effective address refers depends on
- a) segment selector overrides instruction prefixes or b) the operand
- register indicated in the ModRM or SiB byte.
- For case a), the function inspects any prefixes in the insn instruction;
s/insn //
- insn can be null to indicate that selector override prefixes shall be
- ignored.
This is not what the code does: it returns -EINVAL when insn is NULL.
This is useful when the use of prefixes is forbidden (e.g.,
- obtaining the code selector). For case b), the operand register shall be
- represented as the offset from the base address of pt_regs. Also, regoff
- can be -EINVAL for cases in which registers are not used as operands (e.g.,
- when the mod and r/m parts of the ModRM byte are 0 and 5, respectively).
- This function returns the segment selector to utilize as per the conditions
- described above. Please note that this functin does not return the value
- of the segment selector. The value of the segment selector needs to be
- obtained using get_segment_selector and passing the segment selector type
- resolved by this function.
- Return: Segment selector to use, among CS, SS, DS, ES, FS or GS.
: negative value when...
- */
+static int resolve_seg_selector(struct insn *insn, int regoff, bool get_default) +{
- int i;
- if (!insn)
return -EINVAL;
- if (get_default)
goto default_seg;
- /*
* Check first if we have selector overrides. Having more than
* one selector override leads to undefined behavior. We
* only use the first one and return
Well, I'd return -EINVAL to catch that undefined behavior. Note in a local var that I've already seen a seg reg and then if I see another one, return -EINVAL.
*/
- for (i = 0; i < insn->prefixes.nbytes; i++) {
switch (insn->prefixes.bytes[i]) {
case SEG_CS:
return SEG_CS;
case SEG_SS:
return SEG_SS;
case SEG_DS:
return SEG_DS;
case SEG_ES:
return SEG_ES;
case SEG_FS:
return SEG_FS;
case SEG_GS:
return SEG_GS;
So what happens if you're in 64-bit mode and you have CS, DS, ES, or SS? Or is this what @get_default is supposed to do? But it doesn't look like it, it still returns segments ignored in 64-bit mode.
default:
return -EINVAL;
}
- }
+default_seg:
- /*
* If no overrides, use default selectors as described in the
* Intel documentation: SS for ESP or EBP. DS for all data references,
* except when relative to stack or string destination.
* Also, AX, CX and DX are not valid register operands in 16-bit
* address encodings.
* Callers must interpret the result correctly according to the type
* of instructions (e.g., use ES for string instructions).
* Also, some values of modrm and sib might seem to indicate the use
* of EBP and ESP (e.g., modrm_mod = 0, modrm_rm = 5) but actually
* they refer to cases in which only a displacement used. These cases
* should be indentified by the caller and not with this function.
*/
- switch (regoff) {
- case offsetof(struct pt_regs, ax):
/* fall through */
- case offsetof(struct pt_regs, cx):
/* fall through */
- case offsetof(struct pt_regs, dx):
if (insn && insn->addr_bytes == 2)
return -EINVAL;
- case -EDOM: /* no register involved in address computation */
- case offsetof(struct pt_regs, bx):
/* fall through */
- case offsetof(struct pt_regs, di):
/* fall through */
return SEG_ES;
?
It is even in the comment above. I'm looking at MOVS %es:%rdi, %ds:%rsi, for example.
- case offsetof(struct pt_regs, si):
return SEG_DS;
- case offsetof(struct pt_regs, bp):
/* fall through */
- case offsetof(struct pt_regs, sp):
return SEG_SS;
- case offsetof(struct pt_regs, ip):
return SEG_CS;
- default:
return -EINVAL;
- }
+}
+/**
- get_segment_selector() - obtain segment selector
- @regs: Set of registers containing the segment selector
- @seg_type: Type of segment selector to obtain
- @regoff: Operand offset, in pt_regs, of which the selector is needed
That's gone.
- Obtain the segment selector for any of CS, SS, DS, ES, FS, GS. In
- CONFIG_X86_32, the segment is obtained from either pt_regs or
- kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are obtained
- from pt_regs. DS, ES, FS and GS are obtained by reading the ds and es, fs
- and gs, respectively.
... and DS and ES are ignored in long mode.
- Return: Value of the segment selector
... or negative...
- */
+static unsigned short get_segment_selector(struct pt_regs *regs,
enum segment seg_type)
+{
On Tue, 2017-04-18 at 11:42 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:39PM -0800, Ricardo Neri wrote:
When computing a linear address and segmentation is used, we need to know the base address of the segment involved in the computation. In most of the cases, the segment base address will be zero as in USER_DS/USER32_DS. However, it may be possible that a user space program defines its own segments via a local descriptor table. In such a case, the segment base address may not be zero .Thus, the segment base address is needed to calculate correctly the linear address.
The segment selector to be used when computing a linear address is determined by either any of segment select override prefixes in the instruction or inferred from the registers involved in the computation of the effective address; in that order. Also, there are cases when the overrides shall be ignored.
For clarity, this process can be split into two steps: resolving the relevant segment and, once known, read the applicable segment selector. The method to obtain the segment selector depends on several factors. In 32-bit builds, segment selectors are saved into the pt_regs structure when switching to kernel mode. The same is also true for virtual-8086 mode. In 64-bit builds, segmentation is mostly ignored, except when running a program in 32-bit legacy mode. In this case, CS and SS can be obtained from pt_regs. DS, ES, FS and GS can be read directly from registers.
Lastly, segmentation is possible in 64-bit mode via FS and GS.
I'd say "Lastly, the only two segment registers which are not ignored in long mode are FS and GS."
I will make this clarification.
In these two cases, base addresses are obtained from the relevant MSRs.
s/relevant/respective/
Will clarify.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/lib/insn-eval.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 195 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 78df1c9..8d45df8 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -8,6 +8,7 @@ #include <asm/inat.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/vm86.h>
enum reg_type { REG_TYPE_RM = 0, @@ -15,6 +16,200 @@ enum reg_type { REG_TYPE_BASE, };
+enum segment {
- SEG_CS = 0x23,
- SEG_SS = 0x36,
- SEG_DS = 0x3e,
- SEG_ES = 0x26,
- SEG_FS = 0x64,
- SEG_GS = 0x65
+};
+/**
- resolve_seg_selector() - obtain segment selector
- @regs: Set of registers containing the segment selector
That arg is gone.
This came from one of my initial implementations. I will remove it.
- @insn: Instruction structure with selector override prefixes
- @regoff: Operand offset, in pt_regs, of which the selector is needed
- @default: Resolve default segment selector (i.e., ignore overrides)
- The segment selector to which an effective address refers depends on
- a) segment selector overrides instruction prefixes or b) the operand
- register indicated in the ModRM or SiB byte.
- For case a), the function inspects any prefixes in the insn instruction;
s/insn //
In this case I meant "any prefixes in the insn structure". Probably it will make it more clear.
- insn can be null to indicate that selector override prefixes shall be
- ignored.
This is not what the code does: it returns -EINVAL when insn is NULL.
This was the behavior in a previous implementation. I will update it.
This is useful when the use of prefixes is forbidden (e.g.,
- obtaining the code selector). For case b), the operand register shall be
- represented as the offset from the base address of pt_regs. Also, regoff
- can be -EINVAL for cases in which registers are not used as operands (e.g.,
- when the mod and r/m parts of the ModRM byte are 0 and 5, respectively).
- This function returns the segment selector to utilize as per the conditions
- described above. Please note that this functin does not return the value
- of the segment selector. The value of the segment selector needs to be
- obtained using get_segment_selector and passing the segment selector type
- resolved by this function.
- Return: Segment selector to use, among CS, SS, DS, ES, FS or GS.
: negative value when...
I will document this behavior.
- */
+static int resolve_seg_selector(struct insn *insn, int regoff, bool get_default) +{
- int i;
- if (!insn)
return -EINVAL;
- if (get_default)
goto default_seg;
- /*
* Check first if we have selector overrides. Having more than
* one selector override leads to undefined behavior. We
* only use the first one and return
Well, I'd return -EINVAL to catch that undefined behavior. Note in a local var that I've already seen a seg reg and then if I see another one, return -EINVAL.
Sure. Will do.
*/
- for (i = 0; i < insn->prefixes.nbytes; i++) {
switch (insn->prefixes.bytes[i]) {
case SEG_CS:
return SEG_CS;
case SEG_SS:
return SEG_SS;
case SEG_DS:
return SEG_DS;
case SEG_ES:
return SEG_ES;
case SEG_FS:
return SEG_FS;
case SEG_GS:
return SEG_GS;
So what happens if you're in 64-bit mode and you have CS, DS, ES, or SS? Or is this what @get_default is supposed to do? But it doesn't look like it, it still returns segments ignored in 64-bit mode.
I regard that the role of this function is to obtain the the segment selector from either of the prefixes or inferred from the operands. It is the role of caller to determine if the segment selector should be ignored. So far the only caller is insn_get_seg_base() [1]. If in long mode, the segment base address is regarded as 0 unless the segment selector is FS or GS.
default:
return -EINVAL;
}
- }
+default_seg:
- /*
* If no overrides, use default selectors as described in the
* Intel documentation: SS for ESP or EBP. DS for all data references,
* except when relative to stack or string destination.
* Also, AX, CX and DX are not valid register operands in 16-bit
* address encodings.
* Callers must interpret the result correctly according to the type
* of instructions (e.g., use ES for string instructions).
* Also, some values of modrm and sib might seem to indicate the use
* of EBP and ESP (e.g., modrm_mod = 0, modrm_rm = 5) but actually
* they refer to cases in which only a displacement used. These cases
* should be indentified by the caller and not with this function.
*/
- switch (regoff) {
- case offsetof(struct pt_regs, ax):
/* fall through */
- case offsetof(struct pt_regs, cx):
/* fall through */
- case offsetof(struct pt_regs, dx):
if (insn && insn->addr_bytes == 2)
return -EINVAL;
- case -EDOM: /* no register involved in address computation */
- case offsetof(struct pt_regs, bx):
/* fall through */
- case offsetof(struct pt_regs, di):
/* fall through */
return SEG_ES;
?
I double-checked the latest version of the Intel Software Development manual [2], in the table 3-5 in section 3.7.4 mentions that DS is default segment for all data references, except string destinations. I tested this code with the UMIP-protected instructions and whenever I use %edi the default segment is %ds.
It is even in the comment above.
This function does not decode instructions but only the segment selectors. This is the reason I added a comment about callers using the segment carefully when string instructions. Perhaps I can move the comment to the function documentation. Given that string instructions seem to be the only exception, the function could take a boolean parameter if the segment is to be obtained for a destination string operand. How does this sound?
I'm looking at MOVS %es:%rdi, %ds:%rsi, for example.
Is this example valid? The documentation of MOVS specifies that it always moves DS:(E)SI to ES:(E)DI.
- case offsetof(struct pt_regs, si):
return SEG_DS;
- case offsetof(struct pt_regs, bp):
/* fall through */
- case offsetof(struct pt_regs, sp):
return SEG_SS;
- case offsetof(struct pt_regs, ip):
return SEG_CS;
- default:
return -EINVAL;
- }
+}
+/**
- get_segment_selector() - obtain segment selector
- @regs: Set of registers containing the segment selector
- @seg_type: Type of segment selector to obtain
- @regoff: Operand offset, in pt_regs, of which the selector is needed
That's gone.
I will remove it.
- Obtain the segment selector for any of CS, SS, DS, ES, FS, GS. In
- CONFIG_X86_32, the segment is obtained from either pt_regs or
- kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are obtained
- from pt_regs. DS, ES, FS and GS are obtained by reading the ds and es, fs
- and gs, respectively.
... and DS and ES are ignored in long mode.
I will clarify that callers need to ignore DS and ES if in long mode.
- Return: Value of the segment selector
... or negative...
I will complement documentation on this specific case.
Thanks and BR, Ricardo
On Wed, 2017-04-26 at 13:44 -0700, Ricardo Neri wrote:
- */
- for (i = 0; i < insn->prefixes.nbytes; i++) {
switch (insn->prefixes.bytes[i]) {
case SEG_CS:
return SEG_CS;
case SEG_SS:
return SEG_SS;
case SEG_DS:
return SEG_DS;
case SEG_ES:
return SEG_ES;
case SEG_FS:
return SEG_FS;
case SEG_GS:
return SEG_GS;
So what happens if you're in 64-bit mode and you have CS, DS, ES, or
SS?
Or is this what @get_default is supposed to do? But it doesn't look
like
it, it still returns segments ignored in 64-bit mode.
I regard that the role of this function is to obtain the the segment selector from either of the prefixes or inferred from the operands. It is the role of caller to determine if the segment selector should be ignored. So far the only caller is insn_get_seg_base() [1]. If in long mode, the segment base address is regarded as 0 unless the segment selector is FS or GS.
default:
return -EINVAL;
}
- }
+default_seg:
- /*
- If no overrides, use default selectors as described in the
- Intel documentation: SS for ESP or EBP. DS for all data
references,
- except when relative to stack or string destination.
- Also, AX, CX and DX are not valid register operands in
16-bit
- address encodings.
- Callers must interpret the result correctly according to
the type
- of instructions (e.g., use ES for string instructions).
- Also, some values of modrm and sib might seem to indicate
the use
- of EBP and ESP (e.g., modrm_mod = 0, modrm_rm = 5) but
actually
- they refer to cases in which only a displacement used.
These cases
- should be indentified by the caller and not with this
function.
- */
- switch (regoff) {
- case offsetof(struct pt_regs, ax):
/* fall through */
- case offsetof(struct pt_regs, cx):
/* fall through */
- case offsetof(struct pt_regs, dx):
if (insn && insn->addr_bytes == 2)
return -EINVAL;
- case -EDOM: /* no register involved in address computation */
- case offsetof(struct pt_regs, bx):
/* fall through */
- case offsetof(struct pt_regs, di):
/* fall through */
return SEG_ES;
?
I double-checked the latest version of the Intel Software Development manual [2], in the table 3-5 in section 3.7.4 mentions that DS is default segment for all data references, except string destinations. I tested this code with the UMIP-protected instructions and whenever I use %edi the default segment is %ds.
I forgot my references:
[1]. https://lkml.org/lkml/2017/3/7/876 [2]. https://software.intel.com/en-us/articles/intel-sdm#combined
On Wed, Apr 26, 2017 at 01:44:43PM -0700, Ricardo Neri wrote:
I regard that the role of this function is to obtain the the segment selector from either of the prefixes or inferred from the operands. It is the role of caller to determine if the segment selector should be ignored.
No, this is wrong. The function is called resolve_seg_selector() and it gives you the segment selector. CS, DS, ES, and SS in 64-bit mode are treated as null segments and your function should return/signal exactly that, i.e, saying that those should be ignored in that case.
I double-checked the latest version of the Intel Software Development manual [2], in the table 3-5 in section 3.7.4 mentions that DS is default segment for all data references, except string destinations. I tested this code with the UMIP-protected instructions and whenever I use %edi the default segment is %ds.
Yes, all correct. Except that we're adding a more-or-less generic x86 insn decoder so we should make it so...
Is this example valid? The documentation of MOVS specifies that it always moves DS:(E)SI to ES:(E)DI.
... that the decoder should do exactly that:
if (MOVS and rDI) return SEG_ES;
And you're handing in struct insn * so you can easily check which insn you're looking at.
Thanks.
On Sun, 2017-04-30 at 19:15 +0200, Borislav Petkov wrote:
On Wed, Apr 26, 2017 at 01:44:43PM -0700, Ricardo Neri wrote:
I regard that the role of this function is to obtain the the segment selector from either of the prefixes or inferred from the operands. It is the role of caller to determine if the segment selector should be ignored.
No, this is wrong. The function is called resolve_seg_selector() and it gives you the segment selector. CS, DS, ES, and SS in 64-bit mode are treated as null segments and your function should return/signal exactly that, i.e, saying that those should be ignored in that case.
I double-checked the latest version of the Intel Software Development manual [2], in the table 3-5 in section 3.7.4 mentions that DS is default segment for all data references, except string destinations. I tested this code with the UMIP-protected instructions and whenever I use %edi the default segment is %ds.
Yes, all correct. Except that we're adding a more-or-less generic x86 insn decoder so we should make it so...
Is this example valid? The documentation of MOVS specifies that it always moves DS:(E)SI to ES:(E)DI.
... that the decoder should do exactly that:
if (MOVS and rDI) return SEG_ES;
And you're handing in struct insn * so you can easily check which insn you're looking at.
I see. I have submitted v7 of the series and I have implemented all the changes above. Now I am able to identify string instructions.
Thanks and BR, Ricardo
The segment descriptor contains information that is relevant to how linear address need to be computed. It contains the default size of addresses as well as the base address of the segment. Thus, given a segment selector, we ought look at segment descriptor to correctly calculate the linear address.
In protected mode, the segment selector might indicate a segment descriptor from either the global descriptor table or a local descriptor table. Both cases are considered in this function.
This function is the initial implementation for subsequent functions that will obtain the aforementioned attributes of the segment descriptor.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 8d45df8..8608adf 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -5,9 +5,13 @@ */ #include <linux/kernel.h> #include <linux/string.h> +#include <asm/desc_defs.h> +#include <asm/desc.h> #include <asm/inat.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/ldt.h> +#include <linux/mmu_context.h> #include <asm/vm86.h>
enum reg_type { @@ -294,6 +298,63 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, }
/** + * get_desc() - Obtain address of segment descriptor + * @seg: Segment selector + * @desc: Pointer to the selected segment descriptor + * + * Given a segment selector, obtain a memory pointer to the segment + * descriptor. Both global and local descriptor tables are supported. + * desc will contain the address of the descriptor. + * + * Return: 0 if success, -EINVAL if failure + */ +static int get_desc(unsigned short seg, struct desc_struct **desc) +{ + struct desc_ptr gdt_desc = {0, 0}; + unsigned long desc_base; + + if (!desc) + return -EINVAL; + + desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK); + +#ifdef CONFIG_MODIFY_LDT_SYSCALL + if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) { + seg >>= 3; + + mutex_lock(¤t->active_mm->context.lock); + if (unlikely(!current->active_mm->context.ldt || + seg >= current->active_mm->context.ldt->size)) { + *desc = NULL; + mutex_unlock(¤t->active_mm->context.lock); + return -EINVAL; + } + + *desc = ¤t->active_mm->context.ldt->entries[seg]; + mutex_unlock(¤t->active_mm->context.lock); + return 0; + } +#endif + native_store_gdt(&gdt_desc); + + /* + * Bits [15:3] of the segment selector contain the index. Such + * index needs to be multiplied by 8. However, as the index + * least significant bit is already in bit 3, we don't have + * to perform the multiplication. + */ + desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK); + + if (desc_base > gdt_desc.size) { + *desc = NULL; + return -EINVAL; + } + + *desc = (struct desc_struct *)(gdt_desc.address + desc_base); + return 0; +} + +/** * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Set of registers indicated by the ModRM byte
On Tue, Mar 07, 2017 at 04:32:40PM -0800, Ricardo Neri wrote:
The segment descriptor contains information that is relevant to how linear address need to be computed. It contains the default size of addresses as well as the base address of the segment. Thus, given a segment selector, we ought look at segment descriptor to correctly calculate the linear address.
In protected mode, the segment selector might indicate a segment descriptor from either the global descriptor table or a local descriptor table. Both cases are considered in this function.
This function is the initial implementation for subsequent functions that will obtain the aforementioned attributes of the segment descriptor.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/lib/insn-eval.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 8d45df8..8608adf 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -5,9 +5,13 @@ */ #include <linux/kernel.h> #include <linux/string.h> +#include <asm/desc_defs.h> +#include <asm/desc.h> #include <asm/inat.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/ldt.h> +#include <linux/mmu_context.h> #include <asm/vm86.h>
enum reg_type { @@ -294,6 +298,63 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, }
/**
- get_desc() - Obtain address of segment descriptor
- @seg: Segment selector
Maybe that should be
@sel
if it is a sel-ector. :)
And using "sel" makes more sense then when you look at:
desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
for example:
- @desc: Pointer to the selected segment descriptor
- Given a segment selector, obtain a memory pointer to the segment
s/memory //
- descriptor. Both global and local descriptor tables are supported.
- desc will contain the address of the descriptor.
- Return: 0 if success, -EINVAL if failure
Why isn't this function returning the pointer or NULL on error? Maybe the later patches have an answer and I'll discover it if I continue reviewing :)
- */
+static int get_desc(unsigned short seg, struct desc_struct **desc) +{
- struct desc_ptr gdt_desc = {0, 0};
- unsigned long desc_base;
- if (!desc)
return -EINVAL;
- desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
That looks useless as you're doing it below again.
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
- if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) {
seg >>= 3;
mutex_lock(¤t->active_mm->context.lock);
if (unlikely(!current->active_mm->context.ldt ||
Is that really a fast path to complicate the if-test with an unlikely()? If not, you don't really need it.
seg >= current->active_mm->context.ldt->size)) {
ldt->size is the size of the descriptor table but you've shifted seg by 3. That selector index is shifted by 3 (to the left) to form an offset into the descriptor table because the entries there are 8 bytes.
So I *think* you wanna use the "useless" desc_base above... :)
*desc = NULL;
mutex_unlock(¤t->active_mm->context.lock);
return -EINVAL;
}
*desc = ¤t->active_mm->context.ldt->entries[seg];
... and seg here as it is an index into the table.
mutex_unlock(¤t->active_mm->context.lock);
return 0;
- }
+#endif
- native_store_gdt(&gdt_desc);
- /*
* Bits [15:3] of the segment selector contain the index. Such
* index needs to be multiplied by 8.
... because <insert reason I typed in above>.
On Wed, 2017-04-19 at 12:26 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:40PM -0800, Ricardo Neri wrote:
The segment descriptor contains information that is relevant to how linear address need to be computed. It contains the default size of addresses as well as the base address of the segment. Thus, given a segment selector, we ought look at segment descriptor to correctly calculate the linear address.
In protected mode, the segment selector might indicate a segment descriptor from either the global descriptor table or a local descriptor table. Both cases are considered in this function.
This function is the initial implementation for subsequent functions that will obtain the aforementioned attributes of the segment descriptor.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/lib/insn-eval.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 8d45df8..8608adf 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -5,9 +5,13 @@ */ #include <linux/kernel.h> #include <linux/string.h> +#include <asm/desc_defs.h> +#include <asm/desc.h> #include <asm/inat.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/ldt.h> +#include <linux/mmu_context.h> #include <asm/vm86.h>
enum reg_type { @@ -294,6 +298,63 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, }
/**
- get_desc() - Obtain address of segment descriptor
- @seg: Segment selector
Maybe that should be
@sel
if it is a sel-ector. :)
It makes sense. I will rename it.
And using "sel" makes more sense then when you look at:
desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
for example:
- @desc: Pointer to the selected segment descriptor
- Given a segment selector, obtain a memory pointer to the segment
s/memory //
Will update it.
- descriptor. Both global and local descriptor tables are supported.
- desc will contain the address of the descriptor.
- Return: 0 if success, -EINVAL if failure
Why isn't this function returning the pointer or NULL on error? Maybe the later patches have an answer and I'll discover it if I continue reviewing :)
After revisiting the code, I don't see why the function cannot return NULL.
- */
+static int get_desc(unsigned short seg, struct desc_struct **desc) +{
- struct desc_ptr gdt_desc = {0, 0};
- unsigned long desc_base;
- if (!desc)
return -EINVAL;
- desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
That looks useless as you're doing it below again.
Yes, it is useless. Please see my comment below.
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
- if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) {
seg >>= 3;
mutex_lock(¤t->active_mm->context.lock);
if (unlikely(!current->active_mm->context.ldt ||
Is that really a fast path to complicate the if-test with an unlikely()? If not, you don't really need it.
I will remove it.
seg >= current->active_mm->context.ldt->size)) {
ldt->size is the size of the descriptor table but you've shifted seg by 3. That selector index is shifted by 3 (to the left) to form an offset into the descriptor table because the entries there are 8 bytes.
I double-checked the ldt code and it seems to me that size refers to the number of entries in the table; it is always multiplied by LDT_ENTRY_SIZE [1], [2]. Am I missing something?
So I *think* you wanna use the "useless" desc_base above... :)
*desc = NULL;
mutex_unlock(¤t->active_mm->context.lock);
return -EINVAL;
}
*desc = ¤t->active_mm->context.ldt->entries[seg];
... and seg here as it is an index into the table.
mutex_unlock(¤t->active_mm->context.lock);
return 0;
- }
+#endif
- native_store_gdt(&gdt_desc);
- /*
* Bits [15:3] of the segment selector contain the index. Such
* index needs to be multiplied by 8.
... because <insert reason I typed in above>.
I will elaborate on the reason for this.
Thanks and BR, Ricardo
[1]. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch... [2]. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch...
On Wed, Apr 26, 2017 at 02:51:56PM -0700, Ricardo Neri wrote:
seg >= current->active_mm->context.ldt->size)) {
ldt->size is the size of the descriptor table but you've shifted seg by 3. That selector index is shifted by 3 (to the left) to form an offset into the descriptor table because the entries there are 8 bytes.
I double-checked the ldt code and it seems to me that size refers to the number of entries in the table; it is always multiplied by LDT_ENTRY_SIZE [1], [2]. Am I missing something?
No, you're not. I fell into that wrongly named struct member trap.
So ldt_struct.size should actually be called ldt_struct.n_entries or similar. Because what's in there is now is not "size".
And then code like
new_ldt->size * LDT_ENTRY_SIZE
would make much more sense if written like this:
new_ldt->n_entries * LDT_ENTRY_SIZE
Would you fix that in a prepatch pls?
Thanks.
On Thu, 2017-05-04 at 13:02 +0200, Borislav Petkov wrote:
On Wed, Apr 26, 2017 at 02:51:56PM -0700, Ricardo Neri wrote:
seg >= current->active_mm->context.ldt->size)) {
ldt->size is the size of the descriptor table but you've shifted seg by 3. That selector index is shifted by 3 (to the left) to form an offset into the descriptor table because the entries there are 8 bytes.
I double-checked the ldt code and it seems to me that size refers to the number of entries in the table; it is always multiplied by LDT_ENTRY_SIZE [1], [2]. Am I missing something?
No, you're not. I fell into that wrongly named struct member trap.
So ldt_struct.size should actually be called ldt_struct.n_entries or similar. Because what's in there is now is not "size".
And then code like
new_ldt->size * LDT_ENTRY_SIZE
would make much more sense if written like this:
new_ldt->n_entries * LDT_ENTRY_SIZE
Would you fix that in a prepatch pls?
Sure I can. Would this trigger a v8 of my series? I was hoping v7 series could be merged and then start doing incremental work on top of it. Does it make sense?
Thanks and BR, Ricardo
On Thu, May 11, 2017 at 07:13:57PM -0700, Ricardo Neri wrote:
Sure I can. Would this trigger a v8 of my series? I was hoping v7 series could be merged and then start doing incremental work on top of it. Does it make sense?
I guess that's tip guys' call.
With segmentation, the base address of the segment descriptor is needed to compute a linear address. The segment descriptor used in the address computation depends on either any segment override prefixes in the in the instruction or the default segment determined by the registers involved in the address computation. Thus, both the instruction as well as the register (specified as the offset from the base of pt_regs) are given as inputs, along with a boolean variable to select between override and default.
The segment selector is determined by get_seg_selector with the inputs described above. Once the selector is known the base address is determined. In protected mode, the selector is used to obtain the segment descriptor and then its base address. If in 64-bit user mode, the segment = base address is zero except when FS or GS are used. In virtual-8086 mode, the base address is computed as the value of the segment selector shifted 4 positions to the left.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-eval.h | 2 ++ arch/x86/lib/insn-eval.c | 66 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 754211b..b201742 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, + int regoff, bool use_default_seg);
#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 8608adf..383ca83 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -355,6 +355,72 @@ static int get_desc(unsigned short seg, struct desc_struct **desc) }
/** + * insn_get_seg_base() - Obtain base address contained in descriptor + * @regs: Set of registers containing the segment selector + * @insn: Instruction structure with selector override prefixes + * @regoff: Operand offset, in pt_regs, of which the selector is needed + * @use_default_seg: Use the default segment instead of prefix overrides + * + * Obtain the base address of the segment descriptor as indicated by either + * any segment override prefixes contained in insn or the default segment + * applicable to the register indicated by regoff. regoff is specified as the + * offset in bytes from the base of pt_regs. + * + * Return: In protected mode, base address of the segment. It may be zero in + * certain cases for 64-bit builds and/or 64-bit applications. In virtual-8086 + * mode, the segment selector shifed 4 positions to the right. -1L in case of + * error. + */ +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, + int regoff, bool use_default_seg) +{ + struct desc_struct *desc; + unsigned short seg; + enum segment seg_type; + int ret; + + seg_type = resolve_seg_selector(insn, regoff, use_default_seg); + + seg = get_segment_selector(regs, seg_type); + if (seg < 0) + return -1L; + + if (v8086_mode(regs)) + /* + * Base is simply the segment selector shifted 4 + * positions to the right. + */ + return (unsigned long)(seg << 4); + +#ifdef CONFIG_X86_64 + if (user_64bit_mode(regs)) { + /* + * Only FS or GS will have a base address, the rest of + * the segments' bases are forced to 0. + */ + unsigned long base; + + if (seg_type == SEG_FS) + rdmsrl(MSR_FS_BASE, base); + else if (seg_type == SEG_GS) + /* + * swapgs was called at the kernel entry point. Thus, + * MSR_KERNEL_GS_BASE will have the user-space GS base. + */ + rdmsrl(MSR_KERNEL_GS_BASE, base); + else + base = 0; + return base; + } +#endif + ret = get_desc(seg, &desc); + if (ret) + return -1L; + + return get_desc_base(desc); +} + +/** * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Set of registers indicated by the ModRM byte
On Tue, Mar 07, 2017 at 04:32:41PM -0800, Ricardo Neri wrote:
With segmentation, the base address of the segment descriptor is needed to compute a linear address. The segment descriptor used in the address computation depends on either any segment override prefixes in the in the
s/in the //
instruction or the default segment determined by the registers involved in the address computation. Thus, both the instruction as well as the register (specified as the offset from the base of pt_regs) are given as inputs, along with a boolean variable to select between override and default.
The segment selector is determined by get_seg_selector with the inputs
Please end function names with parentheses: get_seg_selector().
described above. Once the selector is known the base address is
known, ...
determined. In protected mode, the selector is used to obtain the segment descriptor and then its base address. If in 64-bit user mode, the segment = base address is zero except when FS or GS are used. In virtual-8086 mode, the base address is computed as the value of the segment selector shifted 4 positions to the left.
Good.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-eval.h | 2 ++ arch/x86/lib/insn-eval.c | 66 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 754211b..b201742 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff, bool use_default_seg);
#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 8608adf..383ca83 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -355,6 +355,72 @@ static int get_desc(unsigned short seg, struct desc_struct **desc) }
/**
- insn_get_seg_base() - Obtain base address contained in descriptor
- @regs: Set of registers containing the segment selector
- @insn: Instruction structure with selector override prefixes
- @regoff: Operand offset, in pt_regs, of which the selector is needed
- @use_default_seg: Use the default segment instead of prefix overrides
I'm wondering whether you really need that bool or you can deduce this from pt_regs... I guess I'll see...
- Obtain the base address of the segment descriptor as indicated by either
- any segment override prefixes contained in insn or the default segment
- applicable to the register indicated by regoff. regoff is specified as the
- offset in bytes from the base of pt_regs.
- Return: In protected mode, base address of the segment. It may be zero in
- certain cases for 64-bit builds and/or 64-bit applications. In virtual-8086
- mode, the segment selector shifed 4 positions to the right. -1L in case of
s/shifed/shifted/
- error.
- */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff, bool use_default_seg)
+{
- struct desc_struct *desc;
- unsigned short seg;
- enum segment seg_type;
- int ret;
- seg_type = resolve_seg_selector(insn, regoff, use_default_seg);
<--- error handling.
And that's not really a "seg_type" but simply the "sel"-ector. And that "enum segment" is not really a segment but an segment override prefixes enum. Can we please get the nomenclature right first?
- seg = get_segment_selector(regs, seg_type);
s/seg/sel/
- if (seg < 0)
return -1L;
- if (v8086_mode(regs))
/*
* Base is simply the segment selector shifted 4
* positions to the right.
*/
return (unsigned long)(seg << 4);
+#ifdef CONFIG_X86_64
- if (user_64bit_mode(regs)) {
if (IS_ENABLED(CONFIG_X86_64) && user_64bit_mode(regs)) {
/*
* Only FS or GS will have a base address, the rest of
* the segments' bases are forced to 0.
*/
unsigned long base;
if (seg_type == SEG_FS)
rdmsrl(MSR_FS_BASE, base);
else if (seg_type == SEG_GS)
/*
* swapgs was called at the kernel entry point. Thus,
* MSR_KERNEL_GS_BASE will have the user-space GS base.
*/
rdmsrl(MSR_KERNEL_GS_BASE, base);
else
base = 0;
return base;
- }
+#endif
- ret = get_desc(seg, &desc);
- if (ret)
return -1L;
- return get_desc_base(desc);
+}
+/**
- insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
- @insn: Instruction structure containing the ModRM byte
- @regs: Set of registers indicated by the ModRM byte
-- 2.9.3
On Thu, 2017-04-20 at 10:25 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:41PM -0800, Ricardo Neri wrote:
With segmentation, the base address of the segment descriptor is needed to compute a linear address. The segment descriptor used in the address computation depends on either any segment override prefixes in the in the
s/in the //
I will fix this typo.
instruction or the default segment determined by the registers involved in the address computation. Thus, both the instruction as well as the register (specified as the offset from the base of pt_regs) are given as inputs, along with a boolean variable to select between override and default.
The segment selector is determined by get_seg_selector with the inputs
Please end function names with parentheses: get_seg_selector().
I will use parentheses.
described above. Once the selector is known the base address is
known, ...
Will fix.
determined. In protected mode, the selector is used to obtain the segment descriptor and then its base address. If in 64-bit user mode, the segment = base address is zero except when FS or GS are used. In virtual-8086 mode, the base address is computed as the value of the segment selector shifted 4 positions to the left.
Good.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-eval.h | 2 ++ arch/x86/lib/insn-eval.c | 66 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 754211b..b201742 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff, bool use_default_seg);
#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 8608adf..383ca83 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -355,6 +355,72 @@ static int get_desc(unsigned short seg, struct desc_struct **desc) }
/**
- insn_get_seg_base() - Obtain base address contained in descriptor
- @regs: Set of registers containing the segment selector
- @insn: Instruction structure with selector override prefixes
- @regoff: Operand offset, in pt_regs, of which the selector is needed
- @use_default_seg: Use the default segment instead of prefix overrides
I'm wondering whether you really need that bool or you can deduce this from pt_regs... I guess I'll see...
- Obtain the base address of the segment descriptor as indicated by either
- any segment override prefixes contained in insn or the default segment
- applicable to the register indicated by regoff. regoff is specified as the
- offset in bytes from the base of pt_regs.
- Return: In protected mode, base address of the segment. It may be zero in
- certain cases for 64-bit builds and/or 64-bit applications. In virtual-8086
- mode, the segment selector shifed 4 positions to the right. -1L in case of
s/shifed/shifted/
I will correct the typo.
- error.
- */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff, bool use_default_seg)
+{
- struct desc_struct *desc;
- unsigned short seg;
- enum segment seg_type;
- int ret;
- seg_type = resolve_seg_selector(insn, regoff, use_default_seg);
<--- error handling.
I will add it.
And that's not really a "seg_type" but simply the "sel"-ector.
I will update the variable names to reflect the fact that they are segment selectors.
And that "enum segment" is not really a segment but an segment override prefixes enum. Can we please get the nomenclature right first?
I need a human-readable way of identifying what segment selector (in pt_regs, vm86regs or directly reading the segment registers) to use. Since there is a segment override prefix for all of them, I thought I could use them. Perhaps I can rename enum segment to enum segment_selector and comment that the values in the enum are those of the override prefixes. Would that be reasonable?
- seg = get_segment_selector(regs, seg_type);
s/seg/sel/
Will change.
- if (seg < 0)
return -1L;
- if (v8086_mode(regs))
/*
* Base is simply the segment selector shifted 4
* positions to the right.
*/
return (unsigned long)(seg << 4);
+#ifdef CONFIG_X86_64
- if (user_64bit_mode(regs)) {
if (IS_ENABLED(CONFIG_X86_64) && user_64bit_mode(regs)) {
I will change it.
Thanks and BR, Ricardo
On Wed, Apr 26, 2017 at 03:37:44PM -0700, Ricardo Neri wrote:
I need a human-readable way of identifying what segment selector (in pt_regs, vm86regs or directly reading the segment registers) to use. Since there is a segment override prefix for all of them, I thought I could use them.
Yes, you should...
Perhaps I can rename enum segment to enum segment_selector and comment that the values in the enum are those of the override prefixes. Would that be reasonable?
... but you should call them what they are: "enum seg_override_pfxs" or "enum seg_ovr_pfx" or...
Or somesuch. I suck at naming stuff.
On Fri, 2017-05-05 at 19:19 +0200, Borislav Petkov wrote:
On Wed, Apr 26, 2017 at 03:37:44PM -0700, Ricardo Neri wrote:
I need a human-readable way of identifying what segment selector (in pt_regs, vm86regs or directly reading the segment registers) to use. Since there is a segment override prefix for all of them, I thought I could use them.
Yes, you should...
Perhaps I can rename enum segment to enum segment_selector and comment that the values in the enum are those of the override prefixes. Would that be reasonable?
... but you should call them what they are: "enum seg_override_pfxs" or "enum seg_ovr_pfx" or...
Or somesuch. I suck at naming stuff.
In my v7, I simply named my enumeration enum segment_register, which is what they are. Some of its entries happen to have the value of the segment override prefixes but also have special entries as SEG_REG_INVAL when for errors and SEG_REG_IGNORE for long mode [1].
Thanks and BR, Ricardo
On Thu, 2017-04-20 at 10:25 +0200, Borislav Petkov wrote:
- insn_get_seg_base() - Obtain base address contained in
descriptor
- @regs: Set of registers containing the segment selector
- @insn: Instruction structure with selector override prefixes
- @regoff: Operand offset, in pt_regs, of which the selector is
needed
- @use_default_seg: Use the default segment instead of prefix
overrides
I'm wondering whether you really need that bool or you can deduce this from pt_regs... I guess I'll see...
Probably insn_get_seg_base() itself can verify if there are segment override prefixes in the struct insn. If yes, use them except for specific cases such as CS.
On an unrelated note, I still have the problem of using DS vs ES for string instructions. Perhaps instead of a use_default_seg flag, a string_instruction flag that indicates how to determine the default segment.
Thanks and BR, Ricardo
On Wed, Apr 26, 2017 at 03:52:41PM -0700, Ricardo Neri wrote:
Probably insn_get_seg_base() itself can verify if there are segment override prefixes in the struct insn. If yes, use them except for specific cases such as CS.
... and depending on whether in long mode or not.
On an unrelated note, I still have the problem of using DS vs ES for string instructions. Perhaps instead of a use_default_seg flag, a string_instruction flag that indicates how to determine the default segment.
... or you can look at the insn opcode directly. AFAICT, you need to check whether the opcode is 0xa4 or 0xa5 and that the insn is a single-byte opcode, i.e., not from the secondary map escaped with 0xf or some of the other multi-byte opcode maps.
On Fri, 2017-05-05 at 19:28 +0200, Borislav Petkov wrote:
On Wed, Apr 26, 2017 at 03:52:41PM -0700, Ricardo Neri wrote:
Probably insn_get_seg_base() itself can verify if there are segment override prefixes in the struct insn. If yes, use them except for specific cases such as CS.
... and depending on whether in long mode or not.
Yes, in my v7 I ignore the segment register if we are in long mode [1].
On an unrelated note, I still have the problem of using DS vs ES for string instructions. Perhaps instead of a use_default_seg flag, a string_instruction flag that indicates how to determine the default segment.
... or you can look at the insn opcode directly. AFAICT, you need to check whether the opcode is 0xa4 or 0xa5 and that the insn is a single-byte opcode, i.e., not from the secondary map escaped with 0xf or some of the other multi-byte opcode maps.
In my v7, I have added a section my function resolve_seg_register() that ignores segment overrides if it sees string instructions and the register EDI and defaults to ES. If the register is EIP, it defaults to CS. To determine if an instruction is a string instruction I do check for the size of the opcode and the opcodes that you mention plus others based on the Intel Software Development Manual[2].
[1]. https://lkml.org/lkml/2017/5/5/405 [2]. https://lkml.org/lkml/2017/5/5/410
Thanks and BR, Ricardo
-- Regards/Gruss, Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
These functions read the default values of the address and operand sizes as specified in the segment descriptor. This information is determined from the D and L bits. Hence, it can be used for both IA-32e 64-bit and 32-bit legacy modes. For virtual-8086 mode, the default address and operand sizes are always 2 bytes.
The D bit is only meaningful for code segments. Thus, these functions always use the code segment selector contained in regs.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/insn-eval.h | 2 + arch/x86/lib/insn-eval.c | 80 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index b201742..a0d81fc 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs); +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs); unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, int regoff, bool use_default_seg);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 383ca83..cda6c71 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -421,6 +421,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, }
/** + * insn_get_seg_default_address_bytes - Obtain default address size of segment + * @regs: Set of registers containing the segment selector + * + * Obtain the default address size as indicated in the segment descriptor + * selected in regs' code segment selector. In protected mode, the default + * address is determined by inspecting the L and D bits of the segment + * descriptor. In virtual-8086 mode, the default is always two bytes. + * + * Return: Default address size of segment + */ +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs) +{ + struct desc_struct *desc; + unsigned short seg; + int ret; + + if (v8086_mode(regs)) + return 2; + + seg = (unsigned short)regs->cs; + + ret = get_desc(seg, &desc); + if (ret) + return 0; + + switch ((desc->l << 1) | desc->d) { + case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */ + return 2; + case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */ + return 4; + case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */ + return 8; + case 3: /* Invalid setting. CS.L=1, CS.D=1 */ + /* fall through */ + default: + return 0; + } +} + +/** + * insn_get_seg_default_operand_bytes - Obtain default operand size of segment + * @regs: Set of registers containing the segment selector + * + * Obtain the default operand size as indicated in the segment descriptor + * selected in regs' code segment selector. In protected mode, the default + * operand size is determined by inspecting the L and D bits of the segment + * descriptor. In virtual-8086 mode, the default is always two bytes. + * + * Return: Default operand size of segment + */ +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs) +{ + struct desc_struct *desc; + unsigned short seg; + int ret; + + if (v8086_mode(regs)) + return 2; + + seg = (unsigned short)regs->cs; + + ret = get_desc(seg, &desc); + if (ret) + return 0; + + switch ((desc->l << 1) | desc->d) { + case 0: /* Legacy mode. 16-bit or 8-bit operands CS.L=0, CS.D=0 */ + return 2; + case 1: /* Legacy mode. 32- or 8 bit operands CS.L=0, CS.D=1 */ + /* fall through */ + case 2: /* IA-32e 64-bit mode. 32- or 8-bit opnds. CS.L=1, CS.D=0 */ + return 4; + case 3: /* Invalid setting. CS.L=1, CS.D=1 */ + /* fall through */ + default: + return 0; + } +} + +/** * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Set of registers indicated by the ModRM byte
On Tue, Mar 07, 2017 at 04:32:42PM -0800, Ricardo Neri wrote:
These functions read the default values of the address and operand sizes as specified in the segment descriptor. This information is determined from the D and L bits. Hence, it can be used for both IA-32e 64-bit and 32-bit legacy modes. For virtual-8086 mode, the default address and operand sizes are always 2 bytes.
Yeah, we tend to call that customarily 16-bit :)
The D bit is only meaningful for code segments. Thus, these functions always use the code segment selector contained in regs.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-eval.h | 2 + arch/x86/lib/insn-eval.c | 80 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index b201742..a0d81fc 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs); +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs); unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, int regoff, bool use_default_seg);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 383ca83..cda6c71 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -421,6 +421,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, }
/**
- insn_get_seg_default_address_bytes - Obtain default address size of segment
- @regs: Set of registers containing the segment selector
- Obtain the default address size as indicated in the segment descriptor
- selected in regs' code segment selector. In protected mode, the default
- address is determined by inspecting the L and D bits of the segment
- descriptor. In virtual-8086 mode, the default is always two bytes.
- Return: Default address size of segment
0 on error.
- */
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs) +{
- struct desc_struct *desc;
- unsigned short seg;
- int ret;
- if (v8086_mode(regs))
return 2;
- seg = (unsigned short)regs->cs;
- ret = get_desc(seg, &desc);
- if (ret)
return 0;
- switch ((desc->l << 1) | desc->d) {
- case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */
return 2;
- case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */
return 4;
- case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */
return 8;
- case 3: /* Invalid setting. CS.L=1, CS.D=1 */
/* fall through */
- default:
return 0;
- }
+}
+/**
- insn_get_seg_default_operand_bytes - Obtain default operand size of segment
- @regs: Set of registers containing the segment selector
- Obtain the default operand size as indicated in the segment descriptor
- selected in regs' code segment selector. In protected mode, the default
- operand size is determined by inspecting the L and D bits of the segment
- descriptor. In virtual-8086 mode, the default is always two bytes.
- Return: Default operand size of segment
- */
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs)
Right, so default address and operand size always go together so I don't think you need two separate functions.
So what I'd suggest - provided this pans out (I still haven't reviewed the whole thing) - is to determine the operating mode of the segment: long, legacy, etc and then return both address and operand sizes. Patch 17/21 needs them both at the same time AFAICT.
On Thu, 2017-04-20 at 15:06 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:42PM -0800, Ricardo Neri wrote:
These functions read the default values of the address and operand sizes as specified in the segment descriptor. This information is determined from the D and L bits. Hence, it can be used for both IA-32e 64-bit and 32-bit legacy modes. For virtual-8086 mode, the default address and operand sizes are always 2 bytes.
Yeah, we tend to call that customarily 16-bit :)
I will call it like this.
The D bit is only meaningful for code segments. Thus, these functions always use the code segment selector contained in regs.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/insn-eval.h | 2 + arch/x86/lib/insn-eval.c | 80 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index b201742..a0d81fc 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs); +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs); +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs); unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, int regoff, bool use_default_seg);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 383ca83..cda6c71 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -421,6 +421,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, }
/**
- insn_get_seg_default_address_bytes - Obtain default address size of segment
- @regs: Set of registers containing the segment selector
- Obtain the default address size as indicated in the segment descriptor
- selected in regs' code segment selector. In protected mode, the default
- address is determined by inspecting the L and D bits of the segment
- descriptor. In virtual-8086 mode, the default is always two bytes.
- Return: Default address size of segment
0 on error.
- */
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs) +{
- struct desc_struct *desc;
- unsigned short seg;
- int ret;
- if (v8086_mode(regs))
return 2;
- seg = (unsigned short)regs->cs;
- ret = get_desc(seg, &desc);
- if (ret)
return 0;
- switch ((desc->l << 1) | desc->d) {
- case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */
return 2;
- case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */
return 4;
- case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */
return 8;
- case 3: /* Invalid setting. CS.L=1, CS.D=1 */
/* fall through */
- default:
return 0;
- }
+}
+/**
- insn_get_seg_default_operand_bytes - Obtain default operand size of segment
- @regs: Set of registers containing the segment selector
- Obtain the default operand size as indicated in the segment descriptor
- selected in regs' code segment selector. In protected mode, the default
- operand size is determined by inspecting the L and D bits of the segment
- descriptor. In virtual-8086 mode, the default is always two bytes.
- Return: Default operand size of segment
- */
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs)
Right, so default address and operand size always go together so I don't think you need two separate functions.
So what I'd suggest - provided this pans out (I still haven't reviewed the whole thing) - is to determine the operating mode of the segment: long, legacy, etc and then return both address and operand sizes. Patch 17/21 needs them both at the same time AFAICT.
It makes sense to me. So far these two functions are used in the same place.
Thanks and BR, Ricardo
Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when the mod part of the ModRM byte is zero and R/EBP is specified in the R/M part of such bit, the value of the aforementioned register should not be used in the address computation. Instead, a 32-bit displacement is expected. The instruction decoder takes care of setting the displacement to the expected value. Returning -EDOM signals callers that they should ignore the value of such register when computing the address encoded in the instruction operands.
Also, callers should exercise care to correctly interpret this particular case. In IA-32e 64-bit mode, the address is given by the displacement plus the value of the RIP. In IA-32e compatibility mode, the value of EIP is ignored. This correction is done for our insn_get_addr_ref.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index cda6c71..ea10b03 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -250,6 +250,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, switch (type) { case REG_TYPE_RM: regno = X86_MODRM_RM(insn->modrm.value); + /* if mod=0, register R/EBP is not used in the address + * computation. Instead, a 32-bit displacement is expected; + * the instruction decoder takes care of reading such + * displacement. This is true for both R/EBP and R13, as the + * REX.B bit is not decoded. + */ + if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0) + return -EDOM; if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break; @@ -599,9 +607,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) + /* -EDOM means that we must ignore the address_offset. + * The only case in which we see this value is when + * R/M points to R/EBP. In such a case, in 64-bit mode + * the effective address is relative to tho RIP. + */ + if (addr_offset == -EDOM) { + eff_addr = 0; +#ifdef CONFIG_X86_64 + if (user_64bit_mode(regs)) + eff_addr = (long)regs->ip; +#endif + } else if (addr_offset < 0) { goto out_err; - eff_addr = regs_get_register(regs, addr_offset); + } else { + eff_addr = regs_get_register(regs, addr_offset); + } } eff_addr += insn->displacement.value; }
On Tue, Mar 07, 2017 at 04:32:43PM -0800, Ricardo Neri wrote:
Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when the mod part of the ModRM byte is zero and R/EBP is specified in the R/M part of such bit, the value of the aforementioned register should not be used in the address computation. Instead, a 32-bit displacement is expected. The instruction decoder takes care of setting the displacement to the expected value. Returning -EDOM signals callers that they should ignore the value of such register when computing the address encoded in the instruction operands.
Also, callers should exercise care to correctly interpret this particular case. In IA-32e 64-bit mode, the address is given by the displacement plus the value of the RIP. In IA-32e compatibility mode, the value of EIP is ignored. This correction is done for our insn_get_addr_ref.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/lib/insn-eval.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index cda6c71..ea10b03 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -250,6 +250,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, switch (type) { case REG_TYPE_RM: regno = X86_MODRM_RM(insn->modrm.value);
/* if mod=0, register R/EBP is not used in the address
* computation. Instead, a 32-bit displacement is expected;
* the instruction decoder takes care of reading such
* displacement. This is true for both R/EBP and R13, as the
* REX.B bit is not decoded.
*/
I'd simply write here: "ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement is following."
In addition, kernel comments style is:
/* * A sentence ending with a full-stop. * Another sentence. ... * More sentences. ... */
if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
return -EDOM;
if (X86_MODRM_MOD(insn->modrm.value) == 0 && X86_MODRM_RM(insn->modrm.value) == 5)
looks more understandable to me.
if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break;
@@ -599,9 +607,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
/* -EDOM means that we must ignore the address_offset.
* The only case in which we see this value is when
* R/M points to R/EBP. In such a case, in 64-bit mode
* the effective address is relative to tho RIP.
s/tho//
*/
Kernel comments style is:
/* * A sentence ending with a full-stop. * Another sentence. ... * More sentences. ... */
if (addr_offset == -EDOM) {
eff_addr = 0;
+#ifdef CONFIG_X86_64
if (user_64bit_mode(regs))
eff_addr = (long)regs->ip;
Is regs->ip the rIP of the *following* insn?
+#endif
You can do this in a prepatch and then get rid of the ifdeffery here:
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 2b5d686ea9f3..f6239273c5f1 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -115,9 +115,9 @@ static inline int v8086_mode(struct pt_regs *regs) #endif }
-#ifdef CONFIG_X86_64 static inline bool user_64bit_mode(struct pt_regs *regs) { +#ifdef CONFIG_X86_64 #ifndef CONFIG_PARAVIRT /* * On non-paravirt systems, this is the only long mode CPL 3 @@ -128,6 +128,9 @@ static inline bool user_64bit_mode(struct pt_regs *regs) /* Headers are too twisted for this to go in paravirt.h. */ return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; #endif +#else /* !CONFIG_X86_64 */ + return false; +#endif }
#define current_user_stack_pointer() current_pt_regs()->sp ---
On Fri, 2017-04-21 at 12:52 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:43PM -0800, Ricardo Neri wrote:
Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when the mod part of the ModRM byte is zero and R/EBP is specified in the R/M part of such bit, the value of the aforementioned register should not be used in the address computation. Instead, a 32-bit displacement is expected. The instruction decoder takes care of setting the displacement to the expected value. Returning -EDOM signals callers that they should ignore the value of such register when computing the address encoded in the instruction operands.
Also, callers should exercise care to correctly interpret this particular case. In IA-32e 64-bit mode, the address is given by the displacement plus the value of the RIP. In IA-32e compatibility mode, the value of EIP is ignored. This correction is done for our insn_get_addr_ref.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/lib/insn-eval.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index cda6c71..ea10b03 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -250,6 +250,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, switch (type) { case REG_TYPE_RM: regno = X86_MODRM_RM(insn->modrm.value);
/* if mod=0, register R/EBP is not used in the address
* computation. Instead, a 32-bit displacement is expected;
* the instruction decoder takes care of reading such
* displacement. This is true for both R/EBP and R13, as the
* REX.B bit is not decoded.
*/
I'd simply write here: "ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement is following."
I will shorten the comment.
In addition, kernel comments style is:
/* * A sentence ending with a full-stop. * Another sentence. ... * More sentences. ... */
... and use the correct style. I feel bad I missed this one.
if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
return -EDOM;
if (X86_MODRM_MOD(insn->modrm.value) == 0 && X86_MODRM_RM(insn->modrm.value) == 5)
looks more understandable to me.
Should I go with !(X86_MODRM_MOD(insn->modrm.value)) as you suggested in other patches?
if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break;
@@ -599,9 +607,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
/* -EDOM means that we must ignore the address_offset.
* The only case in which we see this value is when
* R/M points to R/EBP. In such a case, in 64-bit mode
* the effective address is relative to tho RIP.
s/tho//
Will correct.
*/
Kernel comments style is:
/* * A sentence ending with a full-stop. * Another sentence. ... * More sentences. ... */
Will correct.
if (addr_offset == -EDOM) {
eff_addr = 0;
+#ifdef CONFIG_X86_64
if (user_64bit_mode(regs))
eff_addr = (long)regs->ip;
Is regs->ip the rIP of the *following* insn?
No this is a bug. This should be regs->ip + insn.length.
+#endif
You can do this in a prepatch and then get rid of the ifdeffery here:
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 2b5d686ea9f3..f6239273c5f1 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -115,9 +115,9 @@ static inline int v8086_mode(struct pt_regs *regs) #endif }
-#ifdef CONFIG_X86_64 static inline bool user_64bit_mode(struct pt_regs *regs) { +#ifdef CONFIG_X86_64 #ifndef CONFIG_PARAVIRT /* * On non-paravirt systems, this is the only long mode CPL 3 @@ -128,6 +128,9 @@ static inline bool user_64bit_mode(struct pt_regs *regs) /* Headers are too twisted for this to go in paravirt.h. */ return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; #endif +#else /* !CONFIG_X86_64 */
- return false;
+#endif }
This look nice. I will add this pre-patch.
Thanks and BR, Ricardo
On Wed, Apr 26, 2017 at 06:29:59PM -0700, Ricardo Neri wrote:
if (X86_MODRM_MOD(insn->modrm.value) == 0 && X86_MODRM_RM(insn->modrm.value) == 5)
looks more understandable to me.
Should I go with !(X86_MODRM_MOD(insn->modrm.value)) as you suggested in other patches?
Ah, yes pls.
Thanks.
On Sun, 2017-05-07 at 19:20 +0200, Borislav Petkov wrote:
On Wed, Apr 26, 2017 at 06:29:59PM -0700, Ricardo Neri wrote:
if (X86_MODRM_MOD(insn->modrm.value) == 0 && X86_MODRM_RM(insn->modrm.value) == 5)
looks more understandable to me.
Should I go with !(X86_MODRM_MOD(insn->modrm.value)) as you suggested in other patches?
Ah, yes pls.
I did this in v7[1].
Thanks and BR, Ricardo
insn_get_addr_ref returns the effective address as defined by the section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software Developer's Manual. In order to compute the linear address, we must add to the effective address the segment base address as set in the segment descriptor. Furthermore, the segment descriptor to use depends on the register that is used as the base of the effective address. The effective base address varies depending on whether the operand is a register or a memory address and on whether a SiB byte is used.
In most cases, the segment base address will be 0 if the USER_DS/USER32_DS segment is used or if segmentation is not used. However, the base address is not necessarily zero if a user programs defines its own segments. This is possible by using a local descriptor table.
Since the effective address is a signed quantity, the unsigned segment base address saved in a separate variable and added to the final effective address.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index ea10b03..edb360f 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -566,7 +566,7 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) */ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) { - unsigned long linear_addr; + unsigned long linear_addr, seg_base_addr; long eff_addr, base, indx; int addr_offset, base_offset, indx_offset; insn_byte_t sib; @@ -580,6 +580,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) if (addr_offset < 0) goto out_err; eff_addr = regs_get_register(regs, addr_offset); + seg_base_addr = insn_get_seg_base(regs, insn, addr_offset, + false); } else { if (insn->sib.nbytes) { /* @@ -605,6 +607,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) indx = regs_get_register(regs, indx_offset);
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); + seg_base_addr = insn_get_seg_base(regs, insn, + base_offset, false); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); /* -EDOM means that we must ignore the address_offset. @@ -623,10 +627,12 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) } else { eff_addr = regs_get_register(regs, addr_offset); } + seg_base_addr = insn_get_seg_base(regs, insn, + addr_offset, false); } eff_addr += insn->displacement.value; } - linear_addr = (unsigned long)eff_addr; + linear_addr = (unsigned long)eff_addr + seg_base_addr;
return (void __user *)linear_addr; out_err:
On Tue, Mar 07, 2017 at 04:32:44PM -0800, Ricardo Neri wrote:
insn_get_addr_ref returns the effective address as defined by the
Please end function names with parentheses.
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software Developer's Manual. In order to compute the linear address, we must add to the effective address the segment base address as set in the segment descriptor. Furthermore, the segment descriptor to use depends on the register that is used as the base of the effective address. The effective base address varies depending on whether the operand is a register or a memory address and on whether a SiB byte is used.
In most cases, the segment base address will be 0 if the USER_DS/USER32_DS segment is used or if segmentation is not used. However, the base address is not necessarily zero if a user programs defines its own segments. This is possible by using a local descriptor table.
Since the effective address is a signed quantity, the unsigned segment base address saved in a separate variable and added to the final effective
".. is saved..."
address.
On Fri, 2017-04-21 at 16:55 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:44PM -0800, Ricardo Neri wrote:
insn_get_addr_ref returns the effective address as defined by the
Please end function names with parentheses.
Will do.
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software Developer's Manual. In order to compute the linear address, we must add to the effective address the segment base address as set in the segment descriptor. Furthermore, the segment descriptor to use depends on the register that is used as the base of the effective address. The effective base address varies depending on whether the operand is a register or a memory address and on whether a SiB byte is used.
In most cases, the segment base address will be 0 if the USER_DS/USER32_DS segment is used or if segmentation is not used. However, the base address is not necessarily zero if a user programs defines its own segments. This is possible by using a local descriptor table.
Since the effective address is a signed quantity, the unsigned segment base address saved in a separate variable and added to the final effective
".. is saved..."
I will correct this.
Thanks and BR, Ricardo
The 32-bit and 64-bit address encodings are identical. This means that we can use the same function in both cases. In order to reuse the function for 32-bit address encodings, we must sign-extend our 32-bit signed operands to 64-bit signed variables (only for 64-bit builds). To decide on whether sign extension is needed, we rely on the address size as given by the instruction structure.
Lastly, before computing the linear address, we must truncate our signed 64-bit signed effective address if the address size is 32-bit.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 44 ++++++++++++++++++++++++++++++++------------ 1 file changed, 32 insertions(+), 12 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index edb360f..a9a1704 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -559,6 +559,15 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) return get_reg_offset(insn, regs, REG_TYPE_INDEX); }
+static inline long __to_signed_long(unsigned long val, int long_bytes) +{ +#ifdef CONFIG_X86_64 + return long_bytes == 4 ? (long)((int)((val) & 0xffffffff)) : (long)val; +#else + return (long)val; +#endif +} + /* * return the address being referenced be instruction * for rm=3 returning the content of the rm reg @@ -567,19 +576,21 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) { unsigned long linear_addr, seg_base_addr; - long eff_addr, base, indx; - int addr_offset, base_offset, indx_offset; + long eff_addr, base, indx, tmp; + int addr_offset, base_offset, indx_offset, addr_bytes; insn_byte_t sib;
insn_get_modrm(insn); insn_get_sib(insn); sib = insn->sib.value; + addr_bytes = insn->addr_bytes;
if (X86_MODRM_MOD(insn->modrm.value) == 3) { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) goto out_err; - eff_addr = regs_get_register(regs, addr_offset); + tmp = regs_get_register(regs, addr_offset); + eff_addr = __to_signed_long(tmp, addr_bytes); seg_base_addr = insn_get_seg_base(regs, insn, addr_offset, false); } else { @@ -591,20 +602,24 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) * in the address computation. */ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); - if (unlikely(base_offset == -EDOM)) + if (unlikely(base_offset == -EDOM)) { base = 0; - else if (unlikely(base_offset < 0)) + } else if (unlikely(base_offset < 0)) { goto out_err; - else - base = regs_get_register(regs, base_offset); + } else { + tmp = regs_get_register(regs, base_offset); + base = __to_signed_long(tmp, addr_bytes); + }
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - if (unlikely(indx_offset == -EDOM)) + if (unlikely(indx_offset == -EDOM)) { indx = 0; - else if (unlikely(indx_offset < 0)) + } else if (unlikely(indx_offset < 0)) { goto out_err; - else - indx = regs_get_register(regs, indx_offset); + } else { + tmp = regs_get_register(regs, indx_offset); + indx = __to_signed_long(tmp, addr_bytes); + }
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); seg_base_addr = insn_get_seg_base(regs, insn, @@ -625,13 +640,18 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) } else if (addr_offset < 0) { goto out_err; } else { - eff_addr = regs_get_register(regs, addr_offset); + tmp = regs_get_register(regs, addr_offset); + eff_addr = __to_signed_long(tmp, addr_bytes); } seg_base_addr = insn_get_seg_base(regs, insn, addr_offset, false); } eff_addr += insn->displacement.value; } + /* truncate to 4 bytes for 32-bit effective addresses */ + if (addr_bytes == 4) + eff_addr &= 0xffffffff; + linear_addr = (unsigned long)eff_addr + seg_base_addr;
return (void __user *)linear_addr;
On Tue, Mar 07, 2017 at 04:32:45PM -0800, Ricardo Neri wrote:
The 32-bit and 64-bit address encodings are identical. This means that we can use the same function in both cases. In order to reuse the function for 32-bit address encodings, we must sign-extend our 32-bit signed operands to 64-bit signed variables (only for 64-bit builds). To decide on whether sign extension is needed, we rely on the address size as given by the instruction structure.
Lastly, before computing the linear address, we must truncate our signed 64-bit signed effective address if the address size is 32-bit.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/lib/insn-eval.c | 44 ++++++++++++++++++++++++++++++++------------ 1 file changed, 32 insertions(+), 12 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index edb360f..a9a1704 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -559,6 +559,15 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) return get_reg_offset(insn, regs, REG_TYPE_INDEX); }
+static inline long __to_signed_long(unsigned long val, int long_bytes) +{ +#ifdef CONFIG_X86_64
- return long_bytes == 4 ? (long)((int)((val) & 0xffffffff)) : (long)val;
I don't think this always works as expected:
--- typedef unsigned int u32; typedef unsigned long u64;
int main() { u64 v = 0x1ffffffff;
printf("v: %ld, 0x%lx, %ld\n", v, v, (long)((int)((v) & 0xffffffff)));
return 0; } -- ...
v: 8589934591, 0x1ffffffff, -1
Now, this should not happen on 32-bit because unsigned long is 32-bit there but can that happen on 64-bit?
+#else
- return (long)val;
+#endif +}
/*
- return the address being referenced be instruction
- for rm=3 returning the content of the rm reg
@@ -567,19 +576,21 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) { unsigned long linear_addr, seg_base_addr;
- long eff_addr, base, indx;
- int addr_offset, base_offset, indx_offset;
long eff_addr, base, indx, tmp;
int addr_offset, base_offset, indx_offset, addr_bytes; insn_byte_t sib;
insn_get_modrm(insn); insn_get_sib(insn); sib = insn->sib.value;
addr_bytes = insn->addr_bytes;
if (X86_MODRM_MOD(insn->modrm.value) == 3) { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) goto out_err;
eff_addr = regs_get_register(regs, addr_offset);
tmp = regs_get_register(regs, addr_offset);
eff_addr = __to_signed_long(tmp, addr_bytes);
This repeats throughout the function so it begs to be a separate:
get_mem_addr()
or so.
seg_base_addr = insn_get_seg_base(regs, insn, addr_offset, false);
} else { @@ -591,20 +602,24 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) * in the address computation. */ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
if (unlikely(base_offset == -EDOM))
if (unlikely(base_offset == -EDOM)) { base = 0;
else if (unlikely(base_offset < 0))
} else if (unlikely(base_offset < 0)) { goto out_err;
else
base = regs_get_register(regs, base_offset);
} else {
tmp = regs_get_register(regs, base_offset);
base = __to_signed_long(tmp, addr_bytes);
} indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
if (unlikely(indx_offset == -EDOM))
if (unlikely(indx_offset == -EDOM)) { indx = 0;
else if (unlikely(indx_offset < 0))
} else if (unlikely(indx_offset < 0)) { goto out_err;
else
indx = regs_get_register(regs, indx_offset);
} else {
tmp = regs_get_register(regs, indx_offset);
indx = __to_signed_long(tmp, addr_bytes);
} eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); seg_base_addr = insn_get_seg_base(regs, insn,
@@ -625,13 +640,18 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) } else if (addr_offset < 0) { goto out_err; } else {
eff_addr = regs_get_register(regs, addr_offset);
tmp = regs_get_register(regs, addr_offset);
} eff_addr += insn->displacement.value; }eff_addr = __to_signed_long(tmp, addr_bytes); } seg_base_addr = insn_get_seg_base(regs, insn, addr_offset, false);
- /* truncate to 4 bytes for 32-bit effective addresses */
- if (addr_bytes == 4)
eff_addr &= 0xffffffff;
Why again?
On Tue, 2017-04-25 at 15:51 +0200, Borislav Petkov wrote:
On Tue, Mar 07, 2017 at 04:32:45PM -0800, Ricardo Neri wrote:
The 32-bit and 64-bit address encodings are identical. This means that we can use the same function in both cases. In order to reuse the function for 32-bit address encodings, we must sign-extend our 32-bit signed operands to 64-bit signed variables (only for 64-bit builds). To decide on whether sign extension is needed, we rely on the address size as given by the instruction structure.
Lastly, before computing the linear address, we must truncate our signed 64-bit signed effective address if the address size is 32-bit.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/lib/insn-eval.c | 44 ++++++++++++++++++++++++++++++++------------ 1 file changed, 32 insertions(+), 12 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index edb360f..a9a1704 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -559,6 +559,15 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) return get_reg_offset(insn, regs, REG_TYPE_INDEX); }
+static inline long __to_signed_long(unsigned long val, int long_bytes) +{ +#ifdef CONFIG_X86_64
- return long_bytes == 4 ? (long)((int)((val) & 0xffffffff)) : (long)val;
I don't think this always works as expected:
typedef unsigned int u32; typedef unsigned long u64;
int main() { u64 v = 0x1ffffffff;
printf("v: %ld, 0x%lx, %ld\n", v, v, (long)((int)((v) & 0xffffffff))); return 0;
}
...
v: 8589934591, 0x1ffffffff, -1
Now, this should not happen on 32-bit because unsigned long is 32-bit there but can that happen on 64-bit?
This is the reason I check the value of long_bytes. If long_bytes is not 4, being the only other possible value 8 (perhaps I need to issue an error when the value is not any of these values), the cast is simply (long)val. I modified your test program with:
printf("v: %ld, 0x%lx, %ld, %ld\n", v, v, (long)((int)((v) & 0xffffffff)), (long)v);
and I get:
v: 8589934591, 0x1ffffffff, -1, 8589934591.
Am I missing something?
+#else
- return (long)val;
+#endif +}
/*
- return the address being referenced be instruction
- for rm=3 returning the content of the rm reg
@@ -567,19 +576,21 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) { unsigned long linear_addr, seg_base_addr;
- long eff_addr, base, indx;
- int addr_offset, base_offset, indx_offset;
long eff_addr, base, indx, tmp;
int addr_offset, base_offset, indx_offset, addr_bytes; insn_byte_t sib;
insn_get_modrm(insn); insn_get_sib(insn); sib = insn->sib.value;
addr_bytes = insn->addr_bytes;
if (X86_MODRM_MOD(insn->modrm.value) == 3) { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) goto out_err;
eff_addr = regs_get_register(regs, addr_offset);
tmp = regs_get_register(regs, addr_offset);
eff_addr = __to_signed_long(tmp, addr_bytes);
This repeats throughout the function so it begs to be a separate:
get_mem_addr()
or so.
Yes, the same pattern is used in all places except when using register operands (ModRM.rm == 11b). I will look into putting it in a function.
seg_base_addr = insn_get_seg_base(regs, insn, addr_offset, false);
} else { @@ -591,20 +602,24 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) * in the address computation. */ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
if (unlikely(base_offset == -EDOM))
if (unlikely(base_offset == -EDOM)) { base = 0;
else if (unlikely(base_offset < 0))
} else if (unlikely(base_offset < 0)) { goto out_err;
else
base = regs_get_register(regs, base_offset);
} else {
tmp = regs_get_register(regs, base_offset);
base = __to_signed_long(tmp, addr_bytes);
} indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
if (unlikely(indx_offset == -EDOM))
if (unlikely(indx_offset == -EDOM)) { indx = 0;
else if (unlikely(indx_offset < 0))
} else if (unlikely(indx_offset < 0)) { goto out_err;
else
indx = regs_get_register(regs, indx_offset);
} else {
tmp = regs_get_register(regs, indx_offset);
indx = __to_signed_long(tmp, addr_bytes);
} eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); seg_base_addr = insn_get_seg_base(regs, insn,
@@ -625,13 +640,18 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) } else if (addr_offset < 0) { goto out_err; } else {
eff_addr = regs_get_register(regs, addr_offset);
tmp = regs_get_register(regs, addr_offset);
} eff_addr += insn->displacement.value; }eff_addr = __to_signed_long(tmp, addr_bytes); } seg_base_addr = insn_get_seg_base(regs, insn, addr_offset, false);
- /* truncate to 4 bytes for 32-bit effective addresses */
- if (addr_bytes == 4)
eff_addr &= 0xffffffff;
Why again?
eff_addr is a long variable, which in x86_64 has 64-bit. However, in 32-bit segments the effective address is 32-bit. Thus, I discard the 32 most significant bytes.
Thanks and BR, Ricardo
On Wed, Apr 26, 2017 at 08:33:46PM -0700, Ricardo Neri wrote:
This is the reason I check the value of long_bytes. If long_bytes is not 4, being the only other possible value 8 (perhaps I need to issue an error when the value is not any of these values),
Well, maybe I'm a bit too paranoid. Bottom line is, we should do the address computations exactly like the hardware does them so that there are no surprises. Doing them with longs looks ok to me.
On Mon, 2017-05-08 at 13:42 +0200, Borislav Petkov wrote:
On Wed, Apr 26, 2017 at 08:33:46PM -0700, Ricardo Neri wrote:
This is the reason I check the value of long_bytes. If long_bytes is not 4, being the only other possible value 8 (perhaps I need to issue an error when the value is not any of these values),
Well, maybe I'm a bit too paranoid. Bottom line is, we should do the address computations exactly like the hardware does them so that there are no surprises. Doing them with longs looks ok to me.
Using long is exactly what I intend to do. The problem that I am trying to resolve is to sign-extend signed memory offsets of 32-bit programs running on 64-bit kernels. For 64-bit programs running on 64-bit kernels I can simply use longs. I added error checking in my v7 of this series [1].
Thanks and BR, Ricardo
Tasks running in virtual-8086 mode or in protected mode with code segment descriptors that specify 16-bit default address sizes via the D bit will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section 2.1.5. 16-bit addressing encodings differ in several ways from the 32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte points to different registers and, in some cases, addresses can be indicated by the addition of the value of two registers. Also, there is no support for SiB bytes. Thus, a separate function is needed to parse this form of addressing.
A couple of functions are introduced. get_reg_offset_16 obtains the offset from the base of pt_regs of the registers indicated by the ModRM byte of the address encoding. insn_get_addr_ref_16 computes the linear address indicated by the instructions using the value of the registers given by ModRM as well as the base address of the segment.
Lastly, the original function insn_get_addr_ref is renamed as insn_get_addr_ref_32_64. A new insn_get_addr_ref function decides what type of address decoding must be done base on the number of address bytes given by the instruction. Documentation for insn_get_addr_ref_32_64 is also improved.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 137 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index a9a1704..cb1076d 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -306,6 +306,73 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, }
/** + * get_reg_offset_16 - Obtain offset of register indicated by instruction + * @insn: Instruction structure containing ModRM and SiB bytes + * @regs: Set of registers referred by the instruction + * @offs1: Offset of the first operand register + * @offs2: Offset of the second opeand register, if applicable. + * + * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte + * within insn. This function is to be used with 16-bit address encodings. The + * offs1 and offs2 will be written with the offset of the two registers + * indicated by the instruction. In cases where any of the registers is not + * referenced by the instruction, the value will be set to -EDOM. + * + * Return: 0 on success, -EINVAL on failure. + */ +static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs, + int *offs1, int *offs2) +{ + /* 16-bit addressing can use one or two registers */ + static const int regoff1[] = { + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bx), + }; + + static const int regoff2[] = { + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + -EDOM, + -EDOM, + -EDOM, + -EDOM, + }; + + if (!offs1 || !offs2) + return -EINVAL; + + /* operand is a register, use the generic function */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + *offs1 = insn_get_reg_offset_modrm_rm(insn, regs); + *offs2 = -EDOM; + return 0; + } + + *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)]; + *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)]; + + /* + * If no displacement is indicated in the mod part of the ModRM byte, + * (mod part is 0) and the r/m part of the same byte is 6, no register + * is used caculate the operand address. An r/m part of 6 means that + * the second register offset is already invalid. + */ + if ((X86_MODRM_MOD(insn->modrm.value) == 0) && + (X86_MODRM_RM(insn->modrm.value) == 6)) + *offs1 = -EDOM; + + return 0; +} + +/** * get_desc() - Obtain address of segment descriptor * @seg: Segment selector * @desc: Pointer to the selected segment descriptor @@ -559,6 +626,76 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs) return get_reg_offset(insn, regs, REG_TYPE_INDEX); }
+/** + * insn_get_addr_ref_16 - Obtain the 16-bit address referred by instruction + * @insn: Instruction structure containing ModRM byte and displacement + * @regs: Set of registers referred by the instruction + * + * This function is to be used with 16-bit address encodings. Obtain the memory + * address referred by the instruction's ModRM bytes and displacement. Also, the + * segment used as base is determined by either any segment override prefixes in + * insn or the default segment of the registers involved in the address + * computation. + * the ModRM byte + * + * Return: linear address referenced by instruction and registers + */ +static void __user *insn_get_addr_ref_16(struct insn *insn, + struct pt_regs *regs) +{ + unsigned long linear_addr, seg_base_addr; + short eff_addr, addr1 = 0, addr2 = 0; + int addr_offset1, addr_offset2; + int ret; + + insn_get_modrm(insn); + insn_get_displacement(insn); + + /* + * If operand is a register, the layout is the same as in + * 32-bit and 64-bit addressing. + */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + addr_offset1 = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset1 < 0) + goto out_err; + eff_addr = regs_get_register(regs, addr_offset1); + seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1, + false); + } else { + ret = get_reg_offset_16(insn, regs, &addr_offset1, + &addr_offset2); + if (ret < 0) + goto out_err; + /* + * Don't fail on invalid offset values. They might be invalid + * because they cannot be used for this particular value of + * the ModRM. Instead, use them in the computation only if + * they contain a valid value. + */ + if (addr_offset1 != -EDOM) + addr1 = 0xffff & regs_get_register(regs, addr_offset1); + if (addr_offset2 != -EDOM) + addr2 = 0xffff & regs_get_register(regs, addr_offset2); + eff_addr = addr1 + addr2; + /* + * The first register is in the operand implies the SS or DS + * segment selectors, the second register in the operand can + * only imply DS. Thus, use the first register to obtain + * the segment selector. + */ + seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1, + false); + + eff_addr += (insn->displacement.value & 0xffff); + } + linear_addr = (unsigned short)eff_addr + seg_base_addr; + + return (void __user *)linear_addr; +out_err: + return (void __user *)-1; +} + static inline long __to_signed_long(unsigned long val, int long_bytes) { #ifdef CONFIG_X86_64
Convert the function insn_get_add_ref into a wrapper function that calls the correct static address-decoding function depending on the size of the address. In this way, callers do not need to worry about calling the correct function and decreases the number of functions that need to be exposed.
To this end, the original 32/64-bit insn_get_addr_ref is renamed as insn_get_addr_ref_32_64 to reflect the type of address encodings that it handles.
Documentation is added to the new wrapper function and the documentation for the 32/64-bit address decoding function is improved.
Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Kees Cook keescook@chromium.org Cc: Thomas Garnier thgarnie@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Dmitry Vyukov dvyukov@google.com Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/lib/insn-eval.c | 45 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 40 insertions(+), 5 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index cb1076d..e633588 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -705,12 +705,21 @@ static inline long __to_signed_long(unsigned long val, int long_bytes) #endif }
-/* - * return the address being referenced be instruction - * for rm=3 returning the content of the rm reg - * for rm!=3 calculates the address using SIB and Disp +/** + * insn_get_addr_ref_32_64 - Obtain a 32/64-bit address referred by instruction + * @insn: Instruction struct with ModRM and SiB bytes and displacement + * @regs: Set of registers referred by the instruction + * + * This function is to be used with 32-bit and 64-bit address encodings. Obtain + * the memory address referred by the instruction's ModRM bytes and + * displacement. Also, the segment used as base is determined by either any + * segment override prefixes in insn or the default segment of the registers + * involved in the linear address computation. + * + * Return: linear address referenced by instruction and registers */ -void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +static void __user *insn_get_addr_ref_32_64(struct insn *insn, + struct pt_regs *regs) { unsigned long linear_addr, seg_base_addr; long eff_addr, base, indx, tmp; @@ -795,3 +804,29 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) out_err: return (void __user *)-1; } + +/** + * insn_get_addr_ref - Obtain the linear address referred by instruction + * @insn: Instruction structure containing ModRM byte and displacement + * @regs: Set of registers referred by the instruction + * + * Obtain the memory address referred by the instruction's ModRM bytes and + * displacement. Also, the segment used as base is determined by either any + * segment override prefixes in insn or the default segment of the registers + * involved in the address computation. + * + * Return: linear address referenced by instruction and registers + */ +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +{ + switch (insn->addr_bytes) { + case 2: + return insn_get_addr_ref_16(insn, regs); + case 4: + /* fall through */ + case 8: + return insn_get_addr_ref_32_64(insn, regs); + default: + return (void __user *)-1; + } +}
Up to this point, only fault.c used the definitions of the page fault error codes. Thus, it made sense to keep them within such file. Other portions of code might be interested in those definitions too. For instance, the User- Mode Instruction Prevention emulation code will use such definitions to emulate a page fault when it is unable to successfully copy the results of the emulated instructions to user space.
While relocating the error code enumeration, the prefix X86_ is used to make it consistent with the rest of the definitions in traps.h. Of course, code using the enumeration had to be updated as well. No functional changes were performed.
Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Andy Lutomirski luto@kernel.org Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: x86@kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/traps.h | 18 +++++++++ arch/x86/mm/fault.c | 88 +++++++++++++++++--------------------------- 2 files changed, 52 insertions(+), 54 deletions(-)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 01fd0a7..4a2e585 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -148,4 +148,22 @@ enum { X86_TRAP_IRET = 32, /* 32, IRET Exception */ };
+/* + * Page fault error code bits: + * + * bit 0 == 0: no page found 1: protection fault + * bit 1 == 0: read access 1: write access + * bit 2 == 0: kernel-mode access 1: user-mode access + * bit 3 == 1: use of reserved bit detected + * bit 4 == 1: fault was an instruction fetch + * bit 5 == 1: protection keys block access + */ +enum x86_pf_error_code { + X86_PF_PROT = 1 << 0, + X86_PF_WRITE = 1 << 1, + X86_PF_USER = 1 << 2, + X86_PF_RSVD = 1 << 3, + X86_PF_INSTR = 1 << 4, + X86_PF_PK = 1 << 5, +}; #endif /* _ASM_X86_TRAPS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 428e3176..e859a9c 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -29,26 +29,6 @@ #include <asm/trace/exceptions.h>
/* - * Page fault error code bits: - * - * bit 0 == 0: no page found 1: protection fault - * bit 1 == 0: read access 1: write access - * bit 2 == 0: kernel-mode access 1: user-mode access - * bit 3 == 1: use of reserved bit detected - * bit 4 == 1: fault was an instruction fetch - * bit 5 == 1: protection keys block access - */ -enum x86_pf_error_code { - - PF_PROT = 1 << 0, - PF_WRITE = 1 << 1, - PF_USER = 1 << 2, - PF_RSVD = 1 << 3, - PF_INSTR = 1 << 4, - PF_PK = 1 << 5, -}; - -/* * Returns 0 if mmiotrace is disabled, or if the fault is not * handled by mmiotrace: */ @@ -149,7 +129,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) * If it was a exec (instruction fetch) fault on NX page, then * do not ignore the fault: */ - if (error_code & PF_INSTR) + if (error_code & X86_PF_INSTR) return 0;
instr = (void *)convert_ip_to_linear(current, regs); @@ -179,7 +159,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) * siginfo so userspace can discover which protection key was set * on the PTE. * - * If we get here, we know that the hardware signaled a PF_PK + * If we get here, we know that the hardware signaled a X86_PF_PK * fault and that there was a VMA once we got in the fault * handler. It does *not* guarantee that the VMA we find here * was the one that we faulted on. @@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info, /* * force_sig_info_fault() is called from a number of * contexts, some of which have a VMA and some of which - * do not. The PF_PK handing happens after we have a + * do not. The X86_PF_PK handing happens after we have a * valid VMA, so we should never reach this without a * valid VMA. */ @@ -655,7 +635,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, if (!oops_may_print()) return;
- if (error_code & PF_INSTR) { + if (error_code & X86_PF_INSTR) { unsigned int level; pgd_t *pgd; pte_t *pte; @@ -739,7 +719,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, */ if (current->thread.sig_on_uaccess_err && signal) { tsk->thread.trap_nr = X86_TRAP_PF; - tsk->thread.error_code = error_code | PF_USER; + tsk->thread.error_code = error_code | X86_PF_USER; tsk->thread.cr2 = address;
/* XXX: hwpoison faults will set the wrong code. */ @@ -859,7 +839,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, struct task_struct *tsk = current;
/* User mode accesses just cause a SIGSEGV */ - if (error_code & PF_USER) { + if (error_code & X86_PF_USER) { /* * It's possible to have interrupts off here: */ @@ -880,7 +860,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, * Instruction fetch faults in the vsyscall page might need * emulation. */ - if (unlikely((error_code & PF_INSTR) && + if (unlikely((error_code & X86_PF_INSTR) && ((address & ~0xfff) == VSYSCALL_ADDR))) { if (emulate_vsyscall(regs, address)) return; @@ -893,7 +873,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, * are always protection faults. */ if (address >= TASK_SIZE_MAX) - error_code |= PF_PROT; + error_code |= X86_PF_PROT;
if (likely(show_unhandled_signals)) show_signal_msg(regs, error_code, address, tsk); @@ -949,11 +929,11 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return false; - if (error_code & PF_PK) + if (error_code & X86_PF_PK) return true; /* this checks permission keys on the VMA: */ - if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), - (error_code & PF_INSTR), foreign)) + if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE), + (error_code & X86_PF_INSTR), foreign)) return true; return false; } @@ -981,7 +961,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, int code = BUS_ADRERR;
/* Kernel mode? Handle exceptions or die: */ - if (!(error_code & PF_USER)) { + if (!(error_code & X86_PF_USER)) { no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); return; } @@ -1010,14 +990,14 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, unsigned long address, struct vm_area_struct *vma, unsigned int fault) { - if (fatal_signal_pending(current) && !(error_code & PF_USER)) { + if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) { no_context(regs, error_code, address, 0, 0); return; }
if (fault & VM_FAULT_OOM) { /* Kernel mode? Handle exceptions or die: */ - if (!(error_code & PF_USER)) { + if (!(error_code & X86_PF_USER)) { no_context(regs, error_code, address, SIGSEGV, SEGV_MAPERR); return; @@ -1042,16 +1022,16 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
static int spurious_fault_check(unsigned long error_code, pte_t *pte) { - if ((error_code & PF_WRITE) && !pte_write(*pte)) + if ((error_code & X86_PF_WRITE) && !pte_write(*pte)) return 0;
- if ((error_code & PF_INSTR) && !pte_exec(*pte)) + if ((error_code & X86_PF_INSTR) && !pte_exec(*pte)) return 0; /* * Note: We do not do lazy flushing on protection key - * changes, so no spurious fault will ever set PF_PK. + * changes, so no spurious fault will ever set X86_PF_PK. */ - if ((error_code & PF_PK)) + if ((error_code & X86_PF_PK)) return 1;
return 1; @@ -1096,8 +1076,8 @@ spurious_fault(unsigned long error_code, unsigned long address) * change, so user accesses are not expected to cause spurious * faults. */ - if (error_code != (PF_WRITE | PF_PROT) - && error_code != (PF_INSTR | PF_PROT)) + if (error_code != (X86_PF_WRITE | X86_PF_PROT) && + error_code != (X86_PF_INSTR | X86_PF_PROT)) return 0;
pgd = init_mm.pgd + pgd_index(address); @@ -1150,19 +1130,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) * always an unconditional error and can never result in * a follow-up action to resolve the fault, like a COW. */ - if (error_code & PF_PK) + if (error_code & X86_PF_PK) return 1;
/* * Make sure to check the VMA so that we do not perform - * faults just to hit a PF_PK as soon as we fill in a + * faults just to hit a X86_PF_PK as soon as we fill in a * page. */ - if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), - (error_code & PF_INSTR), foreign)) + if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE), + (error_code & X86_PF_INSTR), foreign)) return 1;
- if (error_code & PF_WRITE) { + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; @@ -1170,7 +1150,7 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) }
/* read, present: */ - if (unlikely(error_code & PF_PROT)) + if (unlikely(error_code & X86_PF_PROT)) return 1;
/* read, not present: */ @@ -1193,7 +1173,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs) if (!static_cpu_has(X86_FEATURE_SMAP)) return false;
- if (error_code & PF_USER) + if (error_code & X86_PF_USER) return false;
if (!user_mode(regs) && (regs->flags & X86_EFLAGS_AC)) @@ -1249,7 +1229,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * protection error (error_code & 9) == 0. */ if (unlikely(fault_in_kernel_space(address))) { - if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) { + if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) { if (vmalloc_fault(address) >= 0) return;
@@ -1277,7 +1257,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, if (unlikely(kprobes_fault(regs))) return;
- if (unlikely(error_code & PF_RSVD)) + if (unlikely(error_code & X86_PF_RSVD)) pgtable_bad(regs, error_code, address);
if (unlikely(smap_violation(error_code, regs))) { @@ -1303,7 +1283,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, */ if (user_mode(regs)) { local_irq_enable(); - error_code |= PF_USER; + error_code |= X86_PF_USER; flags |= FAULT_FLAG_USER; } else { if (regs->flags & X86_EFLAGS_IF) @@ -1312,9 +1292,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
- if (error_code & PF_WRITE) + if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; - if (error_code & PF_INSTR) + if (error_code & X86_PF_INSTR) flags |= FAULT_FLAG_INSTRUCTION;
/* @@ -1334,7 +1314,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * space check, thus avoiding the deadlock: */ if (unlikely(!down_read_trylock(&mm->mmap_sem))) { - if ((error_code & PF_USER) == 0 && + if ((error_code & X86_PF_USER) == 0 && !search_exception_tables(regs->ip)) { bad_area_nosemaphore(regs, error_code, address, NULL); return; @@ -1361,7 +1341,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, bad_area(regs, error_code, address); return; } - if (error_code & PF_USER) { + if (error_code & X86_PF_USER) { /* * Accessing the stack below %sp is always a bug. * The large cushion allows instructions like enter
On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Up to this point, only fault.c used the definitions of the page fault error codes. Thus, it made sense to keep them within such file. Other portions of code might be interested in those definitions too. For instance, the User- Mode Instruction Prevention emulation code will use such definitions to emulate a page fault when it is unable to successfully copy the results of the emulated instructions to user space.
While relocating the error code enumeration, the prefix X86_ is used to make it consistent with the rest of the definitions in traps.h. Of course, code using the enumeration had to be updated as well. No functional changes were performed.
Reviewed-by: Andy Lutomirski luto@kernel.org
User-Mode Instruction Prevention is a security feature present in new Intel processors that, when set, prevents the execution of a subset of instructions if such instructions are executed in user mode (CPL > 0). Attempting to execute such instructions causes a general protection exception.
The subset of instructions comprises:
* SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register
This feature is also added to the list of disabled-features to allow a cleaner handling of build-time configuration.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/include/uapi/asm/processor-flags.h | 2 ++ 3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 4e77723..0739f1e 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -286,6 +286,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 85599ad..4707445 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -16,6 +16,12 @@ # define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31)) #endif
+#ifdef CONFIG_X86_INTEL_UMIP +# define DISABLE_UMIP 0 +#else +# define DISABLE_UMIP (1<<(X86_FEATURE_UMIP & 31)) +#endif + #ifdef CONFIG_X86_64 # define DISABLE_VME (1<<(X86_FEATURE_VME & 31)) # define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31)) @@ -55,7 +61,7 @@ #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 -#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE) +#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_UMIP) #define DISABLED_MASK17 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index 567de50..d2c2af8 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -104,6 +104,8 @@ #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT) #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */ #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT) +#define X86_CR4_UMIP_BIT 11 /* enable UMIP support */ +#define X86_CR4_UMIP _BITUL(X86_CR4_UMIP_BIT) #define X86_CR4_VMXE_BIT 13 /* enable VMX virtualization */ #define X86_CR4_VMXE _BITUL(X86_CR4_VMXE_BIT) #define X86_CR4_SMXE_BIT 14 /* enable safer mode (TXT) */
The feature User-Mode Instruction Prevention present in recent Intel processor prevents a group of instructions from being executed with CPL > 0. Otherwise, a general protection fault is issued.
Rather than relaying this fault to the user space (in the form of a SIGSEGV signal), the instructions protected by UMIP can be emulated to provide dummy results. This allows to conserve the current kernel behavior and not reveal the system resources that UMIP intends to protect (the global descriptor and interrupt descriptor tables, the segment selectors of the local descriptor table and the task state and the machine status word).
This emulation is needed because certain applications (e.g., WineHQ) rely on this subset of instructions to function.
The instructions protected by UMIP can be split in two groups. Those who return a kernel memory address (sgdt and sidt) and those who return a value (sldt, str and smsw).
For the instructions that return a kernel memory address, applications such as WineHQ rely on the result being located in the kernel memory space. The result is emulated as a hard-coded value that, lies close to the top of the kernel memory. The limit for the GDT and the IDT are set to zero.
The instructions sldt and str return a segment selector relative to the base address of the global descriptor table. Since the actual address of such table is not revealed, it makes sense to emulate the result as zero.
The instruction smsw is emulated to return the value that the register CR0 has at boot time as set in the head_32.
Care is taken to appropriately emulate the results when segmentation is used. This is, rather than relying on USER_DS and USER_CS, the function insn_get_addr_ref inspects the segment descriptor pointed by the registers in pt_regs. This ensures that we correctly obtain the segment base address and the address and operand sizes even if the user space application uses local descriptor table.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/include/asm/umip.h | 15 +++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/umip.c | 257 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 273 insertions(+) create mode 100644 arch/x86/include/asm/umip.h create mode 100644 arch/x86/kernel/umip.c
diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h new file mode 100644 index 0000000..077b236 --- /dev/null +++ b/arch/x86/include/asm/umip.h @@ -0,0 +1,15 @@ +#ifndef _ASM_X86_UMIP_H +#define _ASM_X86_UMIP_H + +#include <linux/types.h> +#include <asm/ptrace.h> + +#ifdef CONFIG_X86_INTEL_UMIP +bool fixup_umip_exception(struct pt_regs *regs); +#else +static inline bool fixup_umip_exception(struct pt_regs *regs) +{ + return false; +} +#endif /* CONFIG_X86_INTEL_UMIP */ +#endif /* _ASM_X86_UMIP_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 84c0059..0ded7b1 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -122,6 +122,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o +obj-$(CONFIG_X86_INTEL_UMIP) += umip.o
ifdef CONFIG_FRAME_POINTER obj-y += unwind_frame.o diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c new file mode 100644 index 0000000..e64d8e5 --- /dev/null +++ b/arch/x86/kernel/umip.c @@ -0,0 +1,257 @@ +/* + * umip.c Emulation for instruction protected by the Intel User-Mode + * Instruction Prevention. The instructions are: + * sgdt + * sldt + * sidt + * str + * smsw + * + * Copyright (c) 2017, Intel Corporation. + * Ricardo Neri ricardo.neri@linux.intel.com + */ + +#include <linux/uaccess.h> +#include <asm/umip.h> +#include <asm/traps.h> +#include <asm/insn.h> +#include <asm/insn-eval.h> +#include <linux/ratelimit.h> + +/* + * == Base addresses of GDT and IDT + * Some applications to function rely finding the global descriptor table (GDT) + * and the interrupt descriptor table (IDT) in kernel memory. + * For x86_32, the selected values do not match any particular hole, but it + * suffices to provide a memory location within kernel memory. + * + * == CRO flags for SMSW + * Use the flags given when booting, as found in head_32.S + */ + +#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \ + X86_CR0_WP | X86_CR0_AM) +#define UMIP_DUMMY_GDT_BASE 0xfffe0000 +#define UMIP_DUMMY_IDT_BASE 0xffff0000 + +enum umip_insn { + UMIP_SGDT = 0, /* opcode 0f 01 ModR/M reg 0 */ + UMIP_SIDT, /* opcode 0f 01 ModR/M reg 1 */ + UMIP_SLDT, /* opcode 0f 00 ModR/M reg 0 */ + UMIP_SMSW, /* opcode 0f 01 ModR/M reg 4 */ + UMIP_STR, /* opcode 0f 00 ModR/M reg 1 */ +}; + +/** + * __identify_insn - Identify a UMIP-protected instruction + * @insn: Instruction structure with opcode and ModRM byte. + * + * From the instruction opcode and the reg part of the ModRM byte, identify, + * if any, a UMIP-protected instruction. + * + * Return: an enumeration of a UMIP-protected instruction; -EINVAL on failure. + */ +static int __identify_insn(struct insn *insn) +{ + /* By getting modrm we also get the opcode. */ + insn_get_modrm(insn); + + /* All the instructions of interest start with 0x0f. */ + if (insn->opcode.bytes[0] != 0xf) + return -EINVAL; + + if (insn->opcode.bytes[1] == 0x1) { + switch (X86_MODRM_REG(insn->modrm.value)) { + case 0: + return UMIP_SGDT; + case 1: + return UMIP_SIDT; + case 4: + return UMIP_SMSW; + default: + return -EINVAL; + } + } else if (insn->opcode.bytes[1] == 0x0) { + if (X86_MODRM_REG(insn->modrm.value) == 0) + return UMIP_SLDT; + else if (X86_MODRM_REG(insn->modrm.value) == 1) + return UMIP_STR; + else + return -EINVAL; + } else { + return -EINVAL; + } +} + +/** + * __emulate_umip_insn - Emulate UMIP instructions with dummy values + * @insn: Instruction structure with ModRM byte + * @umip_inst: Instruction to emulate + * @data: Buffer onto which the dummy values will be copied + * @data_size: Size of the emulated result + * + * Emulate an instruction protected by UMIP. The result of the emulation + * is saved in the provided buffer. The size of the results depends on both + * the instruction and type of operand (register vs memory address). Thus, + * the size of the result needs to be updated. + * + * Result: 0 if success, -EINVAL on failure to emulate + */ +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst, + unsigned char *data, int *data_size) +{ + unsigned long dummy_base_addr; + unsigned short dummy_limit = 0; + unsigned int dummy_value = 0; + + switch (umip_inst) { + /* + * These two instructions return the base address and limit of the + * global and interrupt descriptor table. The base address can be + * 24-bit, 32-bit or 64-bit. Limit is always 16-bit. If the operand + * size is 16-bit the returned value of the base address is supposed + * to be a zero-extended 24-byte number. However, it seems that a + * 32-byte number is always returned in legacy protected mode + * irrespective of the operand size. + */ + case UMIP_SGDT: + /* fall through */ + case UMIP_SIDT: + if (umip_inst == UMIP_SGDT) + dummy_base_addr = UMIP_DUMMY_GDT_BASE; + else + dummy_base_addr = UMIP_DUMMY_IDT_BASE; + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + /* SGDT and SIDT do not take register as argument. */ + return -EINVAL; + } + + memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr)); + memcpy(data, &dummy_limit, sizeof(dummy_limit)); + *data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit); + break; + case UMIP_SMSW: + /* + * Even though CR0_STATE contain 4 bytes, the number + * of bytes to be copied in the result buffer is determined + * by whether the operand is a register or a memory location. + */ + dummy_value = CR0_STATE; + /* + * These two instructions return a 16-bit value. We return + * all zeros. This is equivalent to a null descriptor for + * str and sldt. + */ + /* fall through */ + case UMIP_SLDT: + /* fall through */ + case UMIP_STR: + /* if operand is a register, it is zero-extended */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + memset(data, 0, insn->opnd_bytes); + *data_size = insn->opnd_bytes; + /* if not, only the two least significant bytes are copied */ + } else { + *data_size = 2; + } + memcpy(data, &dummy_value, sizeof(dummy_value)); + break; + default: + return -EINVAL; + } + return 0; +} + +/** + * fixup_umip_exception - Fixup #GP faults caused by UMIP + * @regs: Registers as saved when entering the #GP trap + * + * The instructions sgdt, sidt, str, smsw, sldt cause a general protection + * fault if with CPL > 0 (i.e., from user space). This function can be + * used to emulate the results of the aforementioned instructions with + * dummy values. Results are copied to user-space memory as indicated by + * the instruction pointed by EIP using the registers indicated in the + * instruction operands. This function also takes care of determining + * the address to which the results must be copied. + */ +bool fixup_umip_exception(struct pt_regs *regs) +{ + struct insn insn; + unsigned char buf[MAX_INSN_SIZE]; + /* 10 bytes is the maximum size of the result of UMIP instructions */ + unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; + unsigned long seg_base; + int not_copied, nr_copied, reg_offset, dummy_data_size; + void __user *uaddr; + unsigned long *reg_addr; + enum umip_insn umip_inst; + + /* + * Use the segment base in case user space used a different code + * segment, either in protected (e.g., from an LDT) or virtual-8086 + * modes. In most of the cases seg_base will be zero as in USER_CS. + */ + seg_base = insn_get_seg_base(regs, &insn, offsetof(struct pt_regs, ip), + true); + not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip), + sizeof(buf)); + nr_copied = sizeof(buf) - not_copied; + /* + * The copy_from_user above could have failed if user code is protected + * by a memory protection key. Give up on emulation in such a case. + * Should we issue a page fault? + */ + if (!nr_copied) + return false; + + insn_init(&insn, buf, nr_copied, 0); + + /* + * Override the default operand and address sizes to what is specified + * in the code segment descriptor. The instruction decoder only sets + * the address size it to either 4 or 8 address bytes and does nothing + * for the operand bytes. This OK for most of the cases, but we could + * have special cases where, for instance, a 16-bit code segment + * descriptor is used. + * If there are overrides, the instruction decoder correctly updates + * these values, even for 16-bit defaults. + */ + insn.addr_bytes = insn_get_seg_default_address_bytes(regs); + insn.opnd_bytes = insn_get_seg_default_operand_bytes(regs); + + if (!insn.addr_bytes || !insn.opnd_bytes) + return false; + +#ifdef CONFIG_X86_64 + if (user_64bit_mode(regs)) + return false; +#endif + + insn_get_length(&insn); + if (nr_copied < insn.length) + return false; + + umip_inst = __identify_insn(&insn); + /* Check if we found an instruction protected by UMIP */ + if (umip_inst < 0) + return false; + + if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size)) + return false; + + /* If operand is a register, write directly to it */ + if (X86_MODRM_MOD(insn.modrm.value) == 3) { + reg_offset = insn_get_reg_offset_modrm_rm(&insn, regs); + reg_addr = (unsigned long *)((unsigned long)regs + reg_offset); + memcpy(reg_addr, dummy_data, dummy_data_size); + } else { + uaddr = insn_get_addr_ref(&insn, regs); + nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size); + if (nr_copied > 0) + return false; + } + + /* increase IP to let the program keep going */ + regs->ip += insn.length; + return true; +}
fixup_umip_exception will be called from do_general_protection. If the former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV. However, when emulation is successful but the emulated result cannot be copied to user space memory, it is more accurate to issue a SIGSEGV with SEGV_MAPERR with the offending address. A new function is inspired in force_sig_info_fault is introduced to model the page fault.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/kernel/umip.c | 45 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c index e64d8e5..bd06e26 100644 --- a/arch/x86/kernel/umip.c +++ b/arch/x86/kernel/umip.c @@ -163,6 +163,41 @@ static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst, }
/** + * __force_sig_info_umip_fault - Force a SIGSEGV with SEGV_MAPERR + * @address: Address that caused the signal + * @regs: Register set containing the instruction pointer + * + * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is + * intended to be used to provide a segmentation fault when the result of the + * UMIP emulation could not be copied to the user space memory. + * + * Return: none + */ +static void __force_sig_info_umip_fault(void __user *address, + struct pt_regs *regs) +{ + siginfo_t info; + struct task_struct *tsk = current; + + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) { + printk_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n", + tsk->comm, task_pid_nr(tsk), regs->ip, + regs->sp, X86_PF_USER | X86_PF_WRITE, + regs->ip); + } + + tsk->thread.cr2 = (unsigned long)address; + tsk->thread.error_code = X86_PF_USER | X86_PF_WRITE; + tsk->thread.trap_nr = X86_TRAP_PF; + + info.si_signo = SIGSEGV; + info.si_errno = 0; + info.si_code = SEGV_MAPERR; + info.si_addr = address; + force_sig_info(SIGSEGV, &info, tsk); +} + +/** * fixup_umip_exception - Fixup #GP faults caused by UMIP * @regs: Registers as saved when entering the #GP trap * @@ -247,8 +282,14 @@ bool fixup_umip_exception(struct pt_regs *regs) } else { uaddr = insn_get_addr_ref(&insn, regs); nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size); - if (nr_copied > 0) - return false; + if (nr_copied > 0) { + /* + * If copy fails, send a signal and tell caller that + * fault was fixed up + */ + __force_sig_info_umip_fault(uaddr, regs); + return true; + } }
/* increase IP to let the program keep going */
If the User-Mode Instruction Prevention CPU feature is available and enabled, a general protection fault will be issued if the instructions sgdt, sldt, sidt, str or smsw are executed from user-mode context (CPL > 0). If the fault was caused by any of the instructions protected by UMIP, fixup_umip_exception will emulate dummy results for these instructions. If emulation is successful, the result is passed to the user space program and no SIGSEGV signal is emitted.
Please note that fixup_umip_exception also caters for the case when the fault originated while running in virtual-8086 mode.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/kernel/traps.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 948443e..86efbcb 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -65,6 +65,7 @@ #include <asm/trace/mpx.h> #include <asm/mpx.h> #include <asm/vm86.h> +#include <asm/umip.h>
#ifdef CONFIG_X86_64 #include <asm/x86_init.h> @@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code) RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); cond_local_irq_enable(regs);
+ if (user_mode(regs) && fixup_umip_exception(regs)) + return; + if (v8086_mode(regs)) { local_irq_enable(); handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
If the User-Mode Instruction Prevention CPU feature is available and enabled, a general protection fault will be issued if the instructions sgdt, sldt, sidt, str or smsw are executed from user-mode context (CPL > 0). If the fault was caused by any of the instructions protected by UMIP, fixup_umip_exception will emulate dummy results for these instructions. If emulation is successful, the result is passed to the user space program and no SIGSEGV signal is emitted.
Please note that fixup_umip_exception also caters for the case when the fault originated while running in virtual-8086 mode.
Reviewed-by: Andy Lutomirski luto@kernel.org
User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a bit in %cr4.
It makes sense to enable UMIP at some point while booting, before user spaces come up. Like SMAP and SMEP, is not critical to have it enabled very early during boot. This is because UMIP is relevant only when there is a userspace to be protected from. Given the similarities in relevance, it makes sense to enable UMIP along with SMAP and SMEP.
UMIP is enabled by default. It can be disabled by adding clearcpuid=514 to the kernel parameters.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- arch/x86/Kconfig | 10 ++++++++++ arch/x86/kernel/cpu/common.c | 16 +++++++++++++++- 2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cc98d5a..b7f1226 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1735,6 +1735,16 @@ config X86_SMAP
If unsure, say Y.
+config X86_INTEL_UMIP + def_bool y + depends on CPU_SUP_INTEL + prompt "Intel User Mode Instruction Prevention" if EXPERT + ---help--- + The User Mode Instruction Prevention (UMIP) is a security + feature in newer Intel processors. If enabled, a general + protection fault is issued if the instructions SGDT, SLDT, + SIDT, SMSW and STR are executed in user mode. + config X86_INTEL_MPX prompt "Intel MPX (Memory Protection Extensions)" def_bool n diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 58094a1..9f59eb5 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -311,6 +311,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c) } }
+static __always_inline void setup_umip(struct cpuinfo_x86 *c) +{ + if (cpu_feature_enabled(X86_FEATURE_UMIP) && + cpu_has(c, X86_FEATURE_UMIP)) + cr4_set_bits(X86_CR4_UMIP); + else + /* + * Make sure UMIP is disabled in case it was enabled in a + * previous boot (e.g., via kexec). + */ + cr4_clear_bits(X86_CR4_UMIP); +} + /* * Protection Keys are not available in 32-bit mode. */ @@ -1080,9 +1093,10 @@ static void identify_cpu(struct cpuinfo_x86 *c) /* Disable the PN if appropriate */ squash_the_stupid_serial_number(c);
- /* Set up SMEP/SMAP */ + /* Set up SMEP/SMAP/UMIP */ setup_smep(c); setup_smap(c); + setup_umip(c);
/* * The vendor-specific functions might have changed features.
Certain user space programs that run on virtual-8086 mode may utilize instructions protected by the User-Mode Instruction Prevention (UMIP) security feature present in new Intel processors: SGDT, SIDT and SMSW. In such a case, a general protection fault is issued if UMIP is enabled. When such a fault happens, the kernel catches it and emulates the results of these instructions with dummy values. The purpose of this new test is to verify whether the impacted instructions can be executed without causing such #GP. If no #GP exceptions occur, we expect to exit virtual- 8086 mode from INT 0x80.
The instructions protected by UMIP are executed in representative use cases: a) the memory address of the result is given in the form of a displacement from the base of the data segment b) the memory address of the result is given in a general purpose register c) the result is stored directly in a general purpose register.
Unfortunately, it is not possible to check the results against a set of expected values because no emulation will occur in systems that do not have the UMIP feature. Instead, results are printed for verification.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- tools/testing/selftests/x86/entry_from_vm86.c | 39 ++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c index d075ea0..377b773 100644 --- a/tools/testing/selftests/x86/entry_from_vm86.c +++ b/tools/testing/selftests/x86/entry_from_vm86.c @@ -95,6 +95,22 @@ asm ( "int3\n\t" "vmcode_int80:\n\t" "int $0x80\n\t" + "umip:\n\t" + /* addressing via displacements */ + "smsw (2052)\n\t" + "sidt (2054)\n\t" + "sgdt (2060)\n\t" + /* addressing via registers */ + "mov $2066, %bx\n\t" + "smsw (%bx)\n\t" + "mov $2068, %bx\n\t" + "sidt (%bx)\n\t" + "mov $2074, %bx\n\t" + "sgdt (%bx)\n\t" + /* register operands, only for smsw */ + "smsw %ax\n\t" + "mov %ax, (2080)\n\t" + "int $0x80\n\t" ".size vmcode, . - vmcode\n\t" "end_vmcode:\n\t" ".code32\n\t" @@ -103,7 +119,7 @@ asm (
extern unsigned char vmcode[], end_vmcode[]; extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[], - vmcode_sti[], vmcode_int3[], vmcode_int80[]; + vmcode_sti[], vmcode_int3[], vmcode_int80[], umip[];
/* Returns false if the test was skipped. */ static bool do_test(struct vm86plus_struct *v86, unsigned long eip, @@ -218,6 +234,27 @@ int main(void) v86.regs.eax = (unsigned int)-1; do_test(&v86, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
+ /* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */ + do_test(&v86, umip - vmcode, VM86_INTx, 0x80, "UMIP tests"); + printf("[INFO]\tResults of UMIP-protected instructions via displacements:\n"); + printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2052)); + printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2054), + *(unsigned long *)(addr + 2056)); + printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2060), + *(unsigned long *)(addr + 2062)); + printf("[INFO]\tResults of UMIP-protected instructions via addressing in registers:\n"); + printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2066)); + printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2068), + *(unsigned long *)(addr + 2070)); + printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n", + *(unsigned short *)(addr + 2074), + *(unsigned long *)(addr + 2076)); + printf("[INFO]\tResults of SMSW via register operands:\n"); + printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2080)); + /* Execute a null pointer */ v86.regs.cs = 0; v86.regs.ss = 0;
On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Certain user space programs that run on virtual-8086 mode may utilize instructions protected by the User-Mode Instruction Prevention (UMIP) security feature present in new Intel processors: SGDT, SIDT and SMSW. In such a case, a general protection fault is issued if UMIP is enabled. When such a fault happens, the kernel catches it and emulates the results of these instructions with dummy values. The purpose of this new test is to verify whether the impacted instructions can be executed without causing such #GP. If no #GP exceptions occur, we expect to exit virtual- 8086 mode from INT 0x80.
The instructions protected by UMIP are executed in representative use cases: a) the memory address of the result is given in the form of a displacement from the base of the data segment b) the memory address of the result is given in a general purpose register c) the result is stored directly in a general purpose register.
Unfortunately, it is not possible to check the results against a set of expected values because no emulation will occur in systems that do not have the UMIP feature. Instead, results are printed for verification.
You could pre-initialize the result buffer to a bunch of non-matching values (1, 2, 3, ...) and then check that all the invocations of the same instruction gave the same value.
If you do this, maybe make it a follow-up patch -- see other email.
On Wed, 2017-03-08 at 07:56 -0800, Andy Lutomirski wrote:
On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
Certain user space programs that run on virtual-8086 mode may utilize instructions protected by the User-Mode Instruction Prevention (UMIP) security feature present in new Intel processors: SGDT, SIDT and SMSW. In such a case, a general protection fault is issued if UMIP is enabled. When such a fault happens, the kernel catches it and emulates the results of these instructions with dummy values. The purpose of this new test is to verify whether the impacted instructions can be executed without causing such #GP. If no #GP exceptions occur, we expect to exit virtual- 8086 mode from INT 0x80.
The instructions protected by UMIP are executed in representative use cases: a) the memory address of the result is given in the form of a displacement from the base of the data segment b) the memory address of the result is given in a general purpose register c) the result is stored directly in a general purpose register.
Unfortunately, it is not possible to check the results against a set of expected values because no emulation will occur in systems that do not have the UMIP feature. Instead, results are printed for verification.
You could pre-initialize the result buffer to a bunch of non-matching values (1, 2, 3, ...) and then check that all the invocations of the same instruction gave the same value.
Yes, I can do this. Alternatively, I can check in the test program if the CPU has UMIP and only run the tests in that case.
If you do this, maybe make it a follow-up patch -- see other email.
Great! Thank you!
Thanks and BR, Ricardo
08.03.2017 03:32, Ricardo Neri пишет:
These are the instructions covered by UMIP:
- SGDT - Store Global Descriptor Table
- SIDT - Store Interrupt Descriptor Table
- SLDT - Store Local Descriptor Table
- SMSW - Store Machine Status Word
- STR - Store Task Register
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[8] does not support applications that use these instructions.
Yes, this is the case. But at least in the past there was an attempt to support SLDT as it is used by an ancient pharlap DOS extender (currently unsupported by dosemu1/2). So how difficult would it be to add an optional possibility of delivering such SIGSEGV to userspace so that the kernel's dummy emulation can be overridden? It doesn't need to be a matter of this particular patch set, i.e. this proposal should not trigger a v7 resend of all 21 patches. :) But it would be useful for the future development of dosemu2.
On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 03:32, Ricardo Neri пишет:
These are the instructions covered by UMIP:
- SGDT - Store Global Descriptor Table
- SIDT - Store Interrupt Descriptor Table
- SLDT - Store Local Descriptor Table
- SMSW - Store Machine Status Word
- STR - Store Task Register
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[8] does not support applications that use these instructions.
Can you remind me what was special about it? It looks like you still emulate them in v8086 mode.
Yes, this is the case. But at least in the past there was an attempt to support SLDT as it is used by an ancient pharlap DOS extender (currently unsupported by dosemu1/2). So how difficult would it be to add an optional possibility of delivering such SIGSEGV to userspace so that the kernel's dummy emulation can be overridden? It doesn't need to be a matter of this particular patch set, i.e. this proposal should not trigger a v7 resend of all 21 patches. :) But it would be useful for the future development of dosemu2.
What I'd actually like to see is a totally separate patchset that adds an inheritable (but reset on exec) per-task mask of legacy compatibility features to disable. Maybe:
sys_adjust_compat_mask(int op, int word, u32 mask);
op could indicate that we want to so SET, OR, AND, or READ. word would be 0 for now. It could be a prctl, too.
Things in the mask could include:
COMPAT_MASK0_X86_64_VSYSCALL [1] COMPAT_MASK0_X86_UMIP_FIXUP
I'm sure I could think of more along these lines.
Then DOSEMU (and future WINE versions, too) could just mask off X86_UMIP_FIXUP and do their own emulation
[1] For those of you thinking about this and realizing that VSYSCALL readability is inherently global and not per-task, I know how to fix that for essentially no cost :)
--Andy
08.03.2017 19:06, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 03:32, Ricardo Neri пишет:
These are the instructions covered by UMIP:
- SGDT - Store Global Descriptor Table
- SIDT - Store Interrupt Descriptor Table
- SLDT - Store Local Descriptor Table
- SMSW - Store Machine Status Word
- STR - Store Task Register
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[8] does not support applications that use these instructions.
Can you remind me what was special about it? It looks like you still emulate them in v8086 mode.
Indeed, sorry, I meant prot mode here. :) So I wonder what was cited to be special about v86.
Yes, this is the case. But at least in the past there was an attempt to support SLDT as it is used by an ancient pharlap DOS extender (currently unsupported by dosemu1/2). So how difficult would it be to add an optional possibility of delivering such SIGSEGV to userspace so that the kernel's dummy emulation can be overridden? It doesn't need to be a matter of this particular patch set, i.e. this proposal should not trigger a v7 resend of all 21 patches. :) But it would be useful for the future development of dosemu2.
What I'd actually like to see is a totally separate patchset that adds an inheritable (but reset on exec) per-task mask of legacy compatibility features to disable. Maybe:
sys_adjust_compat_mask(int op, int word, u32 mask);
No no, since I meant prot mode, this is not what I need. I would never need to disable UMIP as to allow the prot mode apps to do SLDT. Instead it would be good to have an ability to provide a replacement for the dummy emulation that is currently being proposed for kernel. All is needed for this, is just to deliver a SIGSEGV.
On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 19:06, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 03:32, Ricardo Neri пишет:
These are the instructions covered by UMIP:
- SGDT - Store Global Descriptor Table
- SIDT - Store Interrupt Descriptor Table
- SLDT - Store Local Descriptor Table
- SMSW - Store Machine Status Word
- STR - Store Task Register
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[8] does not support applications that use these instructions.
Can you remind me what was special about it? It looks like you still emulate them in v8086 mode.
Indeed, sorry, I meant prot mode here. :) So I wonder what was cited to be special about v86.
Not sure. Ricardo?
Yes, this is the case. But at least in the past there was an attempt to support SLDT as it is used by an ancient pharlap DOS extender (currently unsupported by dosemu1/2). So how difficult would it be to add an optional possibility of delivering such SIGSEGV to userspace so that the kernel's dummy emulation can be overridden? It doesn't need to be a matter of this particular patch set, i.e. this proposal should not trigger a v7 resend of all 21 patches. :) But it would be useful for the future development of dosemu2.
What I'd actually like to see is a totally separate patchset that adds an inheritable (but reset on exec) per-task mask of legacy compatibility features to disable. Maybe:
sys_adjust_compat_mask(int op, int word, u32 mask);
No no, since I meant prot mode, this is not what I need. I would never need to disable UMIP as to allow the prot mode apps to do SLDT. Instead it would be good to have an ability to provide a replacement for the dummy emulation that is currently being proposed for kernel. All is needed for this, is just to deliver a SIGSEGV.
That's what I meant. Turning off FIXUP_UMIP would leave UMIP on but turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86 GP exit).
--Andy
08.03.2017 19:46, Andy Lutomirski пишет:
No no, since I meant prot mode, this is not what I need. I would never need to disable UMIP as to allow the prot mode apps to do SLDT. Instead it would be good to have an ability to provide a replacement for the dummy emulation that is currently being proposed for kernel. All is needed for this, is just to deliver a SIGSEGV.
That's what I meant. Turning off FIXUP_UMIP would leave UMIP on but turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86 GP exit).
But then I am confused with the word "compat" in your "COMPAT_MASK0_X86_UMIP_FIXUP" and "sys_adjust_compat_mask(int op, int word, u32 mask);"
Leaving UMIP on and only disabling a fixup doesn't sound like a compat option to me. I would expect compat to disable it completely.
On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
08.03.2017 19:46, Andy Lutomirski пишет:
No no, since I meant prot mode, this is not what I need. I would never need to disable UMIP as to allow the prot mode apps to do SLDT. Instead it would be good to have an ability to provide a replacement for the dummy emulation that is currently being proposed for kernel. All is needed for this, is just to deliver a SIGSEGV.
That's what I meant. Turning off FIXUP_UMIP would leave UMIP on but turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86 GP exit).
But then I am confused with the word "compat" in your "COMPAT_MASK0_X86_UMIP_FIXUP" and "sys_adjust_compat_mask(int op, int word, u32 mask);"
Leaving UMIP on and only disabling a fixup doesn't sound like a compat option to me. I would expect compat to disable it completely.
I guess that the _UMIP_FIXUP part makes it clear that emulation, not UMIP is disabled, allowing the SIGSEGV be delivered to the user space program.
Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its purpose? Applications could simply use this compat mask to bypass UMIP and gain access to the instructions it protects.
Thanks and BR, Ricardo
09.03.2017 04:11, Ricardo Neri пишет:
On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
08.03.2017 19:46, Andy Lutomirski пишет:
No no, since I meant prot mode, this is not what I need. I would never need to disable UMIP as to allow the prot mode apps to do SLDT. Instead it would be good to have an ability to provide a replacement for the dummy emulation that is currently being proposed for kernel. All is needed for this, is just to deliver a SIGSEGV.
That's what I meant. Turning off FIXUP_UMIP would leave UMIP on but turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86 GP exit).
But then I am confused with the word "compat" in your "COMPAT_MASK0_X86_UMIP_FIXUP" and "sys_adjust_compat_mask(int op, int word, u32 mask);"
Leaving UMIP on and only disabling a fixup doesn't sound like a compat option to me. I would expect compat to disable it completely.
I guess that the _UMIP_FIXUP part makes it clear that emulation, not UMIP is disabled, allowing the SIGSEGV be delivered to the user space program.
Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its purpose? Applications could simply use this compat mask to bypass UMIP and gain access to the instructions it protects.
I don't think someone will want to completely disable UMIP, so why do you need such functionality? My question was only what does "compat" mean in "COMPAT_MASK0_X86_UMIP_FIXUP", compat with what.
On Wed, Mar 8, 2017 at 5:11 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
08.03.2017 19:46, Andy Lutomirski пишет:
No no, since I meant prot mode, this is not what I need. I would never need to disable UMIP as to allow the prot mode apps to do SLDT. Instead it would be good to have an ability to provide a replacement for the dummy emulation that is currently being proposed for kernel. All is needed for this, is just to deliver a SIGSEGV.
That's what I meant. Turning off FIXUP_UMIP would leave UMIP on but turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86 GP exit).
But then I am confused with the word "compat" in your "COMPAT_MASK0_X86_UMIP_FIXUP" and "sys_adjust_compat_mask(int op, int word, u32 mask);"
Leaving UMIP on and only disabling a fixup doesn't sound like a compat option to me. I would expect compat to disable it completely.
I guess that the _UMIP_FIXUP part makes it clear that emulation, not UMIP is disabled, allowing the SIGSEGV be delivered to the user space program.
Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its purpose? Applications could simply use this compat mask to bypass UMIP and gain access to the instructions it protects.
I was obviously extremely unclear. The point of the proposed syscall is to let programs opt out of legacy features. So there would be a bit to disable emulation of UMIP-blocked instructions (this giving the unadulterated #GP). There would not be a bit to disable UMIP itself.
There's also a flaw in my proposal. Disable-vsyscall would be per-mm and disable-umip-emulation would be per-task, so they'd need to be in separate words to make any sense. I'll ponder this a bit more.
10.03.2017 05:41, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 5:11 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
08.03.2017 19:46, Andy Lutomirski пишет:
No no, since I meant prot mode, this is not what I need. I would never need to disable UMIP as to allow the prot mode apps to do SLDT. Instead it would be good to have an ability to provide a replacement for the dummy emulation that is currently being proposed for kernel. All is needed for this, is just to deliver a SIGSEGV.
That's what I meant. Turning off FIXUP_UMIP would leave UMIP on but turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86 GP exit).
But then I am confused with the word "compat" in your "COMPAT_MASK0_X86_UMIP_FIXUP" and "sys_adjust_compat_mask(int op, int word, u32 mask);"
Leaving UMIP on and only disabling a fixup doesn't sound like a compat option to me. I would expect compat to disable it completely.
I guess that the _UMIP_FIXUP part makes it clear that emulation, not UMIP is disabled, allowing the SIGSEGV be delivered to the user space program.
Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its purpose? Applications could simply use this compat mask to bypass UMIP and gain access to the instructions it protects.
I was obviously extremely unclear. The point of the proposed syscall is to let programs opt out of legacy features.
I guess both "compat" and "legacy" are misleading here. Maybe these are "x86-specific" or "hypervisor-specific", but a mere enabling of UMIP doesn't immediately make the use of SLDT instruction a legacy IMHO.
I'll ponder this a bit more.
So if we are to invent something new, it would be nice to also think up a clear terminology for it. Maybe something like "X86_FEATURE_xxx_MASK" or alike.
On Fri, Mar 10, 2017 at 2:30 AM, Stas Sergeev stsp@list.ru wrote:
10.03.2017 05:41, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 5:11 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
08.03.2017 19:46, Andy Lutomirski пишет:
No no, since I meant prot mode, this is not what I need. I would never need to disable UMIP as to allow the prot mode apps to do SLDT. Instead it would be good to have an ability to provide a replacement for the dummy emulation that is currently being proposed for kernel. All is needed for this, is just to deliver a SIGSEGV.
That's what I meant. Turning off FIXUP_UMIP would leave UMIP on but turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86 GP exit).
But then I am confused with the word "compat" in your "COMPAT_MASK0_X86_UMIP_FIXUP" and "sys_adjust_compat_mask(int op, int word, u32 mask);"
Leaving UMIP on and only disabling a fixup doesn't sound like a compat option to me. I would expect compat to disable it completely.
I guess that the _UMIP_FIXUP part makes it clear that emulation, not UMIP is disabled, allowing the SIGSEGV be delivered to the user space program.
Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its purpose? Applications could simply use this compat mask to bypass UMIP and gain access to the instructions it protects.
I was obviously extremely unclear. The point of the proposed syscall is to let programs opt out of legacy features.
I guess both "compat" and "legacy" are misleading here. Maybe these are "x86-specific" or "hypervisor-specific", but a mere enabling of UMIP doesn't immediately make the use of SLDT instruction a legacy IMHO.
Sure it is. :) Using SLDT from user mode is a legacy ability that just happens to still work on existing CPUs and kernels. Once UMIP goes in, it will officially be obsolete -- it will just be supported for backwards compatibility. New code should opt out and emulate in usermode if needed. (And the vast, vast majority of Linux programs don't use these instructions in the first place.)
Similarly, vsyscalls were obsolete the as soon as better alternatives were fully supported and the kernel started making them slow, and the fact that new static glibc programs still used them for a little while didn't make them any less obsolete.
I'll ponder this a bit more.
So if we are to invent something new, it would be nice to also think up a clear terminology for it. Maybe something like "X86_FEATURE_xxx_MASK" or alike.
But they're misfeatures, not features.
--Andy
11.03.2017 00:04, Andy Lutomirski пишет:
On Fri, Mar 10, 2017 at 2:30 AM, Stas Sergeev stsp@list.ru wrote:
10.03.2017 05:41, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 5:11 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
08.03.2017 19:46, Andy Lutomirski пишет:
> No no, since I meant prot mode, this is not what I need. > I would never need to disable UMIP as to allow the > prot mode apps to do SLDT. Instead it would be good > to have an ability to provide a replacement for the dummy > emulation that is currently being proposed for kernel. > All is needed for this, is just to deliver a SIGSEGV. That's what I meant. Turning off FIXUP_UMIP would leave UMIP on but turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86 GP exit).
But then I am confused with the word "compat" in your "COMPAT_MASK0_X86_UMIP_FIXUP" and "sys_adjust_compat_mask(int op, int word, u32 mask);"
Leaving UMIP on and only disabling a fixup doesn't sound like a compat option to me. I would expect compat to disable it completely.
I guess that the _UMIP_FIXUP part makes it clear that emulation, not UMIP is disabled, allowing the SIGSEGV be delivered to the user space program.
Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its purpose? Applications could simply use this compat mask to bypass UMIP and gain access to the instructions it protects.
I was obviously extremely unclear. The point of the proposed syscall is to let programs opt out of legacy features.
I guess both "compat" and "legacy" are misleading here. Maybe these are "x86-specific" or "hypervisor-specific", but a mere enabling of UMIP doesn't immediately make the use of SLDT instruction a legacy IMHO.
Sure it is. :) Using SLDT from user mode is a legacy ability that just happens to still work on existing CPUs and kernels. Once UMIP goes in, it will officially be obsolete
Yes, but the names you suggest, imply that "UMIP_FIXUP" is legacy or compat, which I find misleading because it have just appeared. Maybe something like "COMPAT_X86_UMIP_INSNS_EMU"?
On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 19:06, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 03:32, Ricardo Neri пишет:
These are the instructions covered by UMIP:
- SGDT - Store Global Descriptor Table
- SIDT - Store Interrupt Descriptor Table
- SLDT - Store Local Descriptor Table
- SMSW - Store Machine Status Word
- STR - Store Task Register
This patchset initially treated tasks running in virtual-8086
mode as a
special case. However, I received clarification that DOSEMU[8]
does not
support applications that use these instructions.
Can you remind me what was special about it? It looks like you
still
emulate them in v8086 mode.
Indeed, sorry, I meant prot mode here. :) So I wonder what was cited to be special about v86.
Initially my patches disabled UMIP on virtual-8086 instructions, without regards of protected mode (i.e., UMIP was always enabled). I didn't have emulation at the time. Then, I added emulation code that now covers protected and virtual-8086 modes. I guess it is not special anymore.
Thanks and BR, Ricardo
09.03.2017 04:15, Ricardo Neri пишет:
On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 19:06, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 03:32, Ricardo Neri пишет:
These are the instructions covered by UMIP:
- SGDT - Store Global Descriptor Table
- SIDT - Store Interrupt Descriptor Table
- SLDT - Store Local Descriptor Table
- SMSW - Store Machine Status Word
- STR - Store Task Register
This patchset initially treated tasks running in virtual-8086
mode as a
special case. However, I received clarification that DOSEMU[8]
does not
support applications that use these instructions.
Can you remind me what was special about it? It looks like you
still
emulate them in v8086 mode.
Indeed, sorry, I meant prot mode here. :) So I wonder what was cited to be special about v86.
Initially my patches disabled UMIP on virtual-8086 instructions, without regards of protected mode (i.e., UMIP was always enabled). I didn't have emulation at the time. Then, I added emulation code that now covers protected and virtual-8086 modes. I guess it is not special anymore.
But isn't SLDT&friends just throw UD in v86? How does UMIP affect this? How does your patch affect this?
On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev stsp@list.ru wrote:
09.03.2017 04:15, Ricardo Neri пишет:
On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 19:06, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 03:32, Ricardo Neri пишет: > > These are the instructions covered by UMIP: > * SGDT - Store Global Descriptor Table > * SIDT - Store Interrupt Descriptor Table > * SLDT - Store Local Descriptor Table > * SMSW - Store Machine Status Word > * STR - Store Task Register > > This patchset initially treated tasks running in virtual-8086
mode as a
> > special case. However, I received clarification that DOSEMU[8]
does not
> > support applications that use these instructions.
Can you remind me what was special about it? It looks like you
still
emulate them in v8086 mode.
Indeed, sorry, I meant prot mode here. :) So I wonder what was cited to be special about v86.
Initially my patches disabled UMIP on virtual-8086 instructions, without regards of protected mode (i.e., UMIP was always enabled). I didn't have emulation at the time. Then, I added emulation code that now covers protected and virtual-8086 modes. I guess it is not special anymore.
But isn't SLDT&friends just throw UD in v86? How does UMIP affect this? How does your patch affect this?
Er, right. Ricardo, your code may need fixing. But don't you have a test case for this? The behavior should be the same with and without your patches applied. The exception is #UD, not #GP, so maybe your code just never executes in the vm86 case.
--Andy
10.03.2017 05:39, Andy Lutomirski пишет:
On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev stsp@list.ru wrote:
09.03.2017 04:15, Ricardo Neri пишет:
On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 19:06, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote: > 08.03.2017 03:32, Ricardo Neri пишет: >> These are the instructions covered by UMIP: >> * SGDT - Store Global Descriptor Table >> * SIDT - Store Interrupt Descriptor Table >> * SLDT - Store Local Descriptor Table >> * SMSW - Store Machine Status Word >> * STR - Store Task Register >> >> This patchset initially treated tasks running in virtual-8086
mode as a
>> special case. However, I received clarification that DOSEMU[8]
does not
>> support applications that use these instructions. Can you remind me what was special about it? It looks like you
still
emulate them in v8086 mode.
Indeed, sorry, I meant prot mode here. :) So I wonder what was cited to be special about v86.
Initially my patches disabled UMIP on virtual-8086 instructions, without regards of protected mode (i.e., UMIP was always enabled). I didn't have emulation at the time. Then, I added emulation code that now covers protected and virtual-8086 modes. I guess it is not special anymore.
But isn't SLDT&friends just throw UD in v86? How does UMIP affect this? How does your patch affect this?
Er, right. Ricardo, your code may need fixing. But don't you have a test case for this?
Why would you need one? Or do you really want to allow these instructions in v86 by the means of emulation? If so - this wasn't clearly stated in the patch description, neither it was properly discussed, it seems.
On Fri, Mar 10, 2017 at 3:33 AM, Stas Sergeev stsp@list.ru wrote:
10.03.2017 05:39, Andy Lutomirski пишет:
On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev stsp@list.ru wrote:
09.03.2017 04:15, Ricardo Neri пишет:
On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 19:06, Andy Lutomirski пишет: > > On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote: >> >> 08.03.2017 03:32, Ricardo Neri пишет: >>> >>> These are the instructions covered by UMIP: >>> * SGDT - Store Global Descriptor Table >>> * SIDT - Store Interrupt Descriptor Table >>> * SLDT - Store Local Descriptor Table >>> * SMSW - Store Machine Status Word >>> * STR - Store Task Register >>> >>> This patchset initially treated tasks running in virtual-8086
mode as a
>>> >>> special case. However, I received clarification that DOSEMU[8]
does not
>>> >>> support applications that use these instructions. > > Can you remind me what was special about it? It looks like you
still
> > emulate them in v8086 mode.
Indeed, sorry, I meant prot mode here. :) So I wonder what was cited to be special about v86.
Initially my patches disabled UMIP on virtual-8086 instructions, without regards of protected mode (i.e., UMIP was always enabled). I didn't have emulation at the time. Then, I added emulation code that now covers protected and virtual-8086 modes. I guess it is not special anymore.
But isn't SLDT&friends just throw UD in v86? How does UMIP affect this? How does your patch affect this?
Er, right. Ricardo, your code may need fixing. But don't you have a test case for this?
Why would you need one? Or do you really want to allow these instructions in v86 by the means of emulation? If so - this wasn't clearly stated in the patch description, neither it was properly discussed, it seems.
What I meant was: if the patches incorrectly started making these instructions work in vm86 mode where they used to cause a vm86 exit, then that's a bug that the selftest should have caught.
On Fri, 2017-03-10 at 06:17 -0800, Andy Lutomirski wrote:
On Fri, Mar 10, 2017 at 3:33 AM, Stas Sergeev stsp@list.ru wrote:
10.03.2017 05:39, Andy Lutomirski пишет:
On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev stsp@list.ru wrote:
09.03.2017 04:15, Ricardo Neri пишет:
On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev stsp@list.ru wrote: > > 08.03.2017 19:06, Andy Lutomirski пишет: >> >> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote: >>> >>> 08.03.2017 03:32, Ricardo Neri пишет: >>>> >>>> These are the instructions covered by UMIP: >>>> * SGDT - Store Global Descriptor Table >>>> * SIDT - Store Interrupt Descriptor Table >>>> * SLDT - Store Local Descriptor Table >>>> * SMSW - Store Machine Status Word >>>> * STR - Store Task Register >>>> >>>> This patchset initially treated tasks running in virtual-8086
mode as a >>>> >>>> special case. However, I received clarification that DOSEMU[8]
does not >>>> >>>> support applications that use these instructions. >> >> Can you remind me what was special about it? It looks like you
still >> >> emulate them in v8086 mode. > > Indeed, sorry, I meant prot mode here. :) > So I wonder what was cited to be special about v86.
Initially my patches disabled UMIP on virtual-8086 instructions, without regards of protected mode (i.e., UMIP was always enabled). I didn't have emulation at the time. Then, I added emulation code that now covers protected and virtual-8086 modes. I guess it is not special anymore.
But isn't SLDT&friends just throw UD in v86? How does UMIP affect this? How does your patch affect this?
Er, right. Ricardo, your code may need fixing. But don't you have a test case for this?
Why would you need one? Or do you really want to allow these instructions in v86 by the means of emulation? If so - this wasn't clearly stated in the patch description, neither it was properly discussed, it seems.
What I meant was: if the patches incorrectly started making these instructions work in vm86 mode where they used to cause a vm86 exit, then that's a bug that the selftest should have caught.
Yes, this is the case. I will fix this behavior... and update the test cases.
On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
10.03.2017 05:39, Andy Lutomirski пишет:
On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev stsp@list.ru wrote:
09.03.2017 04:15, Ricardo Neri пишет:
On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 19:06, Andy Lutomirski пишет: > On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote: >> 08.03.2017 03:32, Ricardo Neri пишет: >>> These are the instructions covered by UMIP: >>> * SGDT - Store Global Descriptor Table >>> * SIDT - Store Interrupt Descriptor Table >>> * SLDT - Store Local Descriptor Table >>> * SMSW - Store Machine Status Word >>> * STR - Store Task Register >>> >>> This patchset initially treated tasks running in virtual-8086
mode as a
>>> special case. However, I received clarification that DOSEMU[8]
does not
>>> support applications that use these instructions. > Can you remind me what was special about it? It looks like you
still
> emulate them in v8086 mode. Indeed, sorry, I meant prot mode here. :) So I wonder what was cited to be special about v86.
Initially my patches disabled UMIP on virtual-8086 instructions, without regards of protected mode (i.e., UMIP was always enabled). I didn't have emulation at the time. Then, I added emulation code that now covers protected and virtual-8086 modes. I guess it is not special anymore.
But isn't SLDT&friends just throw UD in v86? How does UMIP affect this? How does your patch affect this?
Er, right. Ricardo, your code may need fixing. But don't you have a test case for this?
Why would you need one? Or do you really want to allow these instructions in v86 by the means of emulation? If so - this wasn't clearly stated in the patch description, neither it was properly discussed, it seems.
It str and sldt can be emulated in vm86 but as Andy mention, the behavior sould be the same with and without emulation.
Thanks and BR, Ricardo
11.03.2017 02:59, Ricardo Neri пишет:
On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
Why would you need one? Or do you really want to allow these instructions in v86 by the means of emulation? If so - this wasn't clearly stated in the patch description, neither it was properly discussed, it seems.
It str and sldt can be emulated in vm86 but as Andy mention, the behavior sould be the same with and without emulation.
Why would you do that? I looked up the dosemu2 CPU simulator code that is used under x86-64. It says this: --- CODE_FLUSH(); if (REALMODE()) goto illegal_op; PC += ModRMSim(PC+1, mode) + 1; error("SLDT not implemented\n"); break; case 1: /* STR */ /* Store Task Register */ CODE_FLUSH(); if (REALMODE()) goto illegal_op; PC += ModRMSim(PC+1, mode) + 1; error("STR not implemented\n"); break; ... case 0: /* SGDT */ /* Store Global Descriptor Table Register */ PC++; PC += ModRM(opc, PC, mode|DATA16|MSTORE); error("SGDT not implemented\n"); break; case 1: /* SIDT */ /* Store Interrupt Descriptor Table Register */ PC++; PC += ModRM(opc, PC, mode|DATA16|MSTORE); error("SIDT not implemented\n"); break; ---
It only implements smsw. So maybe you can make your code much simpler and remove the unneeded emulation? Same is for prot mode. You know the wine's requirements now - they are very small. And dosemu doesn't need anything at all but smsw. And even smsw is very rare.
On Tue, 2017-03-14 at 00:25 +0300, Stas Sergeev wrote:
11.03.2017 02:59, Ricardo Neri пишет:
On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
Why would you need one? Or do you really want to allow these instructions in v86 by the means of emulation? If so - this wasn't clearly stated in the patch description, neither it was properly discussed, it seems.
It str and sldt can be emulated in vm86 but as Andy mention, the behavior sould be the same with and without emulation.
Why would you do that? I looked up the dosemu2 CPU simulator code that is used under x86-64. It says this:
Stas, I apologize for the delayed reply; I missed your e-mail.
CODE_FLUSH(); if (REALMODE()) goto illegal_op; PC += ModRMSim(PC+1, mode) + 1; error("SLDT not implemented\n"); break; case 1: /* STR */ /* Store Task Register */ CODE_FLUSH(); if (REALMODE()) goto illegal_op; PC += ModRMSim(PC+1, mode) + 1; error("STR not implemented\n"); break;
... case 0: /* SGDT */ /* Store Global Descriptor Table Register */ PC++; PC += ModRM(opc, PC, mode|DATA16|MSTORE); error("SGDT not implemented\n"); break; case 1: /* SIDT */ /* Store Interrupt Descriptor Table Register */ PC++; PC += ModRM(opc, PC, mode|DATA16|MSTORE); error("SIDT not implemented\n"); break;
It only implements smsw. So maybe you can make your code much simpler and remove the unneeded emulation? Same is for prot mode.
Do you mean the unneeded emulation for SLDT and STR?
You know the wine's requirements now - they are very small. And dosemu doesn't need anything at all but smsw. And even smsw is very rare.
But emulation is still needed for SMSW, right?
The majority of my patches deal with computing the effective based on the instruction operands and linear addresses based on the effective address and the segment descriptor. Only two or three patches deal with identifying particular UMIP-protected instructions. Not having to worry about STR and SLDT in vm86 could simplify things a bit, though.
Thanks and BR, Ricardo
28.03.2017 02:46, Ricardo Neri пишет:
On Tue, 2017-03-14 at 00:25 +0300, Stas Sergeev wrote:
11.03.2017 02:59, Ricardo Neri пишет:
On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
Why would you need one? Or do you really want to allow these instructions in v86 by the means of emulation? If so - this wasn't clearly stated in the patch description, neither it was properly discussed, it seems.
It str and sldt can be emulated in vm86 but as Andy mention, the behavior sould be the same with and without emulation.
Why would you do that? I looked up the dosemu2 CPU simulator code that is used under x86-64. It says this:
Stas, I apologize for the delayed reply; I missed your e-mail.
It only implements smsw. So maybe you can make your code much simpler and remove the unneeded emulation? Same is for prot mode.
Do you mean the unneeded emulation for SLDT and STR?
Not quite, I meant also sgdt and sidt in vm86. Yes that it will be a somewhat "incompatible" change, but if there is nothing to stay compatible with, then why to worry? Probably you could also remove the sldt and str emulation for protected mode, because, as I understand from this thread, wine does not need those.
Note that these days dosemu2 uses v86 mode set up under kvm rather than vm86(). Your patches affect that the same way as they do for vm86() syscall, or can there be some differences? Or should the UMIP be enabled under kvm by hands?
You know the wine's requirements now - they are very small. And dosemu doesn't need anything at all but smsw. And even smsw is very rare.
But emulation is still needed for SMSW, right?
Likely so. If you want, I can enable the logging of this command and see if it is used by some of the DOS programs I have. But at least dosemu implements it, so probably it is needed. Of course if it is used by one of 100 DOS progs, then there is an option to just add its support to dosemu2 and pretend the compatibility problems did not exist. :) So, if this can be an option, I can do the tests to estimate its usage.
On Tue, 2017-03-28 at 12:38 +0300, Stas Sergeev wrote:
28.03.2017 02:46, Ricardo Neri пишет:
On Tue, 2017-03-14 at 00:25 +0300, Stas Sergeev wrote:
11.03.2017 02:59, Ricardo Neri пишет:
On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
Why would you need one? Or do you really want to allow these instructions in v86 by the means of emulation? If so - this wasn't clearly stated in the patch description, neither it was properly discussed, it seems.
It str and sldt can be emulated in vm86 but as Andy mention, the behavior sould be the same with and without emulation.
Why would you do that? I looked up the dosemu2 CPU simulator code that is used under x86-64. It says this:
Stas, I apologize for the delayed reply; I missed your e-mail.
It only implements smsw. So maybe you can make your code much simpler and remove the unneeded emulation? Same is for prot mode.
Do you mean the unneeded emulation for SLDT and STR?
Not quite, I meant also sgdt and sidt in vm86. Yes that it will be a somewhat "incompatible" change, but if there is nothing to stay compatible with, then why to worry?
My idea of compatibility was to have the emulation code behave exactly as a processor without UMIP :)
Probably you could also remove the sldt and str emulation for protected mode, because, as I understand from this thread, wine does not need those.
I see. I would lean on keeping the emulation because I already implemented it :), for completeness, and because it is performed in a single switch. The bulk of the emulation code deals with operands.
Note that these days dosemu2 uses v86 mode set up under kvm rather than vm86(). Your patches affect that the same way as they do for vm86() syscall, or can there be some differences?
My code does not touch kvm at all. I would need to assess how kvm will behave.
Or should the UMIP be enabled under kvm by hands?
There was an attempt to emulate UMIP that was submitted a while ago: https://lkml.org/lkml/2016/7/12/644
You know the wine's requirements now - they are very small. And dosemu doesn't need anything at all but smsw. And even smsw is very rare.
But emulation is still needed for SMSW, right?
Likely so. If you want, I can enable the logging of this command and see if it is used by some of the DOS programs I have.
It would be great if you could do that, if you don't mind.
But at least dosemu implements it, so probably it is needed.
Right.
Of course if it is used by one of 100 DOS progs, then there is an option to just add its support to dosemu2 and pretend the compatibility problems did not exist. :)
Do you mean relaying the GP fault to dosemu instead of trapping it and emulating it in the kernel?
Thanks and BR, Ricardo
29.03.2017 07:38, Ricardo Neri пишет:
Probably you could also remove the sldt and str emulation for protected mode, because, as I understand from this thread, wine does not need those.
I see. I would lean on keeping the emulation because I already implemented it :), for completeness, and because it is performed in a single switch. The bulk of the emulation code deals with operands.
But this is not for free. As Andy said, you will then need a syscall and a feature mask to be able to disable this emulation. And AFAIK you haven't implemented that yet, so there is something to consider.
You know the wine's requirements now - they are very small. And dosemu doesn't need anything at all but smsw. And even smsw is very rare.
But emulation is still needed for SMSW, right?
Likely so. If you want, I can enable the logging of this command and see if it is used by some of the DOS programs I have.
It would be great if you could do that, if you don't mind.
OK, scheduled to the week-end. I'll let you know.
But at least dosemu implements it, so probably it is needed.
Right.
Of course if it is used by one of 100 DOS progs, then there is an option to just add its support to dosemu2 and pretend the compatibility problems did not exist. :)
Do you mean relaying the GP fault to dosemu instead of trapping it and emulating it in the kernel?
Yes, that would be optimal if this does not severely break the current setups. If we can find out that smsw is not in the real use, we can probably do exactly that. But other instructions are not in real use in v86 for sure, so I wouldn't be adding the explicit test-cases to the kernel that will make you depend on some particular behaviour that no one may need. My objection was that we shouldn't write tests before we know exactly how we want this to work.
On Wed, 2017-03-29 at 23:55 +0300, Stas Sergeev wrote:
29.03.2017 07:38, Ricardo Neri пишет:
Probably you could also remove the sldt and str emulation for protected mode, because, as I understand from this thread, wine does not need those.
I see. I would lean on keeping the emulation because I already implemented it :), for completeness, and because it is performed in a single switch. The bulk of the emulation code deals with operands.
But this is not for free. As Andy said, you will then need a syscall and a feature mask to be able to disable this emulation. And AFAIK you haven't implemented that yet, so there is something to consider.
Right, I see your point.
You know the wine's requirements now - they are very small. And dosemu doesn't need anything at all but smsw. And even smsw is very rare.
But emulation is still needed for SMSW, right?
Likely so. If you want, I can enable the logging of this command and see if it is used by some of the DOS programs I have.
It would be great if you could do that, if you don't mind.
OK, scheduled to the week-end. I'll let you know.
Thanks!
But at least dosemu implements it, so probably it is needed.
Right.
Of course if it is used by one of 100 DOS progs, then there is an option to just add its support to dosemu2 and pretend the compatibility problems did not exist. :)
Do you mean relaying the GP fault to dosemu instead of trapping it and emulating it in the kernel?
Yes, that would be optimal if this does not severely break the current setups. If we can find out that smsw is not in the real use, we can probably do exactly that. But other instructions are not in real use in v86 for sure, so I wouldn't be adding the explicit test-cases to the kernel that will make you depend on some particular behaviour that no one may need.
My objection was that we shouldn't write tests before we know exactly how we want this to work.
OK, if only SMSW is used then I'll keep the emulation for SMSW only.
30.03.2017 08:14, Ricardo Neri пишет:
But at least dosemu implements it, so probably it is needed.
Right.
Of course if it is used by one of 100 DOS progs, then there is an option to just add its support to dosemu2 and pretend the compatibility problems did not exist. :)
Do you mean relaying the GP fault to dosemu instead of trapping it and emulating it in the kernel?
Yes, that would be optimal if this does not severely break the current setups. If we can find out that smsw is not in the real use, we can probably do exactly that. But other instructions are not in real use in v86 for sure, so I wouldn't be adding the explicit test-cases to the kernel that will make you depend on some particular behaviour that no one may need. My objection was that we shouldn't write tests before we know exactly how we want this to work.
OK, if only SMSW is used then I'll keep the emulation for SMSW only.
In fact, smsw has an interesting property, which is that no one will ever want to disable its in-kernel emulation to provide its own. So while I'll try to estimate its usage, emulating it in kernel will not be that problematic in either case. As for protected mode, if wine only needs sgdt/sidt, then again, no one will want to disable its emulation. Not the case with sldt, but AFAICS wine doesn't need sldt, and so we can leave sldt without a fixups. Is my understanding correct? In this case, I suppose, we are very well on a way to avoid the extra syscalls to toggle the emulation features.
On Thu, 2017-03-30 at 13:10 +0300, Stas Sergeev wrote:
30.03.2017 08:14, Ricardo Neri пишет:
But at least dosemu implements it, so probably it is needed.
Right.
Of course if it is used by one of 100 DOS progs, then there is an option to just add its support to dosemu2 and pretend the compatibility problems did not exist. :)
Do you mean relaying the GP fault to dosemu instead of trapping it and emulating it in the kernel?
Yes, that would be optimal if this does not severely break the current setups. If we can find out that smsw is not in the real use, we can probably do exactly that. But other instructions are not in real use in v86 for sure, so I wouldn't be adding the explicit test-cases to the kernel that will make you depend on some particular behaviour that no one may need. My objection was that we shouldn't write tests before we know exactly how we want this to work.
OK, if only SMSW is used then I'll keep the emulation for SMSW only.
In fact, smsw has an interesting property, which is that no one will ever want to disable its in-kernel emulation to provide its own. So while I'll try to estimate its usage, emulating it in kernel will not be that problematic in either case.
Ah good to know!
As for protected mode, if wine only needs sgdt/sidt, then again, no one will want to disable its emulation. Not the case with sldt, but AFAICS wine doesn't need sldt, and so we can leave sldt without a fixups. Is my understanding correct?
This is my understanding as well. I could not find any use of sldt in wine. Alexandre, would you mind confirming?
In this case, I suppose, we are very well on a way to avoid the extra syscalls to toggle the emulation features.
Great! Then I will keep the emulation for sgdt, sidt, and smsw but not for str and sldt; for both vm86 and protected mode. This seems to be the agreement.
Thanks and BR, Ricardo
Ricardo Neri ricardo.neri-calderon@linux.intel.com writes:
On Thu, 2017-03-30 at 13:10 +0300, Stas Sergeev wrote:
30.03.2017 08:14, Ricardo Neri пишет:
But at least dosemu implements it, so probably it is needed.
Right.
Of course if it is used by one of 100 DOS progs, then there is an option to just add its support to dosemu2 and pretend the compatibility problems did not exist. :)
Do you mean relaying the GP fault to dosemu instead of trapping it and emulating it in the kernel?
Yes, that would be optimal if this does not severely break the current setups. If we can find out that smsw is not in the real use, we can probably do exactly that. But other instructions are not in real use in v86 for sure, so I wouldn't be adding the explicit test-cases to the kernel that will make you depend on some particular behaviour that no one may need. My objection was that we shouldn't write tests before we know exactly how we want this to work.
OK, if only SMSW is used then I'll keep the emulation for SMSW only.
In fact, smsw has an interesting property, which is that no one will ever want to disable its in-kernel emulation to provide its own. So while I'll try to estimate its usage, emulating it in kernel will not be that problematic in either case.
Ah good to know!
As for protected mode, if wine only needs sgdt/sidt, then again, no one will want to disable its emulation. Not the case with sldt, but AFAICS wine doesn't need sldt, and so we can leave sldt without a fixups. Is my understanding correct?
This is my understanding as well. I could not find any use of sldt in wine. Alexandre, would you mind confirming?
Some versions of the Themida software protection are known to use sldt as part of the virtual machine detection code [1]. The check currently fails because it expects the LDT to be zero, so the app is already broken, but sldt segfaulting would still cause a crash where there wasn't one before.
However, I'm only aware of one application using this, and being able to catch and emulate sldt ourselves would actually give us a chance to fix this app in newer Wine versions, so I'm not opposed to having it segfault.
In fact it would be nice to be able to make sidt/sgdt/etc. segfault too. I know a new syscall is a pain, but as far as Wine is concerned, being able to opt out from any emulation would be potentially useful.
[1] https://www.winehq.org/pipermail/wine-bugs/2008-February/094470.html
31.03.2017 17:11, Alexandre Julliard пишет:
In fact it would be nice to be able to make sidt/sgdt/etc. segfault too. I know a new syscall is a pain,
Maybe arch_prctl() then?
On Fri, Mar 31, 2017 at 2:26 PM, Stas Sergeev stsp@list.ru wrote:
31.03.2017 17:11, Alexandre Julliard пишет:
In fact it would be nice to be able to make sidt/sgdt/etc. segfault too. I know a new syscall is a pain,
Maybe arch_prctl() then?
I still like my idea of a generic mechanism to turn off backwards-compatibility things. After all, hardened programs should turn off UMIP fixups entirely. They should also turn off vsyscall emulation entirely, and I see no reason that these mechanisms should be different.
--Andy
On Fri, 2017-03-31 at 16:11 +0200, Alexandre Julliard wrote:
Ricardo Neri ricardo.neri-calderon@linux.intel.com writes:
On Thu, 2017-03-30 at 13:10 +0300, Stas Sergeev wrote:
30.03.2017 08:14, Ricardo Neri пишет:
> But at least dosemu implements it, so probably it is needed. Right.
> Of course if it is used by one of 100 DOS progs, then there > is an option to just add its support to dosemu2 and pretend > the compatibility problems did not exist. :) Do you mean relaying the GP fault to dosemu instead of trapping it and emulating it in the kernel?
Yes, that would be optimal if this does not severely break the current setups. If we can find out that smsw is not in the real use, we can probably do exactly that. But other instructions are not in real use in v86 for sure, so I wouldn't be adding the explicit test-cases to the kernel that will make you depend on some particular behaviour that no one may need. My objection was that we shouldn't write tests before we know exactly how we want this to work.
OK, if only SMSW is used then I'll keep the emulation for SMSW only.
In fact, smsw has an interesting property, which is that no one will ever want to disable its in-kernel emulation to provide its own. So while I'll try to estimate its usage, emulating it in kernel will not be that problematic in either case.
Ah good to know!
As for protected mode, if wine only needs sgdt/sidt, then again, no one will want to disable its emulation. Not the case with sldt, but AFAICS wine doesn't need sldt, and so we can leave sldt without a fixups. Is my understanding correct?
This is my understanding as well. I could not find any use of sldt in wine. Alexandre, would you mind confirming?
Some versions of the Themida software protection are known to use sldt as part of the virtual machine detection code [1]. The check currently fails because it expects the LDT to be zero, so the app is already broken, but sldt segfaulting would still cause a crash where there wasn't one before.
However, I'm only aware of one application using this, and being able to catch and emulate sldt ourselves would actually give us a chance to fix this app in newer Wine versions, so I'm not opposed to having it segfault.
Great! Then this is in line with what we are aiming to do with dosemu2: not emulate str and sldt.
In fact it would be nice to be able to make sidt/sgdt/etc. segfault too. I know a new syscall is a pain, but as far as Wine is concerned, being able to opt out from any emulation would be potentially useful.
I see. I guess for now there should not be a problem with emulating sidt/sgdt/smsw, right? In this way we don't break current versions of winehq and programs using it. In a phase two we can introduce the syscall so that kernel fixups can be disabled. Does this make sense?
Thanks and BR, Ricardo
Ricardo Neri ricardo.neri-calderon@linux.intel.com writes:
On Fri, 2017-03-31 at 16:11 +0200, Alexandre Julliard wrote:
Ricardo Neri ricardo.neri-calderon@linux.intel.com writes:
On Thu, 2017-03-30 at 13:10 +0300, Stas Sergeev wrote:
30.03.2017 08:14, Ricardo Neri пишет: In fact, smsw has an interesting property, which is that no one will ever want to disable its in-kernel emulation to provide its own. So while I'll try to estimate its usage, emulating it in kernel will not be that problematic in either case.
Ah good to know!
As for protected mode, if wine only needs sgdt/sidt, then again, no one will want to disable its emulation. Not the case with sldt, but AFAICS wine doesn't need sldt, and so we can leave sldt without a fixups. Is my understanding correct?
This is my understanding as well. I could not find any use of sldt in wine. Alexandre, would you mind confirming?
Some versions of the Themida software protection are known to use sldt as part of the virtual machine detection code [1]. The check currently fails because it expects the LDT to be zero, so the app is already broken, but sldt segfaulting would still cause a crash where there wasn't one before.
However, I'm only aware of one application using this, and being able to catch and emulate sldt ourselves would actually give us a chance to fix this app in newer Wine versions, so I'm not opposed to having it segfault.
Great! Then this is in line with what we are aiming to do with dosemu2: not emulate str and sldt.
In fact it would be nice to be able to make sidt/sgdt/etc. segfault too. I know a new syscall is a pain, but as far as Wine is concerned, being able to opt out from any emulation would be potentially useful.
I see. I guess for now there should not be a problem with emulating sidt/sgdt/smsw, right? In this way we don't break current versions of winehq and programs using it. In a phase two we can introduce the syscall so that kernel fixups can be disabled. Does this make sense?
Yes, that makes sense.
30.03.2017 08:14, Ricardo Neri пишет:
You know the wine's requirements now - they are very small. And dosemu doesn't need anything at all but smsw. And even smsw is very rare.
But emulation is still needed for SMSW, right?
Likely so. If you want, I can enable the logging of this command and see if it is used by some of the DOS programs I have.
It would be great if you could do that, if you don't mind.
OK, scheduled to the week-end. I'll let you know.
Thanks!
OK, done the testing. It appears smsw is used in v86 by windows-3.1 and dos4gw at the very least, and these are the "major" apps. So doing without a fixup in v86 will not go unnoticed. Unfortunately this also means that KVM-vm86 should be properly tested. I have also found a weird program that does SGDT under v86. This causes "ERROR: SGDT not implemented" under dosemu, but the prog still works fine as it obviously does not care about the results. This app can easily be broken of course, if that makes any sense (likely not).
x86@kernel.org,linux-msdos@vger.kernel.org,wine-devel@winehq.org From: hpa@zytor.com Message-ID: 3FD12652-AA83-4D73-9914-BBA089E58FFA@zytor.com
On April 1, 2017 6:08:43 AM PDT, Stas Sergeev stsp@list.ru wrote:
30.03.2017 08:14, Ricardo Neri пишет:
> You know the wine's > requirements now - they are very small. And > dosemu doesn't need anything at all but smsw. > And even smsw is very rare. But emulation is still needed for SMSW, right?
Likely so. If you want, I can enable the logging of this command and see if it is used by some of the DOS programs I have.
It would be great if you could do that, if you don't mind.
OK, scheduled to the week-end. I'll let you know.
Thanks!
OK, done the testing. It appears smsw is used in v86 by windows-3.1 and dos4gw at the very least, and these are the "major" apps. So doing without a fixup in v86 will not go unnoticed. Unfortunately this also means that KVM-vm86 should be properly tested. I have also found a weird program that does SGDT under v86. This causes "ERROR: SGDT not implemented" under dosemu, but the prog still works fine as it obviously does not care about the results. This app can easily be broken of course, if that makes any sense (likely not).
Using SMSW to detect v86 mode is relatively common. pushf hides the VM flag, but SMSW is available, providing the v86 virtualization hole.
On Sat, 2017-04-01 at 16:08 +0300, Stas Sergeev wrote:
30.03.2017 08:14, Ricardo Neri пишет:
> You know the wine's > requirements now - they are very small. And > dosemu doesn't need anything at all but smsw. > And even smsw is very rare. But emulation is still needed for SMSW, right?
Likely so. If you want, I can enable the logging of this command and see if it is used by some of the DOS programs I have.
It would be great if you could do that, if you don't mind.
OK, scheduled to the week-end. I'll let you know.
Thanks!
OK, done the testing. It appears smsw is used in v86 by windows-3.1 and dos4gw at the very least, and these are the "major" apps. So doing without a fixup in v86 will not go unnoticed. Unfortunately this also means that KVM-vm86 should be properly tested. I have also found a weird program that does SGDT under v86. This causes "ERROR: SGDT not implemented" under dosemu, but the prog still works fine as it obviously does not care about the results. This app can easily be broken of course, if that makes any sense (likely not).
Thanks for inputs! Then it seems that we will need emulation for sgdt and smsw. Perhaps sidt? sldt and str will not need emulation in either protected mode or virtual-8086 mode. At a later stage I can look into working in the syscall as Andy proposes.
I will also look into the kvm-v86 path for dosemu2.
It seems we have an agreement :) Do we?
Thanks and BR, Ricardo
04.04.2017 05:05, Ricardo Neri пишет:
On Sat, 2017-04-01 at 16:08 +0300, Stas Sergeev wrote:
30.03.2017 08:14, Ricardo Neri пишет:
>> You know the wine's >> requirements now - they are very small. And >> dosemu doesn't need anything at all but smsw. >> And even smsw is very rare. > But emulation is still needed for SMSW, right? Likely so. If you want, I can enable the logging of this command and see if it is used by some of the DOS programs I have.
It would be great if you could do that, if you don't mind.
OK, scheduled to the week-end. I'll let you know.
Thanks!
OK, done the testing. It appears smsw is used in v86 by windows-3.1 and dos4gw at the very least, and these are the "major" apps. So doing without a fixup in v86 will not go unnoticed. Unfortunately this also means that KVM-vm86 should be properly tested. I have also found a weird program that does SGDT under v86. This causes "ERROR: SGDT not implemented" under dosemu, but the prog still works fine as it obviously does not care about the results. This app can easily be broken of course, if that makes any sense (likely not).
Thanks for inputs! Then it seems that we will need emulation for sgdt and smsw.
I wouldn't claim we need an emulation of sgdt. One or 2 exotic apps do not count much, considering the overall small usage of dosemu and an easiness of re-adding them to dosemu itself. So if it makes any sense to not add it for vm86, then please leave it omitted. However it seems Andy wants an overall completeness here, lot let me just say I'll be fine with either option.
Perhaps sidt?
If only for overall completeness. If it makes any sense to, please leave it omitted.
sldt and str will not need emulation in either protected mode or virtual-8086 mode. At a later stage I can look into working in the syscall as Andy proposes.
I will also look into the kvm-v86 path for dosemu2.
It seems we have an agreement :) Do we?
Yes, fine with me.
On Thu, 2017-03-09 at 18:39 -0800, Andy Lutomirski wrote:
On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev stsp@list.ru wrote:
09.03.2017 04:15, Ricardo Neri пишет:
On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev stsp@list.ru wrote:
08.03.2017 19:06, Andy Lutomirski пишет:
On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev stsp@list.ru wrote: > > 08.03.2017 03:32, Ricardo Neri пишет: >> >> These are the instructions covered by UMIP: >> * SGDT - Store Global Descriptor Table >> * SIDT - Store Interrupt Descriptor Table >> * SLDT - Store Local Descriptor Table >> * SMSW - Store Machine Status Word >> * STR - Store Task Register >> >> This patchset initially treated tasks running in virtual-8086
mode as a
>> >> special case. However, I received clarification that DOSEMU[8]
does not
>> >> support applications that use these instructions.
Can you remind me what was special about it? It looks like you
still
emulate them in v8086 mode.
Indeed, sorry, I meant prot mode here. :) So I wonder what was cited to be special about v86.
Initially my patches disabled UMIP on virtual-8086 instructions, without regards of protected mode (i.e., UMIP was always enabled). I didn't have emulation at the time. Then, I added emulation code that now covers protected and virtual-8086 modes. I guess it is not special anymore.
But isn't SLDT&friends just throw UD in v86? How does UMIP affect this? How does your patch affect this?
Er, right. Ricardo, your code may need fixing. But don't you have a test case for this? The behavior should be the same with and without your patches applied. The exception is #UD, not #GP, so maybe your code just never executes in the vm86 case.
Ouch! Yes, I am afraid my code will attempt to emulate sldt in vm86 mode. The test cases that I have for vm86 are only for the instructions that are valid in vm86: smsw, sidt and sgdt.
I will add test cases for str and sldt and make sure that a #UD is issued.
Would this trigger a v7 series?
Thanks and BR, Ricardo
--Andy
On Wed, 2017-03-08 at 17:08 +0300, Stas Sergeev wrote:
08.03.2017 03:32, Ricardo Neri пишет:
These are the instructions covered by UMIP:
- SGDT - Store Global Descriptor Table
- SIDT - Store Interrupt Descriptor Table
- SLDT - Store Local Descriptor Table
- SMSW - Store Machine Status Word
- STR - Store Task Register
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[8] does not support applications that use these instructions.
Yes, this is the case. But at least in the past there was an attempt to support SLDT as it is used by an ancient pharlap DOS extender (currently unsupported by dosemu1/2). So how difficult would it be to add an optional possibility of delivering such SIGSEGV to userspace so that the kernel's dummy emulation can be overridden?
I suppose a umip=noemulation kernel parameter could be added in this case.
It doesn't need to be a matter of this particular patch set, i.e. this proposal should not trigger a v7 resend of all 21 patches. :) But it would be useful for the future development of dosemu2.
Would dosemu2 use 32-bit processes in order to keep segmentation? If it could use 64-bit processes, emulation is not used in this case and the SIGSEGV is delivered to user space.
Thanks and BR, Ricardo
09.03.2017 03:46, Ricardo Neri пишет:
On Wed, 2017-03-08 at 17:08 +0300, Stas Sergeev wrote:
08.03.2017 03:32, Ricardo Neri пишет:
These are the instructions covered by UMIP:
- SGDT - Store Global Descriptor Table
- SIDT - Store Interrupt Descriptor Table
- SLDT - Store Local Descriptor Table
- SMSW - Store Machine Status Word
- STR - Store Task Register
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[8] does not support applications that use these instructions.
Yes, this is the case. But at least in the past there was an attempt to support SLDT as it is used by an ancient pharlap DOS extender (currently unsupported by dosemu1/2). So how difficult would it be to add an optional possibility of delivering such SIGSEGV to userspace so that the kernel's dummy emulation can be overridden?
I suppose a umip=noemulation kernel parameter could be added in this case.
Why? It doesn't need to be global: the app should be able to change that on its own. Note that no app currently requires this, so its just for the future, and in the future the app can start using the new API for this, if you provide one.
It doesn't need to be a matter of this particular patch set, i.e. this proposal should not trigger a v7 resend of all 21 patches. :) But it would be useful for the future development of dosemu2.
Would dosemu2 use 32-bit processes in order to keep segmentation? If it could use 64-bit processes, emulation is not used in this case and the SIGSEGV is delivered to user space.
It does use the mix: 64bit process but some segments are 32bit for DOS code.
On Fri, 2017-03-10 at 01:01 +0300, Stas Sergeev wrote:
09.03.2017 03:46, Ricardo Neri пишет:
On Wed, 2017-03-08 at 17:08 +0300, Stas Sergeev wrote:
08.03.2017 03:32, Ricardo Neri пишет:
These are the instructions covered by UMIP:
- SGDT - Store Global Descriptor Table
- SIDT - Store Interrupt Descriptor Table
- SLDT - Store Local Descriptor Table
- SMSW - Store Machine Status Word
- STR - Store Task Register
This patchset initially treated tasks running in virtual-8086 mode as a special case. However, I received clarification that DOSEMU[8] does not support applications that use these instructions.
Yes, this is the case. But at least in the past there was an attempt to support SLDT as it is used by an ancient pharlap DOS extender (currently unsupported by dosemu1/2). So how difficult would it be to add an optional possibility of delivering such SIGSEGV to userspace so that the kernel's dummy emulation can be overridden?
I suppose a umip=noemulation kernel parameter could be added in this case.
Why? It doesn't need to be global: the app should be able to change that on its own. Note that no app currently requires this, so its just for the future, and in the future the app can start using the new API for this, if you provide one.
Right, I missed this detail. Then, yes the API should allow only one app to relay the SIGSEGV.
It doesn't need to be a matter of this particular patch set, i.e. this proposal should not trigger a v7 resend of all 21 patches. :) But it would be useful for the future development of dosemu2.
Would dosemu2 use 32-bit processes in order to keep segmentation? If it could use 64-bit processes, emulation is not used in this case and the SIGSEGV is delivered to user space.
It does use the mix: 64bit process but some segments are 32bit for DOS code.
Do you mean that dosemu2 will start as a 64-bit process and will jump to 32-bit code segments? My emulation code should work in this case as it will use segmentation in 32-bit code descriptors. Is there anything else needed?
Thanks and BR, Ricardo
11.03.2017 02:47, Ricardo Neri пишет:
It doesn't need to be a matter of this particular patch set, i.e. this proposal should not trigger a v7 resend of all 21 patches. :) But it would be useful for the future development of dosemu2.
Would dosemu2 use 32-bit processes in order to keep segmentation? If it could use 64-bit processes, emulation is not used in this case and the SIGSEGV is delivered to user space.
It does use the mix: 64bit process but some segments are 32bit for DOS code.
Do you mean that dosemu2 will start as a 64-bit process and will jump to 32-bit code segments?
Yes, so the offending insns are executed only in 32bit and 16bit segments, even if the process itself is 64bit. I guess you handle 16bit segments same as 32bit ones.
My emulation code should work in this case as it will use segmentation in 32-bit code descriptors. Is there anything else needed?
If I understand you correctly, you are saying that SLDT executed in 64bit code segment, will inevitably segfault to userspace. If this is the case and it makes your code simpler, then its perfectly fine with me as dosemu does not do this and the 64bit DOS progs are not anticipated.
On Sat, 2017-03-11 at 02:58 +0300, Stas Sergeev wrote:
11.03.2017 02:47, Ricardo Neri пишет:
It doesn't need to be a matter of this particular patch set, i.e. this proposal should not trigger a v7 resend of all 21 patches. :) But it would be useful for the future development of dosemu2.
Would dosemu2 use 32-bit processes in order to keep segmentation? If it could use 64-bit processes, emulation is not used in this case and the SIGSEGV is delivered to user space.
It does use the mix: 64bit process but some segments are 32bit for DOS code.
Do you mean that dosemu2 will start as a 64-bit process and will jump to 32-bit code segments?
Yes, so the offending insns are executed only in 32bit and 16bit segments, even if the process itself is 64bit. I guess you handle 16bit segments same as 32bit ones.
I have code to handle 16-bit and 32-bit address encodings differently. Segmentation is used if !user_64bit_mode(regs). In such a case, the emulation code will check the segment descriptor D flag and the address-size overrides prefix to determine the address size and use 16-bit or 32-bit address encodings as applicable.
My emulation code should work in this case as it will use segmentation in 32-bit code descriptors. Is there anything else needed?
If I understand you correctly, you are saying that SLDT executed in 64bit code segment, will inevitably segfault to userspace.
Correct.
If this is the case and it makes your code simpler, then its perfectly fine with me as dosemu does not do this and the 64bit DOS progs are not anticipated.
But if 32-bit or 16-bit code segments are used emulation will be used.
Thanks and BR, Ricardo
On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri ricardo.neri-calderon@linux.intel.com wrote:
This is v6 of this series. The five previous submissions can be found here [1], here [2], here[3], here[4], and here[5]. This version addresses the comments received in v4 plus improvements of the handling of emulation in 64-bit builds. Please see details in the change log.
Hi Ingo and Thomas-
I think this series is in good enough shape that you should consider making a topic branch (x86/umip?) for it so that it can soak in -next and further development can be done incrementally. In the unlikely event that a major problem shows up, you could skip the pull request to Linus for a cycle.
--Andy