On Fri, May 05, 2017 at 11:17:19AM -0700, Ricardo Neri wrote:
The feature User-Mode Instruction Prevention present in recent Intel processor prevents a group of instructions from being executed with CPL > 0. Otherwise, a general protection fault is issued.
This is one of the best opening paragraphs of a commit message I've read this year! This is how you open: short, succinct, to the point, no marketing bullshit. Good!
Rather than relaying this fault to the user space (in the form of a SIGSEGV signal), the instructions protected by UMIP can be emulated to provide dummy results. This allows to conserve the current kernel behavior and not reveal the system resources that UMIP intends to protect (the global descriptor and interrupt descriptor tables, the segment selectors of the local descriptor table and the task state and the machine status word).
This emulation is needed because certain applications (e.g., WineHQ and DOSEMU2) rely on this subset of instructions to function.
The instructions protected by UMIP can be split in two groups. Those who
s/who/which/
return a kernel memory address (sgdt and sidt) and those who return a
ditto.
value (sldt, str and smsw).
For the instructions that return a kernel memory address, applications such as WineHQ rely on the result being located in the kernel memory space. The result is emulated as a hard-coded value that, lies close to the top of the kernel memory. The limit for the GDT and the IDT are set to zero.
Nice.
Given that sldt and str are not used in common in programs supported by
You wanna say "in common programs" here? Or "not commonly used in programs" ?
WineHQ and DOSEMU2, they are not emulated.
The instruction smsw is emulated to return the value that the register CR0 has at boot time as set in the head_32.
Care is taken to appropriately emulate the results when segmentation is used. This is, rather than relying on USER_DS and USER_CS, the function
"That is,... "
insn_get_addr_ref() inspects the segment descriptor pointed by the registers in pt_regs. This ensures that we correctly obtain the segment base address and the address and operand sizes even if the user space application uses local descriptor table.
Btw, I could very well use all that nice explanation in umip.c too so that the high-level behavior is documented.
Cc: Andy Lutomirski luto@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Michael S. Tsirkin mst@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Tony Luck tony.luck@intel.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Liang Z. Li liang.z.li@intel.com Cc: Alexandre Julliard julliard@winehq.org Cc: Stas Sergeev stsp@list.ru Cc: x86@kernel.org Cc: linux-msdos@vger.kernel.org Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
arch/x86/include/asm/umip.h | 15 +++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/umip.c | 245 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 261 insertions(+) create mode 100644 arch/x86/include/asm/umip.h create mode 100644 arch/x86/kernel/umip.c
diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h new file mode 100644 index 0000000..077b236 --- /dev/null +++ b/arch/x86/include/asm/umip.h @@ -0,0 +1,15 @@ +#ifndef _ASM_X86_UMIP_H +#define _ASM_X86_UMIP_H
+#include <linux/types.h> +#include <asm/ptrace.h>
+#ifdef CONFIG_X86_INTEL_UMIP +bool fixup_umip_exception(struct pt_regs *regs); +#else +static inline bool fixup_umip_exception(struct pt_regs *regs) +{
- return false;
+}
Let's save some header lines:
static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; }
those trunks take too much space as it is.
+#endif /* CONFIG_X86_INTEL_UMIP */ +#endif /* _ASM_X86_UMIP_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 4b99423..cc1b7cc 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -123,6 +123,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o +obj-$(CONFIG_X86_INTEL_UMIP) += umip.o
ifdef CONFIG_FRAME_POINTER obj-y += unwind_frame.o diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c new file mode 100644 index 0000000..c7c5795 --- /dev/null +++ b/arch/x86/kernel/umip.c @@ -0,0 +1,245 @@ +/*
- umip.c Emulation for instruction protected by the Intel User-Mode
- Instruction Prevention. The instructions are:
- sgdt
- sldt
- sidt
- str
- smsw
- Copyright (c) 2017, Intel Corporation.
- Ricardo Neri ricardo.neri@linux.intel.com
- */
+#include <linux/uaccess.h> +#include <asm/umip.h> +#include <asm/traps.h> +#include <asm/insn.h> +#include <asm/insn-eval.h> +#include <linux/ratelimit.h>
+/*
- == Base addresses of GDT and IDT
- Some applications to function rely finding the global descriptor table (GDT)
That formulation reads funny.
- and the interrupt descriptor table (IDT) in kernel memory.
- For x86_32, the selected values do not match any particular hole, but it
- suffices to provide a memory location within kernel memory.
- == CRO flags for SMSW
- Use the flags given when booting, as found in head_32.S
- */
+#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \
X86_CR0_WP | X86_CR0_AM)
Why not pull those up in asm/processor-flags.h or so and share the definition instead of duplicating it?
+#define UMIP_DUMMY_GDT_BASE 0xfffe0000 +#define UMIP_DUMMY_IDT_BASE 0xffff0000
+enum umip_insn {
- UMIP_SGDT = 0, /* opcode 0f 01 ModR/M reg 0 */
- UMIP_SIDT, /* opcode 0f 01 ModR/M reg 1 */
- UMIP_SLDT, /* opcode 0f 00 ModR/M reg 0 */
- UMIP_SMSW, /* opcode 0f 01 ModR/M reg 4 */
- UMIP_STR, /* opcode 0f 00 ModR/M reg 1 */
Let's stick to a single spelling: ModRM.reg=0, etc.
Better yet, use the SDM format:
UMIP_SGDT = 0, /* 0F 01 /0 */ UMIP_SIDT, /* 0F 01 /1 */ ...
+};
+/**
- __identify_insn() - Identify a UMIP-protected instruction
- @insn: Instruction structure with opcode and ModRM byte.
- From the instruction opcode and the reg part of the ModRM byte, identify,
- if any, a UMIP-protected instruction.
- Return: an enumeration of a UMIP-protected instruction; -EINVAL on failure.
- */
+static int __identify_insn(struct insn *insn)
static enum umip_insn __identify_insn(...
But frankly, that enum looks pointless to me - it is used locally only and you can just as well use plain ints.
+{
- /* By getting modrm we also get the opcode. */
- insn_get_modrm(insn);
- /* All the instructions of interest start with 0x0f. */
- if (insn->opcode.bytes[0] != 0xf)
return -EINVAL;
- if (insn->opcode.bytes[1] == 0x1) {
switch (X86_MODRM_REG(insn->modrm.value)) {
case 0:
return UMIP_SGDT;
case 1:
return UMIP_SIDT;
case 4:
return UMIP_SMSW;
default:
return -EINVAL;
}
- }
- /* SLDT AND STR are not emulated */
- return -EINVAL;
+}
+/**
- __emulate_umip_insn() - Emulate UMIP instructions with dummy values
- @insn: Instruction structure with ModRM byte
- @umip_inst: Instruction to emulate
- @data: Buffer onto which the dummy values will be copied
- @data_size: Size of the emulated result
- Emulate an instruction protected by UMIP. The result of the emulation
- is saved in the provided buffer. The size of the results depends on both
- the instruction and type of operand (register vs memory address). Thus,
- the size of the result needs to be updated.
- Result: 0 if success, -EINVAL on failure to emulate
- */
+static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
unsigned char *data, int *data_size)
+{
- unsigned long dummy_base_addr;
- unsigned short dummy_limit = 0;
- unsigned int dummy_value = 0;
- switch (umip_inst) {
- /*
* These two instructions return the base address and limit of the
* global and interrupt descriptor table. The base address can be
* 24-bit, 32-bit or 64-bit. Limit is always 16-bit. If the operand
* size is 16-bit the returned value of the base address is supposed
* to be a zero-extended 24-byte number. However, it seems that a
* 32-byte number is always returned in legacy protected mode
* irrespective of the operand size.
*/
- case UMIP_SGDT:
/* fall through */
- case UMIP_SIDT:
if (umip_inst == UMIP_SGDT)
dummy_base_addr = UMIP_DUMMY_GDT_BASE;
else
dummy_base_addr = UMIP_DUMMY_IDT_BASE;
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
/* SGDT and SIDT do not take register as argument. */
Comment above the if.
return -EINVAL;
}
So that check needs to go first, then the dummy_base_addr assignment.
memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr));
memcpy(data, &dummy_limit, sizeof(dummy_limit));
*data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit);
Huh, that value will always be the same - why do you have a specific variable? It could be a define, once for 32-bit and once for 64-bit.
break;
- case UMIP_SMSW:
/*
* Even though CR0_STATE contain 4 bytes, the number
* of bytes to be copied in the result buffer is determined
* by whether the operand is a register or a memory location.
*/
dummy_value = CR0_STATE;
Something's wrong here: how does that local, write-only variable have any effect?
/*
* These two instructions return a 16-bit value. We return
* all zeros. This is equivalent to a null descriptor for
* str and sldt.
*/
/* SLDT and STR are not emulated */
/* fall through */
- case UMIP_SLDT:
/* fall through */
- case UMIP_STR:
/* fall through */
- default:
return -EINVAL;
That switch-case has a majority of fall-throughs. So make it an if-else instead.
- }
- return 0;
+}
+/**
- fixup_umip_exception() - Fixup #GP faults caused by UMIP
- @regs: Registers as saved when entering the #GP trap
- The instructions sgdt, sidt, str, smsw, sldt cause a general protection
- fault if with CPL > 0 (i.e., from user space). This function can be
- used to emulate the results of the aforementioned instructions with
- dummy values. Results are copied to user-space memory as indicated by
- the instruction pointed by EIP using the registers indicated in the
- instruction operands. This function also takes care of determining
- the address to which the results must be copied.
- */
+bool fixup_umip_exception(struct pt_regs *regs) +{
- struct insn insn;
- unsigned char buf[MAX_INSN_SIZE];
- /* 10 bytes is the maximum size of the result of UMIP instructions */
- unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
unsigned char dummy_data[10] = { 0 };
One 0 should be enough :)
- unsigned long seg_base;
- int not_copied, nr_copied, reg_offset, dummy_data_size;
- void __user *uaddr;
- unsigned long *reg_addr;
- enum umip_insn umip_inst;
- struct insn_code_seg_defaults seg_defs;
Please sort function local variables declaration in a reverse christmas tree order:
<type> longest_variable_name; <type> shorter_var_name; <type> even_shorter; <type> i;
- /*
* Use the segment base in case user space used a different code
* segment, either in protected (e.g., from an LDT) or virtual-8086
* modes. In most of the cases seg_base will be zero as in USER_CS.
*/
- seg_base = insn_get_seg_base(regs, &insn,
offsetof(struct pt_regs, ip));
Oh boy, where's the error handling?! That can return -1L.
- not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),
-1L + regs->ip is then your pwnage.
sizeof(buf));
Just let them stick out.
- nr_copied = sizeof(buf) - not_copied;
<---- newline here.
- /*
* The copy_from_user above could have failed if user code is protected
()
* by a memory protection key. Give up on emulation in such a case.
* Should we issue a page fault?
Why? AFAICT, you're in the #GP handler. Simply you return unhandled.
*/
- if (!nr_copied)
return false;
- insn_init(&insn, buf, nr_copied, user_64bit_mode(regs));
- /*
* Override the default operand and address sizes to what is specified
* in the code segment descriptor. The instruction decoder only sets
* the address size it to either 4 or 8 address bytes and does nothing
* for the operand bytes. This OK for most of the cases, but we could
* have special cases where, for instance, a 16-bit code segment
* descriptor is used.
* If there are overrides, the instruction decoder correctly updates
* these values, even for 16-bit defaults.
*/
- seg_defs = insn_get_code_seg_defaults(regs);
- insn.addr_bytes = seg_defs.address_bytes;
- insn.opnd_bytes = seg_defs.operand_bytes;
- if (!insn.addr_bytes || !insn.opnd_bytes)
return false;
- if (user_64bit_mode(regs))
return false;
- insn_get_length(&insn);
- if (nr_copied < insn.length)
return false;
- umip_inst = __identify_insn(&insn);
- /* Check if we found an instruction protected by UMIP */
Put comment above the function call.
- if (umip_inst < 0)
return false;
- if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
return false;
- /* If operand is a register, write directly to it */
- if (X86_MODRM_MOD(insn.modrm.value) == 3) {
reg_offset = insn_get_modrm_rm_off(&insn, regs);
Grr, error handling!! That reg_offset can be -E<something>.
reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
memcpy(reg_addr, dummy_data, dummy_data_size);
- } else {
uaddr = insn_get_addr_ref(&insn, regs);
/* user address could not be determined, abort emulation */
That comment is kinda obvious. But yes, this has error handling.
if ((unsigned long)uaddr == -1L)
return false;
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
if (nr_copied > 0)
return false;
- }
- /* increase IP to let the program keep going */
- regs->ip += insn.length;
- return true;
+}
2.9.3