https://bugs.winehq.org/show_bug.cgi?id=46132
Bug ID: 46132 Summary: Multiple Windows 10 ARM64 apps crash with illegal instruction fault due to access of ARMv8 PMU cycle counter via 'PMCCNTR_EL0' register in EL0 (Linux kernel disallows access by default) Product: Wine Version: 3.20 Hardware: aarch64 OS: Linux Status: NEW Severity: normal Priority: P2 Component: -unknown Assignee: wine-bugs@winehq.org Reporter: focht@gmx.net Distribution: ---
Hello folks,
just for documentation ...
--- snip --- $ WINEDEBUG=+seh,+loaddll,+process,+relay,+msvcrt wine64 ./gatherosstate.exe
log.txt 2>&1
... 0009:trace:msvcrt:DllMain finished process init 0009:Ret PE DLL (proc=0x7fae7e3b04,module=0x7fae730000 L"msvcrt.dll",reason=PROCESS_ATTACH,res=0x22fc48) retval=1 0009:Call PE DLL (proc=0x7fae5fd5d8,module=0x7fae580000 L"advapi32.dll",reason=PROCESS_ATTACH,res=0x22fc48) 0009:Ret PE DLL (proc=0x7fae5fd5d8,module=0x7fae580000 L"advapi32.dll",reason=PROCESS_ATTACH,res=0x22fc48) retval=1 0009:Starting process L"Z:\home\focht\Downloads\ARM64-win10-apps\gatherosstate.exe" (entryproc=0x14002fd80) 0009:trace:process:NtQueryInformationProcess (0xffffffffffffffff,0x00000007,0x22f988,0x00000008,(nil)) 0009:trace:seh:call_stack_handlers calling handler at 0x7b4d6330 code=c000001d flags=0 wine: Unhandled illegal instruction at address 0x14002fc0c (thread 0009), starting debugger... 0009:trace:seh:start_debugger Starting debugger "winedbg --auto 8 32" --- snip ---
strace:
--- snip --- 2752 [000000014002fc0c] --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x14002fc0c} --- --- snip ---
Disassembly around faulting instruction:
--- snip --- start: 000000014002FD80 F3 53 BE A9 STP X19, X20, [SP,#-0x20+var_s0]! 000000014002FD84 F5 0B 00 F9 STR X21, [SP,#var_s10] 000000014002FD88 FD 7B BF A9 STP X29, X30, [SP,#var_10]! 000000014002FD8C FD 03 00 91 MOV X29, SP 000000014002FD90 13 00 80 52 MOV W19, #0 000000014002FD94 9B FF FF 97 BL sub_14002FC00 000000014002FD98 F5 03 00 2A MOV W21, W0 ... sub_14002FC00: 000000014002FC00 F3 0F 1F F8 STR X19, [SP,#-0x10+var_s0]! 000000014002FC04 FD 7B BF A9 STP X29, X30, [SP,#var_10]! 000000014002FC08 FD 03 00 91 MOV X29, SP 000000014002FC0C 08 9D 3B D5 MRS X8, #3, c9, c13, #0 ; *boom* 000000014002FC10 0A 03 00 B0 ADRP X10, #qword_140090670@PAGE 000000014002FC14 53 C1 19 91 ADD X19, X10, #qword_140090670@PAGEOFF 000000014002FC18 08 79 40 92 AND X8, X8, #0x7FFFFFFF --- snip ---
Decoding the sysreg using ARM's ARMv8 Processor Technical Reference Manual:
MRS <Xt>, (<systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>)
08 9D 3B D5 -> D5 3B 9D 08
| D5 | 3B | 9D | 08 | |11010101|00 1 11 011|1001 1101|000 01000| | OPCode |L|o0|op1|CRn |CRm |op2| Rt |
op0 = 3 op1 = 3 op2 = 0 CRn = 9 CRm = 13 Rt = 8
0x08,0x9d,0x3b,0xd5 = 'MRS X8, PMCCNTR_EL0'
Microsoft docs:
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?vie...
--- quote --- Cycle counter
All ARMv8 CPUs are required to support a cycle counter register. This is a 64-bit register that Windows configures to be readable at any exception level (including user mode). It can be accessed via the special PMCCNTR_EL0 register, using the MSR opcode in assembly code, or the _ReadStatusReg intrinsic in C/C++ code.
Note that the cycle counter here is a true cycle counter, not a wall clock, and thus the counting frequency will vary with the processor frequency. If you feel you must know the frequency of the cycle counter, you should not be using the cycle counter. Instead, you want to measure wall clock time, for which you should use QueryPerformanceCounter. --- quote ---
Defined as 'ReadTimeStampCounter()' in Windows 10 SDK header 'winnt.h'.
Current QEMU for ARM64 supports it (if not run under real hardware):
https://github.com/qemu/qemu/blob/master/target/arm/helper.c#L1410
--- snip --- #ifndef CONFIG_USER_ONLY ... { .name = "PMCCNTR_EL0", .state = ARM_CP_STATE_AA64, .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 13, .opc2 = 0, .access = PL0_RW, .accessfn = pmreg_access_ccntr, .type = ARM_CP_IO, .readfn = pmccntr_read, .writefn = pmccntr_write, }, #endif --- snip ---
The relevant piece of Linux kernel code:
https://github.com/torvalds/linux/commit/60792ad349f3c6dc5735aafefe5dc9121c7...
--- quote --- arm64: kernel: enforce pmuserenr_el0 initialization and restore
The pmuserenr_el0 register value is architecturally UNKNOWN on reset. Current kernel code resets that register value iff the core pmu device is correctly probed in the kernel. On platforms with missing DT pmu nodes (or disabled perf events in the kernel), the pmu is not probed, therefore the pmuserenr_el0 register is not reset in the kernel, which means that its value retains the reset value that is architecturally UNKNOWN (system may run with eg pmuserenr_el0 == 0x1, which means that PMU counters access is available at EL0, which must be disallowed).
This patch adds code that resets pmuserenr_el0 on cold boot and restores it on core resume from shutdown, so that the pmuserenr_el0 setup is always enforced in the kernel. --- quote ---
Since the Linux kernel maintainers disallow PMU counter access in EL0 by default, one has to patch the Linux kernel or use a helper kernel module. Windows 10 ARM64 likely has PMU counter access in EL0 enabled by default.
There were some attempts in the past, but they never made it in.
https://patchwork.kernel.org/patch/5217341/
https://github.com/zhiyisun/enable_arm_pmu/tree/dev
Anyway, nothing to fix in Wine here. Upstream (if at all).
Regards
https://bugs.winehq.org/show_bug.cgi?id=46132
André H. nerv@dawncrow.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |nerv@dawncrow.de
https://bugs.winehq.org/show_bug.cgi?id=46132
--- Comment #1 from Anastasius Focht focht@gmx.net --- Hello folks,
another valuable resource of improving/fixing Wine on ARM64 is the Chrome browser port to Windows 10 ARM64 platform that is currently underway and going to continue for some months.
I'm following various Chromium and LLVM/Clang pull requests related to Win10 ARM64 porting activities.
Related PR to this ticket:
https://chromium-review.googlesource.com/c/chromium/src/+/1322363
https://chromium-review.googlesource.com/c/chromium/src/+/1322363/1/base/tim...
--- snip --- #if defined(_M_ARM64)
#define ReadCycleCounter() _ReadStatusReg(ARM64_PMCCNTR_EL0)
#else // X86 or X64
#define ReadCycleCounter() __rdtsc()
#endif --- snip ---
which backs my current analysis/conclusions.
Regards
https://bugs.winehq.org/show_bug.cgi?id=46132
--- Comment #2 from André H. nerv@dawncrow.de --- How do you see the chances of emulating it in dlls/ntoskrnl.exe/instr.c without messing up the application?
https://bugs.winehq.org/show_bug.cgi?id=46132
--- Comment #3 from Anastasius Focht focht@gmx.net --- Hello André,
I didn't check what happens if you are totally off with the increment when using emulation approach. There might be cases where the app relies on an accurate cycle counter increment which is hard to approximate with trap-and-emulate.
Instead I introduced a Linux kernel patch that reverses the default policy. The patch from my own cross-dev Yocto/Poky-Wine project (sadly didn't find the time yet to prepare/polish everything for github):
--- snip ---
From 4299a8ad4013546deca47709a8c23f47f03c7cb0 Mon Sep 17 00:00:00 2001
From: <hidden> Date: Sat, 1 Dec 2018 01:32:55 +0100 Subject: [PATCH] arm64: kernel: Enable PMU access from EL0.
Needed for Windows 10 ARM64 apps which read the PMU cycle counter from EL0. See Wine bug #46132 [1] for details. The kernel default policy of denying EL0 access to PMU was introduced in [2].
[1]: https://bugs.winehq.org/show_bug.cgi?id=46132 [2]: https://github.com/torvalds/linux/commit/60792ad349f3c6dc5735aafefe5dc9121c7...
--- arch/arm64/include/asm/assembler.h | 15 +++++++++++++++ arch/arm64/mm/proc.S | 4 ++-- 2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 0bcc98dbba56..23f0aba5fc8d 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -438,6 +438,21 @@ USER(\label, ic ivau, \tmp2) // invalidate I line PoU 9000: .endm
+/* + * enable_pmuserenr_el0 - Enable PMU access from EL0 if PMUv3 present + */ + .macro enable_pmuserenr_el0, tmpreg + mrs \tmpreg, id_aa64dfr0_el1 // Check ID_AA64DFR0_EL1 PMUVer + sbfx \tmpreg, \tmpreg, #8, #4 + cmp \tmpreg, #1 // Skip if no PMU present + b.lt 9100f + mov \tmpreg, #1 << 0 // EL0 access enable + orr \tmpreg, \tmpreg, #1 << 2 // Cycle counter read enable + orr \tmpreg, \tmpreg, #1 << 3 // Event counter read enable + msr pmuserenr_el0, \tmpreg // enable PMU access from EL0 +9100: + .endm + /* * copy_page - copy src to dest using temp registers t1-t8 */ diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 03646e6a2ef4..98a5a1bf3430 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -137,7 +137,7 @@ alternative_endif */ ubfx x11, x11, #1, #1 msr oslar_el1, x11 - reset_pmuserenr_el0 x0 // Disable PMU access from EL0 + enable_pmuserenr_el0 x0 // Enable PMU access from EL0
alternative_if ARM64_HAS_RAS_EXTN msr_s SYS_DISR_EL1, xzr @@ -410,7 +410,7 @@ ENTRY(__cpu_setup) msr mdscr_el1, x0 // access to the DCC from EL0 isb // Unmask debug exceptions now, enable_dbg // since this is per-cpu - reset_pmuserenr_el0 x0 // Disable PMU access from EL0 + enable_pmuserenr_el0 x0 // Enable PMU access from EL0 /* * Memory region attributes for LPAE: *
https://bugs.winehq.org/show_bug.cgi?id=46132
Anastasius Focht focht@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|-unknown |ntdll
--- Comment #4 from Anastasius Focht focht@gmx.net --- Hello folks,
revisiting, obviously still present.
https://github.com/torvalds/linux/blob/42595ce90b9d4a6b9d8c5a1ea78da4eeaf7e0...
https://github.com/torvalds/linux/blob/42595ce90b9d4a6b9d8c5a1ea78da4eeaf7e0...
$ wine --version wine-5.9
Regards