This adds Mach thread priority support (both in the application and realtime band) and recalculates thread priorities when the process priority changes.
Part 3, which is still a bit WIP deals with implementing priority boosts (for main threads and threads which are processing window messages), effectively fully replacing https://gitlab.winehq.org/wine/wine/-/merge_requests/1232.
Currently the implementation in this MR already technically overrides what https://gitlab.winehq.org/wine/wine/-/merge_requests/1232 does, if it makes sense I can also revert it here.
I added a few comments regarding the Mach thread priority API usage, as there is limited documentation available, and much was inferred from the source or by testing. If this is too verbose I can also remove that...
-- v2: server: Re-apply thread priorities after process tracing is initialized. server: Implement apply_thread_priority on macOS for realtime priorities. server: Implement apply_thread_priority on macOS for application priorities. kernel32/tests: Setting process priority on a terminated process should succeed. server: Also set thread priorities upon process priority change.
From: Marc-Aurel Zent marc_aurel@me.com
--- server/process.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/server/process.c b/server/process.c index e06350f7311..1e48cc43014 100644 --- a/server/process.c +++ b/server/process.c @@ -1127,6 +1127,17 @@ int set_process_debug_flag( struct process *process, int flag ) return write_process_memory( process, process->peb + 2, 1, &data ); }
+static void set_process_priority( struct process *process, int priority ) +{ + struct thread *thread; + process->priority = priority; + + LIST_FOR_EACH_ENTRY( thread, &process->thread_list, struct thread, proc_entry ) + { + set_thread_priority( thread, priority, thread->priority ); + } +} + /* create a new process */ DECL_HANDLER(new_process) { @@ -1624,7 +1635,7 @@ DECL_HANDLER(set_process_info)
if ((process = get_process_from_handle( req->handle, PROCESS_SET_INFORMATION ))) { - if (req->mask & SET_PROCESS_INFO_PRIORITY) process->priority = req->priority; + if (req->mask & SET_PROCESS_INFO_PRIORITY) set_process_priority( process, req->priority ); if (req->mask & SET_PROCESS_INFO_AFFINITY) set_process_affinity( process, req->affinity ); if (req->mask & SET_PROCESS_INFO_TOKEN) {
From: Marc-Aurel Zent mzent@codeweavers.com
--- dlls/kernel32/tests/loader.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/dlls/kernel32/tests/loader.c b/dlls/kernel32/tests/loader.c index 2c7cc784be4..6cf6971ba04 100644 --- a/dlls/kernel32/tests/loader.c +++ b/dlls/kernel32/tests/loader.c @@ -3672,6 +3672,7 @@ static void test_ExitProcess(void) struct PROCESS_BASIC_INFORMATION_PRIVATE pbi; MEMORY_BASIC_INFORMATION mbi; DWORD_PTR affinity; + PROCESS_PRIORITY_CLASS ppc; void *addr; LARGE_INTEGER offset; SIZE_T size; @@ -4011,6 +4012,10 @@ static void test_ExitProcess(void) affinity = 1; ret = pNtSetInformationProcess(pi.hProcess, ProcessAffinityMask, &affinity, sizeof(affinity)); ok(ret == STATUS_PROCESS_IS_TERMINATING, "expected STATUS_PROCESS_IS_TERMINATING, got %#lx\n", ret); + ppc.Foreground = FALSE; + ppc.PriorityClass = PROCESS_PRIOCLASS_BELOW_NORMAL; + ret = pNtSetInformationProcess(pi.hProcess, ProcessPriorityClass, &ppc, sizeof(ppc)); + ok(ret == STATUS_SUCCESS, "expected STATUS_SUCCESS, got status %#lx\n", ret);
SetLastError(0xdeadbeef); ctx.ContextFlags = CONTEXT_INTEGER;
From: Marc-Aurel Zent mzent@codeweavers.com
--- server/thread.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+)
diff --git a/server/thread.c b/server/thread.c index 3c7e4541a09..ac01d7e4f01 100644 --- a/server/thread.c +++ b/server/thread.c @@ -40,6 +40,11 @@ #ifdef HAVE_SYS_RESOURCE_H #include <sys/resource.h> #endif +#ifdef __APPLE__ +#include <mach/mach_init.h> +#include <mach/mach_port.h> +#include <mach/thread_act.h> +#endif
#include "ntstatus.h" #define WIN32_NO_STATUS @@ -252,6 +257,79 @@ static void apply_thread_priority( struct thread *thread, int base_priority ) setpriority( PRIO_PROCESS, thread->unix_tid, niceness ); }
+#elif defined(__APPLE__) + +void init_threading(void) +{ +} + +static int get_mach_importance( int base_priority ) +{ + int min = -31, max = 32, range = max - min; + return min + (base_priority - 1) * range / 14; +} + +static void apply_thread_priority( struct thread *thread, int base_priority ) +{ + kern_return_t kr; + mach_msg_type_name_t type; + int throughput_qos, latency_qos; + struct thread_extended_policy thread_extended_policy; + struct thread_precedence_policy thread_precedence_policy; + mach_port_t thread_port, process_port = thread->process->trace_data; + + if (!process_port) return; + kr = mach_port_extract_right( process_port, thread->unix_tid, + MACH_MSG_TYPE_COPY_SEND, &thread_port, &type ); + if (kr != KERN_SUCCESS) return; + /* base priority 15 is for time-critical threads, so not compute-bound */ + thread_extended_policy.timeshare = base_priority > 14 ? 0 : 1; + thread_precedence_policy.importance = get_mach_importance( base_priority ); + /* adapted from the QoS table at xnu/osfmk/kern/thread_policy.c */ + switch (thread->priority) + { + case THREAD_PRIORITY_IDLE: /* THREAD_QOS_MAINTENANCE */ + case THREAD_PRIORITY_LOWEST: /* THREAD_QOS_BACKGROUND */ + throughput_qos = THROUGHPUT_QOS_TIER_5; + latency_qos = LATENCY_QOS_TIER_3; + break; + case THREAD_PRIORITY_BELOW_NORMAL: /* THREAD_QOS_UTILITY */ + throughput_qos = THROUGHPUT_QOS_TIER_2; + latency_qos = LATENCY_QOS_TIER_3; + break; + case THREAD_PRIORITY_NORMAL: /* THREAD_QOS_LEGACY */ + case THREAD_PRIORITY_ABOVE_NORMAL: /* THREAD_QOS_USER_INITIATED */ + throughput_qos = THROUGHPUT_QOS_TIER_1; + latency_qos = LATENCY_QOS_TIER_1; + break; + case THREAD_PRIORITY_HIGHEST: /* THREAD_QOS_USER_INTERACTIVE */ + throughput_qos = THROUGHPUT_QOS_TIER_0; + latency_qos = LATENCY_QOS_TIER_0; + break; + case THREAD_PRIORITY_TIME_CRITICAL: + default: /* THREAD_QOS_UNSPECIFIED */ + throughput_qos = THROUGHPUT_QOS_TIER_UNSPECIFIED; + latency_qos = LATENCY_QOS_TIER_UNSPECIFIED; + break; + } + /* QOS_UNSPECIFIED is assigned the highest tier available, so it does not provide a limit */ + if (base_priority > THREAD_BASE_PRIORITY_LOWRT) + { + throughput_qos = THROUGHPUT_QOS_TIER_UNSPECIFIED; + latency_qos = LATENCY_QOS_TIER_UNSPECIFIED; + } + + thread_policy_set( thread_port, THREAD_LATENCY_QOS_POLICY, (thread_policy_t)&latency_qos, + THREAD_LATENCY_QOS_POLICY_COUNT ); + thread_policy_set( thread_port, THREAD_THROUGHPUT_QOS_POLICY, (thread_policy_t)&throughput_qos, + THREAD_THROUGHPUT_QOS_POLICY_COUNT ); + thread_policy_set( thread_port, THREAD_EXTENDED_POLICY, (thread_policy_t)&thread_extended_policy, + THREAD_EXTENDED_POLICY_COUNT ); + thread_policy_set( thread_port, THREAD_PRECEDENCE_POLICY, (thread_policy_t)&thread_precedence_policy, + THREAD_PRECEDENCE_POLICY_COUNT ); + mach_port_deallocate( mach_task_self(), thread_port ); +} + #else
void init_threading(void)
From: Marc-Aurel Zent mzent@codeweavers.com
--- server/thread.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+)
diff --git a/server/thread.c b/server/thread.c index ac01d7e4f01..bb9b6f409c3 100644 --- a/server/thread.c +++ b/server/thread.c @@ -42,6 +42,7 @@ #endif #ifdef __APPLE__ #include <mach/mach_init.h> +#include <mach/mach_time.h> #include <mach/mach_port.h> #include <mach/thread_act.h> #endif @@ -258,9 +259,21 @@ static void apply_thread_priority( struct thread *thread, int base_priority ) }
#elif defined(__APPLE__) +static unsigned int mach_ticks_per_second;
void init_threading(void) { + struct mach_timebase_info tb_info; + if (mach_timebase_info( &tb_info ) == KERN_SUCCESS) + { + mach_ticks_per_second = (tb_info.denom * 1000000000U) / tb_info.numer; + } + else + { + const unsigned int best_guess = 24000000U; + fprintf(stderr, "wine: mach_timebase_info failed, guessing %u mach ticks per second\n", best_guess); + mach_ticks_per_second = best_guess; + } }
static int get_mach_importance( int base_priority ) @@ -327,6 +340,33 @@ static void apply_thread_priority( struct thread *thread, int base_priority ) THREAD_EXTENDED_POLICY_COUNT ); thread_policy_set( thread_port, THREAD_PRECEDENCE_POLICY, (thread_policy_t)&thread_precedence_policy, THREAD_PRECEDENCE_POLICY_COUNT ); + if (base_priority > THREAD_BASE_PRIORITY_LOWRT) + { + /* For realtime threads we are requesting from the scheduler to be moved + * into the Mach realtime band (96-127) above the kernel. + * The scheduler will bump us back into the application band though if we + * lie too much about our computation constraints... + * The maximum available amount of resources granted in that band is using + * half of the available bus cycles, and computation (nominally 1/10 of + * the time constraint) is a hint to the scheduler where to place our + * realtime threads relative to each other. + * If someone is violating the time contraint policy, they will be moved + * back where they were (non-timeshare application band with very high + * importance), which is on XNU equivalent to setting SCHED_RR with the + * pthread API. */ + struct thread_time_constraint_policy thread_time_constraint_policy; + int realtime_priority = base_priority - THREAD_BASE_PRIORITY_LOWRT; + unsigned int max_constraint = mach_ticks_per_second / 2; + unsigned int max_computation = max_constraint / 10; + /* unfortunately we can't give a hint for the periodicity of calculations */ + thread_time_constraint_policy.period = 0; + thread_time_constraint_policy.constraint = max_constraint; + thread_time_constraint_policy.computation = realtime_priority * max_computation / 16; + thread_time_constraint_policy.preemptible = thread->priority == THREAD_PRIORITY_TIME_CRITICAL ? 0 : 1; + thread_policy_set( thread_port, THREAD_TIME_CONSTRAINT_POLICY, + (thread_policy_t)&thread_time_constraint_policy, + THREAD_TIME_CONSTRAINT_POLICY_COUNT ); + } mach_port_deallocate( mach_task_self(), thread_port ); }
From: Marc-Aurel Zent mzent@codeweavers.com
This is needed for Mach based thread priorities to take effect, since before that the process port was not known. --- server/process.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/server/process.c b/server/process.c index 1e48cc43014..eb651a34594 100644 --- a/server/process.c +++ b/server/process.c @@ -1440,6 +1440,8 @@ DECL_HANDLER(init_process_done) process->start_time = current_time;
init_process_tracing( process ); + /* Re-apply all thread priorities here, after process tracing is initialized */ + set_process_priority( process, process->priority ); generate_startup_debug_events( process ); set_process_startup_state( process, STARTUP_DONE );
On Tue Feb 18 10:41:19 2025 +0000, Marc-Aurel Zent wrote:
changed this line in [version 2 of the diff](/wine/wine/-/merge_requests/7317/diffs?diff_id=158371&start_sha=1b5fe945f76aad4c1b2ccf5a0e00289e38921395#c9d2907d0f5a89f79a28a80568c303e7f0683af1_1414_1425)
Thanks, I declared it now above all requests. The idea was originally to put it next to `set_process_affinity()`, but this also works.
On Mon Feb 17 15:58:41 2025 +0000, Rémi Bernon wrote:
I don't really know about macOS, but I don't think implementing realtime priorities is a good idea. At least on Linux I would advise against it (and it needs specific permission anyway). It's IMO putting the system at a higher risk of becoming unresponsive in case of bogus or rogue application.
The only thing needed on macOS to get a thread into the realtime band is the thread port, and asking the scheduler for reasonable time constraints.
In my experience even putting an entire game in the realtime band works fairly well (some threads get demoted though after a while); the performance is slightly worse though than staying in the high application band. Putting taskmgr into the realtime band makes it stay there indefinitely with no issues AFAICT too.
System responsiveness was also still fine, at least in my testing. It is also soft-realtime and during very high system load threads get put back to where they originally were relatively quickly (especially when not adhering to their computation constraints), which is probably why the API is so lax with permissions.
On Tue Feb 18 11:05:58 2025 +0000, Rémi Bernon wrote:
Nit: wineserver style doesn't indent cases.
Thanks, should be fixed now in v2.
This merge request was approved by Rémi Bernon.
Alexandre Julliard (@julliard) commented about server/process.c:
process->start_time = current_time; init_process_tracing( process );
- /* Re-apply all thread priorities here, after process tracing is initialized */
- set_process_priority( process, process->priority );
This should be called from the Mach code, it's not needed on Linux.
On Tue Feb 18 11:05:21 2025 +0000, Marc-Aurel Zent wrote:
The only thing needed on macOS to get a thread into the realtime band is the thread port, and asking the scheduler for reasonable time constraints. In my experience even putting an entire game in the realtime band works fairly well (some threads get demoted though after a while); the performance is slightly worse though than staying in the high application band. Putting taskmgr into the realtime band makes it stay there indefinitely with no issues AFAICT too. System responsiveness was also still fine, at least in my testing. It is also soft-realtime and during very high system load threads get put back to where they originally were relatively quickly (especially when not adhering to their computation constraints), which is probably why the API is so lax with permissions.
For what it's worth, Apple includes a game sample with the Game Porting Toolkit that creates a high-priority render thread using `SCHED_RR` and `sched_priority = 45` (take a look at gptk-sample/08 - MetalRendering/README.md` in `Game_Porting_Toolkit_2.0.dmg`). I can also ask our Apple contact whether they'd recommend setting priorities into the realtime band.