This adds Mach thread priority support (both in the application and realtime band) and recalculates thread priorities when the process priority changes.
Part 3, which is still a bit WIP deals with implementing priority boosts (for main threads and threads which are processing window messages), effectively fully replacing https://gitlab.winehq.org/wine/wine/-/merge_requests/1232.
Currently the implementation in this MR already technically overrides what https://gitlab.winehq.org/wine/wine/-/merge_requests/1232 does, if it makes sense I can also revert it here.
I added a few comments regarding the Mach thread priority API usage, as there is limited documentation available, and much was inferred from the source or by testing. If this is too verbose I can also remove that...
-- v3: server: Apply Mach thread priorities after process tracing is initialized. server: Implement apply_thread_priority on macOS for realtime priorities. server: Implement apply_thread_priority on macOS for application priorities. kernel32/tests: Setting process priority on a terminated process should succeed. server: Also set thread priorities upon process priority change.
From: Marc-Aurel Zent marc_aurel@me.com
--- server/process.c | 13 ++++++++++++- server/process.h | 1 + 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/server/process.c b/server/process.c index e06350f7311..a7408db1e96 100644 --- a/server/process.c +++ b/server/process.c @@ -1599,6 +1599,17 @@ DECL_HANDLER(get_process_vm_counters) release_object( process ); }
+void set_process_priority( struct process *process, int priority ) +{ + struct thread *thread; + process->priority = priority; + + LIST_FOR_EACH_ENTRY( thread, &process->thread_list, struct thread, proc_entry ) + { + set_thread_priority( thread, priority, thread->priority ); + } +} + static void set_process_affinity( struct process *process, affinity_t affinity ) { struct thread *thread; @@ -1624,7 +1635,7 @@ DECL_HANDLER(set_process_info)
if ((process = get_process_from_handle( req->handle, PROCESS_SET_INFORMATION ))) { - if (req->mask & SET_PROCESS_INFO_PRIORITY) process->priority = req->priority; + if (req->mask & SET_PROCESS_INFO_PRIORITY) set_process_priority( process, req->priority ); if (req->mask & SET_PROCESS_INFO_AFFINITY) set_process_affinity( process, req->affinity ); if (req->mask & SET_PROCESS_INFO_TOKEN) { diff --git a/server/process.h b/server/process.h index 96814ab7cf8..9238d638f15 100644 --- a/server/process.h +++ b/server/process.h @@ -116,6 +116,7 @@ extern void kill_process( struct process *process, int violent_death ); extern void kill_console_processes( struct thread *renderer, int exit_code ); extern void detach_debugged_processes( struct debug_obj *debug_obj, int exit_code ); extern void enum_processes( int (*cb)(struct process*, void*), void *user); +extern void set_process_priority( struct process *process, int priority );
/* console functions */ extern struct thread *console_get_renderer( struct console *console );
From: Marc-Aurel Zent mzent@codeweavers.com
--- dlls/kernel32/tests/loader.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/dlls/kernel32/tests/loader.c b/dlls/kernel32/tests/loader.c index 2c7cc784be4..6cf6971ba04 100644 --- a/dlls/kernel32/tests/loader.c +++ b/dlls/kernel32/tests/loader.c @@ -3672,6 +3672,7 @@ static void test_ExitProcess(void) struct PROCESS_BASIC_INFORMATION_PRIVATE pbi; MEMORY_BASIC_INFORMATION mbi; DWORD_PTR affinity; + PROCESS_PRIORITY_CLASS ppc; void *addr; LARGE_INTEGER offset; SIZE_T size; @@ -4011,6 +4012,10 @@ static void test_ExitProcess(void) affinity = 1; ret = pNtSetInformationProcess(pi.hProcess, ProcessAffinityMask, &affinity, sizeof(affinity)); ok(ret == STATUS_PROCESS_IS_TERMINATING, "expected STATUS_PROCESS_IS_TERMINATING, got %#lx\n", ret); + ppc.Foreground = FALSE; + ppc.PriorityClass = PROCESS_PRIOCLASS_BELOW_NORMAL; + ret = pNtSetInformationProcess(pi.hProcess, ProcessPriorityClass, &ppc, sizeof(ppc)); + ok(ret == STATUS_SUCCESS, "expected STATUS_SUCCESS, got status %#lx\n", ret);
SetLastError(0xdeadbeef); ctx.ContextFlags = CONTEXT_INTEGER;
From: Marc-Aurel Zent mzent@codeweavers.com
--- server/thread.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+)
diff --git a/server/thread.c b/server/thread.c index 3c7e4541a09..ac01d7e4f01 100644 --- a/server/thread.c +++ b/server/thread.c @@ -40,6 +40,11 @@ #ifdef HAVE_SYS_RESOURCE_H #include <sys/resource.h> #endif +#ifdef __APPLE__ +#include <mach/mach_init.h> +#include <mach/mach_port.h> +#include <mach/thread_act.h> +#endif
#include "ntstatus.h" #define WIN32_NO_STATUS @@ -252,6 +257,79 @@ static void apply_thread_priority( struct thread *thread, int base_priority ) setpriority( PRIO_PROCESS, thread->unix_tid, niceness ); }
+#elif defined(__APPLE__) + +void init_threading(void) +{ +} + +static int get_mach_importance( int base_priority ) +{ + int min = -31, max = 32, range = max - min; + return min + (base_priority - 1) * range / 14; +} + +static void apply_thread_priority( struct thread *thread, int base_priority ) +{ + kern_return_t kr; + mach_msg_type_name_t type; + int throughput_qos, latency_qos; + struct thread_extended_policy thread_extended_policy; + struct thread_precedence_policy thread_precedence_policy; + mach_port_t thread_port, process_port = thread->process->trace_data; + + if (!process_port) return; + kr = mach_port_extract_right( process_port, thread->unix_tid, + MACH_MSG_TYPE_COPY_SEND, &thread_port, &type ); + if (kr != KERN_SUCCESS) return; + /* base priority 15 is for time-critical threads, so not compute-bound */ + thread_extended_policy.timeshare = base_priority > 14 ? 0 : 1; + thread_precedence_policy.importance = get_mach_importance( base_priority ); + /* adapted from the QoS table at xnu/osfmk/kern/thread_policy.c */ + switch (thread->priority) + { + case THREAD_PRIORITY_IDLE: /* THREAD_QOS_MAINTENANCE */ + case THREAD_PRIORITY_LOWEST: /* THREAD_QOS_BACKGROUND */ + throughput_qos = THROUGHPUT_QOS_TIER_5; + latency_qos = LATENCY_QOS_TIER_3; + break; + case THREAD_PRIORITY_BELOW_NORMAL: /* THREAD_QOS_UTILITY */ + throughput_qos = THROUGHPUT_QOS_TIER_2; + latency_qos = LATENCY_QOS_TIER_3; + break; + case THREAD_PRIORITY_NORMAL: /* THREAD_QOS_LEGACY */ + case THREAD_PRIORITY_ABOVE_NORMAL: /* THREAD_QOS_USER_INITIATED */ + throughput_qos = THROUGHPUT_QOS_TIER_1; + latency_qos = LATENCY_QOS_TIER_1; + break; + case THREAD_PRIORITY_HIGHEST: /* THREAD_QOS_USER_INTERACTIVE */ + throughput_qos = THROUGHPUT_QOS_TIER_0; + latency_qos = LATENCY_QOS_TIER_0; + break; + case THREAD_PRIORITY_TIME_CRITICAL: + default: /* THREAD_QOS_UNSPECIFIED */ + throughput_qos = THROUGHPUT_QOS_TIER_UNSPECIFIED; + latency_qos = LATENCY_QOS_TIER_UNSPECIFIED; + break; + } + /* QOS_UNSPECIFIED is assigned the highest tier available, so it does not provide a limit */ + if (base_priority > THREAD_BASE_PRIORITY_LOWRT) + { + throughput_qos = THROUGHPUT_QOS_TIER_UNSPECIFIED; + latency_qos = LATENCY_QOS_TIER_UNSPECIFIED; + } + + thread_policy_set( thread_port, THREAD_LATENCY_QOS_POLICY, (thread_policy_t)&latency_qos, + THREAD_LATENCY_QOS_POLICY_COUNT ); + thread_policy_set( thread_port, THREAD_THROUGHPUT_QOS_POLICY, (thread_policy_t)&throughput_qos, + THREAD_THROUGHPUT_QOS_POLICY_COUNT ); + thread_policy_set( thread_port, THREAD_EXTENDED_POLICY, (thread_policy_t)&thread_extended_policy, + THREAD_EXTENDED_POLICY_COUNT ); + thread_policy_set( thread_port, THREAD_PRECEDENCE_POLICY, (thread_policy_t)&thread_precedence_policy, + THREAD_PRECEDENCE_POLICY_COUNT ); + mach_port_deallocate( mach_task_self(), thread_port ); +} + #else
void init_threading(void)
From: Marc-Aurel Zent mzent@codeweavers.com
--- server/thread.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+)
diff --git a/server/thread.c b/server/thread.c index ac01d7e4f01..bb9b6f409c3 100644 --- a/server/thread.c +++ b/server/thread.c @@ -42,6 +42,7 @@ #endif #ifdef __APPLE__ #include <mach/mach_init.h> +#include <mach/mach_time.h> #include <mach/mach_port.h> #include <mach/thread_act.h> #endif @@ -258,9 +259,21 @@ static void apply_thread_priority( struct thread *thread, int base_priority ) }
#elif defined(__APPLE__) +static unsigned int mach_ticks_per_second;
void init_threading(void) { + struct mach_timebase_info tb_info; + if (mach_timebase_info( &tb_info ) == KERN_SUCCESS) + { + mach_ticks_per_second = (tb_info.denom * 1000000000U) / tb_info.numer; + } + else + { + const unsigned int best_guess = 24000000U; + fprintf(stderr, "wine: mach_timebase_info failed, guessing %u mach ticks per second\n", best_guess); + mach_ticks_per_second = best_guess; + } }
static int get_mach_importance( int base_priority ) @@ -327,6 +340,33 @@ static void apply_thread_priority( struct thread *thread, int base_priority ) THREAD_EXTENDED_POLICY_COUNT ); thread_policy_set( thread_port, THREAD_PRECEDENCE_POLICY, (thread_policy_t)&thread_precedence_policy, THREAD_PRECEDENCE_POLICY_COUNT ); + if (base_priority > THREAD_BASE_PRIORITY_LOWRT) + { + /* For realtime threads we are requesting from the scheduler to be moved + * into the Mach realtime band (96-127) above the kernel. + * The scheduler will bump us back into the application band though if we + * lie too much about our computation constraints... + * The maximum available amount of resources granted in that band is using + * half of the available bus cycles, and computation (nominally 1/10 of + * the time constraint) is a hint to the scheduler where to place our + * realtime threads relative to each other. + * If someone is violating the time contraint policy, they will be moved + * back where they were (non-timeshare application band with very high + * importance), which is on XNU equivalent to setting SCHED_RR with the + * pthread API. */ + struct thread_time_constraint_policy thread_time_constraint_policy; + int realtime_priority = base_priority - THREAD_BASE_PRIORITY_LOWRT; + unsigned int max_constraint = mach_ticks_per_second / 2; + unsigned int max_computation = max_constraint / 10; + /* unfortunately we can't give a hint for the periodicity of calculations */ + thread_time_constraint_policy.period = 0; + thread_time_constraint_policy.constraint = max_constraint; + thread_time_constraint_policy.computation = realtime_priority * max_computation / 16; + thread_time_constraint_policy.preemptible = thread->priority == THREAD_PRIORITY_TIME_CRITICAL ? 0 : 1; + thread_policy_set( thread_port, THREAD_TIME_CONSTRAINT_POLICY, + (thread_policy_t)&thread_time_constraint_policy, + THREAD_TIME_CONSTRAINT_POLICY_COUNT ); + } mach_port_deallocate( mach_task_self(), thread_port ); }
From: Marc-Aurel Zent mzent@codeweavers.com
--- server/mach.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/server/mach.c b/server/mach.c index c3d4a33bdc2..0472e7a6701 100644 --- a/server/mach.c +++ b/server/mach.c @@ -155,6 +155,9 @@ void init_process_tracing( struct process *process ) mach_port_deallocate( mach_task_self(), msg.task_port.name ); } } + /* On Mach thread priorities depend on having the process port available, so + * reapply all thread priorities here after process tracing is initialized */ + set_process_priority( process, process->priority ); }
/* terminate the per-process tracing mechanism */
On Wed Feb 19 11:39:39 2025 +0000, Marc-Aurel Zent wrote:
changed this line in [version 3 of the diff](/wine/wine/-/merge_requests/7317/diffs?diff_id=158727&start_sha=961c0a18824ee5d04eacb29ffc7f1f90d597bdbe#c9d2907d0f5a89f79a28a80568c303e7f0683af1_1444_1432)
Thanks, should be fixed now...
I was wondering though if it would be better to use a `#ifdef USE_MACH` instead of `#ifdef __APPLE__` in thread.c now.
On Wed Feb 19 06:30:08 2025 +0000, Brendan Shanks wrote:
For what it's worth, Apple includes a game sample with the Game Porting Toolkit that creates a high-priority render thread using `SCHED_RR` and `sched_priority = 45` (take a look at gptk-sample/08 - MetalRendering/README.md` in `Game_Porting_Toolkit_2.0.dmg`). I can also ask our Apple contact whether they'd recommend setting priorities into the realtime band.
The realtime band should only be used by audio and video applications mostly I believe (and currently wine does not implement completely independent per thread priorities from the process priority class anyways, that's something that will be in part 3 though), so it is more of an audio or video server/driver kind of usage. That is also where this API is being used on macOS, like in some VLC-demux plugins or Jack2.
IIRC on windows at least administrative privileges are needed to use the NT realtime bad, so it's not something games usually do. I tried to capture as much of the NT semantics as possible, including preemption with priority 31, as discussed [here](https://community.osr.com/t/thread-boost-and-dynamic-priority/58044/6) (I ignored the job object part intentionally for now though and I believe that is something that isn't fully implemented anyways atm):
As was mentioned your process will need REALTIME_PRIORITY_CLASS to reach priority 31 (or anything above 15). With a priority of 31, you can also receive “non-preemptive scheduling” from the dispatcher if you also create a job object for the process and set JobObjectBasicLimitInformation with a SchedulingClass of 9. Note that scheduling class is not the same as priority.
To get an effective scheduling priority on a thread of native priority 45, with this current implementation it could be a combination of `ABOVE_NORMAL_PRIORITY_CLASS` + `THREAD_PRIORITY_ABOVE_NORMAL`. Or alternatively with part 3 a normal `NORMAL_PRIORITY_CLASS` + `THREAD_PRIORITY_NORMAL` with a +3 boost (which is a fairly common value for the main thread on Windows), or anything else really that results in an NT base priority of 11.
This implementation differs from `SCHED_RR` in the way in that it brings back thread QoS classes, after setting thread importance, and I believe `SCHED_RR` is also setting `thread_extended_policy.timeshare` to 0, which is something this current implementation only does for `THREAD_PRIORITY_TIME_CRITICAL` and the realtime band.
But it would be interesting to hear the input of an Apple contact on this as well.
On Wed Feb 19 12:30:51 2025 +0000, Marc-Aurel Zent wrote:
The realtime band should only be used by audio and video applications mostly I believe (and currently wine does not implement completely independent per thread priorities from the process priority class anyways, that's something that will be in part 3 though), so it is more of an audio or video server/driver kind of usage. That is also where this API is being used on macOS, like in some VLC-demux plugins or Jack2. IIRC on windows at least administrative privileges are needed to use the NT realtime band, so it's not something games usually do. I tried to capture as much of the NT semantics as possible, including preemption with priority 31, as discussed [here](https://community.osr.com/t/thread-boost-and-dynamic-priority/58044/6) (I ignored the job object part intentionally for now though and I believe that is something that isn't fully implemented anyways atm):
As was mentioned your process will need REALTIME_PRIORITY_CLASS to
reach priority 31 (or anything above 15).
With a priority of 31, you can also receive “non-preemptive
scheduling” from the dispatcher if you also create a job object for the process and set JobObjectBasicLimitInformation with a SchedulingClass of 9. Note that scheduling class is not the same as priority. To get an effective scheduling priority on a thread of native priority 45, with this current implementation it could be a combination of `ABOVE_NORMAL_PRIORITY_CLASS` + `THREAD_PRIORITY_ABOVE_NORMAL`. Or alternatively with part 3 a normal `NORMAL_PRIORITY_CLASS` + `THREAD_PRIORITY_NORMAL` with a +3 boost (which is a fairly common value for the main thread on Windows), or anything else really that results in an NT base priority of 11. This implementation differs from `SCHED_RR` in the way in that it brings back thread QoS classes, after setting thread importance, and I believe `SCHED_RR` is also setting `thread_extended_policy.timeshare` to 0, which is something this current implementation only does for `THREAD_PRIORITY_TIME_CRITICAL` and the realtime band. But it would be interesting to hear the input of an Apple contact on this as well.
This is also a fairly good resource on the topic I found https://youtu.be/jiuzW9IKCeE?si=rMtvCA_wgPs0gMl1&t=1272