[PATCH v3 0/5] MR7317: Server-side thread priorities implementation (Part 2)
This adds Mach thread priority support (both in the application and realtime band) and recalculates thread priorities when the process priority changes. Part 3, which is still a bit WIP deals with implementing priority boosts (for main threads and threads which are processing window messages), effectively fully replacing https://gitlab.winehq.org/wine/wine/-/merge_requests/1232. Currently the implementation in this MR already technically overrides what https://gitlab.winehq.org/wine/wine/-/merge_requests/1232 does, if it makes sense I can also revert it here. I added a few comments regarding the Mach thread priority API usage, as there is limited documentation available, and much was inferred from the source or by testing. If this is too verbose I can also remove that... -- v3: server: Apply Mach thread priorities after process tracing is initialized. server: Implement apply_thread_priority on macOS for realtime priorities. server: Implement apply_thread_priority on macOS for application priorities. kernel32/tests: Setting process priority on a terminated process should succeed. server: Also set thread priorities upon process priority change. https://gitlab.winehq.org/wine/wine/-/merge_requests/7317
From: Marc-Aurel Zent <marc_aurel(a)me.com> --- server/process.c | 13 ++++++++++++- server/process.h | 1 + 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/server/process.c b/server/process.c index e06350f7311..a7408db1e96 100644 --- a/server/process.c +++ b/server/process.c @@ -1599,6 +1599,17 @@ DECL_HANDLER(get_process_vm_counters) release_object( process ); } +void set_process_priority( struct process *process, int priority ) +{ + struct thread *thread; + process->priority = priority; + + LIST_FOR_EACH_ENTRY( thread, &process->thread_list, struct thread, proc_entry ) + { + set_thread_priority( thread, priority, thread->priority ); + } +} + static void set_process_affinity( struct process *process, affinity_t affinity ) { struct thread *thread; @@ -1624,7 +1635,7 @@ DECL_HANDLER(set_process_info) if ((process = get_process_from_handle( req->handle, PROCESS_SET_INFORMATION ))) { - if (req->mask & SET_PROCESS_INFO_PRIORITY) process->priority = req->priority; + if (req->mask & SET_PROCESS_INFO_PRIORITY) set_process_priority( process, req->priority ); if (req->mask & SET_PROCESS_INFO_AFFINITY) set_process_affinity( process, req->affinity ); if (req->mask & SET_PROCESS_INFO_TOKEN) { diff --git a/server/process.h b/server/process.h index 96814ab7cf8..9238d638f15 100644 --- a/server/process.h +++ b/server/process.h @@ -116,6 +116,7 @@ extern void kill_process( struct process *process, int violent_death ); extern void kill_console_processes( struct thread *renderer, int exit_code ); extern void detach_debugged_processes( struct debug_obj *debug_obj, int exit_code ); extern void enum_processes( int (*cb)(struct process*, void*), void *user); +extern void set_process_priority( struct process *process, int priority ); /* console functions */ extern struct thread *console_get_renderer( struct console *console ); -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/7317
From: Marc-Aurel Zent <mzent(a)codeweavers.com> --- dlls/kernel32/tests/loader.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/dlls/kernel32/tests/loader.c b/dlls/kernel32/tests/loader.c index 2c7cc784be4..6cf6971ba04 100644 --- a/dlls/kernel32/tests/loader.c +++ b/dlls/kernel32/tests/loader.c @@ -3672,6 +3672,7 @@ static void test_ExitProcess(void) struct PROCESS_BASIC_INFORMATION_PRIVATE pbi; MEMORY_BASIC_INFORMATION mbi; DWORD_PTR affinity; + PROCESS_PRIORITY_CLASS ppc; void *addr; LARGE_INTEGER offset; SIZE_T size; @@ -4011,6 +4012,10 @@ static void test_ExitProcess(void) affinity = 1; ret = pNtSetInformationProcess(pi.hProcess, ProcessAffinityMask, &affinity, sizeof(affinity)); ok(ret == STATUS_PROCESS_IS_TERMINATING, "expected STATUS_PROCESS_IS_TERMINATING, got %#lx\n", ret); + ppc.Foreground = FALSE; + ppc.PriorityClass = PROCESS_PRIOCLASS_BELOW_NORMAL; + ret = pNtSetInformationProcess(pi.hProcess, ProcessPriorityClass, &ppc, sizeof(ppc)); + ok(ret == STATUS_SUCCESS, "expected STATUS_SUCCESS, got status %#lx\n", ret); SetLastError(0xdeadbeef); ctx.ContextFlags = CONTEXT_INTEGER; -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/7317
From: Marc-Aurel Zent <mzent(a)codeweavers.com> --- server/thread.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/server/thread.c b/server/thread.c index 3c7e4541a09..ac01d7e4f01 100644 --- a/server/thread.c +++ b/server/thread.c @@ -40,6 +40,11 @@ #ifdef HAVE_SYS_RESOURCE_H #include <sys/resource.h> #endif +#ifdef __APPLE__ +#include <mach/mach_init.h> +#include <mach/mach_port.h> +#include <mach/thread_act.h> +#endif #include "ntstatus.h" #define WIN32_NO_STATUS @@ -252,6 +257,79 @@ static void apply_thread_priority( struct thread *thread, int base_priority ) setpriority( PRIO_PROCESS, thread->unix_tid, niceness ); } +#elif defined(__APPLE__) + +void init_threading(void) +{ +} + +static int get_mach_importance( int base_priority ) +{ + int min = -31, max = 32, range = max - min; + return min + (base_priority - 1) * range / 14; +} + +static void apply_thread_priority( struct thread *thread, int base_priority ) +{ + kern_return_t kr; + mach_msg_type_name_t type; + int throughput_qos, latency_qos; + struct thread_extended_policy thread_extended_policy; + struct thread_precedence_policy thread_precedence_policy; + mach_port_t thread_port, process_port = thread->process->trace_data; + + if (!process_port) return; + kr = mach_port_extract_right( process_port, thread->unix_tid, + MACH_MSG_TYPE_COPY_SEND, &thread_port, &type ); + if (kr != KERN_SUCCESS) return; + /* base priority 15 is for time-critical threads, so not compute-bound */ + thread_extended_policy.timeshare = base_priority > 14 ? 0 : 1; + thread_precedence_policy.importance = get_mach_importance( base_priority ); + /* adapted from the QoS table at xnu/osfmk/kern/thread_policy.c */ + switch (thread->priority) + { + case THREAD_PRIORITY_IDLE: /* THREAD_QOS_MAINTENANCE */ + case THREAD_PRIORITY_LOWEST: /* THREAD_QOS_BACKGROUND */ + throughput_qos = THROUGHPUT_QOS_TIER_5; + latency_qos = LATENCY_QOS_TIER_3; + break; + case THREAD_PRIORITY_BELOW_NORMAL: /* THREAD_QOS_UTILITY */ + throughput_qos = THROUGHPUT_QOS_TIER_2; + latency_qos = LATENCY_QOS_TIER_3; + break; + case THREAD_PRIORITY_NORMAL: /* THREAD_QOS_LEGACY */ + case THREAD_PRIORITY_ABOVE_NORMAL: /* THREAD_QOS_USER_INITIATED */ + throughput_qos = THROUGHPUT_QOS_TIER_1; + latency_qos = LATENCY_QOS_TIER_1; + break; + case THREAD_PRIORITY_HIGHEST: /* THREAD_QOS_USER_INTERACTIVE */ + throughput_qos = THROUGHPUT_QOS_TIER_0; + latency_qos = LATENCY_QOS_TIER_0; + break; + case THREAD_PRIORITY_TIME_CRITICAL: + default: /* THREAD_QOS_UNSPECIFIED */ + throughput_qos = THROUGHPUT_QOS_TIER_UNSPECIFIED; + latency_qos = LATENCY_QOS_TIER_UNSPECIFIED; + break; + } + /* QOS_UNSPECIFIED is assigned the highest tier available, so it does not provide a limit */ + if (base_priority > THREAD_BASE_PRIORITY_LOWRT) + { + throughput_qos = THROUGHPUT_QOS_TIER_UNSPECIFIED; + latency_qos = LATENCY_QOS_TIER_UNSPECIFIED; + } + + thread_policy_set( thread_port, THREAD_LATENCY_QOS_POLICY, (thread_policy_t)&latency_qos, + THREAD_LATENCY_QOS_POLICY_COUNT ); + thread_policy_set( thread_port, THREAD_THROUGHPUT_QOS_POLICY, (thread_policy_t)&throughput_qos, + THREAD_THROUGHPUT_QOS_POLICY_COUNT ); + thread_policy_set( thread_port, THREAD_EXTENDED_POLICY, (thread_policy_t)&thread_extended_policy, + THREAD_EXTENDED_POLICY_COUNT ); + thread_policy_set( thread_port, THREAD_PRECEDENCE_POLICY, (thread_policy_t)&thread_precedence_policy, + THREAD_PRECEDENCE_POLICY_COUNT ); + mach_port_deallocate( mach_task_self(), thread_port ); +} + #else void init_threading(void) -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/7317
From: Marc-Aurel Zent <mzent(a)codeweavers.com> --- server/thread.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/server/thread.c b/server/thread.c index ac01d7e4f01..bb9b6f409c3 100644 --- a/server/thread.c +++ b/server/thread.c @@ -42,6 +42,7 @@ #endif #ifdef __APPLE__ #include <mach/mach_init.h> +#include <mach/mach_time.h> #include <mach/mach_port.h> #include <mach/thread_act.h> #endif @@ -258,9 +259,21 @@ static void apply_thread_priority( struct thread *thread, int base_priority ) } #elif defined(__APPLE__) +static unsigned int mach_ticks_per_second; void init_threading(void) { + struct mach_timebase_info tb_info; + if (mach_timebase_info( &tb_info ) == KERN_SUCCESS) + { + mach_ticks_per_second = (tb_info.denom * 1000000000U) / tb_info.numer; + } + else + { + const unsigned int best_guess = 24000000U; + fprintf(stderr, "wine: mach_timebase_info failed, guessing %u mach ticks per second\n", best_guess); + mach_ticks_per_second = best_guess; + } } static int get_mach_importance( int base_priority ) @@ -327,6 +340,33 @@ static void apply_thread_priority( struct thread *thread, int base_priority ) THREAD_EXTENDED_POLICY_COUNT ); thread_policy_set( thread_port, THREAD_PRECEDENCE_POLICY, (thread_policy_t)&thread_precedence_policy, THREAD_PRECEDENCE_POLICY_COUNT ); + if (base_priority > THREAD_BASE_PRIORITY_LOWRT) + { + /* For realtime threads we are requesting from the scheduler to be moved + * into the Mach realtime band (96-127) above the kernel. + * The scheduler will bump us back into the application band though if we + * lie too much about our computation constraints... + * The maximum available amount of resources granted in that band is using + * half of the available bus cycles, and computation (nominally 1/10 of + * the time constraint) is a hint to the scheduler where to place our + * realtime threads relative to each other. + * If someone is violating the time contraint policy, they will be moved + * back where they were (non-timeshare application band with very high + * importance), which is on XNU equivalent to setting SCHED_RR with the + * pthread API. */ + struct thread_time_constraint_policy thread_time_constraint_policy; + int realtime_priority = base_priority - THREAD_BASE_PRIORITY_LOWRT; + unsigned int max_constraint = mach_ticks_per_second / 2; + unsigned int max_computation = max_constraint / 10; + /* unfortunately we can't give a hint for the periodicity of calculations */ + thread_time_constraint_policy.period = 0; + thread_time_constraint_policy.constraint = max_constraint; + thread_time_constraint_policy.computation = realtime_priority * max_computation / 16; + thread_time_constraint_policy.preemptible = thread->priority == THREAD_PRIORITY_TIME_CRITICAL ? 0 : 1; + thread_policy_set( thread_port, THREAD_TIME_CONSTRAINT_POLICY, + (thread_policy_t)&thread_time_constraint_policy, + THREAD_TIME_CONSTRAINT_POLICY_COUNT ); + } mach_port_deallocate( mach_task_self(), thread_port ); } -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/7317
From: Marc-Aurel Zent <mzent(a)codeweavers.com> --- server/mach.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/server/mach.c b/server/mach.c index c3d4a33bdc2..0472e7a6701 100644 --- a/server/mach.c +++ b/server/mach.c @@ -155,6 +155,9 @@ void init_process_tracing( struct process *process ) mach_port_deallocate( mach_task_self(), msg.task_port.name ); } } + /* On Mach thread priorities depend on having the process port available, so + * reapply all thread priorities here after process tracing is initialized */ + set_process_priority( process, process->priority ); } /* terminate the per-process tracing mechanism */ -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/7317
On Wed Feb 19 11:39:39 2025 +0000, Marc-Aurel Zent wrote:
changed this line in [version 3 of the diff](/wine/wine/-/merge_requests/7317/diffs?diff_id=158727&start_sha=961c0a18824ee5d04eacb29ffc7f1f90d597bdbe#c9d2907d0f5a89f79a28a80568c303e7f0683af1_1444_1432) Thanks, should be fixed now...
I was wondering though if it would be better to use a `#ifdef USE_MACH` instead of `#ifdef __APPLE__` in thread.c now. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/7317#note_95156
On Wed Feb 19 06:30:08 2025 +0000, Brendan Shanks wrote:
For what it's worth, Apple includes a game sample with the Game Porting Toolkit that creates a high-priority render thread using `SCHED_RR` and `sched_priority = 45` (take a look at gptk-sample/08 - MetalRendering/README.md` in `Game_Porting_Toolkit_2.0.dmg`). I can also ask our Apple contact whether they'd recommend setting priorities into the realtime band. The realtime band should only be used by audio and video applications mostly I believe (and currently wine does not implement completely independent per thread priorities from the process priority class anyways, that's something that will be in part 3 though), so it is more of an audio or video server/driver kind of usage. That is also where this API is being used on macOS, like in some VLC-demux plugins or Jack2.
IIRC on windows at least administrative privileges are needed to use the NT realtime bad, so it's not something games usually do. I tried to capture as much of the NT semantics as possible, including preemption with priority 31, as discussed [here](https://community.osr.com/t/thread-boost-and-dynamic-priority/58044/6) (I ignored the job object part intentionally for now though and I believe that is something that isn't fully implemented anyways atm):
As was mentioned your process will need REALTIME_PRIORITY_CLASS to reach priority 31 (or anything above 15). With a priority of 31, you can also receive “non-preemptive scheduling” from the dispatcher if you also create a job object for the process and set JobObjectBasicLimitInformation with a SchedulingClass of 9. Note that scheduling class is not the same as priority.
To get an effective scheduling priority on a thread of native priority 45, with this current implementation it could be a combination of `ABOVE_NORMAL_PRIORITY_CLASS` + `THREAD_PRIORITY_ABOVE_NORMAL`. Or alternatively with part 3 a normal `NORMAL_PRIORITY_CLASS` + `THREAD_PRIORITY_NORMAL` with a +3 boost (which is a fairly common value for the main thread on Windows), or anything else really that results in an NT base priority of 11. This implementation differs from `SCHED_RR` in the way in that it brings back thread QoS classes, after setting thread importance, and I believe `SCHED_RR` is also setting `thread_extended_policy.timeshare` to 0, which is something this current implementation only does for `THREAD_PRIORITY_TIME_CRITICAL` and the realtime band. But it would be interesting to hear the input of an Apple contact on this as well. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/7317#note_95162
On Wed Feb 19 12:30:51 2025 +0000, Marc-Aurel Zent wrote:
As was mentioned your process will need REALTIME_PRIORITY_CLASS to reach priority 31 (or anything above 15). With a priority of 31, you can also receive “non-preemptive scheduling” from the dispatcher if you also create a job object for the process and set JobObjectBasicLimitInformation with a SchedulingClass of
The realtime band should only be used by audio and video applications mostly I believe (and currently wine does not implement completely independent per thread priorities from the process priority class anyways, that's something that will be in part 3 though), so it is more of an audio or video server/driver kind of usage. That is also where this API is being used on macOS, like in some VLC-demux plugins or Jack2. IIRC on windows at least administrative privileges are needed to use the NT realtime band, so it's not something games usually do. I tried to capture as much of the NT semantics as possible, including preemption with priority 31, as discussed [here](https://community.osr.com/t/thread-boost-and-dynamic-priority/58044/6) (I ignored the job object part intentionally for now though and I believe that is something that isn't fully implemented anyways atm): 9. Note that scheduling class is not the same as priority. To get an effective scheduling priority on a thread of native priority 45, with this current implementation it could be a combination of `ABOVE_NORMAL_PRIORITY_CLASS` + `THREAD_PRIORITY_ABOVE_NORMAL`. Or alternatively with part 3 a normal `NORMAL_PRIORITY_CLASS` + `THREAD_PRIORITY_NORMAL` with a +3 boost (which is a fairly common value for the main thread on Windows), or anything else really that results in an NT base priority of 11. This implementation differs from `SCHED_RR` in the way in that it brings back thread QoS classes, after setting thread importance, and I believe `SCHED_RR` is also setting `thread_extended_policy.timeshare` to 0, which is something this current implementation only does for `THREAD_PRIORITY_TIME_CRITICAL` and the realtime band. But it would be interesting to hear the input of an Apple contact on this as well. This is also a fairly good resource on the topic I found https://youtu.be/jiuzW9IKCeE?si=rMtvCA_wgPs0gMl1&t=1272
-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7317#note_95202
participants (2)
-
Marc-Aurel Zent -
Marc-Aurel Zent (@mzent)