This merge request implements several NUMA functions previously stubbed in kernel32 and kernelbase, adds a basic NUMA node discovery/topology layer, and enriches the associated tests. It also improves the traceability of SetThreadGroupAffinity.
## Context / Motivation Some Windows applications (game engines, middleware, runtimes) query the NUMA API to adapt memory allocation or thread distribution. The lack of an implementation returned errors (ERROR_CALL_NOT_IMPLEMENTED) or unhelpful values, which could degrade the internal heuristics of these programs. This first implementation provides: - A logical topology derived from GetLogicalProcessorInformation. - A reasonable approximation of available memory per node. - Consistent processor masks for the present nodes. It prepares for future optimizations (targeted memory allocation, better scheduling strategies) without modifying the existing behavior of generic allocations.
## Main Changes - `kernel32/process.c`: - Implementation of GetNumaNodeProcessorMask, GetNumaAvailableMemoryNode / Ex, GetNumaProcessorNode / Ex, GetNumaProximityNode. - Parameter validation and consistent error propagation (ERROR_INVALID_PARAMETER). - `kernelbase/memory.c`: - New NUMA infrastructure (topology cache, lazy initialization, dedicated critical lock). - Topology reading via GetLogicalProcessorInformation. - Runtime options via environment variables: - WINE_NUMA_FORCE_SINGLE: Force a single logical node. - WINE_NUMA_CONTIG: Remap masks to produce contiguous blocks. - Implementations of GetNumaHighestNodeNumber, GetNumaNodeProcessorMaskEx, GetNumaProximityNodeEx. - Robust fallback: if no NUMA info → single node. - `kernelbase/thread.c`: - Added detailed traces in SetThreadGroupAffinity (removed the redundant DECLSPEC_HOTPATCH here). - Tests (`dlls/kernel32/tests/process.c`): - Added a new test, test_NumaBasic, covering: - GetNumaHighestNodeNumber - GetNumaNodeProcessorMaskEx (nodes 0 and 1) - GetNumaProximityNodeEx - Tolerant behavior: accepts `ERROR_INVALID_FUNCTION` / `ERROR_INVALID_PARAMETER` depending on the platform. - Added the `WINE_DEFAULT_DEBUG_CHANNEL(numa)` debug channel for the subsystem.
## Assumptions / Limitations - Support for a single processor group (Group = 0) for now. - Memory approximation: equal division of available physical memory (improvable later with internal counters per node). - Proximity = node (simplistic direct mapping). - No impact yet on VirtualAlloc / Heap allocation by node.
## Security / Concurrency - Initialization protected by dedicated critical section (numa_cs). - Thread-safe lazy read. - Table bounded to 64 nodes (historical Windows limit).
## Compatibility Impact - Improves compatibility with software probing the NUMA API. - Low risk of regression: previously failed paths now return TRUE with consistent data. - In case of topology collection failure → single-node fallback (conservative behavior).
## Validation / Tests - New test_NumaBasic added and integrated into the process suite. - Traces (numa channel) allow for detection diagnostics. - Invalid parameters tested (NULL, nodes out of range). - Works in environments without real NUMA via fallback.
## Environment Variables (quick documentation) - WINE_NUMA_FORCE_SINGLE=1: Forces a single node (mask covering all CPUs). - WINE_NUMA_CONTIG=1: Reallocates compact bit blocks per node (useful if the topology returns sparse masks).
## Potential Next Steps (not included) - Implement true memory tracking per node (via allocation hooks). - Multi-group support (PROCESSOR_GROUP_INFO). - Improved VirtualAllocExNuma / First-touch implementation. - More accurate proximity-to-node mapping on complex NUMA platforms. - Dedicated tests for environment variables.
## Potential Risks / Regressions - Applications relying on the absence of an API may slightly change their strategy (low). - Masks remapped with WINE_NUMA_CONTIG could surprise a profiling tool (opt-in option). - Memory approximation too coarse for very fine-grained heuristics (no functional regression expected).
## Request for Review - Verify logging conventions and TRACE_(numa) usage. - Verify the relevance of removing DECLSPEC_HOTPATCH on SetThreadGroupAffinity (alignment with local conventions). - Opinion on error granularity (ERROR_INVALID_PARAMETER vs. ERROR_INVALID_FUNCTION) for more accurate mimicry.
Once the kernel can handle those functions directly (in a NUMA module i.e.) we could use this implementation as a fallback when the kernel doesn't support NUMA natively (when the module cannot be loaded).
-- v2: kernelbase: Improve initialization of NUMA information to handle pathological cases
From: wasertech danny@waser.tech
--- dlls/kernel32/process.c | 134 +++++++++++++++++++++---- dlls/kernel32/tests/process.c | 63 ++++++++++++ dlls/kernelbase/memory.c | 180 +++++++++++++++++++++++++++++++--- dlls/kernelbase/thread.c | 25 ++++- 4 files changed, 373 insertions(+), 29 deletions(-)
diff --git a/dlls/kernel32/process.c b/dlls/kernel32/process.c index 23ec8ce7601..61cbc6a60f4 100644 --- a/dlls/kernel32/process.c +++ b/dlls/kernel32/process.c @@ -40,6 +40,11 @@
WINE_DEFAULT_DEBUG_CHANNEL(process);
+/* NUMA prototypes */ +extern BOOL WINAPI GetNumaHighestNodeNumber( ULONG *node ); +extern BOOL WINAPI GetNumaNodeProcessorMaskEx( USHORT node, GROUP_AFFINITY *mask ); +extern BOOL WINAPI GetNumaProximityNodeEx( ULONG proximity_id, USHORT *node ); + static const struct _KUSER_SHARED_DATA *user_shared_data = (struct _KUSER_SHARED_DATA *)0x7ffe0000;
typedef struct @@ -763,9 +768,16 @@ BOOL WINAPI GetFirmwareType(FIRMWARE_TYPE *type) */ BOOL WINAPI GetNumaNodeProcessorMask(UCHAR node, PULONGLONG mask) { - FIXME("(%c %p): stub\n", node, mask); - SetLastError(ERROR_CALL_NOT_IMPLEMENTED); - return FALSE; + GROUP_AFFINITY affinity; + TRACE("GetNumaNodeProcessorMask(node=%u, mask=%p)\n", node, mask); + if (!mask) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + if (!GetNumaNodeProcessorMaskEx(node, &affinity)) return FALSE; + *mask = affinity.Mask; + return TRUE; }
/********************************************************************** @@ -773,9 +785,25 @@ BOOL WINAPI GetNumaNodeProcessorMask(UCHAR node, PULONGLONG mask) */ BOOL WINAPI GetNumaAvailableMemoryNode(UCHAR node, PULONGLONG available_bytes) { - FIXME("(%c %p): stub\n", node, available_bytes); - SetLastError(ERROR_CALL_NOT_IMPLEMENTED); - return FALSE; + MEMORYSTATUSEX status; + ULONG highest; + TRACE("GetNumaAvailableMemoryNode(node=%u, avail=%p)\n", node, available_bytes); + if (!available_bytes) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + if (!GetNumaHighestNodeNumber(&highest)) return FALSE; + if (node > highest) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + status.dwLength = sizeof(status); + if (!GlobalMemoryStatusEx(&status)) return FALSE; + /* Approximation: distributes free memory among known nodes */ + *available_bytes = status.ullAvailPhys / (highest + 1); + return TRUE; }
/********************************************************************** @@ -783,9 +811,8 @@ BOOL WINAPI GetNumaAvailableMemoryNode(UCHAR node, PULONGLONG available_bytes) */ BOOL WINAPI GetNumaAvailableMemoryNodeEx(USHORT node, PULONGLONG available_bytes) { - FIXME("(%hu %p): stub\n", node, available_bytes); - SetLastError(ERROR_CALL_NOT_IMPLEMENTED); - return FALSE; + /* Same approximation as the 8-bit version */ + return GetNumaAvailableMemoryNode((UCHAR)node, available_bytes); }
/*********************************************************************** @@ -793,14 +820,41 @@ BOOL WINAPI GetNumaAvailableMemoryNodeEx(USHORT node, PULONGLONG available_bytes */ BOOL WINAPI GetNumaProcessorNode(UCHAR processor, PUCHAR node) { - TRACE("(%d, %p)\n", processor, node); - - if (processor < system_info.NumberOfProcessors) + ULONG highest, n; + GROUP_AFFINITY affinity; + TRACE("GetNumaProcessorNode(proc=%u, node=%p)\n", processor, node); + if (!node) { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + /* Basic check */ + if (processor >= system_info.NumberOfProcessors) + { + *node = 0xFF; + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + if (!GetNumaHighestNodeNumber(&highest)) + { + /* if failed -> consider single-node */ *node = 0; return TRUE; } - + if (highest == 0) + { + *node = 0; /* no-NUMA system (or fallback) */ + return TRUE; + } + for (n = 0; n <= highest; ++n) + { + if (GetNumaNodeProcessorMaskEx((USHORT)n, &affinity) && (affinity.Mask & ((ULONGLONG)1 << processor))) + { + *node = (UCHAR)n; + return TRUE; + } + } + /* Not found: invalid */ *node = 0xFF; SetLastError(ERROR_INVALID_PARAMETER); return FALSE; @@ -811,17 +865,63 @@ BOOL WINAPI GetNumaProcessorNode(UCHAR processor, PUCHAR node) */ BOOL WINAPI GetNumaProcessorNodeEx(PPROCESSOR_NUMBER processor, PUSHORT node_number) { - SetLastError(ERROR_CALL_NOT_IMPLEMENTED); + ULONG highest, n; + GROUP_AFFINITY affinity; + TRACE("GetNumaProcessorNodeEx(proc=%p, node_number=%p)\n", processor, node_number); + if (!processor || !node_number) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + if (processor->Group != 0) + { + /* Current implementation: single group supported */ + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + if (processor->Number >= system_info.NumberOfProcessors || processor->Number >= 8 * sizeof(affinity.Mask)) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + if (!GetNumaHighestNodeNumber(&highest)) return FALSE; + if (highest == 0) + { + *node_number = 0; /* no-NUMA system */ + return TRUE; + } + for (n = 0; n <= highest; ++n) + { + if (GetNumaNodeProcessorMaskEx((USHORT)n, &affinity) && (affinity.Mask & ((ULONGLONG)1 << processor->Number))) + { + *node_number = (USHORT)n; + return TRUE; + } + } + SetLastError(ERROR_INVALID_PARAMETER); return FALSE; }
/*********************************************************************** * GetNumaProximityNode (KERNEL32.@) */ -BOOL WINAPI GetNumaProximityNode(ULONG proximity_id, PUCHAR node_number) +BOOL WINAPI GetNumaProximityNode(ULONG proximity_id, PUCHAR node_number) { - SetLastError(ERROR_CALL_NOT_IMPLEMENTED); - return FALSE; + USHORT node16; + TRACE("GetNumaProximityNode(proximity=%lu, node_number=%p)\n", proximity_id, node_number); + if (!node_number) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + if (!GetNumaProximityNodeEx(proximity_id, &node16)) return FALSE; + if (node16 > 0xFF) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + *node_number = (UCHAR)node16; + return TRUE; }
/********************************************************************** diff --git a/dlls/kernel32/tests/process.c b/dlls/kernel32/tests/process.c index 7984f3d6fb4..4632ea309b5 100644 --- a/dlls/kernel32/tests/process.c +++ b/dlls/kernel32/tests/process.c @@ -78,7 +78,10 @@ static BOOL (WINAPI *pTerminateJobObject)(HANDLE job, UINT exit_code); static BOOL (WINAPI *pQueryInformationJobObject)(HANDLE job, JOBOBJECTINFOCLASS class, LPVOID info, DWORD len, LPDWORD ret_len); static BOOL (WINAPI *pSetInformationJobObject)(HANDLE job, JOBOBJECTINFOCLASS class, LPVOID info, DWORD len); static HANDLE (WINAPI *pCreateIoCompletionPort)(HANDLE file, HANDLE existing_port, ULONG_PTR key, DWORD threads); +static BOOL (WINAPI *pGetNumaHighestNodeNumber)(ULONG *); static BOOL (WINAPI *pGetNumaProcessorNode)(UCHAR, PUCHAR); +static BOOL (WINAPI *pGetNumaNodeProcessorMaskEx)(USHORT, GROUP_AFFINITY *); +static BOOL (WINAPI *pGetNumaProximityNodeEx)(ULONG, USHORT *); static NTSTATUS (WINAPI *pNtQueryInformationProcess)(HANDLE, PROCESSINFOCLASS, PVOID, ULONG, PULONG); static NTSTATUS (WINAPI *pNtQueryInformationThread)(HANDLE, THREADINFOCLASS, PVOID, ULONG, PULONG); static NTSTATUS (WINAPI *pNtQuerySystemInformationEx)(SYSTEM_INFORMATION_CLASS, void*, ULONG, void*, ULONG, ULONG*); @@ -268,7 +271,10 @@ static BOOL init(void) pQueryInformationJobObject = (void *)GetProcAddress(hkernel32, "QueryInformationJobObject"); pSetInformationJobObject = (void *)GetProcAddress(hkernel32, "SetInformationJobObject"); pCreateIoCompletionPort = (void *)GetProcAddress(hkernel32, "CreateIoCompletionPort"); + pGetNumaHighestNodeNumber = (void *)GetProcAddress(hkernel32, "GetNumaHighestNodeNumber"); pGetNumaProcessorNode = (void *)GetProcAddress(hkernel32, "GetNumaProcessorNode"); + pGetNumaNodeProcessorMaskEx = (void *)GetProcAddress(hkernel32, "GetNumaNodeProcessorMaskEx"); + pGetNumaProximityNodeEx = (void *)GetProcAddress(hkernel32, "GetNumaProximityNodeEx"); pWTSGetActiveConsoleSessionId = (void *)GetProcAddress(hkernel32, "WTSGetActiveConsoleSessionId"); pCreateToolhelp32Snapshot = (void *)GetProcAddress(hkernel32, "CreateToolhelp32Snapshot"); pProcess32First = (void *)GetProcAddress(hkernel32, "Process32First"); @@ -4087,6 +4093,62 @@ static void test_GetNumaProcessorNode(void) } }
+static void test_NumaBasic(void) +{ + ULONG highest_node = 0xdeadbeef; + BOOL ret; + GROUP_AFFINITY affinity; + USHORT node; + + if (!pGetNumaHighestNodeNumber && !pGetNumaNodeProcessorMaskEx && !pGetNumaProximityNodeEx) + { + win_skip("NUMA extended functions are missing\n"); + return; + } + + if (pGetNumaHighestNodeNumber) + { + SetLastError(0xdeadbeef); + ret = pGetNumaHighestNodeNumber(&highest_node); + ok(ret || GetLastError() == ERROR_INVALID_FUNCTION, + "GetNumaHighestNodeNumber failed: %lu\n", GetLastError()); + if (ret) + trace("Highest NUMA node: %lu\n", highest_node); + } + + if (pGetNumaNodeProcessorMaskEx) + { + memset(&affinity, 0, sizeof(affinity)); + SetLastError(0xdeadbeef); + ret = pGetNumaNodeProcessorMaskEx(0, &affinity); + ok(ret || GetLastError() == ERROR_INVALID_PARAMETER || GetLastError() == ERROR_INVALID_FUNCTION, + "GetNumaNodeProcessorMaskEx(0) unexpected failure %lu\n", GetLastError()); + if (ret) + trace("Node0: Group=%u Mask=%I64x\n", affinity.Group, (unsigned long long)affinity.Mask); + + memset(&affinity, 0, sizeof(affinity)); + ret = pGetNumaNodeProcessorMaskEx(1, &affinity); + if (ret) + trace("Node1: Group=%u Mask=%I64x\n", affinity.Group, (unsigned long long)affinity.Mask); + } + + if (pGetNumaProximityNodeEx) + { + node = 0xdead; + SetLastError(0xdeadbeef); + ret = pGetNumaProximityNodeEx(0, &node); + ok(ret || GetLastError() == ERROR_INVALID_PARAMETER || GetLastError() == ERROR_INVALID_FUNCTION, + "GetNumaProximityNodeEx(0) failed %lu\n", GetLastError()); + if (ret) + trace("Proximity 0 -> node %u\n", node); + + node = 0xdead; + ret = pGetNumaProximityNodeEx(1, &node); + if (ret) + trace("Proximity 1 -> node %u\n", node); + } +} + static void test_session_info(void) { DWORD session_id, active_session; @@ -5684,6 +5746,7 @@ START_TEST(process) test_DuplicateHandle(); test_StdHandleInheritance(); test_GetNumaProcessorNode(); + test_NumaBasic(); test_session_info(); test_GetLogicalProcessorInformationEx(); test_GetSystemCpuSetInformation(); diff --git a/dlls/kernelbase/memory.c b/dlls/kernelbase/memory.c index c0a3d65d341..aa83ce85b24 100644 --- a/dlls/kernelbase/memory.c +++ b/dlls/kernelbase/memory.c @@ -40,7 +40,7 @@ WINE_DEFAULT_DEBUG_CHANNEL(heap); WINE_DECLARE_DEBUG_CHANNEL(virtual); WINE_DECLARE_DEBUG_CHANNEL(globalmem); - +WINE_DECLARE_DEBUG_CHANNEL(numa);
static CRITICAL_SECTION memstatus_section; static CRITICAL_SECTION_DEBUG critsect_debug = @@ -1476,6 +1476,134 @@ BOOL WINAPI DECLSPEC_HOTPATCH MapUserPhysicalPages( void *addr, ULONG_PTR page_c * NUMA functions ***********************************************************************/
+/* NUMA support using Windows logical processor information */ +static ULONG numa_highest_node_number = 0; +static BOOL numa_initialized = FALSE; +static RTL_CRITICAL_SECTION numa_cs; +static RTL_CRITICAL_SECTION_DEBUG numa_cs_debug = +{ + 0, 0, &numa_cs, + { &numa_cs_debug.ProcessLocksList, &numa_cs_debug.ProcessLocksList }, + 0, 0, { (DWORD_PTR)(__FILE__ ": numa_cs") } +}; +static RTL_CRITICAL_SECTION numa_cs = { &numa_cs_debug, -1, 0, 0, 0, 0 }; + +/* Structure to hold CPU mask for each NUMA node */ +static struct numa_node_info { + ULONG_PTR cpu_mask; + BOOL valid; +} numa_nodes[64]; /* Windows supports up to 64 NUMA nodes */ + +/* NUMA runtime tweak flags */ +static int numa_env_checked = 0; +static BOOL numa_force_single = FALSE; /* WINE_NUMA_FORCE_SINGLE=1 -> force single node */ +static BOOL numa_contig = FALSE; /* WINE_NUMA_CONTIG=1 -> remap contiguous masks */ + +static unsigned int popcount_ulongptr( ULONG_PTR v ) +{ +#if defined(__GNUC__) + return __builtin_popcountll( (unsigned long long)v ); +#else + unsigned int c = 0; while (v) { v &= (v-1); c++; } return c; +#endif +} + +static void initialize_numa_info(void) +{ + SYSTEM_LOGICAL_PROCESSOR_INFORMATION *info = NULL; + DWORD len = 0, i; + ULONG max_node = 0; + + if (numa_initialized) return; + + if (!numa_env_checked) + { + char buffer[256]; + numa_env_checked = 1; + if (GetEnvironmentVariableA("WINE_NUMA_FORCE_SINGLE", buffer, sizeof(buffer))) numa_force_single = TRUE; + if (GetEnvironmentVariableA("WINE_NUMA_CONTIG", buffer, sizeof(buffer))) numa_contig = TRUE; + TRACE_(numa)("NUMA env: FORCE_SINGLE=%d CONTIG=%d\n", numa_force_single, numa_contig); + } + + memset(numa_nodes, 0, sizeof(numa_nodes)); + + if (!numa_force_single) + { + /* Query logical processor information to get NUMA topology */ + if (!GetLogicalProcessorInformation(NULL, &len) && GetLastError() == ERROR_INSUFFICIENT_BUFFER) + { + info = HeapAlloc(GetProcessHeap(), 0, len); + if (info && GetLogicalProcessorInformation(info, &len)) + { + DWORD count = len / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); + for (i = 0; i < count; i++) + { + if (info[i].Relationship == RelationNumaNode) + { + ULONG node_number = info[i].NumaNode.NodeNumber; + if (node_number < 64) + { + numa_nodes[node_number].cpu_mask = info[i].ProcessorMask; + numa_nodes[node_number].valid = TRUE; + if (node_number > max_node) max_node = node_number; + TRACE_(numa)("NUMA raw: node=%lu mask=0x%llx\n", (unsigned long)node_number, (unsigned long long)info[i].ProcessorMask); + } + } + } + } + HeapFree(GetProcessHeap(), 0, info); + } + } + /* Fallback single node if requested or none discovered */ + if (numa_force_single || (max_node == 0 && !numa_nodes[0].valid)) + { + SYSTEM_INFO si; GetSystemInfo(&si); + numa_nodes[0].cpu_mask = (si.dwNumberOfProcessors >= (sizeof(ULONG_PTR)*8)) ? ~(ULONG_PTR)0 : ((1ULL << si.dwNumberOfProcessors) - 1); + numa_nodes[0].valid = TRUE; max_node = 0; + TRACE_(numa)("NUMA fallback single: mask=0x%llx procs=%u\n", (unsigned long long)numa_nodes[0].cpu_mask, (unsigned)popcount_ulongptr(numa_nodes[0].cpu_mask)); + } + else if (numa_contig && max_node > 0) + { + /* Remap each node to a contiguous block of bits in ascending order */ + ULONG_PTR new_masks[64] = {0}; unsigned int bit_offset = 0; BOOL ok = TRUE; + for (i = 0; i <= max_node; i++) if (numa_nodes[i].valid) + { + unsigned int cnt = popcount_ulongptr(numa_nodes[i].cpu_mask); + if (!cnt || bit_offset + cnt > sizeof(ULONG_PTR)*8) { ok = FALSE; break; } + new_masks[i] = (((ULONG_PTR)1 << cnt) - 1) << bit_offset; + TRACE_(numa)("NUMA remap: node=%lu raw=0x%llx cnt=%u -> contig=0x%llx base=%u\n", + (unsigned long)i, (unsigned long long)numa_nodes[i].cpu_mask, cnt, + (unsigned long long)new_masks[i], bit_offset); + bit_offset += cnt; + } + if (ok) + { + for (i = 0; i <= max_node; i++) if (numa_nodes[i].valid && new_masks[i]) numa_nodes[i].cpu_mask = new_masks[i]; + } + else TRACE_(numa)("NUMA remap: aborted (ok=%d)\n", ok); + } + + numa_highest_node_number = max_node; + numa_initialized = TRUE; + TRACE_(numa)("NUMA init done: highest_node=%lu\n", (unsigned long)numa_highest_node_number); +} + +static BOOL get_numa_node_cpu_mask(UCHAR node, GROUP_AFFINITY *mask) +{ + RtlEnterCriticalSection(&numa_cs); + if (!numa_initialized) initialize_numa_info(); + if (node >= 64 || !numa_nodes[node].valid) + { + RtlLeaveCriticalSection(&numa_cs); SetLastError(ERROR_INVALID_PARAMETER); return FALSE; + } + TRACE_(numa)("get_numa_node_cpu_mask: node=%u mask=0x%llx\n", node, (unsigned long long)numa_nodes[node].cpu_mask); + RtlLeaveCriticalSection(&numa_cs); + memset(mask, 0, sizeof(*mask)); + mask->Group = 0; /* Single processor group for now */ + mask->Mask = numa_nodes[node].cpu_mask; + TRACE_(numa)("get_numa_node_cpu_mask: returning Group=%hu Mask=0x%llx\n", mask->Group, (unsigned long long)mask->Mask); + return TRUE; +}
/*********************************************************************** * AllocateUserPhysicalPagesNuma (kernelbase.@) @@ -1580,10 +1708,19 @@ BOOL WINAPI SetProcessDefaultCpuSets(HANDLE process, const ULONG *cpu_set_ids, U /********************************************************************** * GetNumaHighestNodeNumber (kernelbase.@) */ -BOOL WINAPI DECLSPEC_HOTPATCH GetNumaHighestNodeNumber( ULONG *node ) -{ - FIXME( "semi-stub: %p\n", node ); - *node = 0; +BOOL WINAPI GetNumaHighestNodeNumber( ULONG *node ) +{ + TRACE("(%p)\n", node); + if (!node) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + RtlEnterCriticalSection(&numa_cs); + if (!numa_initialized) + initialize_numa_info(); + *node = numa_highest_node_number; + RtlLeaveCriticalSection(&numa_cs); return TRUE; }
@@ -1591,20 +1728,41 @@ BOOL WINAPI DECLSPEC_HOTPATCH GetNumaHighestNodeNumber( ULONG *node ) /********************************************************************** * GetNumaNodeProcessorMaskEx (kernelbase.@) */ -BOOL WINAPI DECLSPEC_HOTPATCH GetNumaNodeProcessorMaskEx( USHORT node, GROUP_AFFINITY *mask ) +BOOL WINAPI GetNumaNodeProcessorMaskEx( USHORT node, GROUP_AFFINITY *mask ) { - FIXME( "stub: %hu %p\n", node, mask ); - SetLastError( ERROR_CALL_NOT_IMPLEMENTED ); - return FALSE; + TRACE("(%hu, %p)\n", node, mask); + if (!mask) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + return get_numa_node_cpu_mask((UCHAR)node, mask); }
/*********************************************************************** * GetNumaProximityNodeEx (kernelbase.@) */ -BOOL WINAPI DECLSPEC_HOTPATCH GetNumaProximityNodeEx( ULONG proximity_id, USHORT *node ) +BOOL WINAPI GetNumaProximityNodeEx( ULONG proximity_id, USHORT *node ) { - SetLastError( ERROR_CALL_NOT_IMPLEMENTED ); + TRACE("(%lu, %p)\n", proximity_id, node); + if (!node) + { + SetLastError(ERROR_INVALID_PARAMETER); + return FALSE; + } + RtlEnterCriticalSection(&numa_cs); + if (!numa_initialized) + initialize_numa_info(); + /* For simplicity, assume proximity_id maps directly to node number */ + if (proximity_id <= numa_highest_node_number) + { + *node = (USHORT)proximity_id; + RtlLeaveCriticalSection(&numa_cs); + return TRUE; + } + RtlLeaveCriticalSection(&numa_cs); + SetLastError(ERROR_INVALID_PARAMETER); return FALSE; }
diff --git a/dlls/kernelbase/thread.c b/dlls/kernelbase/thread.c index dbde4dcfaf3..e69ce948af1 100644 --- a/dlls/kernelbase/thread.c +++ b/dlls/kernelbase/thread.c @@ -498,10 +498,33 @@ BOOL WINAPI SetThreadErrorMode( DWORD mode, DWORD *old ) /*********************************************************************** * SetThreadGroupAffinity (kernelbase.@) */ -BOOL WINAPI DECLSPEC_HOTPATCH SetThreadGroupAffinity( HANDLE thread, const GROUP_AFFINITY *new, +BOOL WINAPI SetThreadGroupAffinity( HANDLE thread, const GROUP_AFFINITY *new, GROUP_AFFINITY *old ) { + GROUP_AFFINITY local_old; if (old && !GetThreadGroupAffinity( thread, old )) return FALSE; + /* Debug: log requested group/mask and previous affinity if available */ + if (new) + { + TRACE("SetThreadGroupAffinity: thread=%p new->Group=%hu new->Mask=0x%llx\n", + thread, new->Group, (unsigned long long)new->Mask); + } + else + { + TRACE("SetThreadGroupAffinity: thread=%p new=NULL\n", thread); + } + if (old) + { + TRACE("SetThreadGroupAffinity: previous old->Group=%hu old->Mask=0x%llx\n", + old->Group, (unsigned long long)old->Mask); + } + else + { + /* If caller didn't supply 'old', fetch it locally for logging */ + if (GetThreadGroupAffinity(thread, &local_old)) + TRACE("SetThreadGroupAffinity: fetched previous Group=%hu Mask=0x%llx\n", + local_old.Group, (unsigned long long)local_old.Mask); + } return set_ntstatus( NtSetInformationThread( thread, ThreadGroupInformation, new, sizeof(*new) )); }
From: wasertech danny@waser.tech
--- dlls/kernelbase/memory.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/dlls/kernelbase/memory.c b/dlls/kernelbase/memory.c index aa83ce85b24..a76d00df111 100644 --- a/dlls/kernelbase/memory.c +++ b/dlls/kernelbase/memory.c @@ -1557,8 +1557,16 @@ static void initialize_numa_info(void) /* Fallback single node if requested or none discovered */ if (numa_force_single || (max_node == 0 && !numa_nodes[0].valid)) { - SYSTEM_INFO si; GetSystemInfo(&si); - numa_nodes[0].cpu_mask = (si.dwNumberOfProcessors >= (sizeof(ULONG_PTR)*8)) ? ~(ULONG_PTR)0 : ((1ULL << si.dwNumberOfProcessors) - 1); + SYSTEM_INFO si; unsigned int cpu_count; + memset(&si, 0, sizeof(si)); /* explicit init to silence static analyzers */ + GetSystemInfo(&si); + cpu_count = si.dwNumberOfProcessors; + if (cpu_count >= sizeof(ULONG_PTR) * 8) + numa_nodes[0].cpu_mask = ~(ULONG_PTR)0; + else if (cpu_count) + numa_nodes[0].cpu_mask = (((ULONG_PTR)1 << cpu_count) - 1); + else /* pathological case: assume at least CPU0 */ + numa_nodes[0].cpu_mask = 1; numa_nodes[0].valid = TRUE; max_node = 0; TRACE_(numa)("NUMA fallback single: mask=0x%llx procs=%u\n", (unsigned long long)numa_nodes[0].cpu_mask, (unsigned)popcount_ulongptr(numa_nodes[0].cpu_mask)); }
On Sun Sep 14 15:50:29 2025 +0000, Danny Waser wrote:
changed this line in [version 2 of the diff](/wine/wine/-/merge_requests/8970/diffs?diff_id=209002&start_sha=e1810ed2b6dc3e6c60462cbc8e2600c43400e5e0#5ae7449abf26aa1926b9b677633a99ed950907af_1561_1560)
Fixed with commit e1117f34e276919b9ef2efe4b0a180fdf9f3296b
# Quick Analysis: - The test waits for the `WaitCommEvent` (overlapped) to signal in less than 1500 ms on EV_TXEMPTY after an asynchronous `WriteFile`. - The Wine implementation: `WaitCommEvent` = `DeviceIoControl(IOCTL_SERIAL_WAIT_ON_MASK)` → path serial.c → `wait_on()` launches an async poll via `async_wait_proc`. - EV_TXEMPTY is only generated if (in `check_events`): `( (!old->temt || pending_write) && new->temt )`, i.e. an edge (transition to void) OR if a write is still marked pending (`pending_write`). - `pending_write` is fetched via `get_wait_mask(..., &pending_write)` only if the mask includes EV_TXEMPTY. If there is a time lag where the write completes before the first polling iteration and `old->temt` is already true, no edge is seen → no event → timeout test.
## Why my NUMA change may break: - The poll loop depends on asynchronous wakeups (`STATUS_ALERTED` then recheck). Inter-CPU latency or reordering can cause: 1. The buffer to become empty (TEMT) before the initial `old->temt` state has been captured (it is captured in `get_irq_info` just before arming the async). If already empty, no transition. 2. The `pending_write` flag may be cleared early if the write completion arrives before the first evaluation in the asynchronous thread. - With more CPUs / modified affinities (NUMA), the race window increases → more timeouts.
## Technical points: - The code only triggers EV_TXEMPTY on transition; an already empty state initially produces nothing (while Windows seems to allow an immediate return if the TX is empty at the time of the call). - The test comment already mentions a weakness: "calling SetCommMask after WriteFile leads to WaitCommEvent failures." Here, we do the opposite (SetCommMask before WriteFile) to avoid this case, but the race remains possible.
## Provisional conclusion: Probable divergence between Wine and Windows: Windows documents that if the event has already occurred, `WaitCommEvent` (overlapped) completes immediately (no edge necessarily required). Wine requires an edge; on fast machines / different latencies (NUMA), the transmission (small buffer) empties before the async is activated → no event.
## Recommendation: 1. Confirm on Windows: Calling `WaitCommEvent` with EV_TXEMPTY right after configuring the port when the buffer is already empty should immediately return with EV_TXEMPTY (verify). 2. Adapt Wine: if the initial `check_events` (in the `STATUS_ALERTED` branch) finds nothing, should you redo a `get_irq_info` before arming to capture the transition? Or, simpler: if EV_TXEMPTY is in the mask and `old->temt` is already true at the time of initial arming, report immediately. 3. Alternative test: Relax the assertion on line 859 to accept a timeout and then the recovery path (the code already provides for a fallback) — but this masks a real divergence.
## Diagnostics to try before patching: - Enable traces: export `WINEDEBUG=+serial,+comm` to see `old->temt` / `new->temt`. - Insert a `Sleep(1)` just before `WaitCommEvent` to see if it makes the race more or less frequent (timing diagnostics). - Force a longer write (increase tbuf size or further decrease baud) to create a visible transition.
## Decision: You should fix the implementation (front-only) to match Windows semantics (level). The test is legitimate: it waits for the overlapped event to be reported in <1.5s (and almost immediately). Adapting the test would weaken the coverage.
## Patch suggestion (idea): In `wait_on()` after `get_irq_info(fd, &commio->irq_info)` and before calling the server: - If `(commio->evtmask & EV_TXEMPTY)` and `commio->irq_info.temt` is true → immediately complete with EV_TXEMPTY.
Or in `async_wait_proc`, first iteration: if `(commio->evtmask & EV_TXEMPTY)` and `!pending_write` and `commio->irq_info.temt` is already true, return the event.
## Edge cases: - Don't generate in a loop: remember that EV_TXEMPTY has already been delivered until a new write is pending (can use `pending_write` or store a flag). - Windows: EV_TXEMPTY is retriggered after each fill/empty cycle. Therefore, reset the flag if `pending_write` becomes true again.
## NUMA Risk: - Very low; additional checks are local.
## Next steps: - Add a `delivered_txempty` field to `async_commio`, handle it in `wait_on`/`async_wait_proc`. - Or, more minimally: treat the initial state as an event by making a pseudo-transition: in `check`
What is this nonsense about serial ports?
Why are you using AI?
What application needs this?
What is this nonsense about serial ports?
My attempt to explain why I have 4 tests unrelated to my changes that fail suddently.
Why are you using AI?
To help me diagnose and fix complex behaviors like this. Are you biased?
What application needs this?
NUMA topology? I could think of a few examples :smile:
Can you realistically get more than one NUMA node in a regular single socket machine? Regardless of how useful it is, it's probably should be done in ntdll.
I have 2 on my 2nd gen threadreaper. I have done it where it was already. A lot of apps will crash with the current stubs so it isn't just nice to have, it is required for all apps that use MSVC to manage thread pool memory. @nsivov You can move it around, where you want afterwards.
If you are wondering, this relates to [bug#40500](https://bugs.winehq.org/show_bug.cgi?id=40500) and particularly [bug#58584](https://bugs.winehq.org/show_bug.cgi?id=58584) in my case.
Bernhard Kölbl (@besentv) commented about dlls/kernelbase/memory.c:
/**********************************************************************
GetNumaNodeProcessorMaskEx (kernelbase.@)
*/ -BOOL WINAPI DECLSPEC_HOTPATCH GetNumaNodeProcessorMaskEx( USHORT node, GROUP_AFFINITY *mask ) +BOOL WINAPI GetNumaNodeProcessorMaskEx( USHORT node, GROUP_AFFINITY *mask )
Why are you removing hot patching on this one?
Bernhard Kölbl (@besentv) commented about dlls/kernelbase/memory.c:
- return FALSE;
- TRACE("(%hu, %p)\n", node, mask);
- if (!mask)
- {
SetLastError(ERROR_INVALID_PARAMETER);
return FALSE;
- }
- return get_numa_node_cpu_mask((UCHAR)node, mask);
}
/***********************************************************************
GetNumaProximityNodeEx (kernelbase.@)
*/ -BOOL WINAPI DECLSPEC_HOTPATCH GetNumaProximityNodeEx( ULONG proximity_id, USHORT *node ) +BOOL WINAPI GetNumaProximityNodeEx( ULONG proximity_id, USHORT *node )
Same here
Bernhard Kölbl (@besentv) commented about dlls/kernelbase/thread.c:
/***********************************************************************
SetThreadGroupAffinity (kernelbase.@)
*/ -BOOL WINAPI DECLSPEC_HOTPATCH SetThreadGroupAffinity( HANDLE thread, const GROUP_AFFINITY *new, +BOOL WINAPI SetThreadGroupAffinity( HANDLE thread, const GROUP_AFFINITY *new,
Same here
Bernhard Kölbl (@besentv) commented about dlls/kernelbase/memory.c:
numa_nodes[0].cpu_mask = 1;
numa_nodes[0].valid = TRUE; max_node = 0;
TRACE_(numa)("NUMA fallback single: mask=0x%llx procs=%u\n", (unsigned long long)numa_nodes[0].cpu_mask, (unsigned)popcount_ulongptr(numa_nodes[0].cpu_mask));
- }
- else if (numa_contig && max_node > 0)
- {
/* Remap each node to a contiguous block of bits in ascending order */
ULONG_PTR new_masks[64] = {0}; unsigned int bit_offset = 0; BOOL ok = TRUE;
for (i = 0; i <= max_node; i++) if (numa_nodes[i].valid)
{
unsigned int cnt = popcount_ulongptr(numa_nodes[i].cpu_mask);
if (!cnt || bit_offset + cnt > sizeof(ULONG_PTR)*8) { ok = FALSE; break; }
new_masks[i] = (((ULONG_PTR)1 << cnt) - 1) << bit_offset;
TRACE_(numa)("NUMA remap: node=%lu raw=0x%llx cnt=%u -> contig=0x%llx base=%u\n",
(unsigned long)i, (unsigned long long)numa_nodes[i].cpu_mask, cnt,
(unsigned long long)new_masks[i], bit_offset);
Why all these casts, instead of using the correct format?
Bernhard Kölbl (@besentv) commented about dlls/kernelbase/thread.c:
-BOOL WINAPI DECLSPEC_HOTPATCH SetThreadGroupAffinity( HANDLE thread, const GROUP_AFFINITY *new, +BOOL WINAPI SetThreadGroupAffinity( HANDLE thread, const GROUP_AFFINITY *new, GROUP_AFFINITY *old ) {
- GROUP_AFFINITY local_old; if (old && !GetThreadGroupAffinity( thread, old )) return FALSE;
- /* Debug: log requested group/mask and previous affinity if available */
- if (new)
- {
TRACE("SetThreadGroupAffinity: thread=%p new->Group=%hu new->Mask=0x%llx\n",
thread, new->Group, (unsigned long long)new->Mask);
- }
- else
- {
TRACE("SetThreadGroupAffinity: thread=%p new=NULL\n", thread);
- }
This if else seems unnecessary
I'd recommend redoing this for ntdll without LLMs, as it will give you a better insight on the code.
Finally some useful actionable feedback! Thank you @besentv. I'll bring back the `DECLSPEC_HOTPATCH` where I removed it, replace the `%llx` + casts with `%Ix` (casting to `ULONG_PTR`) for pointer-sized masks, copy the cpu mask while holding the NUMA lock, and tidy the contiguous-remap finalization. I'll also clean up the thread-affinity debug traces by removing the unnecessary if/else statements to factor the common finalization/trace and avoid redundant conditions. I will take this opportunity to move those changes in ntdll since everyone is so adamant on having them there (for good reasons).
This merge request was closed by Danny Waser.
What is this nonsense about serial ports?
My attempt to explain why I have 4 tests unrelated to my changes that fail suddently.
Okay, that's not clear at all.
Why are you using AI?
To help me diagnose and fix complex behaviors like this. Are you biased?
Yes, probably. AI generated text is difficult to read and often full of statements that are blatantly or subtly wrong. This is no exception.
But mainly I ask why you're using it when things would be an awful lot easier if you didn't. You don't need 600 words to explain things that are obvious from reading the patch. You don't need another 700 words to say what should just be "the kernel32:comm tests are failing in a way unrelated to this patch". Asking anyone to read all that garbage is wasting their time.
And I suspect that using AI to generate the patch contents, if you did, is not going to be better than writing yourself. Certainly not better than consulting existing Wine developers for help, which would have prevented a lot of mistakes I can see, not just now but in the future, and is generally better for collaboration.
More pressingly, according to my understanding, AI are trained on code with a license incompatible to Wine, and I don't believe any output can be used safely in our project.
What application needs this?
NUMA topology? I could think of a few examples :smile:
What are they? We don't accept new features as a rule unless something needs them.
@nsivov You can move it around, where you want afterwards.
That's not how we do things around here I'm afraid; we don't commit patches with known insufficiencies as a general rule.
Some very great concern you raised @zfigura. Let's address them shall we?
Yes, probably. AI generated text is difficult to read and often full of statements that are blatantly or subtly wrong. This is no exception.
You might find it difficult to read but I don't. It helps me articulate what is happening. Does it makes mistakes? A lot, of course. I do too. Everyone does. That's how we grow and learn. Still better than staying in the dark.
But mainly I ask why you're using it when things would be an awful lot easier if you didn't.
I wouldn't have functional reporting of my system's topology under wine already otherwise.
You don't need 600 words to explain things that are obvious from reading the patch. You don't need another 700 words to say what should just be "the kernel32:comm tests are failing in a way unrelated to this patch". Asking anyone to read all that garbage is wasting their time.
You might not need the words because I made the changes easy to read by keeping them where there are. Some people, on the other hand might need the context to understand the changes and their implication on the codebase. Nobody asked you to read anything that you don't find interesting and nobody forced you to subscribe to the RSS feed.
And I suspect that using AI to generate the patch contents, if you did, is not going to be better than writing yourself.
Not sure about that since I don't write or read C derivatives that often. I usually prefer to hangout as far from the metal as possible. The higher the level of abstraction, the better. Talking to silicon is boring. Necessary, but boring. I can appreciate different perspectives on this but it's unlikely we agree. I don't mind.
Certainly not better than consulting existing Wine developers for help, ...
If I had to bother someone every-time I had an issue... I can understand and solve my problems myself, thank you. I learn a lot but making mistakes. That's what I do. Stupid stuff like this in an attempt to satiate my ever increasing curiosity.
which would have prevented a lot of mistakes I can see,
I am still waiting for your review btw.
not just now but in the future, and is generally better for collaboration.
I spent 2 months, and went out of my way to: - describe the issue, - marked it as dup when I found it was a very well known issue, - made the effort to understand it better and - actually proposed a functional fix for it.
Is it perfect? No! Of course not. How could it be? I don't have the hardware to implement and test it with more than 2 nodes in 1 group. Nobody rational will do serious concurrent work under wine anyway! The idea here is just to trick whatever app asks for it with a convincing enough reply for it to accept it. I have enough memory and plenty room to spear to afford such dumb approximation. I understand not everyone might but it doesn't make it less useful.
https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/kernelbase/memory.c?r...
This is just plain wrong on my system. The correct value I expect is 1 because I have 2 nodes. The're physical... I cannot change it with software. Yes I can disable the NUMA module on my kernel but that also doesn't work because that not how my CPU is architectured.
https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/kernelbase/memory.c?r...
I don't even have words to describe how wrong this behavior is. My compatibility layer is basically spitting in my face when I run an app that asks a reasonable question; "What is the layout of this node?".
I've come only but in good faith to raise awareness about this fact and was only met with mockery and unrelated queries. Talk about collaboration. I opened this submission out of courtesy because I care about this project I've been using since I'm 6 and want it to support modern CPU/memory architectures.
More pressingly, according to my understanding, AI are trained on code with a license incompatible to Wine, and I don't believe any output can be used safely in our project.
That's actually a very common misconception but I'm no lawyer myself. There is just no legal precedent. It's subject to interpretation and anyone's guess is as worthless as the next one. It up to courts to decide eventually. My take is that if I use a tool to produce something, unless that tools strictly forbids me or anyone to redistribute this thing that I made with it freely (in which case it not a useful tool), there is nothing preventing me or anyone to redistribute it with an open source licence. Again as worthless as anyone's guess so I get why anyone might be careful.
What application needs this?
NUMA topology? I could think of a few examples :smile:
What are they? We don't accept new features as a rule unless something needs them.
I don't think you understand what the [Non-Uniform Memory Access](https://en.wikipedia.org/wiki/Non-uniform_memory_access) (NUMA) architecture is. It's not about supporting a new feature... it's about fixing the stubs to report correctly when MSVC or anything else asks for them.
@nsivov You can move it around, where you want afterwards.
That's not how we do things around here I'm afraid; we don't commit patches with known insufficiencies as a general rule.
And that's exactly why, still to this day, it has not yet been addressed after so many years. Nobody is asking wine to support a fully compilent NUMA implementation... but the current one is as bad (if worse) than the one I proposed.
... I'll bring back the `DECLSPEC_HOTPATCH` where I removed it, replace the `%llx` + casts with `%Ix` (casting to `ULONG_PTR`) for pointer-sized masks, copy the cpu mask while holding the NUMA lock, and tidy the contiguous-remap finalization. I'll also clean up the thread-affinity debug traces by removing the unnecessary if/else statements to factor the common finalization/trace and avoid redundant conditions. I will take this opportunity to move those changes in ntdll since everyone is so adamant on having them there (for good reasons).
I'll do that when and if I find the time, purely to satiate my curiosity. Understand that I am in no particular rush to do so just to get laughed at again. I have better thing to do with my time than to argue about those changes and their necessity. I've dedicated enough time on this matter and since I have no issue about this on my system anymore, I don't really care if those changes become mainstream or not. Just don't say I didn't tried to solve this issue. I did and have been using it for more than a month already.
Some very great concern you raised @zfigura. Let's address them shall we?
Yes, probably. AI generated text is difficult to read and often full of statements that are blatantly or subtly wrong. This is no exception.
You might find it difficult to read but I don't. It helps me articulate what is happening.
That's great, but if you're posting explanations here that more or less implies you expect us to read them. If you want your patches accepted you should gear explanations, or lack thereof, toward the reviewer.
Does it makes mistakes? A lot, of course. I do too. Everyone does. That's how we grow and learn. Still better than staying in the dark.
Except AI doesn't learn from its mistakes. Also, quite frankly, it makes a lot more mistakes, with a lot more pretend confidence, than anyone else, apparently because it's not capable of recognizing when it doesn't know something.
But mainly I ask why you're using it when things would be an awful lot easier if you didn't.
I wouldn't have functional reporting of my system's topology under wine already otherwise.
And you still don't? This patch series is just a hack, in the AI's own words, and it's clearly not getting the data from anywhere real.
You don't need 600 words to explain things that are obvious from reading the patch. You don't need another 700 words to say what should just be "the kernel32:comm tests are failing in a way unrelated to this patch". Asking anyone to read all that garbage is wasting their time.
You might not need the words because I made the changes easy to read by keeping them where there are. Some people, on the other hand might need the context to understand the changes and their implication on the codebase. Nobody asked you to read anything that you don't find interesting and nobody forced you to subscribe to the RSS feed.
Any comments you're posting here, you are implicitly asking us to read. The assumption is that it's useful information.
And I suspect that using AI to generate the patch contents, if you did, is not going to be better than writing yourself.
Not sure about that since I don't write or read C derivatives that often. I usually prefer to hangout as far from the metal as possible. The higher the level of abstraction, the better. Talking to silicon is boring. Necessary, but boring. I can appreciate different perspectives on this but it's unlikely we agree. I don't mind.
I don't see what this has to do with anything. The same functionality is being implemented no matter how you do it. I'm just saying you will have a better time if you write it yourself. I've seen many, many first-time contributors submit many, many first-time patches, and this AI-written garbage is one of the worst ones I've ever seen. I fully believe that you can do better without using it.
Certainly not better than consulting existing Wine developers for help, ...
If I had to bother someone every-time I had an issue...
We always welcome questions and will attempt to provide advice and guidance, especially to those interested in writing their own patches.
I can understand and solve my problems myself, thank you. I learn a lot by making mistakes. That's what I do. Stupid stuff like this in an attempt to satiate my ever increasing curiosity.
which would have prevented a lot of mistakes I can see,
I am still waiting for your review btw.
I don't think there's a point reviewing LLM-generated code, not when it has intractable licensing problems. Incremental improvement can't fix that. It needs to be written from scratch by a human (or by an LLM unencumbered by licensing problems, but I'm not convinced those can feasibly exist at this point) before it can be improved.
With that said, so that you don't make the same mistakes when you do rewrite it:
* don't use environment variables to modify Wine's behaviour; * split up changes into small pieces, usually one function at a time; * implement no more behaviour than is absolutely necessary (don't stub every NUMA function, just the ones the application you care about needs); * try to follow the surrounding code style, don't change it, unless a reviewer specifically requests it.
I've come only but in good faith to raise awareness about this fact and was only met with mockery and unrelated queries. Talk about collaboration. I opened this submission out of courtesy because I care about this project I've been using since I'm 6 and want it to support modern CPU/memory architectures.
I apologize that you've felt mocked; I don't know what caused you to feel that way, but it certainly wasn't my intention. I've only attempted to ask questions relevant to development and to this patch set.
More pressingly, according to my understanding, AI are trained on code with a license incompatible to Wine, and I don't believe any output can be used safely in our project.
That's actually a very common misconception but I'm no lawyer myself. There is just no legal precedent. It's subject to interpretation and anyone's guess is as worthless as the next one. It up to courts to decide eventually. My take is that if I use a tool to produce something, unless that tools strictly forbids me or anyone to redistribute this thing that I made with it freely (in which case it not a useful tool), there is nothing preventing me or anyone to redistribute it with an open source licence. Again as worthless as anyone's guess so I get why anyone might be careful.
Which part is a misconception exactly? It's well-established that AI is trained on code with an incompatible license, and you also say that there is no legal precedent and no way to know if its output is safe, so "I don't believe it's safe" is hardly a misconception?
It's been established that AI is capable of producing what's clearly a derivative work (or even an exact reproduction) of other code when asked. It's also clear that AI is not capable of reliably determining, well, anything, so I don't think it can be trusted to judge whether its own output is a derivative work.
What application needs this?
NUMA topology? I could think of a few examples :smile:
What are they? We don't accept new features as a rule unless something needs them.
I don't think you understand what the [Non-Uniform Memory Access](https://en.wikipedia.org/wiki/Non-uniform_memory_access) (NUMA) architecture is. It's not about supporting a new feature... it's about fixing the stubs to report correctly when MSVC or anything else asks for them.
These functions were a stub, and are being turned into a non-stub. That's a new feature.
Are there applications that need this or aren't there? You seem to mention a bug report; can you please link to it?
Frankly, at this point, I'm confused at what your intention is. You say "it's about fixing the stubs to report correctly", but you are submitting a "dumb approximation" to "trick whatever app asks for it with a convincing enough reply". Neither approach is necessarily wrong—we do the latter all the time—but you seem to be simultaneously arguing for both?
@nsivov You can move it around, where you want afterwards.
That's not how we do things around here I'm afraid; we don't commit patches with known insufficiencies as a general rule.
And that's exactly why, still to this day, it has not yet been addressed after so many years. Nobody is asking wine to support a fully compilent NUMA implementation... but the current one is as bad (if worse) than the one I proposed.
I don't see where you're getting that? Patches are committed to Wine all the time. We remove their known insufficiencies through review.
There's probably been no NUMA attention because of a lack of applications that care.
That's the most help thing I've read on this topic. Thank you @zfigura, for your contributions (quite fond of NTsync) and specially for taking precious time out of your day to provide actual useful information. I really value this effort. I now see clearly the absurdity of the changes I am asking.
Your line of questioning was actually on point and designed to guide me towards the rights questions; why does the application intends to do with this information? Do I really need to implement everything just to trick the app to run? Does it even use the second node?
The answer is no. You right, 99% of my submission is garbage. I most likely only need to implement `GetNumaNodeProcessorMask` at most. I don't event need to report the second node.
Are there applications that need this or aren't there? You seem to mention a bug report; can you please link to it?
Here is the comment where I linked to the bugtracker issues I mentioned: https://gitlab.winehq.org/wine/wine/-/merge_requests/8970#note_115837
I am facing this issue when running forks of Open X-Ray that introduce optimizations to add concurrency into the S.T.A.L.K.E.R engine in an effort to reduce stutters. I can see those sort of optimization quickly becoming the norm in the stalker community so I feel it's important that our compatibility layer can support those. It only cares about the first node anyway which explains why my bogus patch still was able to apparently solve my issue: because no data is actually moved between nodes.
I'll take all your valuable feedback and try to come back with a clean version that checks all the requirements. It would actually be very cool if I can submit a production ready patch. Put it on my bucket list.
I'd like to also to present my apologies to anyone that been offended by my stupid proposition. Apparently I clearly needed that to understand why it's a bad idea and how it should be done.
Okay, again that's not at all what I was trying to say...