New subject: [PATCH v2 1/2] kernel32: Implement NUMA functions and enhance NUMA support in memory management

14 Sep 2025


      This merge request implements several NUMA functions previously stubbed in kernel32 and kernelbase, adds a basic NUMA node discovery/topology layer, and enriches the associated tests. It also improves the traceability of SetThreadGroupAffinity.
## Context / Motivation
Some Windows applications (game engines, middleware, runtimes) query the NUMA API to adapt memory allocation or thread distribution. The lack of an implementation returned errors (ERROR_CALL_NOT_IMPLEMENTED) or unhelpful values, which could degrade the internal heuristics of these programs. This first implementation provides:
- A logical topology derived from GetLogicalProcessorInformation.
- A reasonable approximation of available memory per node.
- Consistent processor masks for the present nodes.
It prepares for future optimizations (targeted memory allocation, better scheduling strategies) without modifying the existing behavior of generic allocations.
## Main Changes
- `kernel32/process.c`:
  - Implementation of GetNumaNodeProcessorMask, GetNumaAvailableMemoryNode / Ex, GetNumaProcessorNode / Ex, GetNumaProximityNode.
  - Parameter validation and consistent error propagation (ERROR_INVALID_PARAMETER).
- `kernelbase/memory.c`:
  - New NUMA infrastructure (topology cache, lazy initialization, dedicated critical lock).
  - Topology reading via GetLogicalProcessorInformation.
  - Runtime options via environment variables:
     - WINE_NUMA_FORCE_SINGLE: Force a single logical node.
     - WINE_NUMA_CONTIG: Remap masks to produce contiguous blocks.
  - Implementations of GetNumaHighestNodeNumber, GetNumaNodeProcessorMaskEx, GetNumaProximityNodeEx.
  - Robust fallback: if no NUMA info → single node.
- `kernelbase/thread.c`:
  - Added detailed traces in SetThreadGroupAffinity (removed the redundant DECLSPEC_HOTPATCH here).
- Tests (`dlls/kernel32/tests/process.c`):
  - Added a new test, test_NumaBasic, covering:
    - GetNumaHighestNodeNumber
    - GetNumaNodeProcessorMaskEx (nodes 0 and 1)
    - GetNumaProximityNodeEx
- Tolerant behavior: accepts `ERROR_INVALID_FUNCTION` / `ERROR_INVALID_PARAMETER` depending on the platform.
- Added the `WINE_DEFAULT_DEBUG_CHANNEL(numa)` debug channel for the subsystem.
## Assumptions / Limitations
- Support for a single processor group (Group = 0) for now.
- Memory approximation: equal division of available physical memory (improvable later with internal counters per node).
- Proximity = node (simplistic direct mapping).
- No impact yet on VirtualAlloc / Heap allocation by node.
## Security / Concurrency
- Initialization protected by dedicated critical section (numa_cs).
- Thread-safe lazy read.
- Table bounded to 64 nodes (historical Windows limit).
## Compatibility Impact
- Improves compatibility with software probing the NUMA API.
- Low risk of regression: previously failed paths now return TRUE with consistent data.
- In case of topology collection failure → single-node fallback (conservative behavior).
## Validation / Tests
- New test_NumaBasic added and integrated into the process suite.
- Traces (numa channel) allow for detection diagnostics.
- Invalid parameters tested (NULL, nodes out of range).
- Works in environments without real NUMA via fallback.
## Environment Variables (quick documentation)
- WINE_NUMA_FORCE_SINGLE=1: Forces a single node (mask covering all CPUs).
- WINE_NUMA_CONTIG=1: Reallocates compact bit blocks per node (useful if the topology returns sparse masks).
## Potential Next Steps (not included)
- Implement true memory tracking per node (via allocation hooks).
- Multi-group support (PROCESSOR_GROUP_INFO).
- Improved VirtualAllocExNuma / First-touch implementation.
- More accurate proximity-to-node mapping on complex NUMA platforms. - Dedicated tests for environment variables.
## Potential Risks / Regressions
- Applications relying on the absence of an API may slightly change their strategy (low).
- Masks remapped with WINE_NUMA_CONTIG could surprise a profiling tool (opt-in option).
- Memory approximation too coarse for very fine-grained heuristics (no functional regression expected).
## Request for Review
- Verify logging conventions and TRACE_(numa) usage.
- Verify the relevance of removing DECLSPEC_HOTPATCH on SetThreadGroupAffinity (alignment with local conventions).
- Opinion on error granularity (ERROR_INVALID_PARAMETER vs. ERROR_INVALID_FUNCTION) for more accurate mimicry.
Once the kernel can handle those functions directly (in a NUMA module i.e.) we could use this implementation as a fallback when the kernel doesn't support NUMA natively (when the module cannot be loaded).
--
  v2: kernelbase: Improve initialization of NUMA information to handle pathological cases
https://gitlab.winehq.org/wine/wine/-/merge_requests/8970

[PATCH v2 0/2] MR8970: kernel32/kernelbase: Implemented NUMA functions and improved affinity support