This merge request implements several NUMA functions previously stubbed in kernel32 and kernelbase, adds a basic NUMA node discovery/topology layer, and enriches the associated tests. It also improves the traceability of SetThreadGroupAffinity.
## Context / Motivation Some Windows applications (game engines, middleware, runtimes) query the NUMA API to adapt memory allocation or thread distribution. The lack of an implementation returned errors (ERROR_CALL_NOT_IMPLEMENTED) or unhelpful values, which could degrade the internal heuristics of these programs. This first implementation provides: - A logical topology derived from GetLogicalProcessorInformation. - A reasonable approximation of available memory per node. - Consistent processor masks for the present nodes. It prepares for future optimizations (targeted memory allocation, better scheduling strategies) without modifying the existing behavior of generic allocations.
## Main Changes - `kernel32/process.c`: - Implementation of GetNumaNodeProcessorMask, GetNumaAvailableMemoryNode / Ex, GetNumaProcessorNode / Ex, GetNumaProximityNode. - Parameter validation and consistent error propagation (ERROR_INVALID_PARAMETER). - `kernelbase/memory.c`: - New NUMA infrastructure (topology cache, lazy initialization, dedicated critical lock). - Topology reading via GetLogicalProcessorInformation. - Runtime options via environment variables: - WINE_NUMA_FORCE_SINGLE: Force a single logical node. - WINE_NUMA_CONTIG: Remap masks to produce contiguous blocks. - Implementations of GetNumaHighestNodeNumber, GetNumaNodeProcessorMaskEx, GetNumaProximityNodeEx. - Robust fallback: if no NUMA info → single node. - `kernelbase/thread.c`: - Added detailed traces in SetThreadGroupAffinity (removed the redundant DECLSPEC_HOTPATCH here). - Tests (`dlls/kernel32/tests/process.c`): - Added a new test, test_NumaBasic, covering: - GetNumaHighestNodeNumber - GetNumaNodeProcessorMaskEx (nodes 0 and 1) - GetNumaProximityNodeEx - Tolerant behavior: accepts `ERROR_INVALID_FUNCTION` / `ERROR_INVALID_PARAMETER` depending on the platform. - Added the `WINE_DEFAULT_DEBUG_CHANNEL(numa)` debug channel for the subsystem.
## Assumptions / Limitations - Support for a single processor group (Group = 0) for now. - Memory approximation: equal division of available physical memory (improvable later with internal counters per node). - Proximity = node (simplistic direct mapping). - No impact yet on VirtualAlloc / Heap allocation by node.
## Security / Concurrency - Initialization protected by dedicated critical section (numa_cs). - Thread-safe lazy read. - Table bounded to 64 nodes (historical Windows limit).
## Compatibility Impact - Improves compatibility with software probing the NUMA API. - Low risk of regression: previously failed paths now return TRUE with consistent data. - In case of topology collection failure → single-node fallback (conservative behavior).
## Validation / Tests - New test_NumaBasic added and integrated into the process suite. - Traces (numa channel) allow for detection diagnostics. - Invalid parameters tested (NULL, nodes out of range). - Works in environments without real NUMA via fallback.
## Environment Variables (quick documentation) - WINE_NUMA_FORCE_SINGLE=1: Forces a single node (mask covering all CPUs). - WINE_NUMA_CONTIG=1: Reallocates compact bit blocks per node (useful if the topology returns sparse masks).
## Potential Next Steps (not included) - Implement true memory tracking per node (via allocation hooks). - Multi-group support (PROCESSOR_GROUP_INFO). - Improved VirtualAllocExNuma / First-touch implementation. - More accurate proximity-to-node mapping on complex NUMA platforms. - Dedicated tests for environment variables.
## Potential Risks / Regressions - Applications relying on the absence of an API may slightly change their strategy (low). - Masks remapped with WINE_NUMA_CONTIG could surprise a profiling tool (opt-in option). - Memory approximation too coarse for very fine-grained heuristics (no functional regression expected).
## Request for Review - Verify logging conventions and TRACE_(numa) usage. - Verify the relevance of removing DECLSPEC_HOTPATCH on SetThreadGroupAffinity (alignment with local conventions). - Opinion on error granularity (ERROR_INVALID_PARAMETER vs. ERROR_INVALID_FUNCTION) for more accurate mimicry.
Once the kernel can handle those functions directly (in a NUMA module i.e.) we could use this implementation as a fallback when the kernel doesn't support NUMA natively (when the module cannot be loaded).
-- v2: kernelbase: Improve initialization of NUMA information to handle pathological cases