http://bugs.winehq.org/show_bug.cgi?id=59857 Bug ID: 59857 Summary: ntdll: process heap grows without bound under sustained allocation load; subheaps are never returned to the OS Product: Wine Version: 11.10 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: ntdll Assignee: wine-bugs@list.winehq.org Reporter: nen24t@gmail.com Distribution: --- ## Summary A process running an application that issues a high, sustained rate of small/medium heap allocations with mixed lifetimes (e.g. a Direct2D GUI redrawing continuously) shows **unbounded RSS growth that never plateaus and is never returned to the OS**, even though the live working set is stable. Growth stops the instant the redraw load stops, but RSS does not shrink. Observed up to **3.9 GB RSS without crashing**. The growth is **fully attributable to the ntdll heap's `allocate_region()` path** creating subheaps and large blocks that are never decommitted. ## Reproduction - **Wine**: 11.0 (stable), unmodified / vanilla build. Reproduces equally without any third-party D3D layer (builtin wined3d/d3d11/d2d1, OpenGL backend). - **Also reproduces on 11.10 (devel)**: the same unbounded growth is observed. Note that `dlls/ntdll/heap.c` is **functionally identical between 11.0 and 11.10** — the only diff is the removal of one unrelated `#define WIN32_NO_STATUS`; `allocate_region()`, `create_subheap()`, `subheap_decommit()` and the subheap/LFH logic are byte-for-byte unchanged. The mechanism is therefore the same across both releases. - **Workload**: A DAW (REAPER) hosting a Direct2D-based VST plugin GUI (Serum 2) with a continuously animating waveform display (steady ~tens-of-thousands of D2D draw calls per second). Any application doing continuous `ID2D1RenderTarget` drawing with per-frame transient geometry should reproduce; the plugin is just a convenient high-rate driver. - **Observe**: `ps -o rss= -p <pid>` (or `/proc/<pid>/status` `VmRSS`) over a few minutes of active GUI. RSS climbs at a roughly constant rate and does not recover. ## Observed behaviour (measured) - RSS growth rate under active load: **~1.2–1.6 MB/s**, linear, unbounded. - Growth is **100% in anonymous mappings**. Categorising `/proc/<pid>/smaps` over a 2-minute window: `ANON` grows +~190 MB while **every other category is flat** — NVIDIA GL driver mappings flat (~340 MB), glibc `[heap]` flat (~12 MB), file-backed/PE/SO flat. - Per-VMA diff shows the growth concentrated in a **small number of large, growing anonymous regions** (e.g. one region growing to >120 MB), **not** VMA churn (the mapping count is stable). This is the signature of one or a few heaps/arenas committing incrementally. ## Analysis — attribution to the ntdll heap Layered attribution (each step ruled out a candidate): 1. **Not the LFH frontend.** Walking all LFH groups and classifying by live-block count: committed LFH memory **plateaus** (empty-group caching is bounded by the per-bin group cache). Sparse LFH groups dominate over empty ones (~1:9–11 by wasted bytes), but the LFH total does not grow unbounded. LFH-level decommit/pruning has no effect (consistent with the heap immediately reusing freed interior blocks). 2. **Not Win32 `VirtualAlloc`.** Sampling all `VirtualAllocEx`/`MEM_COMMIT` calls: effectively zero committed bytes via this path even at multi-GB RSS. The large allocations bypass it. 3. **Heap-committed breakdown.** Summing committed bytes per heap (all subheaps + large blocks): the **process heap reaches ~770 MiB** (≈597 MiB across ~42 subheaps, max 15 MiB each, plus ~172 MiB across ~66 large blocks) and keeps growing. All other ntdll heaps stay < 4 MiB. 4. **Direct `NtAllocateVirtualMemory` caller.** Attributing each large committed `NtAllocateVirtualMemory` call to its PE return address (read from the syscall frame, resolved offline via `/proc/<pid>/maps` + `nm`/`addr2line`): the dominant driver is **`allocate_region()` in `dlls/ntdll/heap.c`** — ~201 MiB across ~87 calls in a 2-minute window. `allocate_region()` is called from `create_subheap()` and `heap_allocate_large()`. ## Root cause (as understood) The growth is **process-heap fragmentation that is never returned to the OS**, in the **non-LFH range** (mid-size blocks → subheaps) and large blocks: - A subheap holds blocks of mixed size-classes and mixed lifetimes. `subheap_decommit()` only releases trailing free space at the end of a subheap; a subheap that retains even a single long-lived ("residue") block stays fully committed. - Under a workload that allocates many short-lived blocks per frame plus occasional long-lived ones (geometry/shader/glyph objects), subheaps accumulate in a sparse state: low live occupancy, but pinned by residue blocks and therefore never freed. The result is monotonic committed growth with a stable live set. The allocations filling the heap (from caller sampling) are ordinary builtin-DLL allocations: wined3d shader objects (`pixel_shader_init`/`shader_set_function`, ~17–23 KB), d2d1 transient geometry, dwrite glyph/layout structs, d3d11 — nothing unusual; the issue is the heap's retention behaviour, not the callers. This is not D2D1-specific in principle — D2D1/wined3d is just an intense, mixed-lifetime driver that exposes it quickly. ## Context: Windows behaviour On Windows 10+, the Segment Heap (used by default for many processes, and selectable via `HEAP_CREATE_SEGMENT_HEAP` / image-file-execution-options `FrontEndHeapType`) groups allocations into size-class-homogeneous segments and returns empty segments to the OS, so this mixed-lifetime workload does not accumulate the same way. Wine's heap is the classic NT heap (no Segment Heap; `HEAP_CREATE_SEGMENT_HEAP` is not implemented), which never returns interior subheap pages once a subheap is pinned. This is noted only as behavioural context — not as a fix proposal. ## How the data was gathered (for independent reproduction) All instrumentation was diagnostic-only (counters + periodic `ERR()` dumps), removed afterwards: - VMA categorisation and per-VMA growth diff from `/proc/<pid>/smaps`. - Per-heap committed totals (sum over `subheap_list` of `commit_end - base`, plus `large_list`). - `NtAllocateVirtualMemory` caller attribution: aggregate committed bytes by the PE return IP taken from the syscall frame (`rsp`-saved return address, not the syscall stub IP), resolved against the loaded PE modules. Raw per-window measurement tables can be attached on request. ## Related bugs - **Bug 55818** — earlier report of related Direct2D-driven RSS growth. This report adds precise attribution (driven by `allocate_region()` / subheap retention in the *process heap*, not the LFH) plus a vanilla and cross-version confirmation (heap code identical 11.0 ↔ 11.10). - **Bug 57289** — concerns the LFH frontend (`group_release`). The data here indicates the **LFH is NOT the source** of the unbounded growth: LFH committed memory plateaus; the accumulation is one level up, in non-LFH subheaps and large blocks that `allocate_region()` creates and never decommits. Cross-referenced to help refocus the investigation. (Adjust/verify the bug numbers and titles before submitting; also add them to the "See Also" field of the new bug.) ## Severity / impact Long-running sessions of Direct2D-heavy applications (DAWs with plugin GUIs, etc.) hit out-of-memory after a few hours despite a small, stable live working set. Closing the GUI stops growth but does not reclaim; only process exit (which `HeapDestroy`s everything at once) recovers the memory. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.