On Fri May 31 17:14:09 2024 +0000, Paul Gofman wrote:
What is the motivation under introducing Linux specific memfd, is there any clear indication of how that is beneficial? Maybe there is something, but backing any file with memory fd doesn't seem obviously beneficial: it can probably provoke OOMs which otherwise would be avoided with file backed memory. Huge file backed mappings may be accessed very sparsely and not impose much RAM pressure.
`create_memfd` **only** gets used in `create_temp_file`, which is itself used only to create FDs to back _anonymous_ mappings. That being said, fds returned by `create_temp_file` are _only_ file-backed when `/tmp` (or rather, `server_dir_fd` which is usually `/tmp/.wine-UID/server-XXX`) is mounted `noexec`. Because we first try to create temporary files in `/tmp`, that makes it functionally equivalent to memfds as far as any impact on memory pressure is concerned. `/tmp` is usually backed by `tmpfs` on most platforms, and all pages that back files created under it are counted as `shmem`, the same holding true for memfds as well.
To first try creating a memfd to back all anonymous mappings, instead of only using it to back large page mappings, is potentially better as:
1. When `/tmp` is noexec, wineserver will try to create the backing file under `config_dir_fd`, which in turn is `$WINEPREFIX`/`$HOME/.wine`. If `config_dir` is also `noexec`, `create_temp_file` will fail. Using a memfd removes the strict requirement that we have a non-`noexec` filesystem available, as memfd mappings don't have any such restrictions. 2. Additionally, even though `create_temp_file` will immediately `unlink` the newly created file, any read/writes on the mapped memory still go through the backing filesystem's VFS, which, when the file is created under `$WINEPREFIX` due to a `noexec` mounted `/tmp`, eats up FS blocks (which don't seem to get freed until the fd is closed) + have significant performance implications (or atleast, that seems to be the case on my btrfs + dm-crypt setup). In this case, large file mappings are also additionally limited by (and impact) free space on the filesystem and any configured quota groups. 3. memfds can be permanently sealed to restrict read/writes/execute operations, which provide an additional layer of access protection from the kernel (though I'm not sure how much does this actually matter, since `get_mapping_info` in wineserver will already enforce page protections for the underlying file object).