I hope this email is not too long, but linux does not does not quiet have the functionality of these functions: AllocateUserPhysicalPages() MapUserPhysicalPagesScatter() MapUserPhysicalPages() FreeUserPhysicalPages()
I noticed that they were stubs, and thusly unimplemented. However, I have an idea on how to implement these functions. However, I am not entirely familiar with the code wine code base, but nonetheless I will explain my idea. Let's start with AllocateUserPhysicalPages() it needs to do a few things: * Another process needs to be able to reserve pages for an other process (aka the HANDLE hProcess parameter) * Reserve memory without being added to the virtual address space of caller or target process. * Said memory is locked aka mlocked() and won't be swapped. So first thing is any windows process with permission to do so, needs to be able to reserve memory for other proccess including itself. It makes sense have what currently is stub invoke what I will describe next in the Wine Server. This is so when the target process tries mapping the reserved memory it can fetch the info it needs from the Wine server. So the first thing the handler in the Wine server would need to do is: * Check that invoking windows process is allowed to do this. (aka SeLockMemoryPrivilege, and PROCESS_VM_OPERATION on the handle) or just grant all processes this ability. * Check that the target windows process exists.
However, now we need to reserve memory that can be mapped and unmapped multiple times in the target process without losing the memory contents. The simplest way to do this would be a file. However, a file resides on disk and when unmapped can be slow when remapped. It also has potential to pollute the file system if cleanup fails. However, luckily linux has the memfd_create() system call. This creates a RAM backed file, and returns a file descriptor that then later can be passed to mmap(). This lets create a persistent bit of memory that does not pollute either the caller or targets address space. I assume there is a per process structure in the Wine server we could store this file descriptor there. We can also just use ftruncate() to set the size to be equal to the number of bytes requested. However, FreeUserPhysicalPages() makes things a tad more complicated, it can for instance be used to free only a single page worth of memory. So we need to track free pages size chunks. Luckily, we are only tracking fixed size memory blocks so it's not as bad as it could be. So in C-ish pseudo code do in the following for AllocateUserPhysicalPage():
remaining_pages = NumberOfPages; // NumberOfPages is a function parameter proc_struct = get_process_struct(win_process_id/handle); // If there is a structure like this get it page_array_index = 0; // Track what index we have written to if(proc_struct.memfd == -1) { proc_struct.memfd = memfd_create("debug name", {FLAGS}); proc_struct.free_list_head = NULL; } // Check the free list since FreeUserPhysicalPages() can free for instance a single page next_ptr = proc_struct.free_list_head; while(next_ptr != NULL) { // PageArray is also an other function parameter PageArray[page_array_index] = {However the IDing is done for pages};
next_ptr = next_ptr->next; proc_struct.free_list_head = next_ptr;
page_array_index++; remaining_pages--; }
if(remaining_pages > 0) { old_size = proc_struct.memfd_sz; new_size = old_size + (page_sz * remaining_pages); proc_struct.memfd_sz = new_size; ftruncate(proc_struct.memfd, new_size); for(int i = 0; i < remaining_pages; i++) { // calclate ID start from the old_size and increment it up PageArray[page_array_index] = {However, IDing is done for pages}; } } // Assuming no failures we don't need to update the NumberOfPages value.
So now obviously one needs to be identify what windows referees to as frame number for these API, that get returned as array for each page in the PageArray parameter. I propose the uppers bits be the process ID and the lower bits be position on in the file descriptor aligned to page boundary size and shifted. So 4096 bytes pages [Process ID | (aligned_index >> 12)]. I know a Linux process ID can only be configured to be up 22 bits on a 64bit system. While the windows process ID is likely different 64 - 22 = leaves 42 bits to identify a given page for a process.
Lastly, although I am not sure it's necessary AllocateUserPhysicalPages() implies the pages are locked to RAM. The memfd files from my understanding can be swapped. We could mmap this file descriptor into the wine server memory while setting MAP_SHARED flag, and then call mlock() on this mapping in the Wine server to ensure it's never swapped out. We could also use the unused page boundaries to store the free list. However, this also will eat up address space of the Wine Server.
So now let's discuss the MapUserPhysicalPages() function, this in some regards is simpler. It can only be called from the process that is mapping the pages. It needs to do the following: * Get the file descriptor from the Wine Server * Start at the provided virtual address * Check if the our page IDs in the PageArray make sense. (aka ID is not in free list, and process ID matches) * Check that each page is in already mapped region of memory. * Then mmap-inng each page size chunk referenced in the page array sequential starting from virtual address * Each mmap should keep the same permissions as what the page at the address had before.
Next MapUserPhysicalPagesScatter() is mostly the same as MapUserPhysicalPages(), but instead we handle an array of VirtualAddresses that gets mapped to each page in the page array.
The last function is FreeUserPhysicalPages(). Again this one can be called by any process since it takes a process handle. * Don't do anything to any chunks that are mmap()-ed, this implies we need to keep a reference count or some how check that a process does not have this page size chunk mapped. * If freeing a page would create a whole in the middle of memory backed file add it to the free list * Zero this page size chunk * If the page size chunk is at the end or or all the pages size chunks after it are also free * ftruncate() the file the memory back filed down in size
While this mostly was kinda a broad overview, I hope it gives someone a good idea of where to jump start implementing these functions. I did think about it for a little while since linux does not quite have the same functionality.
Thanks, Keith Cancel
Hi,
I am not familiar with these functions (I just read the MSDN pages); It seems their intention is to allow 32 bit processes to handle more than 2/3/4 GB of memory. Do you have an application that actually uses these functions and isn't happy with the stubs?
I wonder what the difference is compared to CreateFileMapping(INVALID_HANDLE_VALUE) + MapViewOfFile. This supposedly creates a memory-backed file handle and the process can map/unmap it at will (and even pass it to a different process). I see a difference in wording ("backed by the system paging file" vs "physical memory"), but unless you peek into the kernel internals I don't think an application should notice the difference.
Regarding allocating and mapping memory in foreign processes, NtAllocateVirtualMemory() can do this as well. It doesn't have the functionality to create a memfd-like allocation, but you can look at how the cross-process mapping works via wineserver.
Wrt the exact semantics of those physical page allocations I guess it depends on what applications actually need. memfd sounds fine, and we can ignore the swap/mlock until an application actually bothers about it.
I found bug 36527, but it is marked as fixed with the patch that implemented the stubs. It isn't clear to me if the games and office diagnosis service are working correctly with the stub or not.
Cheers, Stefan
PS: I also like the seeming fallback to DOS days in the AllocateUserPhyiscalPages description:
"Do not attempt to modify this buffer. It contains operating system data, and corruption could be catastrophic."
Am 17.04.21 um 09:50 schrieb Keith Cancel:
I hope this email is not too long, but linux does not does not quiet have the functionality of these functions: AllocateUserPhysicalPages() MapUserPhysicalPagesScatter() MapUserPhysicalPages() FreeUserPhysicalPages()
I noticed that they were stubs, and thusly unimplemented. However, I have an idea on how to implement these functions. However, I am not entirely familiar with the code wine code base, but nonetheless I will explain my idea. Let's start with AllocateUserPhysicalPages() it needs to do a few things:
- Another process needs to be able to reserve pages for an other
process (aka the HANDLE hProcess parameter)
- Reserve memory without being added to the virtual address space of
caller or target process.
- Said memory is locked aka mlocked() and won't be swapped.
So first thing is any windows process with permission to do so, needs to be able to reserve memory for other proccess including itself. It makes sense have what currently is stub invoke what I will describe next in the Wine Server. This is so when the target process tries mapping the reserved memory it can fetch the info it needs from the Wine server. So the first thing the handler in the Wine server would need to do is:
- Check that invoking windows process is allowed to do this. (aka
SeLockMemoryPrivilege, and PROCESS_VM_OPERATION on the handle) or just grant all processes this ability.
- Check that the target windows process exists.
However, now we need to reserve memory that can be mapped and unmapped multiple times in the target process without losing the memory contents. The simplest way to do this would be a file. However, a file resides on disk and when unmapped can be slow when remapped. It also has potential to pollute the file system if cleanup fails. However, luckily linux has the memfd_create() system call. This creates a RAM backed file, and returns a file descriptor that then later can be passed to mmap(). This lets create a persistent bit of memory that does not pollute either the caller or targets address space. I assume there is a per process structure in the Wine server we could store this file descriptor there. We can also just use ftruncate() to set the size to be equal to the number of bytes requested. However, FreeUserPhysicalPages() makes things a tad more complicated, it can for instance be used to free only a single page worth of memory. So we need to track free pages size chunks. Luckily, we are only tracking fixed size memory blocks so it's not as bad as it could be. So in C-ish pseudo code do in the following for AllocateUserPhysicalPage():
remaining_pages = NumberOfPages; // NumberOfPages is a function parameter proc_struct = get_process_struct(win_process_id/handle); //
If there is a structure like this get it page_array_index = 0; // Track what index we have written to if(proc_struct.memfd == -1) { proc_struct.memfd = memfd_create("debug name", {FLAGS}); proc_struct.free_list_head = NULL; } // Check the free list since FreeUserPhysicalPages() can free for instance a single page next_ptr = proc_struct.free_list_head; while(next_ptr != NULL) { // PageArray is also an other function parameter PageArray[page_array_index] = {However the IDing is done for pages};
next_ptr = next_ptr->next; proc_struct.free_list_head = next_ptr; page_array_index++; remaining_pages--; } if(remaining_pages > 0) { old_size = proc_struct.memfd_sz; new_size = old_size + (page_sz * remaining_pages); proc_struct.memfd_sz = new_size; ftruncate(proc_struct.memfd, new_size); for(int i = 0; i < remaining_pages; i++) { // calclate ID start from the old_size and increment it up PageArray[page_array_index] = {However, IDing is done for pages}; } } // Assuming no failures we don't need to update the NumberOfPages value.
So now obviously one needs to be identify what windows referees to as frame number for these API, that get returned as array for each page in the PageArray parameter. I propose the uppers bits be the process ID and the lower bits be position on in the file descriptor aligned to page boundary size and shifted. So 4096 bytes pages [Process ID | (aligned_index >> 12)]. I know a Linux process ID can only be configured to be up 22 bits on a 64bit system. While the windows process ID is likely different 64 - 22 = leaves 42 bits to identify a given page for a process.
Lastly, although I am not sure it's necessary AllocateUserPhysicalPages() implies the pages are locked to RAM. The memfd files from my understanding can be swapped. We could mmap this file descriptor into the wine server memory while setting MAP_SHARED flag, and then call mlock() on this mapping in the Wine server to ensure it's never swapped out. We could also use the unused page boundaries to store the free list. However, this also will eat up address space of the Wine Server.
So now let's discuss the MapUserPhysicalPages() function, this in some regards is simpler. It can only be called from the process that is mapping the pages. It needs to do the following: * Get the file descriptor from the Wine Server * Start at the provided virtual address * Check if the our page IDs in the PageArray make sense. (aka ID is not in free list, and process ID matches) * Check that each page is in already mapped region of memory. * Then mmap-inng each page size chunk referenced in the page array sequential starting from virtual address * Each mmap should keep the same permissions as what the page at the address had before.
Next MapUserPhysicalPagesScatter() is mostly the same as MapUserPhysicalPages(), but instead we handle an array of VirtualAddresses that gets mapped to each page in the page array.
The last function is FreeUserPhysicalPages(). Again this one can be called by any process since it takes a process handle. * Don't do anything to any chunks that are mmap()-ed, this implies we need to keep a reference count or some how check that a process does not have this page size chunk mapped. * If freeing a page would create a whole in the middle of memory backed file add it to the free list * Zero this page size chunk * If the page size chunk is at the end or or all the pages size chunks after it are also free * ftruncate() the file the memory back filed down in size
While this mostly was kinda a broad overview, I hope it gives someone a good idea of where to jump start implementing these functions. I did think about it for a little while since linux does not quite have the same functionality.
Thanks, Keith Cancel