Hi everybody,
As originally discovered by Carsten Juttner, Star Wars: The Old Republic uses the time values in KUSER_SHARED_DATA to trigger network send buffers. wine does not currently update these values, so TOR does not make it past an initial server handshake.
I'm attaching my current patch which fixes this. It updates the values every 15600000 nanoseconds as does Windows 7, and presumably all Windows versions. The words of the timers are written in the correct order for applications to detect clock tick.
It uses a separate thread which does nothing else; a previous attempt at an NtTimer based solution did not work, presumably due to irregularity in the exact callback times. Attempts at setting thread priority were also abandoned after many people were unable to run the code.
The patch does not check the return code of nanosleep; I could add code to nanosleep again for the remaining time in the case of an interrupt, but didn't think it was necessary. I could also change it to use clock_nanosleep, which would allow for more absolute timing.
Please let me know what you think.
cheers,
Joey
On Tue, Mar 20, 2012 at 06:40:37PM -0700, Joey Yandle wrote:
Hi everybody,
As originally discovered by Carsten Juttner, Star Wars: The Old Republic uses the time values in KUSER_SHARED_DATA to trigger network send buffers. wine does not currently update these values, so TOR does not make it past an initial server handshake.
I'm attaching my current patch which fixes this. It updates the values every 15600000 nanoseconds as does Windows 7, and presumably all Windows versions. The words of the timers are written in the correct order for applications to detect clock tick.
It uses a separate thread which does nothing else; a previous attempt at an NtTimer based solution did not work, presumably due to irregularity in the exact callback times. Attempts at setting thread priority were also abandoned after many people were unable to run the code.
The patch does not check the return code of nanosleep; I could add code to nanosleep again for the remaining time in the case of an interrupt, but didn't think it was necessary. I could also change it to use clock_nanosleep, which would allow for more absolute timing.
Please let me know what you think.
- pthread_create() is pretty much a no-go.
Use Win32 apis instead (CreateThread() and WaitFor*)
- The shared data structure should be modified with atomic accesses only. (Or use a criticalsection if possible.)
However, I do not know if we can have additional threads at all in a Win32 process without confusing the win32 processes, or if this needs to be solved differently (APC?).
Ciao, Marcus
- The shared data structure should be modified with atomic accesses only. (Or use a criticalsection if possible.)
According the this website, Windows updates the times too frequently to use a critical section:
http://www.dcl.hpi.uni-potsdam.de/research/WRK/2007/08/getting-os-informatio...
The values are written in the order High2, Low, High1. Applications read them in the reverse order; as long as the two High values are the same, the composite value is considered correct. This is a fairly common algorithm for detecting clock tick without atomic operations or critical sections.
I could certainly use atomic operations for each of the 32-bit word writes. But since Windows apps expect the word write order, I don't think it's necessary or advisable to do 64-bit atomic ops.
However, I do not know if we can have additional threads at all in a Win32 process without confusing the win32 processes, or if this needs to be solved differently (APC?).
A previous version of my patch used NtTimer/APC:
http://bugs.winehq.org/attachment.cgi?id=39308
The APC code is commented out but still present in that version of the patch.
It did not work, by which I mean SW:TOR did not successfully login. It was clearly the best solution, so I'd be happy to return to it if we can make the callback times more precise.
cheers,
Joey
However, I do not know if we can have additional threads at all in a Win32 process without confusing the win32 processes, or if this needs to be solved differently (APC?).
A previous version of my patch used NtTimer/APC:
I'm attaching a cleaned up version of my APC code. It does get called regularly:
000f:fixme:thread:update_shared_data_time 129768277137647820 0041:fixme:win:GetWindowPlacement not supported on other process window 0x90076 0036:fixme:d3d:query_init Unhandled query type 0xc. 000f:fixme:thread:update_shared_data_time 129768277137814980
Though the interval is not exactly 15.600 ms, as it is with a separate thread and nanosleep.
And most importantly, it does not allow SW:TOR to make it to the character selection screen.
Could I have initialized the timer with different values to make it more precise?
thanks,
Joey
On 3/21/2012 2:20 PM, Marcus Meissner wrote:
However, I do not know if we can have additional threads at all in a Win32 process without confusing the win32 processes, or if this needs to be solved differently (APC?).
If user_shared_data was written by wineserver and mapped read-only to wine processes there was no need to create separate threads in wine processes. As I know Windows is sharing this structure and is updating it in kernel mode so wine behavior was similar if was updated by wineserver.
Thanks.
Kornel
If user_shared_data was written by wineserver and mapped read-only to wine processes there was no need to create separate threads in wine processes. As I know Windows is sharing this structure and is updating it in kernel mode so wine behavior was similar if was updated by wineserver.
Agreed, this is the ideal solution. I'm just not sure how to do so while maintaining the virtual address mapping required for user_shared_data.
On 3/21/2012 9:11 PM, Joey Yandle wrote:
If user_shared_data was written by wineserver and mapped read-only to wine processes there was no need to create separate threads in wine processes. As I know Windows is sharing this structure and is updating it in kernel mode so wine behavior was similar if was updated by wineserver.
Agreed, this is the ideal solution. I'm just not sure how to do so while maintaining the virtual address mapping required for user_shared_data.
MapViewOfFile (and the underlying NtMapViewOfSection) has support for specifying the address to map it.
Thanks.
Kornel
MapViewOfFile (and the underlying NtMapViewOfSection) has support for specifying the address to map it.
So what I should do is:
1. in ntdll/thread.c:thread_init:
int fd = shm_open("/KUSER_SHARED_DATA", O_RDONLY | O_CREAT, 0600); // call MapViewOfFile to map fd to 0x7ffe0000
2. in server/main.c:
int fd = shm_open("/KUSER_SHARED_DATA", O_RDWR | O_CREAT, 0600); // call MapViewOfFile to map fd to 0x7ffe0000
// start thread to update/nanosleep
Is that roughly correct? How do I translate the fd to a HANDLE for the first arg to MapViewOfFile?
thanks,
Joey
- in server/main.c:
int fd = shm_open("/KUSER_SHARED_DATA", O_RDWR | O_CREAT, 0600); // call MapViewOfFile to map fd to 0x7ffe0000
After looking over the code, I think I should just mmap() directly in wineserver rather than using MapViewOfFile; I should however still use MapViewOfFile in ntdll/thread.c.
Mr Google tells me that I can get a handle for MapViewOfFile by calling CreateFileMapping. The first arg of that is also a HANDLE, which I should be able to get by calling wine_server_fd_to_handle (as is done elsewhere in thread.c).
So I'm going to go ahead and try this now. If anyone has a issue with this approach, please let me know.
cheers,
Joey
So I'm going to go ahead and try this now. If anyone has a issue with this approach, please let me know.
I implemented this approach, only to find that numerous places in ntdll/ write to user_shared_data. I need to move all of that code to server/, which will take a while, since much of it depends on other code (#defines and the like) in the files.
I'm attaching my current diff, which throws an exception as soon as someone tries to read one of the values which is no longer being written from ntdll. The diff is extremely ugly, but shows the approach.
cheers,
Joey
Joey Yandle dragon@dancingdragon.be wrote:
After looking over the code, I think I should just mmap() directly in wineserver rather than using MapViewOfFile; I should however still use MapViewOfFile in ntdll/thread.c.
Mr Google tells me that I can get a handle for MapViewOfFile by calling CreateFileMapping. The first arg of that is also a HANDLE, which I should be able to get by calling wine_server_fd_to_handle (as is done elsewhere in thread.c).
So I'm going to go ahead and try this now. If anyone has a issue with this approach, please let me know.
It's been explained several times already that any approach to share data between wineserver and clients is going to be rejected.
It's been explained several times already that any approach to share data between wineserver and clients is going to be rejected.
Can you point me to such an explanation? I joined wine-devel just before posting my RFC. If this has been discussed previously, I'd prefer to get the context before wasting my and others' time.
So if both a thread (pthread or windows) in the wine process and sharing memory with wineserver are verboten, then it looks like the only option is APC. Can you think of any others?
If not, can you look at the APC based patch I posted and let me know how to make the callbacks happen with more precise timing? Windows sets these values in the kernel via a hardware interrupt, so wine needs to set them without reasonable precision.
thanks,
Joey
On 03/21/2012 10:19 PM, Joey Yandle wrote:
If not, can you look at the APC based patch I posted and let me know how to make the callbacks happen with more precise timing?
Yes, that is the only possible way to do this currently in Wine. Separate thread is a no-go. It will break some applications. And no, there is no way to make it more accurate. See other posts about this topic "CreateTimerQueue".
- Vitaliy.
On 03/21/2012 10:19 PM, Joey Yandle wrote:
It's been explained several times already that any approach to share data between wineserver and clients is going to be rejected.
Can you point me to such an explanation? I joined wine-devel just before posting my RFC. If this has been discussed previously, I'd prefer to get the context before wasting my and others' time.
BTW one more thing, this change will most likely break number of copy protection systems. Such as safedisk. They use user shared date times to estimate time it took for some kernel operations. And some of those time intervals are really tight.
-Vitaliy
On 3/22/2012 2:19 PM, Vitaliy Margolen wrote:
BTW one more thing, this change will most likely break number of copy protection systems. Such as safedisk. They use user shared date times to estimate time it took for some kernel operations. And some of those time intervals are really tight.
I'm not sure that preventig measuring time is a good practice.
Since there are a plenty of ways to measure elapsed time, I don't think that this specific way should generally be prohibited.
Thank you.
Kornel
On 03/24/2012 08:56 AM, Kornél Pál wrote:
On 3/22/2012 2:19 PM, Vitaliy Margolen wrote: Since there are a plenty of ways to measure elapsed time, I don't think that this specific way should generally be prohibited.
I'm not saying it should be prohibited. I'm saying it fixes only one app and potentially breaks 100s more...
Vitaliy
BTW one more thing, this change will most likely break number of copy protection systems. Such as safedisk. They use user shared date times to estimate time it took for some kernel operations. And some of those time intervals are really tight.
I'm extremely confused by this statement. Currently, some the USER_SHARED_DATA values are written at process start, and some are not written at all. How can external code possibly rely on Wine's current broken behavior? Any code written for windows expects these values to update every 15.6 ms.
If anything, I would expect updating these values to fix copy protection schemes, not break them.
On 03/24/2012 01:09 PM, Joey Yandle wrote:
Any code written for windows expects these values to update every 15.6 ms.
Exactly. Wine's wineserver & fake kernel in form of ntoskernl can not work with native speed by definition.
For example even earliest versions of safedisk (1.5) were really picky about how long some kernel calls take. This is to protect against debugger. When numbers don't increment (as happens now) protection is happy. When they start to increment, even on fast PCs round trip user-space->wineserver->ntoskrnl will take way longer then it "should".
Of course this is all based on research I did 5 years ago. But surprisingly enough those old Safedisk versions still work, assuming you have supported gcc version, and few other requirements.
Vitaliy
When numbers don't increment (as happens now) protection is happy. When they start to increment, even on fast PCs round trip user-space->wineserver->ntoskrnl will take way longer then it "should".
Indeed, that pathway will never be fast enough. I'm surprised it works with no incrementing, but perhaps that's for backwards compatibility with previous Windows versions.
So, to summarize:
1. Can't use a separate per-process thread because it will break applications.
2. Can't used shared memory segment between wineserver and wine processes because it violates design principles of wineserver.
3. Can't use TimerQueue/APC because too much jitter in callback times.
Is this a fair appraisal of the position of the Wine devs? If so, I'll fork a TOR-capable branch so others can play. I'll start with my patch for case #1, since it is small and works well for TOR. If I can get #2 working properly, I'll switch to that, since it is more technically correct, though as a larger patch it will be more difficult to maintain.
cheers,
Joey
On Thu, Mar 22, 2012 at 5:19 AM, Joey Yandle dragon@dancingdragon.bewrote:
It's been explained several times already that any approach to share data between wineserver and clients is going to be rejected.
Can you point me to such an explanation? I joined wine-devel just before posting my RFC. If this has been discussed previously, I'd prefer to get the context before wasting my and others' time.
I know they declined an in-process wineserver due to difficulties blaming the right component on a crash: is it the app or wine that's faulty?
I found no other relevant discussion on wine-devel since January 2010, when searching for first 'share' and then 'server'.
Too bad if a fork is needed, couldn't the sharing be configurable?
I know they declined an in-process wineserver due to difficulties blaming the right component on a crash: is it the app or wine that's faulty?
I found no other relevant discussion on wine-devel since January 2010, when searching for first 'share' and then 'server'.
If anyone does know how Alexandre et alia feel about sharing data between wine and wineserver, please let me know. It seems the closest to what Windows does, in the context of wine. And as long as the mapping is read-only from wine, then it seems unlikely to cause too many difficulties.
Too bad if a fork is needed, couldn't the sharing be configurable?
It definitely could, and I would prefer to avoid a fork if at all possible. We could set a wine registry key and only start the thread/memory map if it is present.
If this is an acceptable alternative, then I can submit patches which do this.
cheers,
Joey
Joey Yandle dragon@dancingdragon.be wrote:
If anyone does know how Alexandre et alia feel about sharing data between wine and wineserver, please let me know. It seems the closest to what Windows does, in the context of wine. And as long as the mapping is read-only from wine, then it seems unlikely to cause too many difficulties.
Why do you need to update the data in wineserver and not in the client? Consider each wineserver call just an RPC, and leave it alone, server doesn't run anything in the background, and can't magically update timers behind the client's back.
Why do you need to update the data in wineserver and not in the client?
The problem is that the timer updates need to be extremely precise, or they are worse than useless. So we can either do that in every wine process, or do it once in wineserver and share the memory.
On Windows, the kernel shares this memory with all applications for this exact reason.
Consider each wineserver call just an RPC, and leave it alone, server doesn't run anything in the background, and can't magically update timers behind the client's back.
I certainly agree that wineserver doesn't do this currently, but the Windows kernel does, and there's no reason why wineserver can't.
Joey Yandle dragon@dancingdragon.be wrote:
Why do you need to update the data in wineserver and not in the client?
The problem is that the timer updates need to be extremely precise, or they are worse than useless. So we can either do that in every wine process, or do it once in wineserver and share the memory.
On Windows, the kernel shares this memory with all applications for this exact reason.
It's been continously said that any memory sharing scheme between a client and wineserver won't be accepeted, ever.
Consider each wineserver call just an RPC, and leave it alone, server doesn't run anything in the background, and can't magically update timers behind the client's back.
I certainly agree that wineserver doesn't do this currently, but the Windows kernel does, and there's no reason why wineserver can't.
You need to make wineserver multithreaded then, and that idea has been shot down as well.
It's been continously said that any memory sharing scheme between a client and wineserver won't be accepeted, ever.
You're the second person to make that assertion without pointing out the relevant discussion. Such assertions are not helpful.
You need to make wineserver multithreaded then, and that idea has been shot down as well.
Yes, either wine or wineserver will need a thread to set these values with the required precision.
The current question is whether turning on such functionality via a registry setting is acceptable, since it is clearly not okay for the general case.
On Apr 2, 2012, at 10:01 PM, Joey Yandle wrote:
You need to make wineserver multithreaded then, and that idea has been shot down as well.
Yes, either wine or wineserver will need a thread to set these values with the required precision.
Can you use a separate process -- a Win32 service -- to update this shared memory?
Regards, Ken
Can you use a separate process -- a Win32 service -- to update this shared memory?
Something like plugplay.exe? I don't see why not. I'll look over the plugplay.exe code and see if it looks promising.
cheers,
Joey
Dmitry Timoshkov dmitry@baikal.ru writes:
Joey Yandle dragon@dancingdragon.be wrote:
Why do you need to update the data in wineserver and not in the client?
The problem is that the timer updates need to be extremely precise, or they are worse than useless. So we can either do that in every wine process, or do it once in wineserver and share the memory.
On Windows, the kernel shares this memory with all applications for this exact reason.
It's been continously said that any memory sharing scheme between a client and wineserver won't be accepeted, ever.
Writable shared memory is unacceptable, but as long as it's read-only from the client there's no problem in principle.
Writable shared memory is unacceptable, but as long as it's read-only from the client there's no problem in principle.
Excellent, thanks Alexandre.
I'm attaching my current shared memory diff. It uses shm_open + mmap in wineserver to map the shared memory area read/write, and shm_open + NtCreateSection + NtMapViewOfSection in wine to map read only. A thread is run in wineserver which does a clock_nanosleep to set the times without drift.
The wineserver code appears to work, and successfully writes to shared memory. The wine code fails in NtMapViewOfSection with error C0000018, which is STATUS_CONFLICTING_ADDRESSES. Can someone who understands NtMapViewOfSection look at my code and tell me if I've done something stupid? I thought there might be code buried in NtMapViewOfSection which tries to reserve that code block, but I didn't see it when I looked.
Some notes:
I didn't use CreateFileMapping/MapViewOfFile because that lives in kernel32.dll, so I pulled the code in and called NtCreateSection/NtMapViewOfSection directly.
I also had to pull out all of the ntdll code which writes to USER_SHARED_DATA, and move that to wineserver.
I had to link ntdll and wineserver against -lrt; this should of course be done more clealy via configure.
Finally, the mmap call in wineserver only succeeds if I pass in a candidate address; if I pass in NULL, it fails with 'Operation not permitted'.
cheers,
Joey
On 3/22/2012 6:00 AM, Dmitry Timoshkov wrote:
Joey Yandledragon@dancingdragon.be wrote: It's been explained several times already that any approach to share data between wineserver and clients is going to be rejected.
I think this is the reason that timers in shared_user_data are not being updated. I also think that updating timers in every client process in every 15.6 ms is a lot of overhead.
Another approach could be no to allocate that address and provide up to date data in an signal/exception handler. This is a more complex approach and not sure whether possible in user mode. And of course would defeat the purpose using shared memory for high performance.
Thanks.
Kornel
On 3/22/2012 7:43 AM, Kornél Pál wrote:
On 3/22/2012 6:00 AM, Dmitry Timoshkov wrote:
Joey Yandledragon@dancingdragon.be wrote: It's been explained several times already that any approach to share data between wineserver and clients is going to be rejected.
I think this is the reason that timers in shared_user_data are not being updated. I also think that updating timers in every client process in every 15.6 ms is a lot of overhead.
Another approach could be no to allocate that address and provide up to date data in an signal/exception handler. This is a more complex approach and not sure whether possible in user mode. And of course would defeat the purpose using shared memory for high performance.
A bit better approach was to mark that page PAGE_GUARD. Then wine could get an indication that it needs to be updated. Frequent accesses were not impacted because PAGE_GUARD could be reset by an APC some time later.
wine in this case should not use this structure for performance reasons.
I'm not sure however that this would solve Joey's problem since he mentioned that APCs were not accurate enough.
Thanks.
Kornel
On Thu, 22 Mar 2012, Kornél Pál wrote: [...]
A bit better approach was to mark that page PAGE_GUARD. Then wine could get an indication that it needs to be updated. Frequent accesses were not impacted because PAGE_GUARD could be reset by an APC some time later.
Why reset PAGE_GUARD? Would the following work ? * By default wineserver (or a separate process) does not update the shared page. This way there's no overhead. * The page is marked as PAGE_GUARD. When a process accesses it, the user code makes an RPC to the wineserver (or the separate process) to tell it to start updating the page and removes the PAGE_GARD flag. * The application can now access the page and gets smoothly updated timestamps. * The wineserver (or the separate process) keeps track of which processes need its services. When the last of them exits, it stops updating the shared page so there's no overhead anymore.
On 4/9/2012 7:46 PM, Francois Gouget wrote:
On Thu, 22 Mar 2012, Kornél Pál wrote: [...]
A bit better approach was to mark that page PAGE_GUARD. Then wine could get an indication that it needs to be updated. Frequent accesses were not impacted because PAGE_GUARD could be reset by an APC some time later.
Why reset PAGE_GUARD?
What I've descibed is using no shared memory (no wineserver involved) and the page is updated only on demand rather than periodically (no timer needed).
PAGE_GUARD has to be reset because the flag is cleared when it is accessed. (It is better suited for the use case you have described below.) See http://msdn.microsoft.com/en-us/library/aa366549.aspx
Would the following work ?
- By default wineserver (or a separate process) does not update the shared page. This way there's no overhead.
- The page is marked as PAGE_GUARD. When a process accesses it, the user code makes an RPC to the wineserver (or the separate process) to tell it to start updating the page and removes the PAGE_GARD flag.
- The application can now access the page and gets smoothly updated timestamps.
- The wineserver (or the separate process) keeps track of which processes need its services. When the last of them exits, it stops updating the shared page so there's no overhead anymore.
I think this would work. (But don't know whether it was accepted.)
You also should map the shared memory to that predefined address in addition to updating it from wineserver.
Also note that wine is currently reading that page internally (the same page contains system32 path for example). To make this efficient wine should not use that page at all.
Kornel
On 04/09/2012 12:14 PM, Kornél Pál wrote:
I think this would work. (But don't know whether it was accepted.)
No, it won't work. Wine itself (ntdll & kernel32) accesses it.
Vitaliy.
On 4/10/2012 4:48 AM, Vitaliy Margolen wrote:
On 04/09/2012 12:14 PM, Kornél Pál wrote:
I think this would work. (But don't know whether it was accepted.)
No, it won't work. Wine itself (ntdll & kernel32) accesses it.
In my opinion it still would work, just would not be optimal.
Note that I also have poined out the following:
Also note that wine is currently reading that page internally (the same page contains system32 path for example). To make this efficient wine should not use that page at all.
Kornel