On 19.10.2016 19:05, Jacek Caban wrote:
v2: Don't store async queues in a list.
Signed-off-by: Jacek Caban jacek@codeweavers.com
server/async.c | 34 +++++++++++++++++++++++++++++++++ server/device.c | 58 +++++++++++++++++++++++++-------------------------------- server/file.h | 14 ++++++++++++++ 3 files changed, 73 insertions(+), 33 deletions(-)
I'm not sure if Alexandre has already agreed to this approach, but from my point of view it would make sense to do some measurements. How fast is the proposed named pipe approach compared to alternatives (unpatched Wine, or the Linux-only implementation included in Wine Staging)? In general I like the idea, but I fear that adding significant optimizations afterwards will be very difficult.
Regards, Sebastian
Sebastian Lackner sebastian@fds-team.de writes:
I'm not sure if Alexandre has already agreed to this approach, but from my point of view it would make sense to do some measurements. How fast is the proposed named pipe approach compared to alternatives (unpatched Wine, or the Linux-only implementation included in Wine Staging)? In general I like the idea, but I fear that adding significant optimizations afterwards will be very difficult.
Performance is of course an interesting question, but correctness comes first. Unpatched Wine does not provide correct semantics, so it doesn't really matter how fast it would be.
A Linux-only hack could potentially be used as an optimization, but we need a working portable implementation first.
On 20.10.2016 19:17, Sebastian Lackner wrote:
On 19.10.2016 19:05, Jacek Caban wrote:
v2: Don't store async queues in a list.
Signed-off-by: Jacek Caban jacek@codeweavers.com
server/async.c | 34 +++++++++++++++++++++++++++++++++ server/device.c | 58 +++++++++++++++++++++++++-------------------------------- server/file.h | 14 ++++++++++++++ 3 files changed, 73 insertions(+), 33 deletions(-)
I'm not sure if Alexandre has already agreed to this approach, but from my point of view it would make sense to do some measurements. How fast is the proposed named pipe approach compared to alternatives (unpatched Wine, or the Linux-only implementation included in Wine Staging)?
As Alexandre said, I think we should concentrate on correctness first, but I did some benchmarking that should give us overview. Proper measuring depends on many things and different implementations have various weaknesses. Here are a few things to keep in mind:
- My patchset leaves room for improvements as explained in [1]. I have prototyped patches for those optimizations and tested both with and without them.
- Calls with immediate results have very different characteristics than ones that return immediately. If we need to wait, then server round trips either using overlapped calls or blocking select require server round trips, for which extra data that needs to be transferred with my patchset (if not too big), shouldn't cause significant difference. When using my tests with large buffer, most calls should have immediate results, so I expect it to tests worst-case scenario.
- Current Wine implementation uses blocking socket calls for non-overlapped pipes. This significantly affects results, so I tested both in overlapped and non-overlapped mode (although calls themselves are not overlapped).
- Small buffer sizes would be interesting to test, because they cause mode blocking and thus would show differences in such case. Sadly, staging implementation doesn't take that into account, so results would be meaningless.
I used the attached test with parameters: npbench.exe 1024 10000 100000 (for non-overlapped) npbench.exe 1024 10000 100000 1 (for overlapped)
Results are also attached. Proper benchmarking would require a lot more care, but I think it shows what you asked for. This implementation is 3.3 times slower in overlapped case, 7.4 times slower in non-overlapped tests (3.1 and 2.5 times compared to Windows in VM). I'd say that's not too bad. Overlapped case is even faster than I expected.
BTW, did you expect staging patches to be so much faster in overlapped case than plain Wine? I didn't look too deeply in the code, maybe there is some optimization, but that's a bit suspicious.
In general I like the idea, but I fear that adding significant optimizations afterwards will be very difficult.
FWIW, it should be fairly easy to rebase staging patches on top of my series. It leaves the whole previous implementation mostly untouched for byte mode case. That, however, may not be true after further changes in the future, esp. if we decided to use server I/O implementation for byte mode case as well.
Thanks, Jacek
On 24.10.2016 15:46, Jacek Caban wrote:
- My patchset leaves room for improvements as explained in [1].
[1] https://www.winehq.org/pipermail/wine-patches/2016-October/154803.html
On 24.10.2016 15:46, Jacek Caban wrote:
On 20.10.2016 19:17, Sebastian Lackner wrote:
On 19.10.2016 19:05, Jacek Caban wrote:
v2: Don't store async queues in a list.
Signed-off-by: Jacek Caban jacek@codeweavers.com
server/async.c | 34 +++++++++++++++++++++++++++++++++ server/device.c | 58 +++++++++++++++++++++++++-------------------------------- server/file.h | 14 ++++++++++++++ 3 files changed, 73 insertions(+), 33 deletions(-)
I'm not sure if Alexandre has already agreed to this approach, but from my point of view it would make sense to do some measurements. How fast is the proposed named pipe approach compared to alternatives (unpatched Wine, or the Linux-only implementation included in Wine Staging)?
As Alexandre said, I think we should concentrate on correctness first, but I did some benchmarking that should give us overview.
That is correct, but of course there isn't just one way to implement named pipes. It might also be possible to implement it in a portable way without going through wineserver or using Linux-specific functions. It would require more complexity on the ntdll side though.
In fact, having object handling in ntdll (at least as fast-path for process- local objects) is something people have been thinking about since a long time, and it probably could also be used to solve the message mode problem. Nevertheless, I have to admit that reusing the driver functions is also very elegant. ;)
Proper measuring depends on many things and different implementations have various weaknesses. Here are a few things to keep in mind:
- My patchset leaves room for improvements as explained in [1]. I have
prototyped patches for those optimizations and tested both with and without them.
- Calls with immediate results have very different characteristics than
ones that return immediately. If we need to wait, then server round trips either using overlapped calls or blocking select require server round trips, for which extra data that needs to be transferred with my patchset (if not too big), shouldn't cause significant difference. When using my tests with large buffer, most calls should have immediate results, so I expect it to tests worst-case scenario.
- Current Wine implementation uses blocking socket calls for
non-overlapped pipes. This significantly affects results, so I tested both in overlapped and non-overlapped mode (although calls themselves are not overlapped).
- Small buffer sizes would be interesting to test, because they cause
mode blocking and thus would show differences in such case. Sadly, staging implementation doesn't take that into account, so results would be meaningless.
I used the attached test with parameters: npbench.exe 1024 10000 100000 (for non-overlapped) npbench.exe 1024 10000 100000 1 (for overlapped)
Results are also attached. Proper benchmarking would require a lot more care, but I think it shows what you asked for. This implementation is 3.3 times slower in overlapped case, 7.4 times slower in non-overlapped tests (3.1 and 2.5 times compared to Windows in VM). I'd say that's not too bad. Overlapped case is even faster than I expected.
BTW, did you expect staging patches to be so much faster in overlapped case than plain Wine? I didn't look too deeply in the code, maybe there is some optimization, but that's a bit suspicious.
Thanks, these are exactly the stats I was looking for. I'm not aware of any special optimizations, so not sure how Staging can be faster. If you used a prebuilt Staging version with compiler optimization enabled that could also be the reason. Nevertheless, the values for "plain wine+msgmode patchset+optimizations" actually look pretty good when taking into account the wineserver roundtrip.
In general I like the idea, but I fear that adding significant optimizations afterwards will be very difficult.
FWIW, it should be fairly easy to rebase staging patches on top of my series. It leaves the whole previous implementation mostly untouched for byte mode case. That, however, may not be true after further changes in the future, esp. if we decided to use server I/O implementation for byte mode case as well.
Well, I would probably keep it for some more time in Staging, but I don't think it would make sense to merge it into the development branch. Issues are easier to debug when it behaves the same on all platforms.
BTW; I assume you are aware of it already, but your current implementation lacks handling of PIPE_NOWAIT, which is also implemented in the Staging patchset.
Best regards, Sebastian
On 25.10.2016 06:03, Sebastian Lackner wrote:
On 24.10.2016 15:46, Jacek Caban wrote:
BTW, did you expect staging patches to be so much faster in overlapped case than plain Wine? I didn't look too deeply in the code, maybe there is some optimization, but that's a bit suspicious.
Thanks, these are exactly the stats I was looking for. I'm not aware of any special optimizations, so not sure how Staging can be faster. If you used a prebuilt Staging version with compiler optimization enabled that could also be the reason.
I should be more precise, sorry. In my tests I used plain Wine with staging named pipe patches applied (some trivial conflicts needed to be resolved AFAIR). I can try it other way if you think it would be valuable.
BTW; I assume you are aware of it already, but your current implementation lacks handling of PIPE_NOWAIT, which is also implemented in the Staging patchset.
Yes, this patchset was intended to be as small as makes sense and PIPE_NOWAIT is not implemented in current Wine anyway. PIPE_NOWAIT should be easy to implement on top of my patches, preferably after patches for immediate read/write returns are in (that's one of mentioned optimization). I expect this to be very simple then.
Thanks for the feedback.
Best regards, Jacek
2016-10-25 13:26 GMT+02:00 Jacek Caban jacek@codeweavers.com:
On 25.10.2016 06:03, Sebastian Lackner wrote:
On 24.10.2016 15:46, Jacek Caban wrote:
BTW, did you expect staging patches to be so much faster in overlapped case than plain Wine? I didn't look too deeply in the code, maybe there is some optimization, but that's a bit suspicious.
Thanks, these are exactly the stats I was looking for. I'm not aware of any special optimizations, so not sure how Staging can be faster. If you used a prebuilt Staging version with compiler optimization enabled that could also be the reason.
I should be more precise, sorry. In my tests I used plain Wine with staging named pipe patches applied (some trivial conflicts needed to be resolved AFAIR). I can try it other way if you think it would be valuable.
Hm, I am not sure what causes it then. I will probably do some more investigations, but it shouldn't really block your patchset. It could also be a side effect of using a different socket type (SOCK_SEQPACKET vs SOCK_STREAM) for example.
BTW; I assume you are aware of it already, but your current implementation lacks handling of PIPE_NOWAIT, which is also implemented in the Staging patchset.
Yes, this patchset was intended to be as small as makes sense and PIPE_NOWAIT is not implemented in current Wine anyway. PIPE_NOWAIT should be easy to implement on top of my patches, preferably after patches for immediate read/write returns are in (that's one of mentioned optimization). I expect this to be very simple then.
Thanks for the feedback.
Best regards, Jacek