Over the last two weeks I've been working on an experimental gcc optimization for wine that out-of-lines the expensive prologues & epilogues required for an ms x64 abi function to call a 64-bit sysv function. The theory is that the added expense of each function executing a few extra instructions to facilitate this will be offset by the reduction in instruction cache misses. As far as I can tell, gcc's ix86 back end has never supported any type of out-of-line pro/epilogue stubs, so adding this support has been somewhat tedious as it tends to get broken by subsequent optimization passes. It currently works in many cases, but in other cases produces bad code so this is a work in progress.
None the less, I thought it was interesting enough to post the current results in terms of .text size reduction. On gcc 5.4.0 (with the 8-byte aligned stack assumption flaw corrected), the total .text size of all dlls on my build was 45.2MiB while the out-of-lined build was 35.9MiB, a reduction of 20.5%, so it looks promising so far. I presume it will be a little while before all of the flaws are worked out so that it always produces good code, but once there I can start to produce some real performance numbers.
It's probably also worth mentioning that the ideal long-term solution is to find ways eliminate the register save/restores through various means-- static analysis, better evaluation of branch costs, and other optimizations in Wine its self. For example, there are many WINAPI functions who's only sysv call is to wine_dbg_log() via a TRACE(). Just marking this function as __attribute__((__cold__)) should reduce the number of unneeded register save/restores in the function's main path by a great amount. (I haven't actually tested it yet.)
Daniel
Am 29.08.2016 um 05:18 schrieb Daniel Santos:
It's probably also worth mentioning that the ideal long-term solution is to find ways eliminate the register save/restores through various means-- static analysis, better evaluation of branch costs, and other optimizations in Wine its self. For example, there are many WINAPI functions who's only sysv call is to wine_dbg_log() via a TRACE(). Just marking this function as __attribute__((__cold__)) should reduce the number of unneeded register save/restores in the function's main path by a great amount. (I haven't actually tested it yet.)
For others searching for the meaning of this attribute:
cold The cold attribute on functions is used to inform the compiler that the function is unlikely to be executed. The function is optimized for size rather than speed and on many targets it is placed into a special subsection of the text section so all cold functions appear close together, improving code locality of non-cold parts of program. The paths leading to calls of cold functions within code are marked as unlikely by the branch prediction mechanism. It is thus useful to mark functions used to handle unlikely conditions, such as perror, as cold to improve optimization of hot functions that do call marked functions in rare occasions.
When profile feedback is available, via -fprofile-use, cold functions are automatically detected and this attribute is ignored.
It would be interesting to see some numbers on this, and what it acutally does with the assembler output. I'm leaving for a few days in a couple of hours, otherwise I'd test it
On 08/29/2016 05:09 AM, André Hentschel wrote:
Am 29.08.2016 um 05:18 schrieb Daniel Santos:
It's probably also worth mentioning that the ideal long-term solution is to find ways eliminate the register save/restores through various means-- static analysis, better evaluation of branch costs, and other optimizations in Wine its self. For example, there are many WINAPI functions who's only sysv call is to wine_dbg_log() via a TRACE(). Just marking this function as __attribute__((__cold__)) should reduce the number of unneeded register save/restores in the function's main path by a great amount. (I haven't actually tested it yet.)
For others searching for the meaning of this attribute:
cold The cold attribute on functions is used to inform the compiler that the function is unlikely to be executed. The function is optimized for size rather than speed and on many targets it is placed into a special subsection of the text section so all cold functions appear close together, improving code locality of non-cold parts of program. The paths leading to calls of cold functions within code are marked as unlikely by the branch prediction mechanism. It is thus useful to mark functions used to handle unlikely conditions, such as perror, as cold to improve optimization of hot functions that do call marked functions in rare occasions.
When profile feedback is available, via -fprofile-use, cold functions are automatically detected and this attribute is ignored.
It would be interesting to see some numbers on this, and what it acutally does with the assembler output. I'm leaving for a few days in a couple of hours, otherwise I'd test it
Tomorrow I'm leaving for a few weeks, so my project have to wait. However, did a very small test of marking wine_dbg_log() __cold__ on 4.9.3 to no effect as it had already optimized it correctly. I suppose that I could be mistaken about this having stared at so many RTL dumps & assembly when working on a modified gcc, but I think it's still worth examining.
Am 30.08.2016 um 00:12 schrieb Daniel Santos:
On 08/29/2016 05:09 AM, André Hentschel wrote:
Am 29.08.2016 um 05:18 schrieb Daniel Santos:
It's probably also worth mentioning that the ideal long-term solution is to find ways eliminate the register save/restores through various means-- static analysis, better evaluation of branch costs, and other optimizations in Wine its self. For example, there are many WINAPI functions who's only sysv call is to wine_dbg_log() via a TRACE(). Just marking this function as __attribute__((__cold__)) should reduce the number of unneeded register save/restores in the function's main path by a great amount. (I haven't actually tested it yet.)
For others searching for the meaning of this attribute:
cold The cold attribute on functions is used to inform the compiler that the function is unlikely to be executed. The function is optimized for size rather than speed and on many targets it is placed into a special subsection of the text section so all cold functions appear close together, improving code locality of non-cold parts of program. The paths leading to calls of cold functions within code are marked as unlikely by the branch prediction mechanism. It is thus useful to mark functions used to handle unlikely conditions, such as perror, as cold to improve optimization of hot functions that do call marked functions in rare occasions.
When profile feedback is available, via -fprofile-use, cold functions are automatically detected and this attribute is ignored.
It would be interesting to see some numbers on this, and what it acutally does with the assembler output. I'm leaving for a few days in a couple of hours, otherwise I'd test it
Tomorrow I'm leaving for a few weeks, so my project have to wait. However, did a very small test of marking wine_dbg_log() __cold__ on 4.9.3 to no effect as it had already optimized it correctly. I suppose that I could be mistaken about this having stared at so many RTL dumps & assembly when working on a modified gcc, but I think it's still worth examining.
Quick tests here also show that gcc (5.4 here) is already doing it by itself. There's no real saving in .text size of course, we just have performant stubs ;)