Am 30.08.2016 um 00:12 schrieb Daniel Santos:
On 08/29/2016 05:09 AM, André Hentschel wrote:
Am 29.08.2016 um 05:18 schrieb Daniel Santos:
It's probably also worth mentioning that the ideal long-term solution is to find ways eliminate the register save/restores through various means-- static analysis, better evaluation of branch costs, and other optimizations in Wine its self. For example, there are many WINAPI functions who's only sysv call is to wine_dbg_log() via a TRACE(). Just marking this function as __attribute__((__cold__)) should reduce the number of unneeded register save/restores in the function's main path by a great amount. (I haven't actually tested it yet.)
For others searching for the meaning of this attribute:
cold The cold attribute on functions is used to inform the compiler that the function is unlikely to be executed. The function is optimized for size rather than speed and on many targets it is placed into a special subsection of the text section so all cold functions appear close together, improving code locality of non-cold parts of program. The paths leading to calls of cold functions within code are marked as unlikely by the branch prediction mechanism. It is thus useful to mark functions used to handle unlikely conditions, such as perror, as cold to improve optimization of hot functions that do call marked functions in rare occasions.
When profile feedback is available, via -fprofile-use, cold functions are automatically detected and this attribute is ignored.
It would be interesting to see some numbers on this, and what it acutally does with the assembler output. I'm leaving for a few days in a couple of hours, otherwise I'd test it
Tomorrow I'm leaving for a few weeks, so my project have to wait. However, did a very small test of marking wine_dbg_log() __cold__ on 4.9.3 to no effect as it had already optimized it correctly. I suppose that I could be mistaken about this having stared at so many RTL dumps & assembly when working on a modified gcc, but I think it's still worth examining.
Quick tests here also show that gcc (5.4 here) is already doing it by itself. There's no real saving in .text size of course, we just have performant stubs ;)