Over the last two weeks I've been working on an experimental gcc optimization for wine that out-of-lines the expensive prologues & epilogues required for an ms x64 abi function to call a 64-bit sysv function. The theory is that the added expense of each function executing a few extra instructions to facilitate this will be offset by the reduction in instruction cache misses. As far as I can tell, gcc's ix86 back end has never supported any type of out-of-line pro/epilogue stubs, so adding this support has been somewhat tedious as it tends to get broken by subsequent optimization passes. It currently works in many cases, but in other cases produces bad code so this is a work in progress.
None the less, I thought it was interesting enough to post the current results in terms of .text size reduction. On gcc 5.4.0 (with the 8-byte aligned stack assumption flaw corrected), the total .text size of all dlls on my build was 45.2MiB while the out-of-lined build was 35.9MiB, a reduction of 20.5%, so it looks promising so far. I presume it will be a little while before all of the flaws are worked out so that it always produces good code, but once there I can start to produce some real performance numbers.
It's probably also worth mentioning that the ideal long-term solution is to find ways eliminate the register save/restores through various means-- static analysis, better evaluation of branch costs, and other optimizations in Wine its self. For example, there are many WINAPI functions who's only sysv call is to wine_dbg_log() via a TRACE(). Just marking this function as __attribute__((__cold__)) should reduce the number of unneeded register save/restores in the function's main path by a great amount. (I haven't actually tested it yet.)
Daniel