Actually, we shouldn't be generating assembly code on the fly. If you have more than say 16 proxies in a process then it is actually cheaper in terms of memory usage and cache locality to have a set of compiled entry points that can be shared by all proxies. It is even better if you consider the fact that we shouldn't be allocating the memory for the code from the heap, but should be requesting an executable page of memory for each.
i totally agree with you on this one ;-), but the feasibility of this depends on how dynamic these proxies have to be, i.e., is enough information known at compile/link time to create them in a read-only section or not. for example, the gcc nested function trampolines by their nature must be runtime generated (instantiated) as they need to be both addressable (callable through a normal function pointer) and contain per-instance information - if these proxies have similar constraints, then it's just not possible.
[the following is off-topic i think, if you want to discuss it further, feel free to email me in private, i'll be happy to explain everything]
Just for the record, PaX and execshield are trying to solve problems that are much better solved by other methods that don't break backwards compatibility.
and i completely disagree with you on this one ;-). what PaX and similar approaches solve is not the manifestation of bugs, but rather they put a stop to exploit techniques. the sets of bugs is orthogonal to the sets of exploit techniques, one kind of bug (say a strcpy() based stack overflow) can be exploited by different exploit techniques (say shellcode injection or a forced return into system()). and a given exploit technique say shellcode injection) can be used to exploit different kinds of bugs (say stack and heap overflows, or user supplied format string bugs).
PaX & Co prevent certain exploit techniques from working, canary based systems prevent certain bugs from becoming exploitable (i.e., these methods attack the problem from two different dimensions and have a different, not necessarily comparable coverage of the problem space).
One of the best methods is introducing a terminator canary value between the return address and variables stored on the stack.
it's not only not the best method, but it's a rather limited one in its scope as well. for one, canary solutions can detect simple linear overflow bugs only (and for your case, only on the stack, not on the heap or elsewhere), they won't detect non-linear ones, format string bugs, stale pointer uses, or pretty much any other kind of memory corruption bugs.
and protecting the return address is useless alone if you're not protecting other pointers (or even normal data) in the stack frames, there's 'literature' on how to exploit overwritten frame pointers, function arguments, etc (search the phrack magazine for details).
Obviously, this requires compiler support (which GCC currently lacks, I believe)
mainline gcc does, however both StackGuard and SSP are available as patches. these days quite a few systems (OpenBSD, Adamantix, Gentoo, etc) are using SSP in their base system. there's also talk about incorporating SSP into mainline gcc, but it'll be a longer process.
but it has worked wonders for Microsoft in SP2.
i don't know where you got that from, but it hasn't done much good at all, there're ways to exploit even normal stack overflows that /GS was supposed to prevent (i think David Litchfield posted one on bugtraq or his site), not to mention other kinds of memory corruption bugs which /GS does nothing against. hint: the MS /GS canary doesn't protect the function pointer in the SEH registration record (and the 16 bit randomness in the canary itself is not impossible to brute force either).
It even prevents exploits that PaX/execshield can't, like "return to libc" where the return address is overwritten by the address of another function so that execution jumps into that function.
/GS doesn't prevent such exploits, it makes them succeed with a small probability only (1 in 65534 or so), which PaX and others also achieve by virtue of randomizing the address space layout (and that is of course much more generic than a mere random canary, it helps to make all kinds of exploits harder, not only linear stack overflows). as for the true protection against ret2libc style exploits, check out the doc on the future plans for PaX (http://pax.grsecurity.net/docs/pax-future.txt).