Hi,
Thanks for the great info. I'll CC this to wine-devel as I think it's of general interest, I hope you don't mind.
For context, PaX is a set of security patches for Linux which lock down the system in a similar manner to exec-shield and SELinux. I say similar manner, because PaX seems to go further than these systems do - in fact from what I've read it seems to be the 'gold standard' in security patches.
I'll quote the whole email and reply inline. This thread started on ubuntu-devel after one Ubuntu developer said they were experimenting with PaX, and I asked what the differences were between it and exec- shield (with which the community seems to have more experience) and why it was chosen. So I was pointed towards this thread:
http://lists.debian.org/debian-devel/2003/11/msg00206.html
in which the PaX author and Ingo Molnar who did exec-shield discuss the differences.
On Wed, 2005-01-05 at 13:37 +0100, pageexec@freemail.hu wrote:
Hello,
just ran across this thread on the ubuntu-devel list and have a few observations:
- PaX cares about backwards compatibility as much as it cares about security, the best compromise we could make is that one can mark executables to be exempt from PaX enforcements (and you should have known about this as we'd talked about PaX+wine last year...).
OK. I don't remember this thread I'm afraid but I do recall that you can exempt particular programs from PaX, so if a distribution wanted to integrate that it'd have to mark Wine as exempt by default. Presumably if WineHQ/CodeWeavers were to ship binary packages we'd have to do the same to work on such a distribution. But it's just an ELF flag right?
as of the 20041201 snapshot of wine, it needs to be exempt from at least ASLR [1], because it still makes some invalid assumptions about the address space:
- the highest mapping in the address space may not be the stack, nor is the highest mapping (be that the stack or something else) supposed to extend to the end of the userland address space. the end result of this assumption is that some piece of code in the preloader enters an infinite loop requesting (but never getting) anon mappings above TASK_SIZE (0xc0000000 typically). excerpt from an strace:
mmap2(0xbffe0000, 262144, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x77ec0000 munmap(0x77ec0000, 262144) = 0 mmap2(0xbffe0000, 131072, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xbffe0000 mmap2(0xc0000000, 131072, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x77ee0000 munmap(0x77ee0000, 131072) = 0 mmap2(0xc0000000, 65536, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x77ef0000 munmap(0x77ef0000, 65536) = 0
the last lines then repeat indefinitely as the kernel would never give out the request address, even with MAP_FIXED.
We do this because some Windows programs and DLLs cannot cope with getting pointers >2gig, so we need to ensure that the kernel does not give us mappings above this point. The only way to do this currently is to do an iterative reservation to map as much of this address space as possible which is what you're seeing here.
- the above mentioned infinite loop also highlighted another bad assumption wine makes: mmap() without MAP_FIXED but with a non-0 hint is under no obligation to observe the hint and give you a mapping at that address, under PaX it doesn't do so explicitly. excerpt from an strace:
mmap2(0x81000000, 1034813440, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x3a420000 munmap(0x3a420000, 1034813440) = 0 mmap2(0x81000000, 517406720, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x81000000 mmap2(0x9fd70000, 517406720, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x59190000 munmap(0x59190000, 517406720) = 0 mmap2(0x9fd70000, 258670592, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x9fd70000
as you can see, wine insists on an address until it gets it, without using MAP_FIXED.
We have no choice in the matter, I think we can't use MAP_FIXED as that'd risk blowing away any mappings already made above the 2gig boundary. Actually this code was originally written to support the 4G/4G VM patch that was put into Fedora for a while (it's gone now).
- there's at least /usr/lib/wine/ntdll.dll.so which is marked with an executable PT_GNU_STACK program header, suggesting that it needs an executable stack (or there's some build problem).
Last time I looked documentation on exactly what triggers this flag is scarce or non-existant. I remember asking Ingo if inline assembly still generated it and the answer back then was no, but I have no idea why gcc has decided it's needed now. If you look at ntdll in the sources:
http://source.winehq.org/source/dlls/ntdll/
It's fairly harmless, there is some assembly in there but I don't remember seeing any code which assumed an executable stack.
this alone would make wine fail under a PaX kernel as PT_GNU_STACK is completely ignored there (because it's the wrong solution for the wrong problem), nor is it allowed to generate code at runtime (this applies to apps on which PaX is enforced of course, one can always disable these on a per-executable basis).
I'm afraid Wine cannot operate in an environment that doesn't allow us to map pages as executable and fill them with generated code. This technique is:
a) Used by some Windows programs b) Used by the Wine DLL loader c) Required to implement DCOM universal interface proxies
So if PaX denies this as a matter of course then it will never work. Having read the thread with Ingo I must say I agree with him that runtime code generation is a legitimate technique and not a bug.
i also have memories from about a year ago that kernel32.so had some executable code snippets (some thunking code?) in .data or some other otherwise non-executable area, that of course wouldn't (and didn't) work under PaX either. back then Alexandre Julliard suggested that this wasn't easy to rewrite (by also making the now static code text reloc free) - has this been done since then?
I don't think so, but I don't remember this thread either.
i also have strace excerpts that show how wine wanted to create writable and executable memory, suggesting that it still wants to generate code at runtime and this is how it fails under PaX (which prevents runtime code generation by default):
mmap2(NULL, 1179648, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fee0000 munmap(0x7fff0000, 65536) = 0 mprotect(0x7fee0000, 65536, PROT_READ|PROT_WRITE|PROT_EXEC) = -1 EACCES (Permission denied) munmap(0x7fee0000, 1114112) = 0 write(2, "wine: failed to create the proce"..., 40wine: failed to create the process heap ) = 40
on a sidenote, XP SP2 makes the default heap non-executable as well, so if the above is the result of wanting to be compatible with Windows, you may want to rethink it for the future.
I'm not sure why it is, but yes I expect it's because some programs rely on it. Service Pack 2 may well make the default process heap NX but it also has a huge infrastructure in place to deal with backwards compatibility concerns, including a large database of badly behaved apps, user-accessible GUIs to disable the protections and I believe it also has code to catch NX faults (on hardware that supports that) and ask the user if they wish to disable the protections for that application.
- it would be nice if wine-preloader and wine-pthread had a configurable base address, the current default makes them impossible to use under the faster non-executable method of PaX/i386 (which halves the userland address space, [2]).
They have to be fixed otherwise the kernel may place them in the middle of a reserved area which would cause initialisation to fail. This cannot be changed.
so, right now wine can't run with a randomized address space, i have yet to test if it can get away without generating code at runtime and/or having writable/executable memory.
It can't and I don't see any way to make it able to operate under such conditions in future. Is there a way to brand the binaries as excluded from PaX at build time without a special tool? If not would you be willing to submit a build system patch to detect the branding tool on PaX systems and use it on the relevant binaries automatically?
thanks -mike
Thanks for the great info. I'll CC this to wine-devel as I think it's of general interest, I hope you don't mind.
sure, no problem (let's hope i don't bounce by not being subscribed ;-).
http://lists.debian.org/debian-devel/2003/11/msg00206.html
in which the PaX author and Ingo Molnar who did exec-shield discuss the differences.
a better description of PaX is at http://pax.grsecurity.net/docs/ .
for a quick recap: PaX has techniques to prevent exploiting memory corruption bugs (i.e., the various exploit techniques, not the manifestation of bugs themselves). for practical reasons, we categorize these techniques into 3 sets, the first one being the abuse of the privilege of runtime code generation (this is what you mostly find in published exploits, "shellcode executed on the stack/heap/etc"). so PaX makes it possible to take away this privilege (and this obviously is in conflict with the apps that need it).
last year i proposed to make wine work with the strictest PaX settings (which would make running windows apps more secure than it is under windows itself, SP2 or not). this means that i don't mind if a windows app wants to generate code at runtime, but i'd like to make sure that the base wine runtime doesn't. yes, this is my itch to scratch and i don't actually expect you to do the work, but i'm definitely interested in your ideas and help. the rest of my comments are inline.
OK. I don't remember this thread I'm afraid
just for your reference, it was between Alexandre, you and me around last march, for some reason i thought it had gone to the list as well.
Presumably if WineHQ/CodeWeavers were to ship binary packages we'd have to do the same to work on such a distribution. But it's just an ELF flag right?
for historical reasons, there're 2 marks. the older one (controlled by chpax) abuses an otherwise (or yet) unused field in the ELF header, i personally don't encourage the use of it. the other one (controlled by paxctl) is based on a new ELF program header type called PT_PAX_FLAGS and requires a binutils patch (being specific to PaX and in direct conflict with PT_GNU_STACK, it has about 0 chance to end up upstream).
with that said, i don't think you should be worried about marking executables yourselves, better leave it to the various distros/users who are already aware of what changes PaX requires in some apps (and as i said, ideally wine would not require anything, more on this below).
We do this because some Windows programs and DLLs cannot cope with getting pointers >2gig, so we need to ensure that the kernel does not give us mappings above this point. The only way to do this currently is to do an iterative reservation to map as much of this address space as possible which is what you're seeing here.
my problem wasn't that you reserved all address space 'upstairs', but that the algo wine has can end up in an infinite loop (requesting kernel space memory is another thing, but given how non-existant is a decent API for learning about the address space layout in linux, i won't blame this on you ;-).
interestingly, i also have this need to have some control over the address space and have a plan to do this eventually. the idea is to introduce a new ELF program header (because this is the easiest to parse at execve() time in the kernel), that would describe the desired TASK_SIZE, stack placement and other info you might need. how something like this would fly with upstream (both binutils and the kernel), i can't tell however.
as you can see, wine insists on an address until it gets it, without using MAP_FIXED.
We have no choice in the matter, I think we can't use MAP_FIXED as that'd risk blowing away any mappings already made above the 2gig boundary. Actually this code was originally written to support the 4G/4G VM patch that was put into Fedora for a while (it's gone now).
you're right with the behaviour of MAP_FIXED, but then i still think that something is wrong with the allocation algo if it can loop forever (or i wasn't patient enough).
- there's at least /usr/lib/wine/ntdll.dll.so which is marked with an executable PT_GNU_STACK program header, suggesting that it needs an executable stack (or there's some build problem).
Last time I looked documentation on exactly what triggers this flag is scarce or non-existant.
some time ago when related problems showed up for grsecurity/PaX users, i posted my generic thoughts on the matter, you may find it useful too:
http://forums.grsecurity.net./viewtopic.php?t=673 http://forums.grsecurity.net./viewtopic.php?t=807
if you need more info, feel free to ask. for your particular question, it's always been about the compiler detecting the use of nested function trampolines (never inline asm and other ways of runtime code generation) and the linker erring on the side of backwards compatibility.
but I have no idea why gcc has decided it's needed now. If you look at ntdll in the sources:
http://source.winehq.org/source/dlls/ntdll/
It's fairly harmless, there is some assembly in there but I don't remember seeing any code which assumed an executable stack.
after reading my posts on the grsecurity forum, you'll see that this case is probably a false positive, stemming from an unmarked or badly marked .o file that ended up in the .so in question. it's a small detective work, i'll see what caused it on my box.
I'm afraid Wine cannot operate in an environment that doesn't allow us to map pages as executable and fill them with generated code. This technique is:
ok, this is the really interesting part for me, so let's discuss it a bit more.
a) Used by some Windows programs
this is fine and can't be helped of course. my hope is that most apps, or rather, the ones i'm personally interested in don't generate code runtime.
b) Used by the Wine DLL loader
what is this use exactly? can it be reworked to not require runtime code generation (remember that i'm willing to scratch my own itch, i just need some pointers)?
c) Required to implement DCOM universal interface proxies
required as in 'cannot be implemented any other way'? and same question as above of course. also, this feature doesn't sound like something that all windows apps need, so i wouldn't have a problem with allowing it (note that i'm talking about PaX based systems only, not wine in general, so no worries about me trying to force something on you) if i can run the rest of the apps with full PaX enabled.
So if PaX denies this as a matter of course then it will never work. Having read the thread with Ingo I must say I agree with him that runtime code generation is a legitimate technique and not a bug.
i've never sad it wasn't legitimate, just that it's a privilege and if we want protection from exploits, then we'd better revoke it from apps that don't need it (and which get it by default and cause security problems).
obviously my goal is to run as many apps as possible without this privilege (especially those that process untrusted user input, be that server daemons or client apps like mail clients, web browsers, etc). if there're fundamental reasons why wine cannot work without such a privilege, then so be it, it'll go the way of java on PaX systems and be exempted.
Service Pack 2 may well make the default process heap NX but it also has a huge infrastructure in place to deal with backwards compatibility concerns, including a large database of badly behaved apps, user-accessible GUIs to disable the protections and I believe it also has code to catch NX faults (on hardware that supports that) and ask the user if they wish to disable the protections for that application.
don't worry about the infrastructure of handling exceptions, it'll be the concern for distros (that use PaX), not wine developers. and it's easy to solve with two preloaders, with one allowed and the other denied the privilege of runtime code generation.
They have to be fixed otherwise the kernel may place them in the middle of a reserved area which would cause initialisation to fail. This cannot be changed.
i meant something like a new switch to configure that would let you specify a new compile/link time base address, that can surely be arranged...
It can't and I don't see any way to make it able to operate under such conditions in future. Is there a way to brand the binaries as excluded from PaX at build time without a special tool? If not would you be willing to submit a build system patch to detect the branding tool on PaX systems and use it on the relevant binaries automatically?
there's already such support in the binutils patch i mentioned above: 'ld -z execheap/-z noexecheap' (you'd use the former to mark the preloader which in turn would allow runtime code generation). post linking, you can use 'paxctl -spmr' (or the deprecated chpax) to disable all non-exec enforcement and randomization. but as i suggested at the beginning, this should not be your concern but that of the distro guys (and pending the questions i raised, it may very well turn out to be unnecessary, at least i'll try my best to make it happen).
On Thu, 2005-01-06 at 02:28 +0100, pageexec@freemail.hu wrote:
for historical reasons, there're 2 marks. the older one (controlled by chpax) abuses an otherwise (or yet) unused field in the ELF header, i personally don't encourage the use of it. the other one (controlled by paxctl) is based on a new ELF program header type called PT_PAX_FLAGS and requires a binutils patch (being specific to PaX and in direct conflict with PT_GNU_STACK, it has about 0 chance to end up upstream).
Hmm, OK. If you think PaX may be deployed in a mainstream distribution like Ubuntu by default being able to build with this marking is something we should remember.
my problem wasn't that you reserved all address space 'upstairs', but that the algo wine has can end up in an infinite loop (requesting kernel space memory is another thing, but given how non-existant is a decent API for learning about the address space layout in linux, i won't blame this on you ;-).
Yeah, scraping /proc/self files is silly but unfortunately quite common on Linux ... I'd love a decent VirtualQuery type API for other non-Wine stuff I've done if nothing else.
interestingly, i also have this need to have some control over the address space and have a plan to do this eventually. the idea is to introduce a new ELF program header (because this is the easiest to parse at execve() time in the kernel), that would describe the desired TASK_SIZE, stack placement and other info you might need. how something like this would fly with upstream (both binutils and the kernel), i can't tell however.
That would be good. My last experience with this sort of patch was trying to fix a bug in the kernel ELF loader though. John Reiser developed a patch which fixed the bug but it didn't compile on some architecture he didn't have access to and was dropped.
you're right with the behaviour of MAP_FIXED, but then i still think that something is wrong with the allocation algo if it can loop forever (or i wasn't patient enough).
That's something for Alexandre to comment on, I didn't write that code and never really looked at it.
if you need more info, feel free to ask. for your particular question, it's always been about the compiler detecting the use of nested function trampolines (never inline asm and other ways of runtime code generation) and the linker erring on the side of backwards compatibility.
OK. For your other email I think it'd be better to use the gas pseudo-op to add the .note.GNU-stack, that way we don't need configure checks for the ld flags.
ok, this is the really interesting part for me, so let's discuss it a bit more.
a) Used by some Windows programs
this is fine and can't be helped of course. my hope is that most apps, or rather, the ones i'm personally interested in don't generate code runtime.
b) Used by the Wine DLL loader
what is this use exactly? can it be reworked to not require runtime code generation (remember that i'm willing to scratch my own itch, i just need some pointers)?
Alexandre recently added a patch to generate DLL stubs at runtime as part of the process of removing them from the compile-time generated code. I think it's also used for +relay traces (where each API call is dumped as it's called).
Obviously +relay is only used in debugging so Wine could run without that. The stub block thing is a bit trickier, that patch could be reversed and Wine would still run correctly because we still generate stubs at compile-time too (iirc), but it's part of a policy of eliminating unnecessary stubs over time. So I'm not sure Alexandre would want to remove it.
c) Required to implement DCOM universal interface proxies
required as in 'cannot be implemented any other way'?
I'm not sure. These proxies are run-time generated objects. Essentially a DCOM universal/typelib marshaller proxy is a COM interface (so an array of function pointers) that when called marshal the arguments into a packet and dispatch it via the Windows RPC infrastructure. These proxies come in three forms:
- precompiled MIDL/C marshallers - precompiled MOPs (these are a custom bytecode language fed to a VM which does the marshalling) - generated at runtime from type libraries (databases which describe the types and interfaces used in a program)
I don't know if the second needs runtime code geneation but I don't see any way we can avoid generating code for the third at runtime.
and same question as above of course. also, this feature doesn't sound like something that all windows apps need,
I'm afraid InstallShield needs it, and InstallShield is a "gateway" app, ie if it doesn't run lots of other programs also don't run (because you can't install them).
so i wouldn't have a problem with allowing it (note that i'm talking about PaX based systems only, not wine in general, so no worries about me trying to force something on you) if i can run the rest of the apps with full PaX enabled.
Yes the trick is, how do you know when it's needed or not? I'm not sure.
i've never sad it wasn't legitimate, just that it's a privilege and if we want protection from exploits, then we'd better revoke it from apps that don't need it (and which get it by default and cause security problems).
OK this opinion is fair enough, if a little unusual.
obviously my goal is to run as many apps as possible without this privilege (especially those that process untrusted user input, be that server daemons or client apps like mail clients, web browsers, etc). if there're fundamental reasons why wine cannot work without such a privilege, then so be it, it'll go the way of java on PaX systems and be exempted.
I think some programs could run on a hacked and Wine but the problem is when InstallShield fails in a mysterious way how do you know it's PaX and not some other problem? I don't see how to communicate that to the user in a useful manner ...
They have to be fixed otherwise the kernel may place them in the middle of a reserved area which would cause initialisation to fail. This cannot be changed.
i meant something like a new switch to configure that would let you specify a new compile/link time base address, that can surely be arranged...
Ah yes that could work though you'd have to test the resulting binaries to ensure there were no conflicts (the reserved regions are stored in a table).
there's already such support in the binutils patch i mentioned above: 'ld -z execheap/-z noexecheap' (you'd use the former to mark the preloader which in turn would allow runtime code generation). post linking, you can use 'paxctl -spmr' (or the deprecated chpax) to disable all non-exec enforcement and randomization. but as i suggested at the beginning, this should not be your concern but that of the distro guys (and pending the questions i raised, it may very well turn out to be unnecessary, at least i'll try my best to make it happen).
OK, thanks for the feedback. I have to admit I'm not keen on leaving much up to the distribution guys, in the past they've typically shown little to no concern about keeping Wine working (probably because due to our ridiculously fast release cycle we're never shipped as part of their base sets).
thanks -mike
Mike Hearn wrote:
On Thu, 2005-01-06 at 02:28 +0100, pageexec@freemail.hu wrote:
c) Required to implement DCOM universal interface proxies
required as in 'cannot be implemented any other way'?
I'm not sure. These proxies are run-time generated objects. Essentially a DCOM universal/typelib marshaller proxy is a COM interface (so an array of function pointers) that when called marshal the arguments into a packet and dispatch it via the Windows RPC infrastructure. These proxies come in three forms:
- precompiled MIDL/C marshallers
- precompiled MOPs (these are a custom bytecode language fed to a VM
which does the marshalling)
- generated at runtime from type libraries (databases which describe
the types and interfaces used in a program)
I don't know if the second needs runtime code geneation but I don't see any way we can avoid generating code for the third at runtime.
Actually, we shouldn't be generating assembly code on the fly. If you have more than say 16 proxies in a process then it is actually cheaper in terms of memory usage and cache locality to have a set of compiled entry points that can be shared by all proxies. It is even better if you consider the fact that we shouldn't be allocating the memory for the code from the heap, but should be requesting an executable page of memory for each.
Just for the record, PaX and execshield are trying to solve problems that are much better solved by other methods that don't break backwards compatibility. One of the best methods is introducing a terminator canary value between the return address and variables stored on the stack. Obviously, this requires compiler support (which GCC currently lacks, I believe), but it has worked wonders for Microsoft in SP2. It even prevents exploits that PaX/execshield can't, like "return to libc" where the return address is overwritten by the address of another function so that execution jumps into that function.
Rob
On Thu, 2005-01-06 at 15:47 -0600, Robert Shearman wrote:
Actually, we shouldn't be generating assembly code on the fly. If you have more than say 16 proxies in a process then it is actually cheaper in terms of memory usage and cache locality to have a set of compiled entry points that can be shared by all proxies.
Yeah maybe, I'm not sure how Windows does it. From reading the comments in rpcrt4 it looks like they do have pre-compiled proxy entrypoints. Begs the question of what happens when there are more interface entry points than pre-compiled marshaller entrypoints of course. Maybe they can do both?
It is even better if you consider the fact that we shouldn't be allocating the memory for the code from the heap, but should be requesting an executable page of memory for each.
We don't allocate these from the process heap, we use a VirtualAlloc to get executable memory. I fixed that a while ago.
At the moment that's 1 page per set of proxy thunks (IMHO we shouldn't call them "asm stubs" like the code does, that's a recipe for permanent brain lesions at the moment), but we could easily switch it back to using HeapAlloc with a typelib marshaller specific heap created like so:
HeapCreate(HEAP_CREATE_ENABLE_EXECUTE, 1024 * 4, 1024 * 64);
Just for the record, PaX and execshield are trying to solve problems that are much better solved by other methods that don't break backwards compatibility. One of the best methods is introducing a terminator canary value between the return address and variables stored on the stack.
This can't do everything. It protects against return-to-libc style attacks but there are other stack based attacks that it doesn't work for (if I remember correctly).
It also has the obvious disadvantage of not working with older binaries too, though given how much these sorts of technologies break .....
Obviously, this requires compiler support (which GCC currently lacks, I believe),
gcc4 contains the ProPolice/SSP patches from IBM which can produce stack canaries. It's a looming binary portability problem unfortunately, the location it jumps to in the case of a stack smash is a symbol in glibc, so it's pretty likely that if you compile binaries with it they won't work everywhere. At least not for a very long time.
I might be jumping to conclusions, and anyway if that really is the case I'd be interested in writing a patch to gcc to release it from the glibc dependency. It'd be nice to use it in Crossover.
It even prevents exploits that PaX/execshield can't, like "return to libc" where the return address is overwritten by the address of another function so that execution jumps into that function.
That's why the DSOs and binary load addresses are randomised, you can't do a return to libc attack if you don't know where libc is. You can guess of course but that doesn't let you write a worm that propogates at will.
thanks -mike
This can't do everything. It protects against return-to-libc style attacks but there are other stack based attacks that it doesn't work for (if I remember correctly).
I.e. for C++ apps you could instead change the implicit this argument of the caller, for example, to point to a made-up instance with a pointer to a made-up vtable with function pointers to your own stuff :)
Cheers, Kuba Ober
Actually, we shouldn't be generating assembly code on the fly. If you have more than say 16 proxies in a process then it is actually cheaper in terms of memory usage and cache locality to have a set of compiled entry points that can be shared by all proxies. It is even better if you consider the fact that we shouldn't be allocating the memory for the code from the heap, but should be requesting an executable page of memory for each.
i totally agree with you on this one ;-), but the feasibility of this depends on how dynamic these proxies have to be, i.e., is enough information known at compile/link time to create them in a read-only section or not. for example, the gcc nested function trampolines by their nature must be runtime generated (instantiated) as they need to be both addressable (callable through a normal function pointer) and contain per-instance information - if these proxies have similar constraints, then it's just not possible.
[the following is off-topic i think, if you want to discuss it further, feel free to email me in private, i'll be happy to explain everything]
Just for the record, PaX and execshield are trying to solve problems that are much better solved by other methods that don't break backwards compatibility.
and i completely disagree with you on this one ;-). what PaX and similar approaches solve is not the manifestation of bugs, but rather they put a stop to exploit techniques. the sets of bugs is orthogonal to the sets of exploit techniques, one kind of bug (say a strcpy() based stack overflow) can be exploited by different exploit techniques (say shellcode injection or a forced return into system()). and a given exploit technique say shellcode injection) can be used to exploit different kinds of bugs (say stack and heap overflows, or user supplied format string bugs).
PaX & Co prevent certain exploit techniques from working, canary based systems prevent certain bugs from becoming exploitable (i.e., these methods attack the problem from two different dimensions and have a different, not necessarily comparable coverage of the problem space).
One of the best methods is introducing a terminator canary value between the return address and variables stored on the stack.
it's not only not the best method, but it's a rather limited one in its scope as well. for one, canary solutions can detect simple linear overflow bugs only (and for your case, only on the stack, not on the heap or elsewhere), they won't detect non-linear ones, format string bugs, stale pointer uses, or pretty much any other kind of memory corruption bugs.
and protecting the return address is useless alone if you're not protecting other pointers (or even normal data) in the stack frames, there's 'literature' on how to exploit overwritten frame pointers, function arguments, etc (search the phrack magazine for details).
Obviously, this requires compiler support (which GCC currently lacks, I believe)
mainline gcc does, however both StackGuard and SSP are available as patches. these days quite a few systems (OpenBSD, Adamantix, Gentoo, etc) are using SSP in their base system. there's also talk about incorporating SSP into mainline gcc, but it'll be a longer process.
but it has worked wonders for Microsoft in SP2.
i don't know where you got that from, but it hasn't done much good at all, there're ways to exploit even normal stack overflows that /GS was supposed to prevent (i think David Litchfield posted one on bugtraq or his site), not to mention other kinds of memory corruption bugs which /GS does nothing against. hint: the MS /GS canary doesn't protect the function pointer in the SEH registration record (and the 16 bit randomness in the canary itself is not impossible to brute force either).
It even prevents exploits that PaX/execshield can't, like "return to libc" where the return address is overwritten by the address of another function so that execution jumps into that function.
/GS doesn't prevent such exploits, it makes them succeed with a small probability only (1 in 65534 or so), which PaX and others also achieve by virtue of randomizing the address space layout (and that is of course much more generic than a mere random canary, it helps to make all kinds of exploits harder, not only linear stack overflows). as for the true protection against ret2libc style exploits, check out the doc on the future plans for PaX (http://pax.grsecurity.net/docs/pax-future.txt).
On Fri, 2005-01-07 at 13:46 +0100, pageexec@freemail.hu wrote:
i totally agree with you on this one ;-), but the feasibility of this depends on how dynamic these proxies have to be, i.e., is enough information known at compile/link time to create them in a read-only section or not.
The thunks we're talking about push an index onto the stack and then call a function, so technically we could pre-generate lots of thunks at compile time that do this but it'd put an upper limit on the number of vtable entries you could have which I'm not keen on.
They might also push an object ptr as well, I don't remember ...
http://source.winehq.org/source/dlls/ntdll/
It's fairly harmless, there is some assembly in there but I don't remember seeing any code which assumed an executable stack.
i've looked at it and as i suggested yesterday, it's a false positive. what happens here is that relay32.s doesn't emit a .note.GNU-stack section at all, which when linked together with other .o files that do, will result in an executable PT_GNU_STACK program header. the solution is to add
.section .note.GNU-stack,"",@progbits
to relat32.s and it'll be ok. alternatively, you can forcefully assemble with --noexecstack or link the .so with '-z noexecstack' (or -Wl,-z,noexecstack for gcc) which will override the .note.GNU-stack markings.