This is because every call involving a kernel object handle is done via RPC to the wineserver process.
The semantics of things like DuplicateHandle, and all of the various types of waitable kernel objects, need to be reproduced exactly. Even in the case of a single object used by a single thread, in order to optimize out the wineserver call you'd have to somehow be sure no one had duplicated that object into another process. Or you'd have to give the wineserver enough information to duplicate it while letting you wait for/manipulate the object without an RPC call.
So, I don't know that it's necessarily a fundamental architecture problem, but there's a lot you have to think about. And I can't recommend taking on a project like this to a new Wine developer.