This is slightly different from dlltool but I think it should be compatible. The transition is done by first replacing dlltool with its bugs, and fixing them in separate changes.
I based the ARM / AARCH64 implementation on the existing code around, but I have no idea if it is correct, and dlltool also doesn't include any delay load implementation for ARM.