Most likely what you want is do a LoadLibrary in all cases and don't bother to cache anything, the loader will take care of the refcount.
The reason I cache is that the number of LoadLibrary calls is potentially unknown. Each function can have its own DLL registered in the registry. In my previous implementation you rejected, I enforce that a SIP exists entirely within one DLL, as that's what CryptSIPAddProvider allows. By doing so, I know how many LoadLibrary calls I've done - one for each SIP.
If I 1. allow a separate dll per function, as the registry allows, and 2. don't cache the dll names myself, I guess I must store the HMODULE for each function? The cache allows me to avoid that.
In this implementation, I cache each DLL, so that I only do one LoadLibrary per dll name. This has less memory impact than the straw man approach I think you suggest, and is (I guess) more readable than the one-dll-per-SIP approach.
Am I still missing something? Thanks, --Juan