Hello Zebediah,
Yes, you'd have to iterate over the string multiple times. However, that's a fixed, small number. I wouldn't be concerned about performance until we find evidence that performance is a concern.
If you say so. I'll see what I can come up with.
I don't think you need to write it such that you switch over which weight to write, either. It's probably easier just to make them separate functions, like in my example below.
That would lead to quite a bunch of code duplication, especially when we come to the more difficult to handle characters. I'd like to avoid that. A switch for which weight to write should be a lot simpler.
Also keep in mind, that we need to have some temporary buffer anyways, since weights that are very small only get added when there is a bigger one following it. Either that, or we need to backtrack and remove weights again. Due to those issues, I opted for the current approach.
Can you please give an example of this? I'm not sure I see it in any of your patches.
It's in patch 1, inside APPEND_LIST_TO_SORTKEY, see 'statement_is_ignored'. And AFAIK not only diacritics are affected. Not sure how I would best handle that when writing directly to the sortkey.
Generally, you can continue to calculate size but only write when there's space in the buffer. When I've written functions like this I've found it to be relatively simple, though maybe there's some details here that I'm missing...
Well, there multiple places where we'd need to check. But maybe it'll work out, just seemed more complex to me.
Is there a reason to use LCMapStringEx() here rather than LCMapString()?
LCMapStringA/W completely ignore the locale, and use a LCID. We'd need to fix those functions to properly convert the locale. To start simple, I decided to use the more low-level function and once that works, built the rest on top.
It seems a simple enough fix; just call LCIDToLocaleName().
Yeah, probably. But I'd rather get the low-level stuff working first. Then I can write tests for the higher level functions and then fix them.
Generally I believe that explicitly specifying "struct" is preferred as it makes clear what kind of object you're dealing with.
Okay then.
Regards, Fabian Maurer