Hello Alexandre,
I don't see any language support, there's just one big sortkey table. Yes, that's what the current code is doing too, but if we are rewriting it, we should get the architecture right.
Yeah, there's no language support yet. I just noted how it's done, but for the first patch it's not implemented yet.
I mean when multiple chars map to one sortkey. The COMPRESSION sections in the Microsoft table.
Well, I didn't implemented that yet, but it can be done.
- Linguistic mappings: Not sure what you mean, sorry
NORM_LINGUISTIC_CASING and the like.
I see, same answer then.
Question: How should I prove it works? I can't possible add all of that in the first draft.
The usual way is to add a bunch of tests with todo_wine, and then send a patch series with each patch removing the corresponding todos.
I know, but for this patchset that doesn't prove that it can be done. It would only prove that once I submit the patch for that, no? Or do you want me to submit all at once?
We only have tests for a very small number of strings, that's clearly not proper coverage. Some way of systematically generating test strings should be considered.
Like, random strings from a known seed? I intentionally didn't do that, because of performance concerns.
Not necessarily random, but some interesting data. For instance the normalization tests can run the entire test suite from unicode.org, you may be able to find something similar. Or build your own somehow.
Well, I added what I consider to be interesting. A few testcases for the bits of code I implemented, to have as complete coverage as possible. Not sure what you'd consider interesting data, or where I'd find it. According to the algorithm I implemented, I already cover the corner cases. What more is needed? I could certainly add a bunch of random strings though.
Also testing sort keys directly, like you did in the first try (but without depending on the exact values).
I've that planned, yes. Do you want that in the first version already?
The tests should come before the code, or at the same time.
That's how I planned it - but not in the first version. As I said, I planned to split everything into manageable sizes. I could add all possible tests in one huge patch, but there's no real benefit to that.
In short, I think the main problem here is that I want to split the implementation into multiple patches. I planned to add functionality one-by- one, covered with as many tests as I needed to give the code near full coverage. Is that a bad approach? If you want proof that the functionality can be added, please tell me how.
Regards, Fabian Maurer