Fabian Maurer dark.shadow4@web.de writes:
Hello Alexandre,
thanks for your reply.
It's going to need more work
No problem, if you tell me what needs improvement.
it's not clear how you are going to implement the remaining features with
your approach
What features exactly are you referring to?
Multi-language support, Japanese, Korean, multi-char sequences, surrogates, linguistic mappings, etc.
There are a million things that need to be supported for proper sorting. You don't have to implement them all, but it should be clear from your approach that they can be added. Which in practice means you need to at least prototype most of them.
It also looks very inefficient
Why inefficient? We can't just compare character for character, since the sortkey values don't always align like that. But then again, most strings should get the "early exit", which is a lot cheaper.
For instance you do 10 memory allocations before even starting to compare anything. That's clearly not cheap.
I'd suggest to concentrate on the tests first.
Well, the tests should already cover what's necessary. What do you think is missing?
We only have tests for a very small number of strings, that's clearly not proper coverage. Some way of systematically generating test strings should be considered. Also testing sort keys directly, like you did in the first try (but without depending on the exact values).
Also as already mentioned, you should be working on the latest (Win10)
table, not the one from Win7.
Might have missed that, where was that? As i understood from your last mail, the code needs to test and give the correct results for all windows version, especially win10. But why do we care about which tables are used as long as the results are the same on all windows versions? The table version is only relevant when it comes to differences between windows versions. The Win7 table was easier to handle, so I picked that one.
When there are differences between Windows versions we want to use the latest, since that's the one that will continue to work in the future. In this case it means using the most recent table.
Note that we most likely want to use a Windows-compatible NLS file, like we are now using for codepage or normalization tables. I can work on that part.