Re: [PATCH v3 1/2] kernelbase/locale: Implement comparison on top of official unicode weight tables

4 Mar 2020


      Hello Alexandre,
...
Multi-language support, Japanese, Korean, multi-char sequences,
surrogates, linguistic mappings, etc.
There are a million things that need to be supported for proper
sorting. You don't have to implement them all, but it should be clear
from your approach that they can be added. Which in practice means you
need to at least prototype most of them.
Well, they can be added, it's just that I left them out for the initial
versions...
Short breakdown:
- Multi-language: The character is looked up the current language, as a
fallback the default is used. Currently, only the default is implemented
- Japanese: Main reason why I did all of this. Special case, but supported by
the tables.
- Korean: Handled under Jamo. Special case, but supported by the tables.
Currently not properly implemented by me because it's a lot of work
- Multi-char sequences: You man when a single codepoint is encoded as more
than one WCHAR? Is supported, windows seems to treat each WCHAR separately
- Surrogates: Windows seems to treat each WCHAR on their own
- Linguistic mappings: Not sure what you mean, sorry
Question: How should I prove it works? I can't possible add all of that in the
first draft.
...
For instance you do 10 memory allocations before even starting to
compare anything. That's clearly not cheap.
I understand. But for a dynamic sized sortkey I need to have dynamic buffers.
Maybe I could put the initial buffers on the stack?
...
We only have tests for a very small number of strings, that's clearly
not proper coverage. Some way of systematically generating test strings
should be considered.
Like, random strings from a known seed? I intentionally didn't do that,
because of performance concerns.
...
Also testing sort keys directly, like you did in
the first try (but without depending on the exact values).
I've that planned, yes. Do you want that in the first version already?
...
When there are differences between Windows versions we want to use the
latest, since that's the one that will continue to work in the
future. In this case it means using the most recent table.
Okay then. If that's important, I can change the table.
...
Note that we most likely want to use a Windows-compatible NLS file, like
we are now using for codepage or normalization tables. I can work on
that part.
I have to admit, I don't know what you mean by that. I don't know about NLS
files.
Regards,
Fabian Maurer

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH v3 1/2] kernelbase/locale: Implement comparison on top of official unicode weight tables