On Thu Apr 30 17:16:16 2026 +0000, Aric Stewart wrote:
Ok I loaded a string on my windows machines and see that if I repeat the above string twice I get 'ৌৌ' -> 5 glyphs, 2 leading, 1 invalid glyph mark and 2 trailing. That shaping may be far more complicated than is addressed in this bug but I will investigate. Ok that case is very interesting. It is the `BENGALI VOWEL SIGN AU ৌ U+09CC`. It decomposes to ` ৌ <U+09C7, U+09D7>`. There is nice information here https://util.unicode.org/UnicodeJsps/character.jsp?a=09CC
Looking at uniscribe, we are properly decomposing: ``` 01a0:trace:uniscribe:ScriptGetCMap (0000000009010074,00000000000389C0,L"\09cc\09cc\09cc\09cc\09cc",5,0x0,0000 00000082F750)` ... 01a0:trace:uniscribe:ContextualShape_Bengali New composed string L"\09c7\09d7\09c7\09d7\09c7\09d7\09c7\09d7\0 9c7\09d7" (10) 01a0:trace:uniscribe:debug_output_string MmMpMmMpMmMpMmMpMmMp ``` That is a string of pre-Matras and post-Matras. What we are doing now is we identify each Mm and Mp as an incomplete syllable and my new code I putting the invalid character mark resulting in the string attached as an image. MacOs's textpad does the same thing actaully. However that native uniscribe, and Chrome both group all the Mm together at the beginning with a single invalid character mark and then all the Mp at the end. So there is quite a bit of re-ordering happening there that is not happening now. I think this is out of the scope of this bug and should be included in a new bug. I would also be curious about the behavior of U+09cc in other situations. Is it being properly shaped in string where it is being used correctly? I do not know Bengali so I am not sure how to find it. This string `মৌমাছি` appears to be `u+09ae u+09cc u+09ae u+09be u+099b u+09bf` and a quick visual inspection seems to show it being shaped correctly. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10704#note_138548