On Fri May 1 16:36:51 2026 +0000, Aric Stewart wrote:
Ok that case is very interesting. It is the `BENGALI VOWEL SIGN AU ৌ U+09CC`. It decomposes to ` ৌ <U+09C7, U+09D7>`. There is nice information here https://util.unicode.org/UnicodeJsps/character.jsp?a=09CC Looking at uniscribe, we are properly decomposing: ``` 01a0:trace:uniscribe:ScriptGetCMap (0000000009010074,00000000000389C0,L"\09cc\09cc\09cc\09cc\09cc",5,0x0,0000 00000082F750)` ... 01a0:trace:uniscribe:ContextualShape_Bengali New composed string L"\09c7\09d7\09c7\09d7\09c7\09d7\09c7\09d7\0 9c7\09d7" (10) 01a0:trace:uniscribe:debug_output_string MmMpMmMpMmMpMmMpMmMp ``` That is a string of pre-Matras and post-Matras. What we are doing now is we identify each Mm and Mp as an incomplete syllable and my new code I putting the invalid character mark resulting in the string attached as an image. MacOs's textpad does the same thing actaully. However that native uniscribe, and Chrome both group all the Mm together at the beginning with a single invalid character mark and then all the Mp at the end. So there is quite a bit of re-ordering happening there that is not happening now. I think this is out of the scope of this bug and should be included in a new bug. I would also be curious about the behavior of U+09cc in other situations. Is it being properly shaped in string where it is being used correctly? I do not know Bengali so I am not sure how to find it. This string `মৌমাছি` appears to be `u+09ae u+09cc u+09ae u+09be u+099b u+09bf` and a quick visual inspection seems to show it being shaped correctly. hmm, it looks like harfbuzz also groups all Mm at the beginning and all Mp at the end with the dotted circle in the middle.
``` hb-view /usr/share/fonts/TiroIndigo-otf/TiroBangla-Regular.otf "ৌৌৌ" --output-format=png --output-file=test.png ``` {width=716 height=325} This can be also viewed using `hb-shape` ```
hb-shape /usr/share/fonts/TiroIndigo-otf/TiroBangla-Regular.otf "ৌৌৌ"
[bSignE.init=0+396|bSignE=0+405|bSignE=0+405|BASE=0+724|bAuMark=0+247|bAuMark=0+247|bAuMark.fina=0+247] ``` Where bSignE = Mm, bAuMark = Mp So this does not seem like a bug. The bug I was talking about was when U+09CC is preceded by a space, a dotted circle was inserted at the start of the glyph. Are you aware of this? {width=486 height=90} Looking closely at the second line, I can see that the glyph is decomposed here, evident by the fact that Mm has a top line.
I would also be curious about the behavior of U+09cc in other situations. Is it being properly shaped in string where it is being used correctly? I do not know Bengali so I am not sure how to find it.
This string `মৌমাছি` appears to be `u+09ae u+09cc u+09ae u+09be u+099b u+09bf` and a quick visual inspection seems to show it being shaped correctly.
Yes I can see that it is being shaped properly. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10704#note_138556