This does not look right:
+int _mbsbtype(const unsigned char* mbstr, size_t count) {
- const unsigned char* str;
- const unsigned char* start = mbstr;
- str = mbstr + count;
- /** from _ismbslead */
- if (MSVCRT___mb_cur_max > 1)
- {
- while (start < str) {
if (!*start) {
- return _MBC_ILLEGAL;
}
start += MSVCRT_isleadbyte(*str) ? 2 : 1;
This should probably be "start += MSVCRT_isleadbyte(*start) ? 2 : 1", BUT...
- }
- }
- if (!*str) { /** TODO: check *str validity */
- return _MBC_ILLEGAL;
- }
- if (start == str && MSVCRT_isleadbyte(*str)) {
- return _MBC_LEAD;
- }
This is not safe, because values used for a lead byte can also be used for a trailing byte - indeed with your loop (corrected as above), it seems that "start" could never be pointing to a trailing byte as you skip over trailing bytes.
- if (start == str && MSVCRT_isleadbyte(str[-1])) {
- return _MBC_TRAIL;
- }
- return _MBC_SINGLE;
Try this:
if (MSVCRT___mb_cur_max > 1) { while (start < str) { if (!*start) { return _MBC_ILLEGAL; } if (MSVCRT_isleadbyte(*start)) { if (str == start) return _MBC_LEAD; else if (str == start + 1) return _MBC_TRAIL; start += 2; } else { if (str == start) return _MBC_SINGLE; ++start; } } } return _MBC_ILLEGAL;
The tests should probably test for this - of course the test will only work if the test knows which code page is set.