This does not look right:
+int _mbsbtype(const unsigned char* mbstr, size_t count) { + const unsigned char* str; + const unsigned char* start = mbstr; + + str = mbstr + count; + + /** from _ismbslead */ + if (MSVCRT___mb_cur_max > 1) + { + while (start < str) { + if (!*start) { + return _MBC_ILLEGAL; + } + start += MSVCRT_isleadbyte(*str) ? 2 : 1;
This should probably be "start += MSVCRT_isleadbyte(*start) ? 2 : 1", BUT...
+ } + + } + if (!*str) { /** TODO: check *str validity */ + return _MBC_ILLEGAL; + } + if (start == str && MSVCRT_isleadbyte(*str)) { + return _MBC_LEAD; + }
This is not safe, because values used for a lead byte can also be used for a trailing byte - indeed with your loop (corrected as above), it seems that "start" could never be pointing to a trailing byte as you skip over trailing bytes.
+ if (start == str && MSVCRT_isleadbyte(str[-1])) { + return _MBC_TRAIL; + } + + return _MBC_SINGLE;
Try this: if (MSVCRT___mb_cur_max > 1) { while (start < str) { if (!*start) { return _MBC_ILLEGAL; } if (MSVCRT_isleadbyte(*start)) { if (str == start) return _MBC_LEAD; else if (str == start + 1) return _MBC_TRAIL; start += 2; } else { if (str == start) return _MBC_SINGLE; ++start; } } } return _MBC_ILLEGAL; The tests should probably test for this - of course the test will only work if the test knows which code page is set.