https://bugs.winehq.org/show_bug.cgi?id=29685
Damjan Jovanovic damjan.jov@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |damjan.jov@gmail.com
--- Comment #6 from Damjan Jovanovic damjan.jov@gmail.com --- That xs:pattern snippet can be used to reproduce this bug with the command line "xmllint" tool. Here's a quickly cobbled together example.
x.xml:
---snip--- <?xml version="1.0"?> <note xmlns="https://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="file:///tmp/x.xsd"> <strTableRef>$(string.a)</strTableRef> </note> ---snip---
/tmp/x.xsd: (uncommented line broken, commented line below it working)
---snip--- <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="https://www.w3schools.com" xmlns="https://www.w3schools.com" elementFormDefault="qualified"> <xs:element name="note"> xs:complexType xs:sequence <xs:element name="strTableRef"> xs:simpleType <xs:restriction base="xs:string"> <xs:pattern value="($([Ss]tring..*))|($([Mm][Cc]..*))"/> <!-- <xs:pattern value="($\([Ss]tring\..*\))|($\([Mm][Cc]\..*\))"/> --> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> ---snip---
And to test:
---snip--- $ xmllint --schema /tmp/x.xsd x.xml --noout regexp error : failed to compile: Wrong escape sequence, misuse of character '' regexp error : failed to compile: internal: no atom generated regexp error : failed to compile: generate transition: atom == NULL regexp error : failed to compile: xmlFAParseAtom: expecting ')' regexp error : failed to compile: xmlFAParseRegExp: extra characters x.xsd:13: element pattern: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema%7Dpattern': The value '($([Ss]tring..*))|($([Mm][Cc]..*))' of the facet 'pattern' is not a valid regular expression. WXS schema x.xsd failed to compile ---snip---
Swap the commented and uncommented lines in /tmp/x.xsd around, and:
---snip--- $ xmllint --schema /tmp/x.xsd x.xml --noout x.xml validates ---snip---
Note the problem: MSXML is apparently ok with "$" in the regex, but to libxml2 that's an error, it never allows "$" after a "".
As per Appendix F of https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#regexs it seems that "$" really shouldn't be allowed, but MSXML allows it anyway; ie. MSXML uses a non-conforming regex dialect where additional characters are allowed.
Bug #43581 has the same issue with various "\u####" regex sequences and could be considered a duplicate of this one.
While for this we could make a regex parser that rewrites MSXML's dialect into libxml2's, regex in XML sadly doesn't end at schema validation, eg. XSLT 2 uses it as well (https://www.xml.com/pub/a/2003/06/04/tr.html). If MSXSML uses the same regex dialect for other things, and we can't change that regex in transit between its origin and libxml2, then we may need to ship a private fork of libxml2 patched to use MSXML's dialect internally.