This is part XXIX of the rewrite of cmd engine.
It's time to tackle the lexer...
So far, I left it mainly untouched. As it's a &#:!\_%% piece of code.
It maintains (or actually tries to maintain) a state machine with a dozen
of boolean variables. Which makes the code quickly unreadable...
Not speaking of changing it...
[ Exercice for the reader: think of the boolean variables as a set of ]
[ binary digits, which is the base two representation of the state number. ]
[ Rewrite the code using a single state number in order to get rid of ]
[ all the boolean variables. ]
[ -- Good luck. -- ]
There's a small amount of known bugs in the lexer (some in bugzilla, some
I got from direct reports -- thanks Hans --, others from local testings).
This is the first MR (out of 3) to go for that lexer rewrite.
Basically, it's done with:
- reusing the already parsed token stack to get back to the state for lexer,
- reducing leaves directly (tokens for which we can from first character(s)
work on end condition) instead of handling every character in the state
machine,
- factorizing (eg end of line was handled at two different places,
needless to say there "slight" differences in the two parts).
The good news: LoC for lexer (after third MR) is reduced by 30% and fixes
most of bugzilla entries related to cmd's lexer.
The bad news: wait for bug reports.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/7742