This is part XXIX of the rewrite of cmd engine.
It's time to tackle the lexer... So far, I left it mainly untouched. As it's a &#:!_%% piece of code. It maintains (or actually tries to maintain) a state machine with a dozen of boolean variables. Which makes the code quickly unreadable... Not speaking of changing it...
[ Exercice for the reader: think of the boolean variables as a set of ] [ binary digits, which is the base two representation of the state number. ] [ Rewrite the code using a single state number in order to get rid of ] [ all the boolean variables. ] [ -- Good luck. -- ]
There's a small amount of known bugs in the lexer (some in bugzilla, some I got from direct reports -- thanks Hans --, others from local testings).
This is the first MR (out of 3) to go for that lexer rewrite.
Basically, it's done with: - reusing the already parsed token stack to get back to the state for lexer, - reducing leaves directly (tokens for which we can from first character(s) work on end condition) instead of handling every character in the state machine, - factorizing (eg end of line was handled at two different places, needless to say there "slight" differences in the two parts).
The good news: LoC for lexer (after third MR) is reduced by 30% and fixes most of bugzilla entries related to cmd's lexer. The bad news: wait for bug reports.