Incorrect Result When ANTLR4 Lexer Action Invokes getText() -
it seems gettext() in lexer action cannot retrieve token being matched correctly. normal behaviour? example, part of grammar has these rules parsing c++ style identifier support \u sequence embed unicode characters part of identifier name:
grammar cppdefine; cppcompilationunit: (id_token|all_other_symbol)+ eof; id_token:identifier //{system.out.println($text);} ; crlf: '\r'? '\n' -> skip; all_other_symbol: '\\'; identifier: (nondigit (nondigit | digit)*) {system.out.println(gettext());} ; fragment digit: [0-9]; fragment nondigit: [_a-za-z] | universal_character_name ; fragment universal_character_name: ('\\u' hex_quad | '\\u' hex_quad hex_quad ) ; fragment hex_quad: [0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f];
tested 1 line input containing identifier incorrect unicode escape sequence:
dkk\uzzzz
the $text of id_token
parser rule action produces correct result:
dkk uzzzz
i.e. input interpreted 2 identifiers separated symbol '\' (symbol '\' not printed parser rule).
however, gettext() of identifier lexer rule action produces incorrect result:
dkk\u uzzzz
why lexer rule identifier
's gettext() different parser id_token
rule's $text. afterall, parser rule contains lexer rule?
edit:
issue observed in antlr4.1 not in antlr4.2 have been fixed already.
it's hard tell based on example, instinct using old version of antlr. unable reproduce issue in antlr 4.2.
Comments
Post a Comment