Incorrect Result When ANTLR4 Lexer Action Invokes getText() -

- May 15, 2011

it seems gettext() in lexer action cannot retrieve token being matched correctly. normal behaviour? example, part of grammar has these rules parsing c++ style identifier support \u sequence embed unicode characters part of identifier name:

grammar cppdefine; cppcompilationunit: (id_token|all_other_symbol)+ eof; id_token:identifier //{system.out.println($text);} ; crlf: '\r'? '\n' -> skip;  all_other_symbol: '\\'; identifier: (nondigit (nondigit | digit)*)    {system.out.println(gettext());} ; fragment digit: [0-9]; fragment nondigit: [_a-za-z]  | universal_character_name ; fragment universal_character_name: ('\\u' hex_quad  | '\\u' hex_quad hex_quad ) ; fragment hex_quad: [0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f];

tested 1 line input containing identifier incorrect unicode escape sequence:

dkk\uzzzz

the $text of id_token parser rule action produces correct result:

dkk uzzzz

i.e. input interpreted 2 identifiers separated symbol '\' (symbol '\' not printed parser rule).

however, gettext() of identifier lexer rule action produces incorrect result:

dkk\u uzzzz

why lexer rule identifier's gettext() different parser id_token rule's $text. afterall, parser rule contains lexer rule?

edit:

issue observed in antlr4.1 not in antlr4.2 have been fixed already.

it's hard tell based on example, instinct using old version of antlr. unable reproduce issue in antlr 4.2.

Search This Blog

Sp

Incorrect Result When ANTLR4 Lexer Action Invokes getText() -

Comments

Post a Comment

Popular posts from this blog

c++11 - Intel compiler and "cannot have an in-class initializer" when using constexpr -

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -

rest - Spring boot: Request method 'PUT' not supported -