Incorrect Result When ANTLR4 Lexer Action Invokes getText() -


it seems gettext() in lexer action cannot retrieve token being matched correctly. normal behaviour? example, part of grammar has these rules parsing c++ style identifier support \u sequence embed unicode characters part of identifier name:

grammar cppdefine; cppcompilationunit: (id_token|all_other_symbol)+ eof; id_token:identifier //{system.out.println($text);} ; crlf: '\r'? '\n' -> skip;  all_other_symbol: '\\'; identifier: (nondigit (nondigit | digit)*)    {system.out.println(gettext());} ; fragment digit: [0-9]; fragment nondigit: [_a-za-z]  | universal_character_name ; fragment universal_character_name: ('\\u' hex_quad  | '\\u' hex_quad hex_quad ) ; fragment hex_quad: [0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f] [0-9a-fa-f]; 

tested 1 line input containing identifier incorrect unicode escape sequence:

dkk\uzzzz 

the $text of id_token parser rule action produces correct result:

dkk uzzzz 

i.e. input interpreted 2 identifiers separated symbol '\' (symbol '\' not printed parser rule).

however, gettext() of identifier lexer rule action produces incorrect result:

dkk\u uzzzz 

why lexer rule identifier's gettext() different parser id_token rule's $text. afterall, parser rule contains lexer rule?

edit:

issue observed in antlr4.1 not in antlr4.2 have been fixed already.

it's hard tell based on example, instinct using old version of antlr. unable reproduce issue in antlr 4.2.


Comments

Popular posts from this blog

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -

php - Magento - Deleted Base url key -

android - How to disable Button if EditText is empty ? -