Java String.split() regex for handling escaped delimeter and escaped escape characters -


string teststring = "a\\,b\\\\,c,d\\\\\\,e,f\\\\g"; string[] splitedstring = test.split(pattern_string); (string string : splitedstring) {     system.out.println(string); } 

here have string encodes list of string string escape character \ , delimiter ,
note:(back slashes in example doubled because of java code)
backslash , comma escaped in original strings , result strings merged comma. need regex split string original list of strings.
example of string

"a\,b\\,c,d\\\,e,f\\g"
need such strings:

"a\,b\\" "c" "d\\\,e" "f\\g" 

so logic of split simple: split delimiter comma if number of backslashes directly before even: 0,2,4... in case comma delimiter. if number of backslashes before comma odd escaped comma , no split should occur.

can me appropriate regex case?

edit
know regex: (?<!\\\\), split string commas not have backslashes before it. in case need split in case number of slashes before comma even.

appreciate help.

if has split can try like

split("(?<!(?<!\\\\)\\\\(\\\\{2}){0,1000000000}),") 

i used {0,1000000000} instead of * because look-behind in java needs have obvious maximal length, , 1000000000 seems enough, unless can have more 1000000000 continuous \\ in text.


if doesn't have split can use

matcher m = pattern.compile("(\\g.*?(?<!\\\\)(\\\\{2})*)(,|(?<!\\g)$)",         pattern.dotall).matcher(teststring); while (m.find()) {     system.out.println(m.group(1)); } 

\\g means end of previous match, or in case first iteration of matcher , there no previous match start of string ^.


but fastest , not hart implement writing own parser, use flag escaped signal current checked character escaped \.

public static list<string> parse(string text) {     list<string> tokens = new arraylist<>();     boolean escaped = false;     stringbuilder sb = new stringbuilder();      (char ch : text.tochararray()) {         if (ch == ',' && !escaped) {             tokens.add(sb.tostring());             sb.delete(0, sb.length());         } else {             if (ch == '\\')                 escaped = !escaped;             else                 escaped = false;             sb.append(ch);         }     }      if (sb.length() > 0) {         tokens.add(sb.tostring());         sb.delete(0, sb.length());     }      return tokens; } 

demo of approaches:

string teststring = "a\\,b\\\\,c,d\\\\\\,e,f\\\\g"; string[] splitedstring = teststring         .split("(?<!(?<!\\\\)\\\\(\\\\{2}){0,1000000000}),"); (string string : splitedstring) {     system.out.println(string); }  system.out.println("-----"); matcher m = pattern.compile("(\\g.*?(?<!\\\\)(\\\\{2})*)(,|(?<!\\g)$)",         pattern.dotall).matcher(teststring); while (m.find()) {     system.out.println(m.group(1)); }  system.out.println("-----"); (string s : parse(teststring))     system.out.println(s); 

output:

a\,b\\ c d\\\,e f\\g ----- a\,b\\ c d\\\,e f\\g ----- a\,b\\ c d\\\,e f\\g 

Comments

Popular posts from this blog

php - Magento - Deleted Base url key -

javascript - Tooltipster plugin not firing jquery function when button or any click even occur -

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -