Java String.split() regex for handling escaped delimeter and escaped escape characters -
string teststring = "a\\,b\\\\,c,d\\\\\\,e,f\\\\g"; string[] splitedstring = test.split(pattern_string); (string string : splitedstring) { system.out.println(string); }
here have string encodes list of string string escape character \ , delimiter ,
note:(back slashes in example doubled because of java code)
backslash , comma escaped in original strings , result strings merged comma. need regex split string original list of strings.
example of string
"a\,b\\,c,d\\\,e,f\\g"need such strings:
"a\,b\\" "c" "d\\\,e" "f\\g"
so logic of split simple: split delimiter comma if number of backslashes directly before even: 0,2,4... in case comma delimiter. if number of backslashes before comma odd escaped comma , no split should occur.
can me appropriate regex case?
edit
know regex: (?<!\\\\),
split string commas not have backslashes before it. in case need split in case number of slashes before comma even.
appreciate help.
if has split can try like
split("(?<!(?<!\\\\)\\\\(\\\\{2}){0,1000000000}),")
i used {0,1000000000}
instead of *
because look-behind in java needs have obvious maximal length, , 1000000000
seems enough, unless can have more 1000000000
continuous \\
in text.
if doesn't have split
can use
matcher m = pattern.compile("(\\g.*?(?<!\\\\)(\\\\{2})*)(,|(?<!\\g)$)", pattern.dotall).matcher(teststring); while (m.find()) { system.out.println(m.group(1)); }
\\g
means end of previous match, or in case first iteration of matcher , there no previous match start of string ^
.
but fastest , not hart implement writing own parser, use flag escaped
signal current checked character escaped \
.
public static list<string> parse(string text) { list<string> tokens = new arraylist<>(); boolean escaped = false; stringbuilder sb = new stringbuilder(); (char ch : text.tochararray()) { if (ch == ',' && !escaped) { tokens.add(sb.tostring()); sb.delete(0, sb.length()); } else { if (ch == '\\') escaped = !escaped; else escaped = false; sb.append(ch); } } if (sb.length() > 0) { tokens.add(sb.tostring()); sb.delete(0, sb.length()); } return tokens; }
demo of approaches:
string teststring = "a\\,b\\\\,c,d\\\\\\,e,f\\\\g"; string[] splitedstring = teststring .split("(?<!(?<!\\\\)\\\\(\\\\{2}){0,1000000000}),"); (string string : splitedstring) { system.out.println(string); } system.out.println("-----"); matcher m = pattern.compile("(\\g.*?(?<!\\\\)(\\\\{2})*)(,|(?<!\\g)$)", pattern.dotall).matcher(teststring); while (m.find()) { system.out.println(m.group(1)); } system.out.println("-----"); (string s : parse(teststring)) system.out.println(s);
output:
a\,b\\ c d\\\,e f\\g ----- a\,b\\ c d\\\,e f\\g ----- a\,b\\ c d\\\,e f\\g
Comments
Post a Comment