Is there a Java function which parses escaped characters? - java

I'm looking for a built-in Java functions which for example can convert "\\n" into "\n".
Something like this:
assert parseFunc("\\n") = "\n"
Or do I have to manually search-and-replace all the escaped characters?

You can use StringEscapeUtils.unescapeJava(s) from Apache Commons Lang. It works for all escape sequences, including Unicode characters (i.e. \u1234).
https://commons.apache.org/lang/apidocs/org/apache/commons/lang3/StringEscapeUtils.html#unescapeJava-java.lang.String-

Anthony is 99% right -- since backslash is also a reserved character in regular expressions, it needs to be escaped a second time:
result = myString.replaceAll("\\\\n", "\n");

Just use the strings own replaceAll method.
result = myString.replaceAll("\\n", "\n");
However if you want match all escape sequences then you could use a Matcher. See http://www.regular-expressions.info/java.html for a very basic example of using Matcher.
Pattern p = Pattern.compile("\\(.)");
Matcher m = p.matcher("This is tab \\t and \\n this is on a new line");
StringBuffer sb = new StringBuffer();
while (m.find()) {
String s = m.group(1);
if (s == "n") {s = "\n"; }
else if (s == "t") {s = "\t"; }
m.appendReplacement(sb, s);
}
m.appendTail(sb);
System.out.println(sb.toString());
You just need to make the assignment to s more sophisticated depending on the number and type of escapes you want to handle. (Warning this is air code, I'm not Java developer)

If you don't want to list all possible escaped characters you can delegate this to Properties behaviour
String escapedText="This is tab \\t and \\rthis is on a new line";
Properties prop = new Properties();
prop.load(new StringReader("x=" + escapedText + "\n"));
String decoded = prop.getProperty("x");
System.out.println(decoded);
This handle all possible characters

Related

Unescaped java not matching in regex matcher.find()

I have the following code that basically matches "Match this:" and keeps the first sentence. However, there are sometimes unicode characters that get passed into the text that are causing backtracking on other more complicated regex's. Escaping seem to alleviate the backtracking index out of range exceptions. However, now the regex isn't matching.
What i would like to know is why this regex isn't matching when escaped? If you comment out the escape/unescape java lines everything.
String text = "Keep this\n\n"
+ "Match this:\n\nDelete šŸ“± this";
text = org.apache.commons.lang.StringEscapeUtils.escapeJava(text);
Pattern PATTERN = Pattern.compile("^Match this:$",
Pattern.MULTILINE);
Matcher m = PATTERN.matcher(text);
if (m.find()) {
text = text.substring(0, m.start()).replaceAll("[\\n]+$", "");
}
text = org.apache.commons.lang.StringEscapeUtils.unescapeJava(text);
System.out.println(text);
What i would like to know is why this regex isn't matching when escaped?
When you escape string like "foo\nbar" which printed is similar to
foo
bar
you are getting "foo\\nbar" which printed looks like
foo\nbar
It happens because StringEscapeUtils.escapeJava escapes also \n and is replacing it with \\n, so it is no longer line separator but simple literal, so it can't be matched with ^ or $.
Possible solution could be replacing back "\\n" with "\n" after StringEscapeUtils.escapeJava. You will need to be careful here, not to "unescapee" real "\\n" which after replacing would give you "\\\\n" which printed would look like \\n. So maybe use
text = org.apache.commons.lang3.StringEscapeUtils.escapeJava(text);
text = text.replaceAll("(?<!\\\\)\\\\n", "\n");// escape `\n`
// if it is not preceded with `\`
//do your job
//and now you can unescape your text (\n will stay \n)
text = org.apache.commons.lang3.StringEscapeUtils.unescapeJava(text);
Another option could be creating your own implementation similar to StringEscapeUtils.escapeJava. If you take a look at this method body you will see
return ESCAPE_JAVA.translate(input);
Where ESCAPE_JAVA is
CharSequenceTranslator ESCAPE_JAVA =
new LookupTranslator(
new String[][] {
{"\"", "\\\""},
{"\\", "\\\\"},
}).with(
new LookupTranslator(EntityArrays.JAVA_CTRL_CHARS_ESCAPE())
).with(
UnicodeEscaper.outsideOf(32, 0x7f)
);
and EntityArrays.JAVA_CTRL_CHARS_ESCAPE() returns clone of
String[][] JAVA_CTRL_CHARS_ESCAPE = {
{"\b", "\\b"},
{"\n", "\\n"},
{"\t", "\\t"},
{"\f", "\\f"},
{"\r", "\\r"}
};
array. So if you provide here your own table which will tell explicitly that \n should be left as it is (so it should be replaced with itself \n) your code will ignore it.
So this is how your own implementation can look like
private static CharSequenceTranslator translatorIgnoringLineSeparators =
new LookupTranslator(
new String[][] {
{ "\"", "\\\"" },
{ "\\", "\\\\" },
}).with(
new LookupTranslator(new String[][] {
{ "\b", "\\b" },
{ "\n", "\n" },//this will handle `\n` and will not change it
{ "\r", "\r" },//this will handle `\r` and will not change it
{ "\t", "\\t" },
{ "\f", "\\f" },
})).with(UnicodeEscaper.outsideOf(32, 0x7f));
public static String myJavaEscaper(CharSequence input) {
return translatorIgnoringLineSeparators.translate(input);
}
This method will prevent escaping \r and \n.

Replace non-ascii character by ascii code using java regex

I have string like this T 8.ESTƜTESTą¤¤ą„ą¤® ą¤®ą„‡ą¤°ą„€. Now using java regex i want to replace non-ascii character Ɯ, ą¤¤ą„ą¤® ą¤®ą„‡ą¤°ą„€ with its equivalent code.
How can i achieve this?
I can replace it with any other string.
String str = "T 8.ESTƜTESTą¤¤ą„ą¤® ą¤®ą„‡ą¤°ą„€";
String resultString = str.replaceAll("[^\\p{ASCII}]", "");
System.out.println(resultString);
It prints T 8.ESTTEST
Sorry, I don't know how to do this using a single regex, please check if this works for you
String str = "T 8.ESTƜTESTą¤¤ą„ą¤® ą¤®ą„‡ą¤°ą„€";
StringBuffer sb = new StringBuffer();
for(int i=0;i<str.length();i++){
if (String.valueOf(str.charAt(i)).matches("[^\\p{ASCII}]")){
sb.append("[CODE #").append((int)str.charAt(i)).append("]");
}else{
sb.append(str.charAt(i));
}
}
System.out.println(sb.toString());
prints
T 8.EST[CODE #220]TEST[CODE #2340][CODE #2369][CODE #2350] [CODE #2350][CODE #2375][CODE #2352][CODE #2368]
the problem seems to be how to tell regex how to convert what it finds to the code.

Java replaceAll() method to escape special characters

I am using java replaceAll() method to escape new line characters
String comment = "ddnfa \n \r \tdnfadsf ' \r t ";
comment = comment.replaceAll("(\\n|\\r|\\t)","\\\\$1");
System.out.println(comment);
But the above code is still inserting new line.
Is there a way to output the comment exactly the same (i.e. with \n and \r instead of inserting new line)?
UPDATE:
I ended up using:
comment = comment.replaceAll("\\n","\\\\n")
.replaceAll("\\r","\\\\r")
.replaceAll("\\t","\\\\t");
You'll have to go one-by-one, since the new-line character U+000A has nothing to do with the two-character escape sequence \n:
comment = comment.replaceAll("\n","\\\\n");
comment = comment.replaceAll("\r","\\\\r");
comment = comment.replaceAll("\t","\\\\t");
you will have to do it character by character:
comment = comment.replaceAll("\n","\\\\n");
comment = comment.replaceAll("\r","\\\\r");
comment = comment.replaceAll("\t","\\\\t");
another solution is to escape the String as a Java String using this function:
comment = org.apache.commons.lang.StringEscapeUtils.escapeJava(comment);
This will make the String look exactly like the String in the Java Code, but it will also show other escape sequences (like \\, \" etc).
But maybe thats exactly what you want
Hard way: using Matcher
String comment = "ddnfa \n \r \tdnfadsf ' \r t ";
Map<String,String> sub = new HashMap<String,String>();
sub.put("\n", "\\\\n");
sub.put("\r", "\\\\r");
sub.put("\t", "\\\\t");
StringBuffer result = new StringBuffer();
Pattern regex = Pattern.compile("\\n|\\r|\\t");
Matcher matcher = regex.matcher(comment);
while (matcher.find()) {
matcher.appendReplacement(result, sub.get(matcher.group()));
}
matcher.appendTail(result);
System.out.println(result.toString());
prints
ddnfa \n \r \tdnfadsf ' \r
Why you dont use Matcher.quoteReplacement(stringToBeReplaced);?
It is a \ problem, simplify like this :
comment = comment.replaceAll("(\n|\r|\t)", "");
output :
ddnfa dnfadsf ' t
Try this..
comment.replaceAll("(\n)|(\r)|(\t)", "\n");

Regex composion

I want to parse a line from a CSV(comma separated) file, something like this:
Bosh,Mark,mark#gmail.com,"3, Institute","83, 1, 2",1,21
I have to parse the file, and instead of the commas between the apostrophes I wanna have ';', like this:
Bosh,Mark,mark#gmail.com,"3; Institute","83; 1; 2",1,21
I use the following Java code but it doesn't parse it well:
Pattern regex = Pattern.compile("(\"[^\\]]*\")");
Matcher matcher = regex.matcher(line);
if (matcher.find()) {
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
the output is:
Bosh,Mark,mark#gmail.com,"3; Institute";"83; 1; 2",1,21
anyone have any idea how to fix this?
This is my solution to replace , inside quote to ;. It assumes that if " were to appear in a quoted string, then it is escaped by another ". This property ensures that counting from start to the current character, if the number of quotes " is odd, then that character is inside a quoted string.
// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);
int start = 0;
StringBuilder output = new StringBuilder();
while (matcher.find()) {
// System.out.println(m.group() + "\n " + m.start() + " " + m.end());
output
.append(line.substring(start, matcher.start())) // Append unrelated contents
.append(matcher.group().replaceAll(",", ";")); // Append replaced string
start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents
// System.out.println(output);
Although I cannot find any case that will fail the method of replace the matched group like you did in line = line.replace(matcher.group(), replacedMatch);, I feel safer to rebuild the string from scratch.
Here's a way:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
String in = "Bosh,Mark,mark#gmail.com,\"3, \"\" Institute\",\"83, 1, 2\",1,21";
String regex = "[^,\"\r\n]+|\"(\"\"|[^\"])*\"";
Matcher matcher = Pattern.compile(regex).matcher(in);
StringBuilder out = new StringBuilder();
while(matcher.find()) {
out.append(matcher.group().replace(',', ';')).append(',');
}
out.deleteCharAt(out.length() - 1);
System.out.println(in + "\n" + out);
}
}
which will print:
Bosh,Mark,mark#gmail.com,"3, "" Institute","83, 1, 2",1,21
Bosh,Mark,mark#gmail.com,"3; "" Institute","83; 1; 2",1,21
Tested on Ideone: http://ideone.com/fCgh7
Here is the what you need
String line = "Bosh,Mark,mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Matcher matcher = regex.matcher(line);
while(matcher.find()){
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
line will have value you needed.
Have you tried to make the RegExp lazy?
Another idea: inside the [] you should use a " too. If you do that, you should have the expected output with global flag set.
Your regex is faulty. Why would you want to make sure there are no ] within the "..." expression? You'd rather make the regex reluctant (default is eager, which means it catches as much as it can).
"(\"[^\\]]*\")"
should be
"(\"[^\"]*\")"
But nhadtdh is right, you should use a proper CSV library to parse it and replace , to ; in the values the parser returns.
I'm sure you'll find a parser when googling "Java CSV parser".
Shouldn't your regex be ("[^"]*") instead? In other words, your first line should be:
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Of course, this is assuming you can't have quotes in the quoted values of your input line.

Escape special characters in java

I have a text file having | (pipe) as the separator. If I am reading a column and the column itself also contains | then it while separating another column is created.
Example :
name|date|age
zzz|20-03-22|23
"xx|zz"|23-23-33|32
How can I escape the character within the double quotes ""
how to escape the regular expression used in the split, so that it works for user-specified delimiters
i have tried
String[] cols = line.split("\|");
System.out.println("lets see column only=="+cols[1]);
How can I escape the character within the double quotes ""
Here's one approach:
String str = "\"xx|zz\"|23-23-33|32";
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find())
m.appendReplacement(sb, m.group().replace("|", "\\\\|"));
m.appendTail(sb);
System.out.println(sb); // prints "xx\|zz"|23-23-33|32
In order to get the columns back you'd do something like this:
String str = "\"xx\\|zz\"|23-23-33|32";
String[] cols = str.split("(?<!\\\\)\\|");
for (String col : cols)
System.out.println(col.replace("\\|", "|"));
Regarding your edit:
how to escape the regular expression used in the split, so that it works for user-specified delimiters
You should use Pattern.quote on the string you want to split on:
String[] cols = line.split(Pattern.quote(delimiter));
This will ensure that the split works as intended even if delimiter contains special regex-symbols such as . or |.
You can use a CSV parser like OpenCSV ou Commons CSV
http://opencsv.sourceforge.net
http://commons.apache.org/sandbox/csv
You can replace it with its unicode sequence (prior to delimiting with pipe)
But what you should do is adjust your parser to take that into account, rather than changing the files.
Here is one way to parse it
String str = "zzz|20-03-22|23 \"xx|zz\"|23-23-33|32";
String regex = "(?<=^|\\|)(([^\"]*?)|([^\"]+\"[^\"]+\".*?))(?=\\||$)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group());
}
Output:
zzz
20-03-22
23 "xx|zz"
23-23-33
32

Categories

Resources