how to escape escape characters in Java - java

I have a large file that contains \' that I need to find. I've tried variations of the following but it's not working:
do{
line = TextFileIO.readLine(bufferedReader);
if(line != null){
TextFileIO.writeLine(bufferedWriter,line);
for (int i = 0; i < line.length() - 1; i++){
if(line.substring(i,i+1).equals("\\\'"))System.out.println("we found it " + line);
}
}
}while (line != null);

No need to escape the single quote!
Single quotes don't need escaping because all Java strings are delimited by double quotes. Single quotes delimit character literals. So in a character literal, you need to escape single quotes, e.g. '\''.
So all you need is "\\'", escaping only the backslash.
substring(i,i+1) cannot produce a two character string. If you are trying to get 2-character strings, you need to call with (i,i+2).
Also, your for loop can be replaced by a call to contains.
if(line.contains("\\'"))System.out.println("we found it " + line);

To represent a single backslash followed by an apostrophe, you can use
"\\'"
But there is no way substring(i,i+1) can be equal to a two-character string.
Perhaps you mean
if (line.substring(i, i+2).equals("\\'")) ...

line.substring(i,i+1) only contains one character, and the for loop can replaced by line.indexOf("\\'") >= 0:
if (line.indexOf() >= 0) {
System.out.println("we found it " + line);
}

\\ is an escaped \ in Java, so I think your match string should be "\\".
P.s. I'm not exactly sure what you are trying to achieve here, but there appears to be more elegant, more "java-like" ways to do it than what you have here...

Related

How to split a string by a string in Java, considering escaped ones

So I have a code here where I need to split incoming strings by the char ';'. However, there might be some that are escaped with \.
What I am doing then is to iterate it letter by letter for ; excluding if the previous letter was a \ and then replace any outcomes where there was an escaped \; with ;.
This seems all a bit cumbersome to me, is there a better way how to do this?
public void parse(String line, Player player) {
if (line.contains(";")) { // check split sign
String subString;
int previousIndex = 0; // location of the first letter
String search = "\\;";
String replace = ";";
// lets search for colons
int index = line.indexOf(';');
while (index >= 0) {
// check if the previous letter is a \ so we know it's escaped
if (line.charAt(index - 1) != '\\') {
// get a substring for the current segment:
subString = substring(previousIndex, index);
if (subString.contains("/Command/")) { // Check if line is an actual command line
// replace escaped colons and execute command
parseCommand(subString.replaceAll(search, replace), player);
} else if (subString.contains("/Output/")) {
parseOutput(subString.replaceAll(search, replace), player);
} else {
Main.logDebugInfo(Level.WARNING, "Command parsing: No command or output tag found!");
}
previousIndex = index;
}
index = line.indexOf(';', index + 1); // next letter
}
} else {
Main.logDebugInfo(Level.WARNING, "Command parsing: No ; found.");
}
}
Alternative that comes to my mind would be to first replace all \; with a Very specific substring (e.g. "%%€€", then split into a list by ; and re-substitute the escaped ones with ;. There is a tiny risk that this causes issues.
I am wondering if there is some standard routine/best practice to deal with escaped characters?
split takes regex as parameter, so you can use negative lookbehind:
String[] split = foo.split("(?<!\\\\);");
Yes, that's 4 \'s repeated, because each \ needs to be escaped.
I am wondering if there is some standard routine/best practice to deal with escaped characters?
Quote your values, or use a separator that doesn't appear in actual content. Or better yet, use some well-defined format for transmitting data, such as JSON.

Split string by array of characters

i want to split a string by array of characters,
so i have this code:
String target = "hello,any|body here?";
char[] delim = {'|',',',' '};
String regex = "(" + new String(delim).replaceAll("(.)", "\\\\$1|").replaceAll("\\|$", ")");
String[] result = target.split(regex);
everything works fine except when i want to add a character like 'Q' to delim[] array,
it throws exception :
java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 11
(\ |\,|\||\Q)
so how can i fix that to work with non-special characters as well?
thanks in advance
how can i fix that to work with non-special characters as well
Put square brackets around your characters, instead of escaping them. Make sure that if ^ is included in your list of characters, you need to make sure it's not the first character, or escape it separately if it's the only character on the list.
Dashes also need special treatment - they need to go at the beginning or at the end of the regex.
String delimStr = String(delim);
String regex;
if (delimStr.equals("^") {
regex = "\\^"
} else if (delimStr.charAt(0) == '^') {
// This assumes that all characters are distinct.
// You may need a stricter check to make this work in general case.
regex = "[" + delimStr.charAt(1) + delimStr + "]";
} else {
regex = "[" + delimStr + "]";
}
Using Pattern.quote and putting it in square brackets seems to work:
String regex = "[" + Pattern.quote(new String(delim)) + "]";
Tested with possible problem characters.
Q is not a control character in a regex, so you do not have to put the \\ before it (it only serves to mark that you must interpret the following character as a literal, and not as a control character).
Example
`\\.` in a regex means "a dot"
`.` in a regex means "any character"
\\Q fails because Q is not special character in a regex, so it does not need to be quoted.
I would make delim a String array and add the quotes to these values that need it.
delim = {"\\|", ..... "Q"};

contains quotation mark java

how to I add an argument to check if a string contains ONE quotation mark ? I tried to escape the character but it doesn't work
words[i].contains()
EDIT: my bad, got some unclosed brackets, works fine now
words[i].matches("[^\"]*\"[^\"]*")
That is: any non-quotes, a quote, any non-quotes.
You could use something like this:
words[i].split("\"").length - 1
That would give you the amount of "s in your string. Therefore, just use:
if (words[i].split("\"").length == 2) {
//do stuff
}
You can check if the first quotation mark exists, and then check if the second one doesn't. It's much faster than using matches or split.
int index = words[i].indexOf('\"');
if (index != -1 && words[i].indexOf('\"', index + 1) == -1){
// do stuff
}
To check number or quotes you can also use length of string after removing ".
int quotesNumber = words[i].length() - words[i].replace("\"", "").length();
if (quotesNumber == 1){
//do stuff
}

String.replaceAll for multiple characters

I have a line with ^||^ as my delimiter, I am using
int charCount = line.replaceAll("[^" + fileSeperator + "]", "").length();
if(fileSeperator.length()>1)
{
charCount=charCount/fileSeperator.length();
System.out.println(charCount+"char count between");
}
This does not work if i have a line that has stray | or ^ as it counts these as well. How can i modify the regex or any other suggestions?
If I understand correctly, what you're really trying to do is count the number of times that ^||^ appears in your String.
If that's the case, you can use:
Matcher m = Pattern.compile(Pattern.quote("^||^")).matcher(line);
int count = 0;
while ( m.find() )
count++;
System.out.println(count + "char count between");
But you really don't need the regex engine for this.
int startIndex = 0;
int count = 0;
while ( true ) {
int newIndex = line.indexOf(fileDelimiter, startIndex);
if ( newIndex == -1 ) {
break;
} else {
startIndex = newIndex + 1;
count++;
}
}
Certain characters have special meanings in a regular expression, such as ^ and |. These must be escaped with a backslash in order for them to be treated as normal characters and not as special characters. For example, the following regular expression matches all caret (^) and pipe (|) characters (note the backslashes): [\^\|]
The Pattern.quote() method can be used to escape all of the special characters in a given String.
String quoted = Pattern.quote("^||^"); //returns "\^\|\|\^";
Also note that a character class only matches one character. Thus, the regex [^\^\|\|\^] will match all characters except ^ and |, not all characters except the sequence ^||^. If your intention is to count the number of delimiters (^||^) in a String, then a better approach might be to use the String.indexOf(String, int) method.
Mark Peters's answer seems better. I edited so my answer won't cause any confusion.
You should replace it like this with proper escaping since your delimiter has all special character of regex:
line.replaceAll("\\^\\|\\|\\^", "");
OR else don't use regex at all and call replace method like this:
line.replace("^||^", "");
Lazy solutions.
Depending on the end goal (the println statement is a little confusing):
int numberOfDelimiters = (line.length() - line.replace(fileSeparator,"").length())
/ fileSeparator.length();
int numberOfNonDelimiterChars = line.replace(fileSeparator,"").length();

Remove end of line characters from end of Java String

I have a string which I'd like to remove the end of line characters from the very end of the string only using Java
"foo\r\nbar\r\nhello\r\nworld\r\n"
which I'd like to become
"foo\r\nbar\r\nhello\r\nworld"
(This question is similar to, but not the same as question 593671)
You can use s = s.replaceAll("[\r\n]+$", "");. This trims the \r and \n characters at the end of the string
The regex is explained as follows:
[\r\n] is a character class containing \r and \n
+ is one-or-more repetition of
$ is the end-of-string anchor
References
regular-expressions.info/Anchors, Character Class, Repetition
Related topics
You can also use String.trim() to trim any whitespace characters from the beginning and end of the string:
s = s.trim();
If you need to check if a String contains nothing but whitespace characters, you can check if it isEmpty() after trim():
if (s.trim().isEmpty()) {
//...
}
Alternatively you can also see if it matches("\\s*"), i.e. zero-or-more of whitespace characters. Note that in Java, the regex matches tries to match the whole string. In flavors that can match a substring, you need to anchor the pattern, so it's ^\s*$.
Related questions
regex, check if a line is blank or not
how to replace 2 or more spaces with single space in string and delete leading spaces only
Wouldn't String.trim do the trick here?
i.e you'd call the method .trim() on your string and it should return a copy of that string minus any leading or trailing whitespace.
The Apache Commons Lang StringUtils.stripEnd(String str, String stripChars) will do the trick; e.g.
String trimmed = StringUtils.stripEnd(someString, "\n\r");
If you want to remove all whitespace at the end of the String:
String trimmed = StringUtils.stripEnd(someString, null);
Well, everyone gave some way to do it with regex, so I'll give a fastest way possible instead:
public String replace(String val) {
for (int i=val.length()-1;i>=0;i--) {
char c = val.charAt(i);
if (c != '\n' && c != '\r') {
return val.substring(0, i+1);
}
}
return "";
}
Benchmark says it operates ~45 times faster than regexp solutions.
If you have Google's guava-librariesin your project (if not, you arguably should!) you'd do this with a CharMatcher:
String result = CharMatcher.any("\r\n").trimTrailingFrom(input);
String text = "foo\r\nbar\r\nhello\r\nworld\r\n";
String result = text.replaceAll("[\r\n]+$", "");
"foo\r\nbar\r\nhello\r\nworld\r\n".replaceAll("\\s+$", "")
or
"foo\r\nbar\r\nhello\r\nworld\r\n".replaceAll("[\r\n]+$", "")

Categories

Resources