What all characters can be used as String Delimiters in Java? - java

I am trying break a String in various pieces using delimiter(":").
String sepIds[]=ids.split(":");
It is working fine. But when I replace ":" with " * " and use " * " as delimiter, it doesn't work.
String sepIds[]=ids.split("*"); //doesn't work
It just hangs up there, and doesn't execute further.
What mistake I am making here?

String#split takes a regular expression as parameter. In regex some chars have special meanings so they need to be escaped, for example:
"foo*bar".split("\\*")
the result will be as you expect:
[foo, bar]
You could also use the method Pattern#quote to simplify the task.
"foo*bar".split(Pattern.quote("*"))

String.split expects a regular expression argument. * has got a meaning in regex. So if you want to use them then you need to escape them like this:
String sepIds[]=ids.split("\\*");

The argument of .split() is a regular expression, not a string literal. Therefore you need to escape * since it is a special regex character. Write:
ids.split("\\*");
This is how you would split agaisnt one or more spaces:
ids.split("\\s+");
Note that Guava has Splitter which is very, very fast and can split against literals:
Splitter.on('*').split(ids);

'*' and '.' are special characters you have to blackshlash it.
String sepIds[]=ids.split("\\*");
To read more about java patterns please visit that page.

That is expected behaviour. The documentation for the String split function says that the input string is treated as a regular expression (with a link explaining how that works). As Germann points out, '*' is a special character in regular expressions.

Java's String.split() uses regular expressions to split up the string (unlike similar functions in C# or python). * is a special character in regular expressions and you need to escape it with a \ (backslash). So you should use instead:
String sepIds[]=ids.split("\\*");
You can find more information on regular expressions anywhere on the internet a quite complete list of special characters supported by java should be here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Related

Remove everything from a string upto a certain character and optionally a string if it follows too

I am looking to write a regex that can remove any characters upto the first &emsp and if there is a (new section) following &emsp then remove that as well. But the following regex doesn't seem to work. Why? How do I correct this?
String removeEmsp =" “[<centd>[</centd>]§ 431:10A–126 (new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
Pattern removeEmspPattern1 = Pattern.compile("(.*( (\\(new section\\)))?)(.*)", Pattern.MULTILINE);
System.out.println(removeEmspPattern1.matcher(removeEmsp).replaceAll("$2"));
Have you tried String Split? This creates an array of strings from a string, based on a deliminator.
Once you have the string split, just select the elements of the array that you need for print statement.
Read more here
Your regex is very long and I do not want to debug it. However the tip is that some characters have special meaning in regular expressions. For example & means "and". Squire brackets allow defining characters groups etc. Such characters must be escaped if you want them to be interpreted as just characters and not regex commands. To escape special character you have to write \ in front of it. But \ is escape character for java too, so it should be duplicate.
For example to replace ampersand by letter A you should write str.replaceAll("\\&", "A")
Now you have all information you need. Try to start from simpler regex and then expand it to what you need. Good luck.
EDIT
BTW parsing XML and/or HTML using regular expressions is possible but is highly not recommended. Use special parser for such formats.
Try this:
String removeEmsp =" “[<centd>[</centd>]§ 431:10A–126 (new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
System.out.println(removeEmsp.replaceFirst("^.*?\\ (\\(new\\ssection\\))?", ""));
System.out.println(removeEmsp.replaceAll("^.*?\\ (\\(new\\ssection\\))?", ""));
Output:
[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.
[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.
It will remove everything up to " " and optionally, the following "(new section)" text if any.

String.replaceAll(...) of Java not working for \\ and \

I want to convert the directory path from:
C:\Users\Host\Desktop\picture.jpg
to
C:\\Users\\Host\\Desktop\\picture.jpg
I am using replaceAll() function and other replace functions but they do not work.
How can I do this?
I have printed the statement , it gives me the one which i wanted ie
C:\Users\Host\Desktop\picture.jpg
but now when i pass this variable to open the file, i get this exception why?
java.io.FileNotFoundException: C:\Users\Host\Desktop\picture.jpg
EDIT: Changed from replaceAll to replace - you don't need a regex here, so don't use one. (It was a really poor design decision on the part of the Java API team, IMO.)
My guess (as you haven't provided enough information) is that you're doing something like:
text.replace("\\", "\\\\");
Strings are immutable in Java, so you need to use the return value, e.g.
String newText = oldText.replace("\\", "\\\\");
If that doesn't answer your question, please provide more information.
(I'd also suggest that usually you shouldn't be doing this yourself anyway - if this is to include the information in something like a JSON response, I'd expect the wider library to perform escaping for you.)
Note that the doubling is required as \ is an escape character for Java string (and character) literals. Note that as replace doesn't treat the inputs as regular expression patterns, there's no need to perform further doubling, unlike replaceAll.
EDIT: You're now getting a FileNotFoundException because there isn't a filename with double backslashes in - what made you think there was? If you want it as a valid filename, why are you doubling the backslashes?
You have to use :
String t2 = t1.replaceAll("\\\\", "\\\\\\\\");
or (without pattern) :
String t2 = t1.replace("\\", "\\\\");
Each "\" has to be preceeded by an other "\". But it's also true for the preceeding "\" so you have to write four backslashes each time you want one in regex.
In strings \ is bydefault used as escape character therefore in order to select "\" in a string you have to use "\" and for "\" (i.e blackslack two times) use "\\". This will solve your problem and thos will also apply to other symbols also like "
Two explanations:
1. Replace double backslashes to one (not what you asked)
You have to escape the backslash by backslashes. Like this:
String newPath = oldPath.replaceAll("\\\\\\\\", "\\");
The first parameter needs to be escaped twice. Once for the Java Compiler and once because you use regular expressions. So you want to replace two backslashes by one. So, since we have to escape a backslash add one backslash. Now you have \\. This will be compiled to \. BUT!! you have to escape the backslash once again because the first parameter of the replaceAll method uses regular expressions. So to escape it, add a backslash, but that backslash needs to be escaped, so we get \\\\. These for backslashes represents one backslash in the regex. But you want to replace the double backslash to one. So use 8 backslashes.
The second parameter of the replaceAll method isn't using regular expressions, but it has to be escaped as well. So, you need to escape it once for the Java Compiler and once for the replace method: \\\\. This is compiled to two backslashes, which are being interpreted as 1 backslash in the replaceAll method.
2. Replace single backslash to a pair of backslashes (what you asked)
String newPath = oldPath.replaceAll("\\\\", "\\\\\\\\");
Same logic as above.
3. Use replace() instead of replaceAll().
String newPath = oldPath.replace("\\", "\\\\");
The difference is that the replace() method doesn't use regular expressions, so you don't have to escape every backslash twice for the first parameter.
Hopefully, I explained well...
-- Edit: Fixed error, as pointed out by xehpuk --

Java and string split

split this String using function split. Here is my code:
String data= "data^data";
String[] spli = data.split("^");
When I try to do that in spli contain only one string. It seems like java dont see "^" in splitting. Do anyone know how can I split this string by letter "^"?
EDIT
SOLVED :P
This is because String.split takes a regular expression, not a literal string. You have to escape the ^ as it has a different meaning in regex (anchor at the start of a string). So the split would actually be done before the first character, giving you the complete string back unaltered.
You escape a regular expression metacharacter with \, which has to be \\ in Java strings, so
data.split("\\^")
should work.
You need to escape it because it takes reg-ex
\\^
Special characters like ^ need to be escaped with \
This does not work because .split() expects its argument to be a regex. "^" has a special meaing in regex and so does not work as you expect. To get it to work, you need to escape it. Use \\^.
The reason is that split's parameter is a regular expression, so "^" means the beginning of a line. So you need to escape to ASCII-^: use the parameter "\\^".

using replace() or replaceall()

I know of using this:
public String RemoveTag(String html){
html = html.replaceAll("\\<.*?>","");
html = html.replaceAll(" ","");
html = html.replaceAll("&","");
return html;
}
This removes all tags within an html string. However the question is how does it get a wild characters in between <.*?>. Could someone give me a more detailed explanation on how getting wild characters in String.
The main reason for this is that I still have this characters that has "an # at start point and } at end point" and I want to get rid of everything in between "#" and "}".
The first parameter to replaceAll(...) is a regex string. The .*? in your example is the part that matches anything. So, if you want a regular expression that will get rid of everything between "#" and "}" you would use something like:
String exampleText = "Start #some text} finish.";
exampleText.replaceAll("#(.*?)\\}", "#}");
System.out.println(exampleText); // prints "Start #} finish."
Notice the same pattern: .*?. The parentheses, which are optional here, are just used for grouping. Also notice the } is escaped with backslashes since it can have special meaning within regular expressions.
For more info on Java's regex support see the Pattern class.
regular expressions can be implemented by building a finite automaton, since every regular expression has a finite deterministic automaton and vice versa.
The regex for what you are seeking is #.*?} if you want to keep these chars: you can replace it with "#}" instead of with "". it will be something like: s.replaceAll("#.*?}", "#}") [s is your String].
It seems you might need the regex "#.*?\}", though the special } char should be ignored by the pattern recognizer when it fails to see the preceding {. To be on the safe side: "#.*?\\}" should work either way, as #WayneBaylor posted.
You might want to read more on regular expressions

Escaping a String from getting regex parsed in Java

In Java, suppose I have a String variable S, and I want to search for it inside of another String T, like so:
if (T.matches(S)) ...
(note: the above line was T.contains() until a few posts pointed out that that method does not use regexes. My bad.)
But now suppose S may have unsavory characters in it. For instance, let S = "[hi". The left square bracket is going to cause the regex to fail. Is there a function I can call to escape S so that this doesn't happen? In this particular case, I would like it to be transformed to "\[hi".
String.contains does not use regex, so there isn't a problem in this case.
Where a regex is required, rather rejecting strings with regex special characters, use java.util.regex.Pattern.quote to escape them.
As Tom Hawtin said, you need to quote the pattern. You can do this in two ways (edit: actually three ways, as pointed out by #diastrophism):
Surround the string with "\Q" and "\E", like:
if (T.matches("\\Q" + S + "\\E"))
Use Pattern instead. The code would be something like this:
Pattern sPattern = Pattern.compile(S, Pattern.LITERAL);
if (sPattern.matcher(T).matches()) { /* do something */ }
This way, you can cache the compiled Pattern and reuse it. If you are using the same regex more than once, you almost certainly want to do it this way.
Note that if you are using regular expressions to test whether a string is inside a larger string, you should put .* at the start and end of the expression. But this will not work if you are quoting the pattern, since it will then be looking for actual dots. So, are you absolutely certain you want to be using regular expressions?
Try Pattern.quote(String). It will fix up anything that has special meaning in the string.
Any particular reason not to use String.indexOf() instead? That way it will always be interpreted as a regular string rather than a regex.
Regex uses the backslash character '\' to escape a literal. Given that java also uses the backslash character you would need to use a double bashslash like:
String S = "\\[hi"
That will become the String:
\[hi
which will be passed to the regex.
Or if you only care about a literal String and don't need a regex you could do the following:
if (T.indexOf("[hi") != -1) {
T.contains() (according to javadoc : http://java.sun.com/javase/6/docs/api/java/lang/String.html) does not use regexes. contains() delegates to indexOf() only.
So, there are NO regexes used here. Were you thinking of some other String method ?

Categories

Resources