String.split(".") is not splitting my long String - java

I'm doing the following:
String test = "this is a. example";
String[] test2 = test.split(".");
the problem: test2 has no items. But there are many . in the test String.
Any idea what the problem is?

Note that public String[] split(String regex) takes a regex.
You need to escape the special char ..
Use String[] test2 = test.split("\\.");
Now you're telling Java:
"Don't take . as the special char ., take it as the regular char .".
Note that escaping a regex is done by \, but in Java, \ is written as \\.
As suggested in the comments by #OldCurmudgeon (+1), you can use public static String quote(String s) that "Returns a literal pattern String for the specified String":
String[] test2 = test.split(Pattern.quote("."));

The dot . is special regex character. It means match any. You need to escape the character, which in Java is done with \\.
Once it is escaped, it won't be treated as special and will be matched just like any other character.
So String[] test2 = test.split("\\."); should do the trick nicely!

Related

regular expressions to determine if a string starts with ;

The requirement is simple: if the given string matches:
starts with ';'
starts with some char or chars among '\r','\n','\t',' ', and then followed with ';'.
For example ";", "\r;","\r\n;", " \r\n \t;" should all be ok.
Here is my code and it does not work:
private static String regex = "[\\r|\\n| |\\t]+;";
private static boolean startsWithSemicolon(String str) {
return str.matches(regex);
}
Thanks for any help.
You have 2 choices:
Use matches(), in which case the regex must match the entire input, so you'd have to add matching of characters following the ;.
Regex: str.matches("[\\r\\n\\t ]*;.*")
or: Pattern.compile("[\\r\\n\\t ]*;.*").matcher(str).matches()
Use find(), in which case the regex must be anchored to the beginning of the input:
Regex: Pattern.compile("^[\\r\\n\\t ]*;").matcher(str).find()

Split string containing newline characters Java

Say I have a following string str:
GTM =0.2
Test =100
[DLM]
ABCDEF =5
(yes, it contains newline characters) That I am trying to split with [DLM] delimiter substring like this:
String[] strArr = str.split("[DLM]");
Why is it that when I do:
System.out.print(strArr[0]);
I get this output: GT
and when I do
System.out.print(strArr[1]);
I get =0.2
Does this make any sense at all?
str.split("[DLM]"); should be str.split("\\[DLM\\]");
Why?
[ and ] are special characters and String#split accepts regex.
A solution that I like more is using Pattern#quote:
str.split(Pattern.quote("[DLM]"));
quote returns a String representation of the given regex.
Yes, you're giving a regex which says "split with either D, or L, or M".
You should escape those boys like this: str.split("\[DLM\]");
It's being split at the first M.
Escape the brackets
("\\[DLM\\]")
When you use brackets inside the " ", it reads it as, each character inside of the brackets is a delimiter. So in your case, M was a delimiter
use
String[] strArr = str.split("\\[DLM]\\");
Instead of
String[] strArr = str.split("[DLM]");
Other wise it will split with either D, or L, or M.

How to split a string according to "\\" or "\"?

I want to split a string "ABC\DEF" ?
I have tried
String str = "ABC\DEF";
String[] values1 = str.split("\\");
String[] values2 = str.split("\");
But none seems to be working. Please help.
String.split() expects a regular expression. You need to escape each \ because it is in a java string (by the way you should escape on String str = "ABC\DEF"; too), and you need to escape for the regex. In the end, you will end with this line:
String[] values = str.split("\\\\");
The "\\\\" will be the \\ string, which the regex will interpret as \.
Note that String.split splits a string by regex.
One correct way1 to specify \ as delimiter, in RAW regex is:
\\
Since \ is special character in regex, you need to escape it to specify the literal \.
Putting the regex in string literal, you need to escape again, since \ is also escape character in string literal. Therefore, you end up with:
"\\\\"
So your code should be:
str.split("\\\\")
Note that this splits on every single instance of \ in the string.
Footnote
1 Other ways (in RAW regex) are:
\x5C
\0134
\u005C
In string literal (even worse than the quadruple escaping):
"\\x5C"
"\\0134"
"\\u005C"
Use it:
String str = "ABC\\DEF";
String[] values1 = str.split("\\\\");
final String HAY = "_0_";
String str = "ABC\\DEF".replace("\\", HAY);
System.out.println(Arrays.asList(str.split(HAY)));

Java - split string with special delimiter

I have a String, which I want to split into parts using delimeter }},{". I have tried using:
String delims="['}},{\"']+";
String field[]=new String[50];
field=subResult.split(delims);
But it is not working :-( do you know, what expression in delims should I use?
Thanks for your replies
A { is a regex meta-character which marks the beginning of a character class. To match a literal { you need to escape it by preceding it with a \\ as:
String delims="}},\\{";
String field[] = subResult.split(delims);
You need not escape the } in your regex as the regex engine infers that it is a literal } as it is not preceded by a opening {. That said there is no harm in escaping it.
See it
If the delimiter is simply }},{ then subResult.split("\\}\\},\\{") should work
String fooo = "asdf}},{bar}},{baz";
System.out.println(Arrays.toString(fooo.split("\\}\\},\\{")));
You should be escaping it.
String.split("\\}\\},\\{");
You could be making it more complex than you need.
String text = "{{aaa}},{\"hello\"}";
String[] field=text.split("\\}\\},\\{\"");
System.out.println(Arrays.toString(field));
Use:
Pattern p = Pattern.compile("[}},{\"]");       
// Split input with the pattern
String[] result = p.split(MyTextString);

How to replace special characters in a string?

I have a string with lots of special characters. I want to remove all those, but keep alphabetical characters.
How can I do this?
That depends on what you mean. If you just want to get rid of them, do this:
(Update: Apparently you want to keep digits as well, use the second lines in that case)
String alphaOnly = input.replaceAll("[^a-zA-Z]+","");
String alphaAndDigits = input.replaceAll("[^a-zA-Z0-9]+","");
or the equivalent:
String alphaOnly = input.replaceAll("[^\\p{Alpha}]+","");
String alphaAndDigits = input.replaceAll("[^\\p{Alpha}\\p{Digit}]+","");
(All of these can be significantly improved by precompiling the regex pattern and storing it in a constant)
Or, with Guava:
private static final CharMatcher ALNUM =
CharMatcher.inRange('a', 'z').or(CharMatcher.inRange('A', 'Z'))
.or(CharMatcher.inRange('0', '9')).precomputed();
// ...
String alphaAndDigits = ALNUM.retainFrom(input);
But if you want to turn accented characters into something sensible that's still ascii, look at these questions:
Converting Java String to ASCII
Java change áéőűú to aeouu
ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n or Remove diacritical marks from unicode chars
I am using this.
s = s.replaceAll("\\W", "");
It replace all special characters from string.
Here
\w : A word character, short for [a-zA-Z_0-9]
\W : A non-word character
You can use the following method to keep alphanumeric characters.
replaceAll("[^a-zA-Z0-9]", "");
And if you want to keep only alphabetical characters use this
replaceAll("[^a-zA-Z]", "");
Replace any special characters by
replaceAll("\\your special character","new character");
ex:to replace all the occurrence of * with white space
replaceAll("\\*","");
*this statement can only replace one type of special character at a time
Following the example of the Andrzej Doyle's answer, I think the better solution is to use org.apache.commons.lang3.StringUtils.stripAccents():
package bla.bla.utility;
import org.apache.commons.lang3.StringUtils;
public class UriUtility {
public static String normalizeUri(String s) {
String r = StringUtils.stripAccents(s);
r = r.replace(" ", "_");
r = r.replaceAll("[^\\.A-Za-z0-9_]", "");
return r;
}
}
string Output = Regex.Replace(Input, #"([ a-zA-Z0-9&, _]|^\s)", "");
Here all the special characters except space, comma, and ampersand are replaced. You can also omit space, comma and ampersand by the following regular expression.
string Output = Regex.Replace(Input, #"([ a-zA-Z0-9_]|^\s)", "");
Where Input is the string which we need to replace the characters.
Here is a function I used to remove all possible special characters from the string
let name = name.replace(/[&\/\\#,+()$~%!.„'":*‚^_¤?<>|#ª{«»§}©®™ ]/g, '').toLowerCase();
You can use basic regular expressions on strings to find all special characters or use pattern and matcher classes to search/modify/delete user defined strings. This link has some simple and easy to understand examples for regular expressions: http://www.vogella.de/articles/JavaRegularExpressions/article.html
You can get unicode for that junk character from charactermap tool in window pc and add \u e.g. \u00a9 for copyright symbol.
Now you can use that string with that particular junk caharacter, don't remove any junk character but replace with proper unicode.
For spaces use "[^a-z A-Z 0-9]" this pattern

Categories

Resources