Split a string in Java containing ( - java

I am trying to split a string in Java.
For example
Hello (1234)
The part after ( must not be included in the string.
I am expecting the following:
Hello
How would you do it?

Just split according to
zero or more spaces.
And the following ( character.
Then get the value from index 0 of splitted parts.
"Hello (1234)".split("\\s*\\(")[0];
or
"Hello (1234)".split("\\s+")[0];

You can replace the contents in the parenthesis by nothing.
String str = "Hello(1234)";
String result = str.replaceAll("\\(.*\\)", "");
System.out.println(result);
You mention split operation in the question, but you say
The part after ( must not be included in the string. I am expecting
the following:
So I'm assuming you are discarding the (1234) ? If you need to save it, consider the other answer (using split)

You may try the following regex
String[] split = "Hello (1234)".split("\\s*\\([\\d]*\\)*");
System.out.println(split[0]);
\\s* space may be ignored
\\( ( or left parenthesis has special meaning in regex so \\( means only (
Same is the case with ) or right parenthesis \\)
\\d* digits may be present
You may use + instead of * if that character is present at-least once

Related

java - regex within string literal

I'm trying to remove something from a string that looks like:
"name" : "12345"
it will always be that 12345 can be any number, is there a way to do this with something like:
string.replace("\"name\":\"[0-9]\",", "")
that doesn't work, and i've tried several things but nothing works.
thank you!
Add a + behind the number part in order for the regex to match numbers of any length. [0-9] alone will only match exactly 1 digit.
Furthermore, what about spaces? In your example there are spaces, in your code, there are none. You can add \\s* to match any (including none) white-space.
string.replaceAll("\"name\"\\s*:\\s*\"[0-9]+\",", "")
You can play around with it on Regex101.
Andy Turner's comment: You need to use replaceAll instead of replace. replace does not interpret the first parameter as a regex, but tries to find that exact string in your string.
this will do it for you
string.replaceAll( "\"name\"\\s*:\\s*\"\\d+\"", "" )
example:
final String string = "Some\"name\" : \"12345\"String";
System.out.println( string.replaceAll( "\"name\"\\s:\\s\"\\d+\"", "" )
will print the output:
SomeString
And it will work for any number
Replace "\"name\":\"[0-9]\" To "\"name\" : \"[0-9]*\""
I tried a regex like this
String regex="\\w+:\\d+";
String data = "name:12345";
System.out.println(data.matches(regex));
and output is true, you can try around this. \w+ matches one or more word characters and \d+ matches one or more numbers

String.split() not working as intended

I'm trying to split a string, however, I'm not getting the expected output.
String one = "hello 0xA0xAgoodbye";
String two[] = one.split(" |0xA");
System.out.println(Arrays.toString(two));
Expected output: [hello, goodbye]
What I got: [hello, , , goodbye]
Why is this happening and how can I fix it?
Thanks in advance! ^-^
If you'd like to treat consecutive delimiters as one, you could modify your regex as follows:
"( |0xA)+"
This means "a space or the string "0xA", repeated one or more times".
(\\s|0xA)+ This will match one or more number of space or 0xA in the text and split them
This result is caused by multiple consecutive matches in the string. You may wrap the pattern with a grouping construct and apply a + quantifier to it to match multiple matches:
String one = "hello 0xA0xAgoodbye";
String two[] = one.split("(?:\\s|0xA)+");
System.out.println(Arrays.toString(two));
A (?:\s|0xA)+ regex matches 1 or more whitespace symbols or 0XA literal character sequences.
See the Java online demo.
However, you will still get an empty value as the first item in the resulting array if the 0xA or whitespaces appear at the start of the string. Then, you will have to remove them first:
String two[] = one.replaceFirst("^(?:\\s|0xA)+", "").split("(?:\\s+|0xA)+");
See another Java demo.

how to convert one line containing several sentences into lines according to dot(.) [duplicate]

I am wondering if I am going about splitting a string on a . the right way? My code is:
String[] fn = filename.split(".");
return fn[0];
I only need the first part of the string, that's why I return the first item. I ask because I noticed in the API that . means any character, so now I'm stuck.
split() accepts a regular expression, so you need to escape . to not consider it as a regex meta character. Here's an example :
String[] fn = filename.split("\\.");
return fn[0];
I see only solutions here but no full explanation of the problem so I decided to post this answer
Problem
You need to know few things about text.split(delim). split method:
accepts as argument regular expression (regex) which describes delimiter on which we want to split,
if delim exists at end of text like in a,b,c,, (where delimiter is ,) split at first will create array like ["a" "b" "c" "" ""] but since in most cases we don't really need these trailing empty strings it also removes them automatically for us. So it creates another array without these trailing empty strings and returns it.
You also need to know that dot . is special character in regex. It represents any character (except line separators but this can be changed with Pattern.DOTALL flag).
So for string like "abc" if we split on "." split method will
create array like ["" "" "" ""],
but since this array contains only empty strings and they all are trailing they will be removed (like shown in previous second point)
which means we will get as result empty array [] (with no elements, not even empty string), so we can't use fn[0] because there is no index 0.
Solution
To solve this problem you simply need to create regex which will represents dot. To do so we need to escape that .. There are few ways to do it, but simplest is probably by using \ (which in String needs to be written as "\\" because \ is also special there and requires another \ to be escaped).
So solution to your problem may look like
String[] fn = filename.split("\\.");
Bonus
You can also use other ways to escape that dot like
using character class split("[.]")
wrapping it in quote split("\\Q.\\E")
using proper Pattern instance with Pattern.LITERAL flag
or simply use split(Pattern.quote(".")) and let regex do escaping for you.
Split uses regular expressions, where '.' is a special character meaning anything. You need to escape it if you actually want it to match the '.' character:
String[] fn = filename.split("\\.");
(one '\' to escape the '.' in the regular expression, and the other to escape the first one in the Java string)
Also I wouldn't suggest returning fn[0] since if you have a file named something.blabla.txt, which is a valid name you won't be returning the actual file name. Instead I think it's better if you use:
int idx = filename.lastIndexOf('.');
return filename.subString(0, idx);
the String#split(String) method uses regular expressions.
In regular expressions, the "." character means "any character".
You can avoid this behavior by either escaping the "."
filename.split("\\.");
or telling the split method to split at at a character class:
filename.split("[.]");
Character classes are collections of characters. You could write
filename.split("[-.;ld7]");
and filename would be split at every "-", ".", ";", "l", "d" or "7". Inside character classes, the "." is not a special character ("metacharacter").
As DOT( . ) is considered as a special character and split method of String expects a regular expression you need to do like this -
String[] fn = filename.split("\\.");
return fn[0];
In java the special characters need to be escaped with a "\" but since "\" is also a special character in Java, you need to escape it again with another "\" !
String str="1.2.3";
String[] cats = str.split(Pattern.quote("."));
Wouldn't it be more efficient to use
filename.substring(0, filename.indexOf("."))
if you only want what's up to the first dot?
Usually its NOT a good idea to unmask it by hand. There is a method in the Pattern class for this task:
java.util.regex
static String quote(String s)
The split must be taking regex as a an argument... Simply change "." to "\\."
The solution that worked for me is the following
String[] fn = filename.split("[.]");
Note: Further care should be taken with this snippet, even after the dot is escaped!
If filename is just the string ".", then fn will still end up to be of 0 length and fn[0] will still throw an exception!
This is, because if the pattern matches at least once, then split will discard all trailing empty strings (thus also the one before the dot!) from the array, leaving an empty array to be returned.
Using ApacheCommons it's simplest:
File file = ...
FilenameUtils.getBaseName(file.getName());
Note, it also extracts a filename from full path.
split takes a regex as argument. So you should pass "\." instead of "." because "." is a metacharacter in regex.

how to ignore newlines for split function

I am splitting the string using ^ char. The String which I am reading, is coming from some external source. This string contains some \n characters.
The string may look like:
Hi hello^There\nhow are\nyou doing^9987678867abc^popup
when I am splitting like below, why the array length is coming as 2 instead of 4:
String[] st = msg[0].split("^");
st.length //giving "2" instead of "4"
It look like, split is ignoring after \n.
How can I fix it without replacing \n to some other character.
the string parameter for split is interpreted as regular expression.
So you have to escape the char and use:
st.split("\\^")
see this answer for more details
Escape the ^ character. Use msg[0].split("\\^") instead.
String.split considers its argument as regular expression. And as ^ has a special meaning when it comes to regular expressions, you need to escape it to use its literal representation.
If you want to split by ^ only, then
String[] st = msg[0].split("\\^");
If I read your question correctly, you want to split by ^ and \n characters, so this would suffice.
String[] st = msg[0].split("[\\^\\\\n]");
This considers that \n literally exists as 2 characters in a string.
"^" it's know as regular expression by the JDK.
To avoid this confusion you need to modify the code as below
old code = msg[0].split("^")
new code = msg[0].split("\\^")

Split string with dot as delimiter

I am wondering if I am going about splitting a string on a . the right way? My code is:
String[] fn = filename.split(".");
return fn[0];
I only need the first part of the string, that's why I return the first item. I ask because I noticed in the API that . means any character, so now I'm stuck.
split() accepts a regular expression, so you need to escape . to not consider it as a regex meta character. Here's an example :
String[] fn = filename.split("\\.");
return fn[0];
I see only solutions here but no full explanation of the problem so I decided to post this answer
Problem
You need to know few things about text.split(delim). split method:
accepts as argument regular expression (regex) which describes delimiter on which we want to split,
if delim exists at end of text like in a,b,c,, (where delimiter is ,) split at first will create array like ["a" "b" "c" "" ""] but since in most cases we don't really need these trailing empty strings it also removes them automatically for us. So it creates another array without these trailing empty strings and returns it.
You also need to know that dot . is special character in regex. It represents any character (except line separators but this can be changed with Pattern.DOTALL flag).
So for string like "abc" if we split on "." split method will
create array like ["" "" "" ""],
but since this array contains only empty strings and they all are trailing they will be removed (like shown in previous second point)
which means we will get as result empty array [] (with no elements, not even empty string), so we can't use fn[0] because there is no index 0.
Solution
To solve this problem you simply need to create regex which will represents dot. To do so we need to escape that .. There are few ways to do it, but simplest is probably by using \ (which in String needs to be written as "\\" because \ is also special there and requires another \ to be escaped).
So solution to your problem may look like
String[] fn = filename.split("\\.");
Bonus
You can also use other ways to escape that dot like
using character class split("[.]")
wrapping it in quote split("\\Q.\\E")
using proper Pattern instance with Pattern.LITERAL flag
or simply use split(Pattern.quote(".")) and let regex do escaping for you.
Split uses regular expressions, where '.' is a special character meaning anything. You need to escape it if you actually want it to match the '.' character:
String[] fn = filename.split("\\.");
(one '\' to escape the '.' in the regular expression, and the other to escape the first one in the Java string)
Also I wouldn't suggest returning fn[0] since if you have a file named something.blabla.txt, which is a valid name you won't be returning the actual file name. Instead I think it's better if you use:
int idx = filename.lastIndexOf('.');
return filename.subString(0, idx);
the String#split(String) method uses regular expressions.
In regular expressions, the "." character means "any character".
You can avoid this behavior by either escaping the "."
filename.split("\\.");
or telling the split method to split at at a character class:
filename.split("[.]");
Character classes are collections of characters. You could write
filename.split("[-.;ld7]");
and filename would be split at every "-", ".", ";", "l", "d" or "7". Inside character classes, the "." is not a special character ("metacharacter").
As DOT( . ) is considered as a special character and split method of String expects a regular expression you need to do like this -
String[] fn = filename.split("\\.");
return fn[0];
In java the special characters need to be escaped with a "\" but since "\" is also a special character in Java, you need to escape it again with another "\" !
String str="1.2.3";
String[] cats = str.split(Pattern.quote("."));
Wouldn't it be more efficient to use
filename.substring(0, filename.indexOf("."))
if you only want what's up to the first dot?
Usually its NOT a good idea to unmask it by hand. There is a method in the Pattern class for this task:
java.util.regex
static String quote(String s)
The split must be taking regex as a an argument... Simply change "." to "\\."
The solution that worked for me is the following
String[] fn = filename.split("[.]");
Note: Further care should be taken with this snippet, even after the dot is escaped!
If filename is just the string ".", then fn will still end up to be of 0 length and fn[0] will still throw an exception!
This is, because if the pattern matches at least once, then split will discard all trailing empty strings (thus also the one before the dot!) from the array, leaving an empty array to be returned.
Using ApacheCommons it's simplest:
File file = ...
FilenameUtils.getBaseName(file.getName());
Note, it also extracts a filename from full path.
split takes a regex as argument. So you should pass "\." instead of "." because "." is a metacharacter in regex.

Categories

Resources