Splitting a string in Java using multiple delimiters - java

I have a string like
String myString = "hello world~~hello~~world"
I am using the split method like this
String[] temp = myString.split("~|~~|~~~");
I want the array temp to contain only the strings separated by ~, ~~ or ~~~.
However, the temp array thus created has length 5, the 2 additional 'strings' being empty strings.
I want it to ONLY contain my non-empty string. Please help. Thank you!

You should use quantifier with your character:
String[] temp = myString.split("~+");
String#split() takes a regex. ~+ will match 1 or more ~, so it will split on ~, or ~~, or ~~~, and so on.
Also, if you just want to split on ~, ~~, or ~~~, then you can limit the repetition by using {m,n} quantifier, which matches a pattern from m to n times:
String[] temp = myString.split("~{1,3}");
When you split it the way you are doing, it will split a~~b twice on ~, and thus the middle element will be an empty string.
You could also have solved the problem by reversing the order of your delimiter like this:
String[] temp = myString.split("~~~|~~|~");
That will first try to split on ~~, before splitting on ~ and will work fine. But you should use the first approach.

Just turn the pattern around:
String myString = "hello world~~hello~~world";
String[] temp = myString.split("~~~|~~|~");

Try This :
myString.split("~~~|~~|~");
It will definitely works. In your code, what actually happens that when ~ occurs for the first time,it count as a first separator and split the string from that point. So it doesn't get ~~ or ~~~ anywhere in your string though it is there. Like :
[hello world]~[]~[hello]~[]~[world]
Square brackets are split-ed in to 5 different string values.

Related

How to remove white-space at the front of a string using string split in java?

I was solving a problem on string.split(), but I couldn't find the solution to remove white-space at the start. Actually the problem is in the regular expression. It needs to be changed. I tried to solve by changing the expression several times but didn't work out. What will be the regular expression? Here is the code given below:
String s = " YES leading spaces are valid, problemsetters are evillllll";
String delims = "[\\s._,?!'#\\t]+";
String[] words = s.split(delims);
System.out.println(words.length - 1);
for(String w:words) {
System.out.println(w);
}
System.out.println(Arrays.toString(words));
output:
8
YES
leading
spaces
are
valid
problemsetters
are
evillllll
[, YES, leading, spaces, are, valid, problemsetters, are, evillllll]
You can use String class trim() method to remove the space before using split

Splitting string in java produces empty element first

Im trying to split a sting on multiple or single occurences of "O" and all other characters will be dots. I'm wondering why this produces en empty string first.
String row = ".....O.O.O"
String[] arr = row.split("\\.+");
This produces produces:
["", "O", "O", "O"]
You just need to make sure that any trailing or leading dots are removed.
So one solution is:
row.replaceAll("^\\.+|\\.+$", "").split("\\.+");
For this pattern you can use replaceFirstMethod() and then split by dot
String[] arr = row.replaceFirst("\\.+","").split("\\.");
Output will be
["O","O","O"]
The "+" character is removing multiple instances of the seperator, so what your split is essentially doing is splitting the following string on "."
.0.0.0.
This, of course, means that your first field is empty. Hence the result you get.
To avoid this, strip all leading separators from the string before splitting it. Rather than type some examples on how to do this, here's a thread with a few suggestions.
Java - Trim leading or trailing characters from a string?

How to select string until a certain character in java

I have string where only a certain part should be selected. until i reach a character.
Ex. 5000 - 10000 i want only 5000 until the - or the white space.
input.replace("","");
What Regular expression should i be using.
Something like this:
final String beforeDash = input.split("-")[0].trim();
This should solve your problem:
String[] parts = input.split("-");
The string you are looking for is then in parts[0].
If you want to split on the whitespace instead of the dash, use string.split(" ").
You could try the below code which matches the first space or - upto the last character. Replacing those matched characters with an empty string will gave you the desired output.
input.replaceAll("[\\s-].*","");
You could also use string.split function.
String[] parts = input.split("[\\s-]");
System.out.println(parts[0]);
The above split function would split the input according to a space or a hyphen. Printing the index 0 from the splitted parts will give you the desired output.

Java (Regex) - Get all words in a sentence

I need to split a java string into an array of words. Let's say the string is:
"Hi!! I need to split this string, into a serie's of words?!"
At the moment I'm tried using this String[] strs = str.split("(?!\\w)") however it keeps symbols such as ! in the array and it also keeps strings like "Hi!" in the array as well. The string I am splitting will always be lowercase. What I would like is for an array to be produced that looks like:
{"hi", "i", "need", "to", "split", "this", "string", "into", "a", "serie's", "of", "words"} - Note the apostrophe is kept.
How could I change my regex to not include the symbols in the array?
Apologies, I would define a word as a sequence of alphanumeric characters only but with the ' character inclusive if it is in the above context such as "it's", not if it is used to a quote a word such as "'its'". Also, in this context "hi," or "hi-person" are not words but "hi" and "person" are. I hope that clarifies the question.
You can remove all ?! symbols and then split into words
str = str.replaceAll("[!?,]", "");
String[] words = str.split("\\s+");
Result:
Hi, I, need, to, split, this, string, into, a, serie's, of, words
Should work for what you want.
String line = "Hi!! I need to split this string, into a serie's of words?! but not '' or ''' word";
String regex = "([^a-zA-Z']+)'*\\1*";
String[] split = line.split(regex);
System.out.println(Arrays.asList(split));
Gives
[Hi, I, need, to, split, this, string, into, a, serie's, of, words, but, not, or, word]
If you define a word as a sequence of non-whitespace characters (whitespace character as defined by \s), then you can split along space characters:
str.split("\\s+")
Note that ";.';.##$>?>#4", "very,bad,punctuation", and "'goodbye'" are words under the definition above.
Then the other approach is to define a word as a sequence of characters from a set of allowed characters. If you want to allow a-z, A-Z, and ' as part of a word, you can split along everything else:
str.split("[^a-zA-Z']+")
This will still allow "''''''" to be defined as a word, though.
So what you want is to split on anything that is not a wordcharacter [a-zA-Z] and is not a '
This regex will do that "[^a-zA-Z']\s"
There will be a problem if the string contains a quote that is quoted in '
I usually use this page for testing my regex'
http://www.regexplanet.com/advanced/java/index.html
I would use str.split("[\\s,?!]+"). You can add whatever character you want to split with inside the brackets [].
You could filter out the characters you deem as "non-word" characters:
String[] strs = str.split("[,!? ]+");
myString.replaceAll("[^a-zA-Z'\\s]","").toLowerCase().split("\\s+");
replaceAll("[^a-zA-Z'\\s]","") method replaces all the characters which are not a-z or A-Z or ' or a whitespace with nothing ("") and then toLowerCase method make all the chars returned from replaceAll method lower case. Finally we are splitting the string in terms of whitespace char. more readable one;
myString = myString.replaceAll("[^a-zA-Z'\\s]","");
myString = myString.toLowerCase();
String[] strArr = myString.split("\\s+");

Java Split on Regex

I cant seem to be able to split on a simple regex,
If i have a string [data, data2] and i attempt to split like so: I tried to escape the brackets.
String regex = "\\[,\\]";
String[] notifySplit = notifyWho.split(regex);
The output of looping through notifySplit shows this regex not working
notify: [Everyone, Teachers only]
Any help on what the proper regex is, i am expecting an array like so:
data, data2
where i could possibly ignore these two characters [ ,
First, you don't want to split on the brackets. You just want to exclude them from your end result. So first thing you'll probably want to do is strip those out:
notifyWho = notifyWho.replace("[", "").replace("]", "");
Then you can do a basic split on the comma:
String[] notifySplit = notifyWho.split(",");
I would do it in one line, first removing the square brackets, then splitting:
String[] notifySplit = notifyWho.replaceAll("[[\\]]", "").split(",");

Categories

Resources