I have string which should be split on "." (point) and " " (space). I have tried:
s.split("[\\s\\.]")
but it doesn't work, because it hasn't split this string normally - "123 456 . 11323 1".
How should I change my regular expression?
I think, what you want is this:
s.split("[\\s\\.]+");
Note the +. You don't seem to want to split on every single (!) occurrence of whitespace or dots. You want to match all lengths of combinations of whitespace or dots. That's why you have to greedy match as many as possible of those characters
Simply use "[\\s.]+" as the regex.
You will get a lot of blank spaces if you only split on a single character.
s.split("[\\s\\.]+")
will produce "123", "456", "11323", "1".
The + causes it to treat any run of spaces and dots as a single break instead of returning a string between adjacent spaces and dots.
You might still get blank strings at either end of your results since given " 123" it will split between the start of the string and "123".
Related
Here what the program is expectiong as the output:
if originalString = "CATCATICATAMCATCATGREATCATCAT";
Output should be "I AM GREAT".
The code must find the sequence of characters (CAT in this case), and remove them. Plus, the resulting String must have spaces in between words.
String origString = remixString.replace("CAT", "");
I figured out I have to use String.replace, But what could be the logic for finding out if its not cat and producing the resulting string with spaces in between the words.
First off, you probably want to use the replaceAll method instead, to make sure you replace all occurrences of "CAT" within the String. Then, you want to introduce spaces, so instead of an empty String, replace "CAT" with " " (space).
As pointed out by the comment below, there might be multiple spaces between words - so we use a regular expression to replace multiple instances of "CAT" with a single space. The '+' symbol means "one or more",.
Finally, trim the String to get rid of leading and trailing white space.
remixString.replaceAll("(CAT)+", " ").trim()
You can use replaceAll which accepts a regular expression:
String remixString = "CATCATICATAMCATCATGREATCATCAT";
String origString = remixString.replaceAll("(CAT)+", " ").trim();
Note: the naming of replace and replaceAll is very confusing. They both replace all instances of the matching string; the difference is that replace takes a literal text as an argument, while replaceAll takes a regular expression.
Maybe this will help
String result = remixString.replaceAll("(CAT){1,}", " ");
I'm trying to split a string, however, I'm not getting the expected output.
String one = "hello 0xA0xAgoodbye";
String two[] = one.split(" |0xA");
System.out.println(Arrays.toString(two));
Expected output: [hello, goodbye]
What I got: [hello, , , goodbye]
Why is this happening and how can I fix it?
Thanks in advance! ^-^
If you'd like to treat consecutive delimiters as one, you could modify your regex as follows:
"( |0xA)+"
This means "a space or the string "0xA", repeated one or more times".
(\\s|0xA)+ This will match one or more number of space or 0xA in the text and split them
This result is caused by multiple consecutive matches in the string. You may wrap the pattern with a grouping construct and apply a + quantifier to it to match multiple matches:
String one = "hello 0xA0xAgoodbye";
String two[] = one.split("(?:\\s|0xA)+");
System.out.println(Arrays.toString(two));
A (?:\s|0xA)+ regex matches 1 or more whitespace symbols or 0XA literal character sequences.
See the Java online demo.
However, you will still get an empty value as the first item in the resulting array if the 0xA or whitespaces appear at the start of the string. Then, you will have to remove them first:
String two[] = one.replaceFirst("^(?:\\s|0xA)+", "").split("(?:\\s+|0xA)+");
See another Java demo.
I have written a regex to omit the characters after the first occurrence of some characters (, and #)
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", ""); //This is the 1st regex
Then a second regex to get only numbers (remove spaces and other non numeric characters)
number = number.replaceAll("[^0-9]+", ""); //This is the 2nd regex
Output: 1234567890
How can I merge the two regex into one like piping the O/p from first regex to the second.
You can combine both regex in the following way.
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", "").replaceAll("[^0-9]+", "");
So you need to remove all symbols other than digits and the whole rest of the string after the first hash symbol or a comma.
You cannot just concatenate the patterns with |operator because one of the patterns is anchored implicitly at the end of the string.
You need to remove any symbols but digits AND hashes with commas first since the tegex engine processes the string from left to right and then you can add the alternative to match a comma or hash with any text after them. Use DOTALL modifier in case you have newline symbols in your input.
Use
(?s)[,#].*$|[^#,0-9]+
I have a string here:
javax.swing.JLabel[,380,30,150x25,alignmentX=0.0,alignmentY=0.0]: Hello
I want to remove everything before the ":", including the ":" itself. This would leave only "Hello". I read about regex, but no combination I tried worked. Can someone tell me how to do it. Thanks in advance!
You need to use replaceAll method or replaceFirst.
string.replaceFirst(".*:\\s*", "");
or
string.replaceAll(".*:\\s*", "");
This would give you only Hello. If you remove \\s* pattern,then it would give you <space>Hello string.
.* Matches any character zero or more times, greedily.
: Upto the colon.
\\s* Matches zero or more space characters.
You could also just split the string by : and take the second string. Like this
String sample = "javax.swing.JLabel[,380,30,150x25,alignmentX=0.0,alignmentY=0.0]: Hello";
System.out.println(sample.split(":", -1)[1]);
This will output
<space>Hello
If you want to get rid of that leading space just trim it off like
System.out.println(sample.split(":", -1)[1].trim());
I want to split string by setting all non-alphabet as separator.
String[] word_list = line.split("[^a-zA-Z]");
But with the following input
11:11 Hello World
word_list contains many empty string before "hello" and "world"
Please kindly tell me why. Thank You.
Because your regular expression matches each individual non-alpha character. It would be like separating
",,,,,,Hello,World"
on commas.
You will want an expression that matches an entire sequence of non-alpha characters at once such as:
line.split("[^a-zA-Z][^a-zA-Z]*")
I still think you will get one leading empty string with your example since it would be like separating ",Hello,World" if comma were your separator.
Here's your string, where each ^ character shows a match for [^a-zA-Z]:
11:11 Hello World
^^^^^^ ^
The split method finds each of these matches, and basically returns all substrings between the ^ characters. Since there's six matches before any useful data, you end up with 5 empty substrings before you get the string "Hello".
To prevent this, you can manually filter the result to ignore any empty strings.
Will the following do?
String[] word_list = line.replaceAll("[^a-zA-Z ]","").replaceAll(" +", " ").trim().split("[^a-zA-Z]");
What I am doing here is removing all non-alphabet characters before doing the split and then replacing multiple spaces by a single space.