String.split() not working as intended - java

I'm trying to split a string, however, I'm not getting the expected output.
String one = "hello 0xA0xAgoodbye";
String two[] = one.split(" |0xA");
System.out.println(Arrays.toString(two));
Expected output: [hello, goodbye]
What I got: [hello, , , goodbye]
Why is this happening and how can I fix it?
Thanks in advance! ^-^

If you'd like to treat consecutive delimiters as one, you could modify your regex as follows:
"( |0xA)+"
This means "a space or the string "0xA", repeated one or more times".

(\\s|0xA)+ This will match one or more number of space or 0xA in the text and split them

This result is caused by multiple consecutive matches in the string. You may wrap the pattern with a grouping construct and apply a + quantifier to it to match multiple matches:
String one = "hello 0xA0xAgoodbye";
String two[] = one.split("(?:\\s|0xA)+");
System.out.println(Arrays.toString(two));
A (?:\s|0xA)+ regex matches 1 or more whitespace symbols or 0XA literal character sequences.
See the Java online demo.
However, you will still get an empty value as the first item in the resulting array if the 0xA or whitespaces appear at the start of the string. Then, you will have to remove them first:
String two[] = one.replaceFirst("^(?:\\s|0xA)+", "").split("(?:\\s+|0xA)+");
See another Java demo.

Related

Replace a nth character using regex in Java

I'm trying to learn regex in Java.
So far, I've been trying some little mini challenges and I'm wondering if there is a way to define a nth character.
For instance, let's say I have this string: todayiwasnotagoodday
If I want to replace the third (fourth or seventh) character, how I can define a regex in order to change an specific "index", for this example the 'd' for an empty space "".
I've been searching about it, but so far my implementations match from the first element to the third: ^[a-z]{3}
¿Is it possible to define this regex?
Thanks in advance.
If you want to replace the third character with a space via regex, you could try a regex replace all:
String input = "todayiwasnotagoodday";
String output = input.replaceAll("^(.{2}).(.*)$", "$1 $2");
System.out.println(output); // to ayiwasnotagoodday
Note that you could also avoid regex here, and just use substring operations:
String output = input.substring(0, 2) + " " + input.substring(3);
System.out.println(output); // to ayiwasnotagoodday

How to check and replace a sequence of characters in a String?

Here what the program is expectiong as the output:
if originalString = "CATCATICATAMCATCATGREATCATCAT";
Output should be "I AM GREAT".
The code must find the sequence of characters (CAT in this case), and remove them. Plus, the resulting String must have spaces in between words.
String origString = remixString.replace("CAT", "");
I figured out I have to use String.replace, But what could be the logic for finding out if its not cat and producing the resulting string with spaces in between the words.
First off, you probably want to use the replaceAll method instead, to make sure you replace all occurrences of "CAT" within the String. Then, you want to introduce spaces, so instead of an empty String, replace "CAT" with " " (space).
As pointed out by the comment below, there might be multiple spaces between words - so we use a regular expression to replace multiple instances of "CAT" with a single space. The '+' symbol means "one or more",.
Finally, trim the String to get rid of leading and trailing white space.
remixString.replaceAll("(CAT)+", " ").trim()
You can use replaceAll which accepts a regular expression:
String remixString = "CATCATICATAMCATCATGREATCATCAT";
String origString = remixString.replaceAll("(CAT)+", " ").trim();
Note: the naming of replace and replaceAll is very confusing. They both replace all instances of the matching string; the difference is that replace takes a literal text as an argument, while replaceAll takes a regular expression.
Maybe this will help
String result = remixString.replaceAll("(CAT){1,}", " ");

Split a string in Java containing (

I am trying to split a string in Java.
For example
Hello (1234)
The part after ( must not be included in the string.
I am expecting the following:
Hello
How would you do it?
Just split according to
zero or more spaces.
And the following ( character.
Then get the value from index 0 of splitted parts.
"Hello (1234)".split("\\s*\\(")[0];
or
"Hello (1234)".split("\\s+")[0];
You can replace the contents in the parenthesis by nothing.
String str = "Hello(1234)";
String result = str.replaceAll("\\(.*\\)", "");
System.out.println(result);
You mention split operation in the question, but you say
The part after ( must not be included in the string. I am expecting
the following:
So I'm assuming you are discarding the (1234) ? If you need to save it, consider the other answer (using split)
You may try the following regex
String[] split = "Hello (1234)".split("\\s*\\([\\d]*\\)*");
System.out.println(split[0]);
\\s* space may be ignored
\\( ( or left parenthesis has special meaning in regex so \\( means only (
Same is the case with ) or right parenthesis \\)
\\d* digits may be present
You may use + instead of * if that character is present at-least once

How to concatenate several strings with different format and then split them

Hi all.
I want to concatenate some strings without specified format in java. for example I want to concatenate multiple objects like signature and BigInteger and string, that all of them are converted to string. So i can not use of the specified delimiter because each delimiter may be exist in these strings. how i can concatenate these strings and then split them?
thanks all.
Use a well-defined format, like XML or JSON. Or choose a delimiter and escape every instance of this delimiter in each of the Strings. Or prepend the length of each part in the message. For example:
10/7/14-<10 chars of signature><7 chars of BigInteger><14 chars of string>
or
10-<10 chars of signature>7-<7 chars of BigInteger>14-<14 chars of string>
You can escape the delimiter in your string. For example, let's say you have the following strings:
String a = "abc;def";
String b = "12345:";
String c = "99;red:balloons";
You want to be able to do something like this
String concat = a + delim + b + delim + c;
String[] tokens = concat.split(delim);
But if our delim is ";" then quite clearly this will not suffice, as we will have 5 tokens, and not 3. We could use a set of possible delimiters, search the strings for those delimiters, and then use the first one that isn't in the target strings, but this has two problems. First, how do we know which delimiter was used? Second, what if all delimiters exist in the strings? That's not a valid solution, and it's certainly not robust.
We can get around this by using an escape delimiter. Let us use ":" as our escape delimiter. We can use it to say "The next character is just a regular old character, it doesn't mean anything important."
So if we did this:
String aEscaped = a.replace(";",":;");
String bEscaped = b.replace(";",":;");
String cEscaped = c.replace(";",":;");
Then, we can split the concat'd string like
String tokens = concat.split("[^:];")
But there is one problem: What if our text actually contains ":;" or ends with ":"? Either way, these will produce false positives. In this case, we must also escape our escape character. It basically says the same thing as before: "The next character does nothing special."
So now our escaped strings become:
// note we escape our escape token first, otherwise we'll escape
// real usages of the token
String aEscaped = a.replace(":","::").replace(";",":;");
String bEscaped = b.replace(":","::").replace(";",":;");
String cEscaped = c.replace(":","::").replace(";",":;");
And now, we must account for this in the regex. If someone knows a regex that works for this, they can feel free to edit it in. What occurs to me is something like concat.split("(::;|[^:];)") but it doesn't seem to get the job done. The job of parsing it would be pretty easy. I threw together a small test driver for it, and it seems to work just fine.
Code found at http://ideone.com/wUlyz
Result:
abc;def becomes abc:;def
ja:3fr becomes ja::3fr
; becomes :;
becomes
: becomes ::
83;:;:;;;; becomes 83:;:::;:::;:;:;:;
:; becomes :::;
Final product:
abc:;def;ja::3fr;:;;;::;83:;:::;:::;:;:;:;;:::;
Expected 'abc;def', Actual 'abc;def', Matches true
Expected 'ja:3fr', Actual 'ja:3fr', Matches true
Expected ';', Actual ';', Matches true
Expected '', Actual '', Matches true
Expected ':', Actual ':', Matches true
Expected '83;:;:;;;;', Actual '83;:;:;;;;', Matches true
Expected ':;', Actual ':;', Matches true
You concatenate using the concatenation operator(+) as below:
String str1 = "str1";
String str2 = "str2";
int inte = 2;
String result = str1+str2+inte;
But to split them back again you need some special character as delimiter as the split function in String works on delimiter.

java split string with regex

I want to split string by setting all non-alphabet as separator.
String[] word_list = line.split("[^a-zA-Z]");
But with the following input
11:11 Hello World
word_list contains many empty string before "hello" and "world"
Please kindly tell me why. Thank You.
Because your regular expression matches each individual non-alpha character. It would be like separating
",,,,,,Hello,World"
on commas.
You will want an expression that matches an entire sequence of non-alpha characters at once such as:
line.split("[^a-zA-Z][^a-zA-Z]*")
I still think you will get one leading empty string with your example since it would be like separating ",Hello,World" if comma were your separator.
Here's your string, where each ^ character shows a match for [^a-zA-Z]:
11:11 Hello World
^^^^^^ ^
The split method finds each of these matches, and basically returns all substrings between the ^ characters. Since there's six matches before any useful data, you end up with 5 empty substrings before you get the string "Hello".
To prevent this, you can manually filter the result to ignore any empty strings.
Will the following do?
String[] word_list = line.replaceAll("[^a-zA-Z ]","").replaceAll(" +", " ").trim().split("[^a-zA-Z]");
What I am doing here is removing all non-alphabet characters before doing the split and then replacing multiple spaces by a single space.

Categories

Resources