This question already has answers here:
Java String split removed empty values
(5 answers)
Closed 3 years ago.
I am actually working on a software that requires to read text files with some features that won't be explained here. While testing my code, I've found an anomaly which seems to come from the implementation of str.split("\r\n"), where str is a substring of the file's content.
When my substring ends with a succession of "\r\n" (several line breaks), the method completely neglects this part. For example, if I work with the following string:
"\r\nLine 1\r\n\r\nLine 2\r\n\r\n"
, I would like to get the following array;
["", "Line 1", "", "Line 2", "", ""]
, but it returns:
["", "Line 1", "", "Line 2"]
The String.split() Javadoc only notifies this without explaining:
... Trailing empty strings are therefore not included in the resulting array.
I cannot understand this asymmetry; why did they neglect empty string at the end, but not at the beginning?
The Javadocs explain why it works the way it does; you'd have to ask them why they chose this default implementation. Why not just call split(regex, n) as per the docs? Using -1 does what you say you want, just like the docs imply.
class Main {
public static void main(String[] args) {
String s = "\r\nLine 1\r\n\r\nLine 2\r\n\r\n";
String[] r = s.split("\\r\\n", -1);
for (int i = 0; i < r.length; i++) {
System.out.println("i: " + i + " = \"" + r[i] + "\"");
}
}
}
Produces:
i: 0 = ""
i: 1 = "Line 1"
i: 2 = ""
i: 3 = "Line 2"
i: 4 = ""
i: 5 = ""
You missed the part of the doc that explains the therefore, which states:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero.
Looking at the referenced two-arg doc shows
If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
So this is just not the special case you want. Call with a negative integer instead:
str.split("\r\n", -1)
It's unclear why the authors thought 0 would be a more popular use-case than -1, but it doesn't really matter since the option you want exists.
I have the following problem which states
Replace all characters in a string with + symbol except instances of the given string in the method
so for example if the string given was abc123efg and they want me to replace every character except every instance of 123 then it would become +++123+++.
I figured a regular expression is probably the best for this and I came up with this.
str.replaceAll("[^str]","+")
where str is a variable, but its not letting me use the method without putting it in quotations. If I just want to replace the variable string str how can I do that? I ran it with the string manually typed and it worked on the method, but can I just input a variable?
as of right now I believe its looking for the string "str" and not the variable string.
Here is the output its right for so many cases except for two :(
List of open test cases:
plusOut("12xy34", "xy") → "++xy++"
plusOut("12xy34", "1") → "1+++++"
plusOut("12xy34xyabcxy", "xy") → "++xy++xy+++xy"
plusOut("abXYabcXYZ", "ab") → "ab++ab++++"
plusOut("abXYabcXYZ", "abc") → "++++abc+++"
plusOut("abXYabcXYZ", "XY") → "++XY+++XY+"
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
plusOut("--++ab", "++") → "++++++"
plusOut("aaxxxxbb", "xx") → "++xxxx++"
plusOut("123123", "3") → "++3++3"
Looks like this is the plusOut problem on CodingBat.
I had 3 solutions to this problem, and wrote a new streaming solution just for fun.
Solution 1: Loop and check
Create a StringBuilder out of the input string, and check for the word at every position. Replace the character if doesn't match, and skip the length of the word if found.
public String plusOut(String str, String word) {
StringBuilder out = new StringBuilder(str);
for (int i = 0; i < out.length(); ) {
if (!str.startsWith(word, i))
out.setCharAt(i++, '+');
else
i += word.length();
}
return out.toString();
}
This is probably the expected answer for a beginner programmer, though there is an assumption that the string doesn't contain any astral plane character, which would be represented by 2 char instead of 1.
Solution 2: Replace the word with a marker, replace the rest, then restore the word
public String plusOut(String str, String word) {
return str.replaceAll(java.util.regex.Pattern.quote(word), "#").replaceAll("[^#]", "+").replaceAll("#", word);
}
Not a proper solution since it assumes that a certain character or sequence of character doesn't appear in the string.
Note the use of Pattern.quote to prevent the word being interpreted as regex syntax by replaceAll method.
Solution 3: Regex with \G
public String plusOut(String str, String word) {
word = java.util.regex.Pattern.quote(word);
return str.replaceAll("\\G((?:" + word + ")*+).", "$1+");
}
Construct regex \G((?:word)*+)., which does more or less what solution 1 is doing:
\G makes sure the match starts from where the previous match leaves off
((?:word)*+) picks out 0 or more instance of word - if any, so that we can keep them in the replacement with $1. The key here is the possessive quantifier *+, which forces the regex to keep any instance of the word it finds. Otherwise, the regex will not work correctly when the word appear at the end of the string, as the regex backtracks to match .
. will not be part of any word, since the previous part already picks out all consecutive appearances of word and disallow backtrack. We will replace this with +
Solution 4: Streaming
public String plusOut(String str, String word) {
return String.join(word,
Arrays.stream(str.split(java.util.regex.Pattern.quote(word), -1))
.map((String s) -> s.replaceAll("(?s:.)", "+"))
.collect(Collectors.toList()));
}
The idea is to split the string by word, do the replacement on the rest, and join them back with word using String.join method.
Same as above, we need Pattern.quote to avoid split interpreting the word as regex. Since split by default removes empty string at the end of the array, we need to use -1 in the second parameter to make split leave those empty strings alone.
Then we create a stream out of the array and replace the rest as strings of +. In Java 11, we can use s -> String.repeat(s.length()) instead.
The rest is just converting the Stream to an Iterable (List in this case) and joining them for the result
This is a bit trickier than you might initially think because you don't just need to match characters, but the absence of specific phrase - a negated character set is not enough. If the string is 123, you would need:
(?<=^|123)(?!123).*?(?=123|$)
https://regex101.com/r/EZWMqM/1/
That is - lookbehind for the start of the string or "123", make sure the current position is not followed by 123, then lazy-repeat any character until lookahead matches "123" or the end of the string. This will match all characters which are not in a "123" substring. Then, you need to replace each character with a +, after which you can use appendReplacement and a StringBuffer to create the result string:
String inputPhrase = "123";
String inputStr = "abc123efg123123hij";
StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("(?<=^|" + inputPhrase + ")(?!" + inputPhrase + ").*?(?=" + inputPhrase + "|$)");
Matcher m = regex.matcher(inputStr);
while (m.find()) {
String replacement = m.group(0).replaceAll(".", "+");
m.appendReplacement(resultString, replacement);
}
m.appendTail(resultString);
System.out.println(resultString.toString());
Output:
+++123+++123123+++
Note that if the inputPhrase can contain character with a special meaning in a regular expression, you'll have to escape them first before concatenating into the pattern.
You can do it in one line:
input = input.replaceAll("((?:" + str + ")+)?(?!" + str + ").((?:" + str + ")+)?", "$1+$2");
This optionally captures "123" either side of each character and puts them back (a blank if there's no "123"):
So instead of coming up with a regular expression that matches the absence of a string. We might as well just match the selected phrase and append + the number of skipped characters.
StringBuilder sb = new StringBuilder();
Matcher m = Pattern.compile(Pattern.quote(str)).matcher(input);
while (m.find()) {
for (int i = 0; i < m.start(); i++) sb.append('+');
sb.append(str);
}
int remaining = input.length() - sb.length();
for (int i = 0; i < remaining; i++) {
sb.append('+');
}
Absolutely just for the fun of it, a solution using CharBuffer (unexpectedly it took a lot more that I initially hoped for):
private static String plusOutCharBuffer(String input, String match) {
int size = match.length();
CharBuffer cb = CharBuffer.wrap(input.toCharArray());
CharBuffer word = CharBuffer.wrap(match);
int x = 0;
for (; cb.remaining() > 0;) {
if (!cb.subSequence(0, size < cb.remaining() ? size : cb.remaining()).equals(word)) {
cb.put(x, '+');
cb.clear().position(++x);
} else {
cb.clear().position(x = x + size);
}
}
return cb.clear().toString();
}
To make this work you need a beast of a pattern. Let's say you you are operating on the following test case as an example:
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
What you need to do is build a series of clauses in your pattern to match a single character at a time:
Any character that is NOT "X", "Y" or "Z" -- [^XYZ]
Any "X" not followed by "YZ" -- X(?!YZ)
Any "Y" not preceded by "X" -- (?<!X)Y
Any "Y" not followed by "Z" -- Y(?!Z)
Any "Z" not preceded by "XY" -- (?<!XY)Z
An example of this replacement can be found here: https://regex101.com/r/jK5wU3/4
Here is an example of how this might work (most certainly not optimized, but it works):
import java.util.regex.Pattern;
public class Test {
public static void plusOut(String text, String exclude) {
StringBuilder pattern = new StringBuilder("");
for (int i=0; i<exclude.length(); i++) {
Character target = exclude.charAt(i);
String prefix = (i > 0) ? exclude.substring(0, i) : "";
String postfix = (i < exclude.length() - 1) ? exclude.substring(i+1) : "";
// add the look-behind (?<!X)Y
if (!prefix.isEmpty()) {
pattern.append("(?<!").append(Pattern.quote(prefix)).append(")")
.append(Pattern.quote(target.toString())).append("|");
}
// add the look-ahead X(?!YZ)
if (!postfix.isEmpty()) {
pattern.append(Pattern.quote(target.toString()))
.append("(?!").append(Pattern.quote(postfix)).append(")|");
}
}
// add in the other character exclusion
pattern.append("[^" + Pattern.quote(exclude) + "]");
System.out.println(text.replaceAll(pattern.toString(), "+"));
}
public static void main(String [] args) {
plusOut("12xy34", "xy");
plusOut("12xy34", "1");
plusOut("12xy34xyabcxy", "xy");
plusOut("abXYabcXYZ", "ab");
plusOut("abXYabcXYZ", "abc");
plusOut("abXYabcXYZ", "XY");
plusOut("abXYxyzXYZ", "XYZ");
plusOut("--++ab", "++");
plusOut("aaxxxxbb", "xx");
plusOut("123123", "3");
}
}
UPDATE: Even this doesn't quite work because it can't deal with exclusions that are just repeated characters, like "xx". Regular expressions are most definitely not the right tool for this, but I thought it might be possible. After poking around, I'm not so sure a pattern even exists that might make this work.
The problem in your solution that you put a set of instance string str.replaceAll("[^str]","+") which it will exclude any character from the variable str and that will not solve your problem
EX: when you try str.replaceAll("[^XYZ]","+") it will exclude any combination of character X , character Y and character Z from your replacing method so you will get "++XY+++XYZ".
Actually you should exclude a sequence of characters instead in str.replaceAll.
You can do it by using capture group of characters like (XYZ) then use a negative lookahead to match a string which does not contain characters sequence : ^((?!XYZ).)*$
Check this solution for more info about this problem but you should know that it may be complicated to find regular expression to do that directly.
I have found two simple solutions for this problem :
Solution 1:
You can implement a method to replace all characters with '+' except the instance of given string:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
for(int i = 0; i < str.length(); i++){
// exclude any instance string of exWord from replacing process in str
if(str.substring(i, str.length()).indexOf(exWord) + i == i){
i = i + exWord.length()-1;
}
else{
str = str.substring(0,i) + "+" + str.substring(i+1);//replace each character with '+' symbol
}
}
Note : str.substring(i, str.length()).indexOf(exWord) + i this if statement will exclude any instance string of exWord from replacing process in str.
Output:
+++++++XYZ
Solution 2:
You can try this Approach using ReplaceAll method and it doesn't need any complex regular expression:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
str = str.replaceAll(exWord,"*"); // replace instance string with * symbol
str = str.replaceAll("[^*]","+"); // replace all characters with + symbol except *
str = str.replaceAll("\\*",exWord); // replace * symbol with instance string
Note : This solution will work only if your input string str doesn't contain any * symbol.
Also you should escape any character with a special meaning in a regular expression in phrase instance string exWord like : exWord = "++".
I have a string of comma-separated user-ids and I want to eliminate/remove specific user-id from a string.
I’ve following possibilities of string and expected the result
int elimiateUserId = 11;
String css1 = "11,22,33,44,55";
String css2 = "22,33,11,44,55";
String css3 = "22,33,44,55,11";
// The expected result in all cases, after replacement, should be:
// "22,33,44,55"
I tried the following:
String result = css#.replaceAll("," + elimiateUserId, ""); // # = 1 or 2 or 3
result = css#.replaceAll(elimiateUserId + "," , "");
This logic fails in case of css3. Please suggest me a proper solution for this issue.
Note: I'm working with Java 7
I checked around the following posts, but could not find any solution:
Java String.replaceAll regex
java String.replaceAll regex question
Java 1.3 String.replaceAll() , replacement
You can use the Stream API in Java 8:
int elimiateUserId = 11;
String css1 = "11,22,33,44,55";
String css1Result = Stream.of(css1.split(","))
.filter(value -> !String.valueOf(elimiateUserId).equals(value))
.collect(Collectors.joining(","));
// css1Result = 22,33,44,55
If you want to use regex, you may use (remember to properly escape as java string literal)
,\b11\b|\b11\b,
This will ensure that 11 won't be matched as part of another number due to the word boundaries and only one comma (if two are present) is matched and removed.
You may build a regex like
^11,|,11\b
that will match 11, at the start of a string (^11,) or (|) ,11 not followed with any other word char (,11\b).
See the regex demo.
int elimiate_user_id = 11;
String pattern = "^" + elimiate_user_id + ",|," + elimiate_user_id + "\\b";
System.out.println("11,22,33,44,55,111".replaceAll(pattern, "")); // => 22,33,44,55,111
System.out.println("22,33,11,44,55,111".replaceAll(pattern, "")); // => 22,33,44,55,111
System.out.println("22,33,44,55,111,11".replaceAll(pattern, "")); // => 22,33,44,55,111
See the Java demo
Try to (^(11)(?:,))|((?<=,)(11)(?:,))|(,11$) expression to replaceAll:
final String regexp = MessageFormat.format("(^({0})(?:,))|((?<=,)({0})(?:,))|(,{0}$)", elimiateUserId)
String result = css#.replaceAll(regexp, "") //for all cases.
Here is an example:
https://regex101.com/r/LwJgRu/3
try this:
String result = css#.replaceAll("," + elimiateUserId, "")
.replaceAll(elimiateUserId + "," , "");
You can use two replace in one shot like :
int elimiateUserId = 11;
String result = css#.replace("," + elimiateUserId , "").replace(elimiateUserId + ",", "");
If your string is like ,11 the the first replace will do replace it with empty
If your string is like 11, the the second replace will do replace it with empty
result
11,22,33,44,55 -> 22,33,44,55
22,33,11,44,55 -> 22,33,44,55
22,33,44,55,11 -> 22,33,44,55
ideone demo
String result = css#.replaceAll("," + eliminate_user_id + "\b|\b" + eliminate_user_id + ",", '');
The regular expression here is:
, A leading comma.
eliminate_user_id I assumed the missing 'n' here was a typo.
\b Word boundary: word/number characters end here.
| OR
\b Word boundary: word/number characters begin here.
eliminate_user_id again.
, A trailing comma.
The word boundary marker, matching the beginning or end of a "word", is the magic here. It means that the 11 will match in these strings:
11,22,33,44,55
22,33,11,44,55
22,33,44,55,11
But not these strings:
111,112,113,114
411,311,211,111
There's a cleaner way, though:
String result = css#.replaceAll("(,?)\b" + eliminate_user_id + "\b(?(1)|,)", "");
The regular expression here is:
( A capturing group - what's in here, is in group 1.
,? An optional leading comma.
) End the capturing group.
\b Word boundary: word/number characters begin here.
eliminate_user_id I assumed the missing 'n' here was a typo.
\b Word boundary: word/number characters end here.
(?(1) If there's something in group 1, then require...
| ...nothing, but if there was nothing, then require...
, A trailing comma.
) end the if.
The "if" part here is a little unusual - you can find a little more information on regex conditionals here: http://www.regular-expressions.info/conditional.html
I am not sure if Java supports regex conditionals. Some posts here (Conditional Regular Expression in Java?) suggest that it does not :(
Side-note: for performance, if the list is VERY long and there are VERY many removals to be performed, the most obvious option is to just run the above line for each number to be removed:
String css = "11,22,33,44,55,66,77,88,99,1010,1111,1212,...";
Array<String> removals = ["11", "33", "55", "77", "99", "1212"];
for (i=0; i<removals.length; i++) {
css = css.replaceAll("," + removals[i] + "\b|\b" + eliminate_user_id + ",", "");
}
(code not tested: don't have access to a Java compiler here)
This will be fast enough (worst case scales with about O(m*n) for m removals from a string of n ids), but we can maybe do better.
One is to build the regex to be \b(11,42,18,13,123,...etc)\b - that is, make the regex search for all ids to be removed at the same time. In theory this scales a little worse, scaling with O(m*n) in every case rather than jut the worst case, but in practice should be considerably faster.
String css = "11,22,33,44,55,66,77,88,99,1010,1111,1212,...";
Array<String> removals = ["11", "33", "55", "77", "99", "1212"];
String removalsStr = String.join("|", removals);
css = css.replaceAll("," + removalsStr + "\b|\b" + removalsStr + ",", "");
But another approach might be to build a hashtable of the ids in the long string, then remove all the ids from the hashtable, then concatenate the remaining hashtable keys back into a string. Since hashtable lookups are effectively O(1) for sparse hashtables, that makes this scale with O(n). The tradeoff here is the extra memory for that hashtable, though.
(I don't think I can do this version without a java compiler handy. I would not recommend this approach unless you have a VAST (many thousands) list of IDs to remove, anyway, as it will be much uglier and more complex code).
I think its safer to maintain a whitelist and then use it as a reference to make further changes.
List<String> whitelist = Arrays.asList("22", "33", "44", "55");
String s = "22,33,44,55,11";
String[] sArr = s.split(",");
StringBuilder ids = new StringBuilder();
for (String id : sArr) {
if (whitelist.contains(id)) {
ids.append(id).append(", ");
}
}
String r = ids.substring(0, ids.length() - 2);
System.out.println(r);
If you need a solution with Regex, then the following works perfectly.
int elimiate_user_id = 11;
String css1 = "11,22,33,44,55";
String css2 = "22,33,11,44,55";
String css3 = "22,33,44,55,11";
String resultCss=css1.replaceAll(elimiate_user_id+"[,]*", "").replaceAll(",$", "");
I works with all types of input you desire.
This should work
replaceAll("(11,|,11)", "")
At least when you can guarantee when there is no 311, or ,113 or so