String.replaceAll Strange Behaviour - java

String s = "hi hello";
s = s.replaceAll("\\s*", " ");
System.out.println(s);
I have the code above, but I can't work out why it produces
h i h e l l o
rather than
hi hello
Many thanks

Use + quantifier to match 1 or more spaces instead of *: -
s = s.replaceAll("\\s+", " ");
\\s* means match 0 or more spaces, and will match an empty character before every character and is replaced by a space.

The * matches 0 or more spaces, I think you want to change it to + to match 1 or more spaces.

Related

Checking if there is whitespace between two elements in a String

I am working with Strings where I need to separate two chars/elements if there is a whitespace between them. I have seen a former post on SO about the same however it still has not worked for me as intended yet. As you would assume, I could just check if the String contains(" ") and then substring around the space. However my strings could possibly contains countless whitespaces at the end despite not having whitespace in between characters. Hence my question is "How do I detect a whitespace between two chars (numbers too) " ?
//Example with numbers in a String
String test = "2 2";
final Pattern P = Pattern.compile("^(\\d [\\d\\d] )*\\d$");
final Matcher m = P.matcher(test);
if (m.matches()) {
System.out.println("There is between space!");
}
You would use String.strip() to remove any leading or trailing whitespace, followed by String.split(). If there is a whitespace, the array will be of length 2 or greater. If there is not, it will be of length 1.
Example:
String test = " 2 2 ";
test = test.strip(); // Removes whitespace, test is now "2 2"
String[] testSplit = test.split(" "); // Splits the string, testSplit is ["2", "2"]
if (testSplit.length >= 2) {
System.out.println("There is whitespace!");
} else {
System.out.println("There is no whitespace");
}
If you need an array of a specified length, you can also specify a limit to split. For example:
"a b c".split(" ", 2); // Returns ["a", "b c"]
If you want a solution that only uses regex, the following regex matches any two groups of characters separated by a single space, with any amount of leading or trailing whitespace:
\s*(\S+\s\S+)\s*
Positive lookahead and lookbehind may also work if you use the regex (?<=\\w)\\s(?=\\w)
\w : a word character [a-zA-Z_0-9]
\\s : whitespace
(?<=\\w)\\s : positive lookbehind, matches if a whitespace preceeded by a \w
\\s(?=\\w) : positive lookahead, matches if a whitespace followed by a \w
List<String> testList = Arrays.asList("2 2", " 245 ");
Pattern p = Pattern.compile("(?<=\\w)\\s(?=\\w)");
for (String str : testList) {
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(str + "\t: There is a space!");
} else {
System.out.println(str + "\t: There is not a space!");
}
}
Output:
2 2 : There is a space!
245 : There is not a space!
The reason you pattern does not work as expected is because ^(\\d [\\d\\d] )*\\d$ which can be simplified to (\\d \\d )*\\d$ starts by repeating 0 or more times what is between the parenthesis.
Then it matches a digit at the end of the string. As the repetition is 0 or more times, it is optional and it would also match just a single digit.
If you want to check if there is a single space between 2 non whitespace chars:
\\S \\S
Regex demo | Java demo
final Pattern P = Pattern.compile("\\S \\S");
final Matcher m = P.matcher(test);
if (m.find()) {
System.out.println("There is between space!");
}
Here is the simplest way you can do it:
String testString = " Find if there is a space. ";
testString.trim(); //This removes all the leading and trailing spaces
testString.contains(" "); //Checks if the string contains a whitespace still
You can also use a shorthand method in one line by chaining the two methods:
String testString = " Find if there is a space. ";
testString.trim().contains(" ");
Use
String text = "2 2";
Matcher m = Pattern.compile("\\S\\s+\\S").matcher(text.trim());
if (m.find()) {
System.out.println("Space detected.");
}
Java code demo.
text.trim() will remove leading and trailing whitespaces, \S\s+\S pattern matches a non-whitespace, then one or more whitespace characters, and then a non-whitespace character again.

In java replace the regex with string

In java I am using regex \".*?\".
I used this for replacing all the string with doublequote with a term String.
Ex:
INPUT: Functions.unescapeJson("test")
Result : Functions.unescapeJson("String")
But now I wanted to exclude some string if they contains double quote. So, I am using / as the escape character. How to achieve this.
Ex:
INPUT: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.con"),"payloads_ul.dataFrameOutput"),"[/"Dimming Value/"]")
RESULT: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(String), String),String),String)
But the result I am getting if I use the previous regex is:
Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(input.mIntegerm/:sgn.nev.rep), String),String),StringDimming ValueString)
How to achieve this using regex if it finds / it should neglect without replacing original string.
The code that I am using
public static void main(String[] args) {
String STRINGVALIDATIONREGEX = "\".*?\"";
String formula = "Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(input.m2m/:sgn.nev.rep), \"m2m:cin.con\"),\"payloads_ul.dataFrameOutput\"),\"[\"Dimming Value\"]\")";
System.out.println(formula.replace(STRINGVALIDATIONREGEX, "String"));
}
You can use this regex:
\"(\/?.)*?\"
Use [^/] to match anything that is not a slash.
For example, [^/]?\".*[^/]?\" would catch quotes not preceded by /
"((?:[^"]|(?<=\/)")*)"
" match a "
[^"] match a non-quote character
| or
(?<=\/)") a quote character that is preceded by a /
* match sub-expressions 2 - 4 zero or more times.
" match a "
See Regex demo
If you believe that a string such as "abc/" is invalid, then you should use the stricter regex:
"((?:[^"\/]|\/")*)"
" match "
[^"\/] match a any character that isn't a quote for /
| or
\/" match a /" combination
* match sub-expressions 2 - 4 zero or more times.
" match a "
See Regex demo

Turn zeroes in a string to spaces

Suppose I have a string:
String s = "34205478200044520042";
I want to change this string so that a 0 becomes a space, but consecutive 0s are treated as only 1 space still.
Meaning the new string would look like "342 54782 4452 42"
How do you treat any number of 0s to equal one space?
The regular expression 0+ matches any number of zeroes, because + in a regular expression means "one or more of the preceding subexpression". So you can write
String newString = myString.replaceAll("0+", " ");
You can use the replaceAll method to replace a regular expression of a sequence of 0 characters with a space:
String result = s.replaceAll("0+", " ");
You can use regular expressions to achieve this:
System.out.println(s.replaceAll("[0]+", " "));
// + indicates that the character set (in this case "0") must match at least once.
// Displays 342 54782 4452 42

Replacing a special character (along with space is considered same)

Have a look at this string String str = "first,second, there"; . there are , and , ( there's a space after the second comma). How do I express it in RegEx as .replaceAll()'s parameters so the
output would be :
"first second there". <-- the amount on each space will be same.
I had tried some combinations but still fail. One of them is :
String temp2 = str.replaceAll("[\\,\\, ]", " "); will print first second there.
Thanks before.
Simply use , * to match comma followed by zero or more spaces and replace with single space.
String str = "first,second, there";
System.out.println(str.replaceAll(", *", " "));
output:
first second there
Read more about Java Pattern
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
String temp2 = str.replaceAll(", ?", " ");
The ? Means optional (ie zero or once), or
String temp2 = str.replaceAll(", *", " ");
Where the * means zero or more (many) spaces

Why are my character and word counts off?

Given the following string:
String text = "The woods are\nlovely,\t\tdark and deep.";
I want all whitespace treated as a single character. So for instance, the \n is 1 char. The \t\t should also be 1 char. With that logic, I count 36 characters and 7 words. But when I run this through the following code:
String text = "The woods are\nlovely,\t\tdark and deep.";
int numNewCharacters = 0;
for(int i=0; i < text.length(); i++)
if(!Character.isWhitespace(text.charAt(i)))
numNewCharacters++;
int numNewWords = text.split("\\s").length;
// Prints "30"
System.out.println("Chars:" + numNewCharacters);
// Prints "8"
System.out.println("Words:" + numNewWords);
It's telling me that there are 30 characters and 8 words. Any ideas as to why? Thanks in advance.
You are matching on individual whitespaces. Instead you could match on one or more:
text.split("\\s+")
You are counting only non white space characters in the first loop - so not counting space etc at all. Then 30 is the right answer. As for the second - I suspect split is treating consecutive white spaces as distinct, so there is a "null" word between the two tabs.
Reimueus has already solved your word count problem:
text.split("\\s+")
And your character count is corret. Newlines \n and tabs \t are considered whitespace. If you don't want them to be, you can implement your own isWhitespace function.
Here is the complete solution to counting words and characters:
System.out.println("Characters: " + text.replaceAll("\\s+", " ").length());
Matcher m = Pattern.compile("[^\\s]+", Pattern.MULTILINE).matcher(text);
int wordCount = 0;
while (m.find()) {
wordCount ++;
}
System.out.println("Words: "+ wordCount);
Character count is accomplished by replacing all whitespaces groups to a single space and just taking the resulting string's length;
For word count we create a pattern that will match any char group which does not contain a whitespace. You could use \\w+ pattern here, but it will match only alphanumeric characters and underscore. Note also Pattern.MULTILINE parameter.

Categories

Resources