I want to replaceAll strings like:
"aaaa"
"zzzzzzz"
"----------"
"TTTTTT"
"...."
String contains only one char, but > 3 times.
I use Java. I can replace a specific char (like "a") with more than 3 times, but don't know how to do this with any char:
str = str.replaceAll("^[a]{4,}$", "");
Any idea? If this can't be done in regex, how would you do it?
Any char can be matched with . and Pattern.DOTALL modifier.
To check if it is the same, we can capture the first character and use a backreference to match the same text, and a limiting quantifier {3,} to check for at least 3 occurrences of the same substring.
See a regex and IDEONE demo:
List<String> strs = Arrays.asList("aaaa", "zzzzzzz", "----------", "TTTTTT", "....");
for (String str : strs)
System.out.println("\"" + str.replaceAll("(?s)^(.)\\1{3,}$", "") + "\"");
Related
I am working with Strings where I need to separate two chars/elements if there is a whitespace between them. I have seen a former post on SO about the same however it still has not worked for me as intended yet. As you would assume, I could just check if the String contains(" ") and then substring around the space. However my strings could possibly contains countless whitespaces at the end despite not having whitespace in between characters. Hence my question is "How do I detect a whitespace between two chars (numbers too) " ?
//Example with numbers in a String
String test = "2 2";
final Pattern P = Pattern.compile("^(\\d [\\d\\d] )*\\d$");
final Matcher m = P.matcher(test);
if (m.matches()) {
System.out.println("There is between space!");
}
You would use String.strip() to remove any leading or trailing whitespace, followed by String.split(). If there is a whitespace, the array will be of length 2 or greater. If there is not, it will be of length 1.
Example:
String test = " 2 2 ";
test = test.strip(); // Removes whitespace, test is now "2 2"
String[] testSplit = test.split(" "); // Splits the string, testSplit is ["2", "2"]
if (testSplit.length >= 2) {
System.out.println("There is whitespace!");
} else {
System.out.println("There is no whitespace");
}
If you need an array of a specified length, you can also specify a limit to split. For example:
"a b c".split(" ", 2); // Returns ["a", "b c"]
If you want a solution that only uses regex, the following regex matches any two groups of characters separated by a single space, with any amount of leading or trailing whitespace:
\s*(\S+\s\S+)\s*
Positive lookahead and lookbehind may also work if you use the regex (?<=\\w)\\s(?=\\w)
\w : a word character [a-zA-Z_0-9]
\\s : whitespace
(?<=\\w)\\s : positive lookbehind, matches if a whitespace preceeded by a \w
\\s(?=\\w) : positive lookahead, matches if a whitespace followed by a \w
List<String> testList = Arrays.asList("2 2", " 245 ");
Pattern p = Pattern.compile("(?<=\\w)\\s(?=\\w)");
for (String str : testList) {
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(str + "\t: There is a space!");
} else {
System.out.println(str + "\t: There is not a space!");
}
}
Output:
2 2 : There is a space!
245 : There is not a space!
The reason you pattern does not work as expected is because ^(\\d [\\d\\d] )*\\d$ which can be simplified to (\\d \\d )*\\d$ starts by repeating 0 or more times what is between the parenthesis.
Then it matches a digit at the end of the string. As the repetition is 0 or more times, it is optional and it would also match just a single digit.
If you want to check if there is a single space between 2 non whitespace chars:
\\S \\S
Regex demo | Java demo
final Pattern P = Pattern.compile("\\S \\S");
final Matcher m = P.matcher(test);
if (m.find()) {
System.out.println("There is between space!");
}
Here is the simplest way you can do it:
String testString = " Find if there is a space. ";
testString.trim(); //This removes all the leading and trailing spaces
testString.contains(" "); //Checks if the string contains a whitespace still
You can also use a shorthand method in one line by chaining the two methods:
String testString = " Find if there is a space. ";
testString.trim().contains(" ");
Use
String text = "2 2";
Matcher m = Pattern.compile("\\S\\s+\\S").matcher(text.trim());
if (m.find()) {
System.out.println("Space detected.");
}
Java code demo.
text.trim() will remove leading and trailing whitespaces, \S\s+\S pattern matches a non-whitespace, then one or more whitespace characters, and then a non-whitespace character again.
Sorry if this question has already been asked, but I could only find results of c#.
So I have this StringBuilder:
StringBuilder sb = new StringBuilder(" 111 11 ");
and I want to split it into an array using this method:
String[] ar = sb.toString().split(" ");
As expected the result array has some empty entries. My question is if I can remove these empty spaces directly when I split the StringBuilder or I have to do it afterwards.
split takes a regex. So:
String[] ar = sb.toString().split("\\s+");
The string \\s is regexp-ese for 'any whitespace', and the + is: 1 or more of it. If you want to split on spaces only (and not on newlines, tabs, etc), try: String[] ar = sb.toString().split(" +"); which is literally: "split on one or more spaces".
This trick works for just about any separator. For example, split on commas? Try: .split("\\s*,\\s*"), which is: 0 or more whitespace, a comma, followed by 0 or more whitespace (and regexes take as much as they can).
Note that this trick does NOT get rid of leading and trailing whitespace. But to do that, use trim. Putting it all together:
String[] ar = sb.toString().trim().split("\\s+");
and for commas:
String[] ar = sb.toString().trim().split("\\s*,\\s*");
I would use guava for this:
String t = " 111 11 ";
Splitter.on(Pattern.compile("\\s+"))
.omitEmptyStrings()
.split(t)
.forEach(System.out::println);
If you do not want to depend on any third party dependencies and do not want to regex filtering,
You can do it in one line with Java 8 Streams API:
Arrays.stream(sb.toString().trim().split(" ")).filter(s-> !s.equals("")).map(s -> s.trim()).toArray();
For a detailed multiline version of the previous:
Arrays.stream(sb.toString()
.trim() // Trim the starting and ending whitespaces from string
.split(" ")) // Split the regarding to spaces
.filter(s-> !s.equals("")) // Filter the non-empty elements from the stream
.map(s -> s.trim()) // Trim the starting and ending whitespaces from element
.toArray(); // Collect the elements to object array
Here is the working code for demonstration:
StringBuilder sb = new StringBuilder(" 111 11 ");
Object[] array = Arrays.stream(sb.toString().trim().split(" ")).filter(s-> !s.equals("")).map(s -> s.trim()).toArray();
System.out.println("(" + array[0] + ")");
System.out.println("(" + array[1] + ")");
There is couple of regex to deal with it, i would also prefer #rzwitserloot method,
but if you would like to see more.
Check it here : How do I split a string with any whitespace chars as delimiters?
glenatron has explained it :
In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:
\w - Matches any word character.
\W - Matches any nonword character.
\s - Matches any white-space character.
\S - Matches anything but white-space characters.
\d - Matches any digit.
\D - Matches anything except digits.
A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.
Thanks to glenatron
You can use turnkey solution from Apache Commons.
Here is an example:
StringBuilder sb = new StringBuilder(" 111 11 ");
String trimmedString = StringUtils.normalizeSpace(sb.toString());
String[] trimmedAr = trimmedString.split(" ");
System.out.println(Arrays.toString(trimmedAr));
Output: [111, 11].
I would like to split a String and despair on the regex pattern.
I need to split a string like this: Hi I want "to split" this (String) to a String array like this:
String [] array = {"Hi", "I", "want", """, "to", "split", """, "this", "(", "string", ")"};
This is what I have tried, but it deletes the delimiter.
public static void main(String[] args) {
String string = "Hi \"why should\" (this work)";
String[] array;
array = string.split("\\s"
+ "|\\s(?=\")"
+ "|\\w(?=\")"
+ "|\"(?=\\w)"
+ "|\\s(?=\\()"
+ "|\\w(?=\\))"
+ "|\\((?=\\w)");
for (String str : array) {
System.out.println(str);
}
}
Result:
Hi
why
shoul
"
this
wor
)
You can match the tokens with the regex \w+|[\w\s], assuming that you want the punctuation characters to end up in different tokens:
String input = "Hi I want \"to split\" this (String).";
Matcher matcher = Pattern.compile("\\w+|[^\\w\\s]").matcher(input);
List<String> out = new ArrayList<>();
while (matcher.find()) {
out.add(matcher.group());
}
The output ArrayList contains:
[Hi, I, want, ", to, split, ", this, (, String, ), .]
You might want to use (?U) flag to make the \w and \s follows the Unicode definition of word and whitespace character. By default, \w and \s only recognizes word and whitespace characters in ASCII range.
For the sake of completeness, here is the solution in split(), which works on Java 8 and above. There will be an extra empty string at the beginning in Java 7.
String tokens[] = input.split("\\s+|(?<![\\w\\s])(?=\\w)|(?<=\\w)(?![\\w\\s])|(?<=[^\\w\\s])(?=[^\\w\\s])");
The regex is rather complex, since the empty string splits between punctuation character and word character need to avoid the cases already split by \s+.
Since the regex in the split solution is quite a mess, please use the match solution instead.
What language are you trying to write this in?
You could write regex groups something like: (.+)(\s)
This would match any quantity of characters followed by a space
I am writing a parser for a file containing the following string pattern:
Key : value
Key : value
Key : value
etc...
I am able to retrieve those lines one by one into a list. What I would like to do is to separate the key from the value for each one of those strings. I know there is the split() method that can take a Regex and do this for me, but I am very unfamiliar with them so I don't know what Regex to give as a parameter to the split() function.
Also, while not in the specifications of the file I am parsing, I would like for that Regex to be able to recognize the following patterns as well (if possible):
Key: value
Key :value
Key:value
etc...
So basically, whether there's a space or not after/before/after AND before the : character, I would like for that Regex to be able to detect it. What is the Regex that can achieve this?
In other words split method should look for : and zero or more whitespaces before or after it.
Key: value
^^
Key :value
^^
Key:value
^
Key : value
^^^
In that case split("\\s*:\\s*") should do the trick.
Explanation:
\\s represents any whitespace
* means one or more occurrences of element described before it
\\s* means zero or more whitespaces.
On the other hand you may want also to find entire key:value pair and place parts matching key and value in separate groups (you can even name groups as you like using (?<groupName>regex)). In that case you may use
Pattern p = Pattern.compile("(?<key>\\w+)\\s*:\\s*(?<value>\\w+)");
Matcher m = p.matcher(yourData);
while(m.find()){
System.out.println("key = " + m.group("key"));
System.out.println("value = " + m.group("value"));
System.out.println("--------");
}
If you want to use String.split(), you could use this:
String input = "key : value";
String[] s = input.split("\\s*:\\s*");
String key = s[0];
String value = s[1];
This will split the String at the ":", but add all whitespaces in front of the ":" to it, so that you will receive a trimmed string.
Explanation:
\\s* will match any whitespace, by default this is equal to [ \\n\\r\\t]*
The : in between the two \\s* means that your : need to be there
Note that this solution will cause an ArrayIndexOutOfBoundsException if your input line does not contain the key-value-format as you defined it.
If you are not sure if the line really contain the key-value-String, maybe because you want to have an empty line at the end of your file like there normally is, you could do it like that:
String input = "key : value";
Matcher m = Pattern.compile("(\\S+)\\s*:\\s*(.+)").matcher(input);
if (m.matches())
{
String key = m.group(1); // note that the count starts by 1 here
String value = m.group(2);
}
Explanation:
\\S+ matches any non-whitespace String - if it contains whitespaces, the next part of the regex will be matches with this expression already. Note that the () around it mark so that you can get it's value by m.group().
\\s* will match any whitespace, by default this is equal to [ \\n\\r\\t]*
The : in between the two \\s* means that your : need to be there
The last group, .+, will match any string, containing whitespaces and so on.
you can use the split method but can pass delimiter as ":"
This splits the string when it sees ':', then you can trim the values to get the key and value.
String s = " keys : value ";
String keyValuePairs[] = s.split(":");
String key = keyValuePairs[0].trim();
String value = keyValuePairs[1].trim();
You can also make use of regex to simplify it.
String keyValuePairs[] = s.trim().split("[ ]*:[ ]*");
s.trim() will remove the spaces before and after the string (if you have it in your case), So sting will become "keys : value" and
[ ]*:[ ]*
to split the string with regular expression saying spaces (one or more) : spaces (one or more) as delimiter.
For a pure regex solution, you can use the following pattern (note the space at the beginning):
?: ?
See http://regexr.com/39evh
String[] tokensVal = str.split(":");
String key = tokensVal[0].trim();
String value = tokensVal[1].trim();
I am trying to use a simple split to break up the following string: 00-00000
My expression is: ^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9])
And my usage is:
String s = "00-00000";
String pattern = "^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9])";
String[] parts = s.split(pattern);
If I play around with the Pattern and Matcher classes I can see that my pattern does match and the matcher tells me my groupCount is 7 which is correct. But when I try and split them I have no luck.
String.split does not use capturing groups as its result. It finds whatever matches and uses that as the delimiter. So the resulting String[] are substrings in between what the regex matches. As it is the regex matches the whole string, and with the whole string as a delimiter there is nothing else left so it returns an empty array.
If you want to use regex capturing groups you will have to use Matcher.group(), String.split() will not do.
for your example, you could simply do this:
String s = "00-00000";
String pattern = "-";
String[] parts = s.split(pattern);
I can not be sure, but I think what you are trying to do is to get each matched group into an array.
Matcher matcher = Pattern.compile(pattern).matcher();
if (matcher.matches()) {
String s[] = new String[matcher.groupCount()) {
for (int i=0;i<matches.groupCount();i++) {
s[i] = matcher.group(i);
}
}
}
From the documentation:
String[] split(String regex) -- Returns: the array of strings computed by splitting this string around matches of the given regular expression
Essentially the regular expression is used to define delimiters in the input string. You can use capturing groups and backreferences in your pattern (e.g. for lookarounds), but ultimately what matters is what and where the pattern matches, because that defines what goes into the returned array.
If you want to split your original string into 7 parts using regular expression, then you can do something like this:
String s = "12-3456";
String[] parts = s.split("(?!^)");
System.out.println(parts.length); // prints "7"
for (String part : parts) {
System.out.println("Part [" + part + "]");
} // prints "[1] [2] [-] [3] [4] [5] [6] "
This splits on zero-length matching assertion (?!^), which is anywhere except before the first character in the string. This prevents the empty string to be the first element in the array, and trailing empty string is already discarded because we use the default limit parameter to split.
Using regular expression to get individual character of a string like this is an overkill, though. If you have only a few characters, then the most concise option is to use foreach on the toCharArray():
for (char ch : "12-3456".toCharArray()) {
System.out.print("[" + ch + "] ");
}
This is not the most efficient option if you have a longer string.
Splitting on -
This may also be what you're looking for:
String s = "12-3456";
String[] parts = s.split("-");
System.out.println(parts.length); // prints "2"
for (String part : parts) {
System.out.print("[" + part + "] ");
} // prints "[12] [3456] "