What is this Java regex code doing?

What is this Java regex code doing? - java

I just found this method inside a "Utils"-type class in our codebase. It was written a long time ago by a developer who no longer works for us. What in tarnation is it doing? What is it returning?!? Of course, there's no JavaDocs or comments.
public static String stripChars(String toChar, String ptn){
String stripped = "";
stripped = toChar.replaceAll(ptn, "$1");
return stripped.trim();
}
Thanks in advance!

It's a very short alias, essentially. This:
stripChars(a, b)
Is equivalent to:
a.replaceAll(b, "$1").trim()

It seems to replace everything in "toChar" which matches the regular expression "ptn" with the first group to match in "toChar"
Regular expressions have a concept of groups, for example matching "year 2012" and replacing it with "year 1012", or "year 2006" with "year 1007" (changing the first 20 to 10) can be accomplished by replacing
"year 20([0-9][9-9])" with "year 20$1" -- That is, match the entire string, and then replace it "year 20" followed by the first group ($1). The group is the first thing in parenthesis.
Anyway, your method then replaces everything that matches "ptn" in "toChar" with the first group in the regular expression "ptn". So given
stripChars("year 2012", "year 20([0-9][9-9]"); You would receive back only "12" because the entire text would match and be replaced by only the first group.
It then trips any leading or trailing whitespace.

The pattern string that is passed as argument method seems to contain a matching group and the call to replace all is going to replace the entire match to the paatern with the portion that matched the first group. You should look for the call hierarchy of this method to find some regexes passed to the method along with the strings that are being worked upon,

It's just replacing a string with its own subset of matched characters and then trimming the spaces from both end.
Fo example
So if you want a word to be replaced by a series of digits of that word
Use the regex \b.*?(\d*).*?\b
and then boom,your replaceAll method will give these results
hey123wow->123
what666->666
how888->888
$0 refers to the whole matched string i.e hey123wow,what666,how888 in this example
$1 refers to the group.i.e.(\d*) in this example i.e.123,666,888
$2 would refer to the second group which does not exist in this example.

toChar.replaceAll(ptn, "$1");
Its replacing all the occurences of ptn in toChar with the captured group $1 which we don't know where it is.
Capture groups are patterns inside brackets (): -
For E.G in the below Regex : -
"(\\d+)(cd)"
$0 denotes the complete match
$1 denotes the first capture group (\\d+)
$2 denotes the second capture group (cd)
String str1 = "xyz12cd";
// This will replace `12cd` with the first capture group `12`
str1 = str1.replaceAll("(\\d+)(cd)", "$1");
System.out.println(str1);
For learning more about Regular Expression, you can refer to the following links: -
http://www.vogella.com/articles/JavaRegularExpressions/article.html
http://docs.oracle.com/javase/tutorial/essential/regex/

Related

Regex to check if String is one word in Java

I need regex to check if String has only one word (e.g. "This", "Country", "Boston ", " Programming ").
So far I used an alternative way of doing it which is to check if String contains spaces. However, I am sure that this can be done using regex.
One possible way in my opinion is "^\w{2,}\s". Does this work properly? Are there any other possible answers?

The pattern ^\w{2,}\s matches 2 or more word characters from the start of the string, followed by a mandatory whitespace char (that can also match a newline)
As the pattern is also unanchored, it can also match Boston in Boston test
If you want to match a single word with as least 2 characters surrounded by optional horizontal whitespace characters using \h* and add an anchor $ to assert the end of the string.
^\h*\w{2,}\h*$
Regex demo
In Java
String regex = "^\\h*\\w{2,}\\h*$";

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.

If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.

You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.

^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

remove part of matcher after the match in regex pattern

I need to help in writing regex pattern to remove only part of the matcher from original string.
Original String: 2017-02-15T12:00:00.268+00:00
Expected String: 2017-02-15T12:00:00+00:00
Expected String removes everything in milliseconds.
My regex pattern looks like this: (:[0-5][0-9])\.[0-9]{1,3}
i need this regex to make sure i am removing only the milliseconds from some time field, not everything that comes after dot. But using above regex, I am also removing the minute part. Please suggest and help.

You have defined a capturing group with (...) in your pattern, and you want to have that part of string to be present after the replacement is performed. All you need is to use a backreference to the value stored in this capture. It can be done with $1:
String s = "2017-02-15T12:00:00.268+00:00";
String res = s.replaceFirst("(:[0-5][0-9])\\.[0-9]{1,3}", "$1");
System.out.println(res); // => 2017-02-15T12:00:00+00:00
See the Java demo and a regex demo.
The $1 in the replacement pattern tells the regex engine it should look up the captured group with ID 1 in the match object data. Since you only have one pair of unescaped parentheses (1 capturing group) the ID of the group is 1.

Change your pattern to (?::[0-5][0-9])(\.[0-9]{1,3}), run the find in the matcher and remove all it finds in the group(1).
The backslash will force the match with the '.' char, instead of any char, which is what the dot represents in a regex.
The (?: defines a non-capturing group, so it will not be considered in the group(...) on the matcher.
And adding a parenthesis around what you want will make it show up as group in the matcher, and in this case, the first group.
A good reference is the Pattern javadoc: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

Use $1 and $2 variable for replace
string.replaceAll("(.*)\\.\\d{1,3}(.*)","$1$2");

Remove repeating set of characters in a string

I want to remove the sequesnce "-~-~-" if it repeats in a string, but only if they are together.
I have tried to create a regex based on the removing of multiple white spaces regex:
test.replaceAll("\\s+", " ");
Unfortunately I was unsuccessful. Can someone please help me write the correct regex? thanks.
Example:
string test = "hello-~-~--~-~--~-~-"
output:
hello-~-~-
Another example
string test = "-~-~--~-~--~-~-hello-~-~--~-~--~-~-"
output:
-~-~-hello-~-~-

The regex is:
test.replaceAll("(-~-~-){2,}", "-~-~-")
replaceAll replaces all occurrences matched by the regex (the first parameter) with the second parameter.
the () groups the expression -~-~- together, {2,} means two or more occurrences.
EDIT
Like #anubhava said, instead of using -~-~- for the replacement string, you could also use $1 which backreferences the first capturing group (i.e. the expression in the regex surrounded by ()).

test.replaceAll("(-~-~-)+", "-~-~-");

This is the regex you need:
(-~-~-){2}

What is the responsibility of (.*) in the Java String?

What is the responsibility of (.*) in the third line and how it works?
String Str = new String("Welcome to Tutorialspoint.com");
System.out.print("Return Value :" );
System.out.println(Str.matches("(.*)Tutorials(.*)"));

.matches() is a call to parse Str using the regex provided.
Regex, or Regular Expressions, are a way of parsing strings into groups. In the example provided, this matches any string which contains the word "Tutorials". (.*) simply means "a group of zero or more of any character".
This page is a good regex reference (for very basic syntax and examples).

Your expression matches any word prefixed and suffixed by any character of word Tutorial. .* means occurrence of any character any number of times including zero times.
The . represents regular expression meta-character which means any character.
The * is a regular expression quantifier, which means 0 or more occurrences of the expression character it was associated with.

matches takes regular expression string as parameter and (.*) means capture any character zero or more times greedily

.* means a group of zero or more of any character

In Regex:
.
Wildcard: Matches any single character except \n
for example pattern a.e matches ave in nave and ate in water
*
Matches the previous element zero or more times
for example pattern \d*\.\d matches .0, 19.9, 219.9

There is no reason to put parentheses around the .*, nor is there a reason to instantiate a String if you've already got a literal String. But worse is the fact that the matches() method is out of place here.
What it does is greedily matching any character from the start to the end of a String. Then it backtracks until it finds "Tutorials", after which it will again match any characters (except newlines).
It's better and more clear to use the find method. The find method simply finds the first "Tutorials" within the String, and you can remove the "(.*)" parts from the pattern.
As a one liner for convenience:
System.out.printf("Return value : %b%n", Pattern.compile("Tutorials").matcher("Welcome to Tutorialspoint.com").find());

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What is this Java regex code doing? - java

It's a very short alias, essentially. This: stripChars(a, b) Is equivalent to: a.replaceAll(b, "$1").trim()

Related

Regex to check if String is one word in Java

Regex pattern matching with multiple strings

remove part of matcher after the match in regex pattern

Remove repeating set of characters in a string

What is the responsibility of (.*) in the Java String?

Categories

Resources