Java Rex is not giving the output as expected - java

networks[0]/site[9785d8e8-9b1f-3fc0-8271-6e32f58fb725]/equipment/location[144ae20e-be33-32e2-8b52-798e968e88b9]
The objective is to get the 9785d8e8-9b1f-3fc0-8271-6e32f58fb725 from above string. I have written the regex as below. But its giving the output as "location".
.*\\/([^\\/]+)\\[.*\\]$
Could any one suggest me the proper regex to get the 9785d8e8-9b1f-3fc0-8271-6e32f58fb725 from above string.

You can search using this regex:
^[^/]+/[^\[/]*\[|\].*
and replace with empty string.
RegEx Demo
RegEx Explanation:
^[^/]+/[^\[/]*\[: This pattern matches text before first / then / followed by text till it gets next [
\].*: Matches ] and everything afterwards
Code:
String s = "networks[0]/site[9785d8e8-9b1f-3fc0-8271-6e32f58fb725]/equipment/location[144ae20e-be33-32e2-8b52-798e968e88b9]";
String r = s.replaceAll("^[^/]+/[^\\[/]*\\[|\\].*", "");
//=> "9785d8e8-9b1f-3fc0-8271-6e32f58fb725"

You can just use site\[(.+?)\]. See the test.
P.S. You current expression is actually doing the following:
Pass whatever .*
Unless you encounter /
then capture any sequence after / not containing: \, /
which in turn is followed by [] with whatever content straight away and residing at the very end of the string.
So the only matching part is location

This should do the trick:
^networks\[\d\]\/site\[([^]]+)\].*$
It will match
the literal string networks[]/site[
followed by your id
followed by ] and arbitrary stuff
You can then extract your ID from the first capturing group.

Related

Java Regex to replace only part of string (url)

I want to replace only numeric section of a string. Most of the cases it's either full URL or part of URL, but it can be just a normal string as well.
/users/12345 becomes /users/XXXXX
/users/234567/summary becomes /users/XXXXXX/summary
/api/v1/summary/5678 becomes /api/v1/summary/XXXX
http://example.com/api/v1/summary/5678/single becomes http://example.com/api/v1/summary/XXXX/single
Notice that I am not replacing 1 from /api/v1
So far, I have only following which seem to work in most of the cases:
input.replaceAll("/[\\d]+$", "/XXXXX").replaceAll("/[\\d]+/", "/XXXXX/");
But this has 2 problems:
The replacement size doesn't match with the original string length.
The replacement character is hardcoded.
Is there a better way to do this?
In Java you can use:
str = str.replaceAll("(/|(?!^)\\G)\\d(?=\\d*(?:/|$))", "$1X");
RegEx Demo
RegEx Details:
\G asserts position at the end of the previous match or the start of the string for the first match.
(/|(?!^)\\G): Match / or end of the previous match (but not at start) in capture group #1
\\d: Match a digit
(?=\\d*(?:/|$)): Ensure that digits are followed by a / or end.
Replacement: $1X: replace it with capture group #1 followed by X
Not a Java guy here but the idea should be transferrable. Just capture a /, digits and / optionally, count the length of the second group and but it back again.
So
(/)(\d+)(/?)
becomes
$1XYZ$3
See a demo on regex101.com and this answer for a lambda equivalent to e.g. Python or PHP.
First of all you need something like this :
String new_s1 = s3.replaceAll("(\\/)(\\d)+(\\/)?", "$1XXXXX$3");

Escape symbol while spliting string using regex in java

I have a string that recieved while parsing XML-document:
"ListOfItems/Item[Name='Model/Id']/Price"
And I need to split it by delimeter - "/"
String[] nodes = path.split("/") , but with one condition:
"If backslash presence in name of item, like in an example above, I must skip this block and don't split it."
ie after spliting a must get next array of nodes:
ListOfItems, Item[Name='Model/Id'], Price
How can I do it using regex expression?
Thanks for help!
You can split using this regex:
/(?=(?:(?:[^']*'){2})*[^']*$)
RegEx Demo
This regex basically splits on only forward slashes / that are followed be even number of single quotes, which in other words mean that / inside single quotes are not matched for splitting.
A way consists to use this pattern with the find method and to check if the last match is empty. The advantage is that you don't need to add an additional lookahead to test the string until the end for each possible positions. The items you need are in the capture group 1:
\\G/?((?>[^/']+|'[^']*')*)|$
The \G is an anchor that matches either the start of the string or the position after the previous match. Using this forces all the matchs to be contiguous.
(?>[^/']+|'[^']*')* defines the possible content of an item: all that is not a / or a ', or a string between quotes.
Note that the description of a string between quotes can be improved to deal with escaped quotes: '(?>[^'\\]+|\\.)*' (with the s modifier)
The alternation with the $ is only here to ensure that you have parsed all the string until the end. The capture group 1 of the last match must be empty. If it is null, this means that the global research has stopped before the end (for example in case of unbalanced quotes)
example

Java - Regex: Several matches in same string

I have a string: String s = "The input must be of format: '$var1$'-'$var1$'-'$var1$'".
I want to replace the text between the $ with another text, so the outcome may look on the console like:
"The input must be of format: '$REPLACED$'-'$REPLACED$'-'$REPLACED$'"
I came till s.replaceAll("\\$.+\\$", "\\$REPLACED\\$";, but that results in
"The input must be of format: '$REPLACED$'" (the first and the last $ are taken as borders).
How can I tell the regex engine, that there are several occurences and each need to be processed (=replaced)?
Thank for your help!
Edit:// Thanks for your help. The "greedy thing" was the matter. Adding a ? to the regex fixed my issue. The solution now looks like this (for those witha similar problem):
s.replaceAll("\\$.+?\\$", "\\$REPLACED\\$";
The effect you're experiencing is called greediness: An expression like .+ will match as many characters as possible.
Use .+? instead to make the expression ungreedy and match as few characters as possible.
+ is greedy so it will try to find maximal match. This means that [$].+[$] will match
a$b$c$e
^^^^^
If you want .+ to look for minimal possible match you can
add ? after + quantifier .+? making + reluctant
instead of every character (.) between $ $ accept only these that are not $ like [^$].
So try to change your regex to
s.replaceAll("\\$.+?\\$", "\\$REPLACED\\$");
or
s.replaceAll("\\$[^$]+?\\$", "\\$REPLACED\\$");
This should work:
String s = "The input must be of format: '$var1$'-'$var1$'-'$var1$'";
System.out.println( s.replaceAll("\\$[^$]*\\$", "\\$REPLACED\\$") );
//=> The input must be of format: '$REPLACED$'-'$REPLACED$'-'$REPLACED$'
Using this regex: \\$[^$]*\\$ will match literal $ then string until $ is found and then followed by literal $

What is this Java regex code doing?

I just found this method inside a "Utils"-type class in our codebase. It was written a long time ago by a developer who no longer works for us. What in tarnation is it doing? What is it returning?!? Of course, there's no JavaDocs or comments.
public static String stripChars(String toChar, String ptn){
String stripped = "";
stripped = toChar.replaceAll(ptn, "$1");
return stripped.trim();
}
Thanks in advance!
It's a very short alias, essentially. This:
stripChars(a, b)
Is equivalent to:
a.replaceAll(b, "$1").trim()
It seems to replace everything in "toChar" which matches the regular expression "ptn" with the first group to match in "toChar"
Regular expressions have a concept of groups, for example matching "year 2012" and replacing it with "year 1012", or "year 2006" with "year 1007" (changing the first 20 to 10) can be accomplished by replacing
"year 20([0-9][9-9])" with "year 20$1" -- That is, match the entire string, and then replace it "year 20" followed by the first group ($1). The group is the first thing in parenthesis.
Anyway, your method then replaces everything that matches "ptn" in "toChar" with the first group in the regular expression "ptn". So given
stripChars("year 2012", "year 20([0-9][9-9]"); You would receive back only "12" because the entire text would match and be replaced by only the first group.
It then trips any leading or trailing whitespace.
The pattern string that is passed as argument method seems to contain a matching group and the call to replace all is going to replace the entire match to the paatern with the portion that matched the first group. You should look for the call hierarchy of this method to find some regexes passed to the method along with the strings that are being worked upon,
It's just replacing a string with its own subset of matched characters and then trimming the spaces from both end.
Fo example
So if you want a word to be replaced by a series of digits of that word
Use the regex \b.*?(\d*).*?\b
and then boom,your replaceAll method will give these results
hey123wow->123
what666->666
how888->888
$0 refers to the whole matched string i.e hey123wow,what666,how888 in this example
$1 refers to the group.i.e.(\d*) in this example i.e.123,666,888
$2 would refer to the second group which does not exist in this example.
toChar.replaceAll(ptn, "$1");
Its replacing all the occurences of ptn in toChar with the captured group $1 which we don't know where it is.
Capture groups are patterns inside brackets (): -
For E.G in the below Regex : -
"(\\d+)(cd)"
$0 denotes the complete match
$1 denotes the first capture group (\\d+)
$2 denotes the second capture group (cd)
String str1 = "xyz12cd";
// This will replace `12cd` with the first capture group `12`
str1 = str1.replaceAll("(\\d+)(cd)", "$1");
System.out.println(str1);
For learning more about Regular Expression, you can refer to the following links: -
http://www.vogella.com/articles/JavaRegularExpressions/article.html
http://docs.oracle.com/javase/tutorial/essential/regex/

How do I write a regular expression to find the following pattern?

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:
12341+1
12241+1R1
100001+1R2
So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:
^(\d+)(1\\+1).*
This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:
12341+1 becomes 12341
12241+1R1 becomes 12241
100001+1R2 becomes 100001
Any ideas?
Your existing regex works fine, just that you are missing a \ before \d
String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");
Working link
IMHO, the regex is correct.
Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".
Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").
Your regex looks fine to me - I don't have access to java but in JavaScript the code..
"12341+1".replace(/(\d+)(1\+1)/g, "$1");
Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.
"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1");
gives 1234 5432.
Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)
stringName.substring(0, stringName.indexOf("1+1"))
But it looks like other posters have already mentioned the non-greedy operator.
In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Categories

Resources