Java Regex to replace only part of string (url)

Java Regex to replace only part of string (url) - java

I want to replace only numeric section of a string. Most of the cases it's either full URL or part of URL, but it can be just a normal string as well.
/users/12345 becomes /users/XXXXX
/users/234567/summary becomes /users/XXXXXX/summary
/api/v1/summary/5678 becomes /api/v1/summary/XXXX
http://example.com/api/v1/summary/5678/single becomes http://example.com/api/v1/summary/XXXX/single
Notice that I am not replacing 1 from /api/v1
So far, I have only following which seem to work in most of the cases:
input.replaceAll("/[\\d]+$", "/XXXXX").replaceAll("/[\\d]+/", "/XXXXX/");
But this has 2 problems:
The replacement size doesn't match with the original string length.
The replacement character is hardcoded.
Is there a better way to do this?

In Java you can use:
str = str.replaceAll("(/|(?!^)\\G)\\d(?=\\d*(?:/|$))", "$1X");
RegEx Demo
RegEx Details:
\G asserts position at the end of the previous match or the start of the string for the first match.
(/|(?!^)\\G): Match / or end of the previous match (but not at start) in capture group #1
\\d: Match a digit
(?=\\d*(?:/|$)): Ensure that digits are followed by a / or end.
Replacement: $1X: replace it with capture group #1 followed by X

Not a Java guy here but the idea should be transferrable. Just capture a /, digits and / optionally, count the length of the second group and but it back again.
So
(/)(\d+)(/?)
becomes
$1XYZ$3
See a demo on regex101.com and this answer for a lambda equivalent to e.g. Python or PHP.

First of all you need something like this :
String new_s1 = s3.replaceAll("(\\/)(\\d)+(\\/)?", "$1XXXXX$3");

Related

Regex match optional string greedy inbetween two random strings

I am looking for a way to match an optional ABC in the following strings.
Both strings should be matched either way, if ABC is there or not:
precedingstringwithundefinedlenghtABCsubsequentstringwithundefinedlength
precedingstringwithundefinedlenghtsubsequentstringwithundefinedlength
I've tried
.*(ABC).*
which doesn't work for an optional ABC but making ABC non greedy doesn't work either as the .* will take all the pride:
.*(ABC)?.*
This is NOT a duplicate to e.g. Regex Match all characters between two strings as I am looking for a certain string inbetween two random string, kind of the other way around.

You can use
.*(ABC).*|.*
This works like this:
.*(ABC).* pattern is searched for first, since it is the leftmost part of an alternation (see "Remember That The Regex Engine Is Eager"), it looks for any zero or more chars other than line break chars as many as possible, then captures ABC into Group 1 and then matches the rest of the line with the right-hand .*
| - or
.* - is searched for if the first alternation part does not match.
Another solution without the need to use alternation:
^(?:.*(ABC))?.*
See this regex demo. Details:
^ - start of string
(?:.*(ABC))? - an optional non-capturing group that matches zero or more chars other than line break chars as many as possible and then captures into Group 1 an ABC char sequence
.* - zero or more chars other than line break chars as many as possible.

I’ve come up with an answer myself:
Using the OR operator seems to work:
(?:(?:.*(ABC))|.*).*
If there’s a better way, feel free to answer and I will accept it.

You could use this regex: .*(ABC){0,1}.*. It means any, optional{min,max}, any. It is easier to read. I can' t say if your solution or mine is faster due to the processing speed.
Options:
{value} = n-times
{min,} = min to infinity
{min,max} = min to max

.+([ABC])?.+ should do the job

Java Rex is not giving the output as expected

networks[0]/site[9785d8e8-9b1f-3fc0-8271-6e32f58fb725]/equipment/location[144ae20e-be33-32e2-8b52-798e968e88b9]
The objective is to get the 9785d8e8-9b1f-3fc0-8271-6e32f58fb725 from above string. I have written the regex as below. But its giving the output as "location".
.*\\/([^\\/]+)\\[.*\\]$
Could any one suggest me the proper regex to get the 9785d8e8-9b1f-3fc0-8271-6e32f58fb725 from above string.

You can search using this regex:
^[^/]+/[^\[/]*\[|\].*
and replace with empty string.
RegEx Demo
RegEx Explanation:
^[^/]+/[^\[/]*\[: This pattern matches text before first / then / followed by text till it gets next [
\].*: Matches ] and everything afterwards
Code:
String s = "networks[0]/site[9785d8e8-9b1f-3fc0-8271-6e32f58fb725]/equipment/location[144ae20e-be33-32e2-8b52-798e968e88b9]";
String r = s.replaceAll("^[^/]+/[^\\[/]*\\[|\\].*", "");
//=> "9785d8e8-9b1f-3fc0-8271-6e32f58fb725"

You can just use site\[(.+?)\]. See the test.
P.S. You current expression is actually doing the following:
Pass whatever .*
Unless you encounter /
then capture any sequence after / not containing: \, /
which in turn is followed by [] with whatever content straight away and residing at the very end of the string.
So the only matching part is location

This should do the trick:
^networks\[\d\]\/site\[([^]]+)\].*$
It will match
the literal string networks[]/site[
followed by your id
followed by ] and arbitrary stuff
You can then extract your ID from the first capturing group.

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.

If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.

You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.

^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

A regex to match the smallest nested part first

I am quite new to RegEx. I have not much experience. I already searched the internet and tried many things on regex101.com. Nothing seems to work.
This is the pattern:
\\((.*?)\\)
I use it in combination with Java 's replaceAll to add a ?: to each (...) provided in a string (the user input).
The user input is used as regular expression as well. But currently I am treating it as a normal String.
Imagine this user input: (Welcome, (StackOverflow|World)|Hello, Dad)
What I want as the result is: (?:Welcome, (?:StackOverflow|World)|Hello, Dad)
But I only get the first ?: : (?:Welcome, (StackOverflow|World)|Hello, Dad)
I think, I understand the problem. I guess, RegEx scans from right to left and is trying to get the smallest match (see .*? ). It searches for ( till the next ) . And this is (Welcome, (StackOverflow|World) .
What could I do to match these nested matches first? I cannot let the user modify their input. I have to find a better regex pattern to match from the smallest possible match to the greatest possible match, and not from the left to the right.

I suggest searching for any unescaped ( (so as not to add ?: after literal () that is not followed with ? (to avoid matching lookarounds/non-capturing groups/etc,):
(?<!\\)((?:\\{2})*)\((?!\?)
and replace with $1(?:. See the regex demo.
Java declaration:
String pat = "(?<!\\\\)((?:\\\\{2})*)\\((?!\\?)";
Details:
(?<!\\) - no backslash immediately to the left of the current location
((?:\\{2})*) - Group 1: zero or more even number of backslashes
\(- a literal (...
(?!\?) - that is not immediately followed with a literal ?.

Regexp: Specific characters in the text

My goal is to validate specific characters (*,^,+,?,$,[],[^]) in the some text, like:
?test.test => true
test.test => false
test^test => true
test:test => false
test-test$ => true
test-test => false
I've already created regex regarding to requirment above, but I am not sure in this.
^(.*)([\[\]\^\$\?\*\+])(.*)$
Will be good to know whether it can be optimized in such way.

Your regex is already optimized one as its very simple. You can make is much simpler or readable only.
Also if you use the matches() method of Java's String class then you'll not require the ^ and $ at the both ends.
.*([\\[\\]^$?*+]).*
Double slashes(\\) for Java, otherwise please use single slash(\).
Look, I have removed the captures () along with escape character \ for the characters ^$?*+ as they are inside the character class [].

TL;DR
The quickest regex to do the job is
# ^[^\]\[^$?*+]*([\]\[^$?*+])
^ #start of the string
[^ #any character BUT...
\]\[^$?*+ #...these ones (^$?*+ aren't special inside a character class)
]*+ #zero or more times (possessive quantifier)
([ #capture any of...
\]\[^$?*+ #...these characters
])
Be careful that in a java string, you need to escape the \ as well, so you should transform every \ into \\.
Discussion
At first two regex come in mind:
[\]\[^$?*+], which will match only the character you want inside the string.
^.*[\]\[^$?*+], which will match your string up to the desired character.
It's actually important performance-wise to understand the difference between the case with .* at the beginning and the one with no wildcard at all.
When searching for the pattern, the first .* will make the regex engine eat all the string, then backtrack character by character to see if it's a match for your character range [...]. So the regex will actually search from the end of the string.
This is an advantage when your wanted sign if near the end, a disadvantage when it is at the beginning.
On the other case, the regex engine will try every character, beginning from the left, until it matches what you want.
You can see what I mean with these two examples from the excellent regex101.com:
with the .*, match is found in 26 steps when near the beginning, 8 when it's near the beginning: http://regex101.com/r/oI3pS1/#debugger
without it, it is found in 5 steps when near the beginning and 23 when near the end
Now, if you want to combine these two approaches you can use the tl;dr answer: you eat everything that isn't your character, then you match your character (or fail if there isn't one).
On our example, it takes 7 steps wherever your character is in the string (and 7 steps even if there is no character, thanks to the possessive quantifier).

That should also work:
String regex = ".*[\\[\\]^$?*+].*";
String test1 = "?test.test";
String test2 = "test.test";
String test3 = "test^test";
String test4 = "test:test";
String test5 = "test-test$";
String test6 = "test-test";
System.out.println(test1.matches(regex));
System.out.println(test2.matches(regex));
System.out.println(test3.matches(regex));
System.out.println(test4.matches(regex));
System.out.println(test5.matches(regex));
System.out.println(test6.matches(regex));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regex to replace only part of string (url) - java

Not a Java guy here but the idea should be transferrable. Just capture a /, digits and / optionally, count the length of the second group and but it back again. So (/)(\d+)(/?) becomes $1XYZ$3 See a demo on regex101.com and this answer for a lambda equivalent to e.g. Python or PHP.

First of all you need something like this : String new_s1 = s3.replaceAll("(\\/)(\\d)+(\\/)?", "$1XXXXX$3");

Related

Regex match optional string greedy inbetween two random strings

Java Rex is not giving the output as expected

Regex pattern matching with multiple strings

A regex to match the smallest nested part first

Regexp: Specific characters in the text

Categories

Resources