Regular expression replace characters by a given match between strings - java

I am trying to replace a given character by a regular expression match.
For example, given the following string:
If you look at what you have in life, you'll always have more. If you look at what you don't have in life, you'll never have enough
I would like to replace all 't' with a '!' only where the match is between the characters 'ok' and 'fe'.
I get the match between 'ok' and 'fe' with this regular expression:
(?<=ok).*?(?=fe)
And I can only match one character with the following regex:
(?<=ok).*?(t).*?(?=fe)
I tried to transform that regex in the following way but it does not work:
(?<=ok).*?((t).*?)*?(?=fe)
How can I match all 't' between 'ok' and 'fe'?
https://regex101.com/r/ORgseA/1

You can use
String result = text.replaceAll("(?s)(\\G(?!\\A)|ok)((?:(?!ok|fe|t).)*)t(?=(?:(?!ok|fe).)*fe)", "$1$2!");
See the regex demo and the Java demo:
String text = "If you look at what you have in life, you'll always have more. If you look at what you don't have in life, you'll never have enough";
String result = text.replaceAll("(?s)(\\G(?!\\A)|ok)((?:(?!ok|fe|t).)*)t(?=(?:(?!ok|fe).)*fe)", "$1$2!");
System.out.println(result);
// => If you look a! wha! you have in life, you'll always have more. If you look a! wha! you don'! have in life, you'll never have enough
Details:
(?s) - Pattern.DOTALL embedded flag option (to make . match line break chars)
(\G(?!\A)|ok) - Group 1 ($1): ok or the end of the previous successful match
((?:(?!ok|fe|t).)*) - Group 2 ($2): any one char, zero or more occurrences, as many as possible, that does not start a ok, fe or t char sequence
t - a t char
(?=(?:(?!ok|fe).)*fe) - immediately to the right, there must be any single char, zero or more occurrences, as many as possible, that does not start ok or fe char sequences and then a fe substring.

Related

Regex match optional string greedy inbetween two random strings

I am looking for a way to match an optional ABC in the following strings.
Both strings should be matched either way, if ABC is there or not:
precedingstringwithundefinedlenghtABCsubsequentstringwithundefinedlength
precedingstringwithundefinedlenghtsubsequentstringwithundefinedlength
I've tried
.*(ABC).*
which doesn't work for an optional ABC but making ABC non greedy doesn't work either as the .* will take all the pride:
.*(ABC)?.*
This is NOT a duplicate to e.g. Regex Match all characters between two strings as I am looking for a certain string inbetween two random string, kind of the other way around.
You can use
.*(ABC).*|.*
This works like this:
.*(ABC).* pattern is searched for first, since it is the leftmost part of an alternation (see "Remember That The Regex Engine Is Eager"), it looks for any zero or more chars other than line break chars as many as possible, then captures ABC into Group 1 and then matches the rest of the line with the right-hand .*
| - or
.* - is searched for if the first alternation part does not match.
Another solution without the need to use alternation:
^(?:.*(ABC))?.*
See this regex demo. Details:
^ - start of string
(?:.*(ABC))? - an optional non-capturing group that matches zero or more chars other than line break chars as many as possible and then captures into Group 1 an ABC char sequence
.* - zero or more chars other than line break chars as many as possible.
I’ve come up with an answer myself:
Using the OR operator seems to work:
(?:(?:.*(ABC))|.*).*
If there’s a better way, feel free to answer and I will accept it.
You could use this regex: .*(ABC){0,1}.*. It means any, optional{min,max}, any. It is easier to read. I can' t say if your solution or mine is faster due to the processing speed.
Options:
{value} = n-times
{min,} = min to infinity
{min,max} = min to max
.+([ABC])?.+ should do the job

Java Regex to replace only part of string (url)

I want to replace only numeric section of a string. Most of the cases it's either full URL or part of URL, but it can be just a normal string as well.
/users/12345 becomes /users/XXXXX
/users/234567/summary becomes /users/XXXXXX/summary
/api/v1/summary/5678 becomes /api/v1/summary/XXXX
http://example.com/api/v1/summary/5678/single becomes http://example.com/api/v1/summary/XXXX/single
Notice that I am not replacing 1 from /api/v1
So far, I have only following which seem to work in most of the cases:
input.replaceAll("/[\\d]+$", "/XXXXX").replaceAll("/[\\d]+/", "/XXXXX/");
But this has 2 problems:
The replacement size doesn't match with the original string length.
The replacement character is hardcoded.
Is there a better way to do this?
In Java you can use:
str = str.replaceAll("(/|(?!^)\\G)\\d(?=\\d*(?:/|$))", "$1X");
RegEx Demo
RegEx Details:
\G asserts position at the end of the previous match or the start of the string for the first match.
(/|(?!^)\\G): Match / or end of the previous match (but not at start) in capture group #1
\\d: Match a digit
(?=\\d*(?:/|$)): Ensure that digits are followed by a / or end.
Replacement: $1X: replace it with capture group #1 followed by X
Not a Java guy here but the idea should be transferrable. Just capture a /, digits and / optionally, count the length of the second group and but it back again.
So
(/)(\d+)(/?)
becomes
$1XYZ$3
See a demo on regex101.com and this answer for a lambda equivalent to e.g. Python or PHP.
First of all you need something like this :
String new_s1 = s3.replaceAll("(\\/)(\\d)+(\\/)?", "$1XXXXX$3");

Regular expression for both length with whitespaces

I am trying to write a regular expression with following conditions.
Allow empty at any position in string.
First three are characters-range (1-3)
Next six are numeric (must) -range (6)
Next optional to have characters - range (1-3)
After that optional to have numeric - range(0-2)
For this i tried lot of things nothing works.
^[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2}
This expression works fine for matching all criteria but it is not allowing empty strings. Thanks in advance.
I just want to validate the string like "AB 123456 ADF 12".
As i mentioned first point the string contains empty space at any position in given string like "AB 123 456 ADF 12".
You have to wrap your pattern in parentheses and make it optional using ?:
^(?:[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2})?$
^ Assert beginning of string
(?: Start of non-capturing group
[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2} Your pattern
)? End of NCG, optional
$ Assert end of string
If you want to match strings with whitespace characters add \\s (or \s treating literal) and remove ?:
^(?:[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2}|\s*)$
^^^^
Live demo
Update
Based on comment:
^(?:[a-zA-Z](?:\s*[a-zA-Z]){0,2}\s*\d(?:\s*\d){5}(?:\s*[a-zA-Z](?:\s*[a-zA-Z]){0,2})?\s*(?:\d\s*\d?)?)$
Live demo

Java regex line.split("\\s*//")

I came across the following string split line.split("\s*//")[0] but can't seem find documentation on the use of the '/' character in regular expressions.
Here my code:
String line = "type=path.composition id=pathComp";
line = line.split("\\s*//")[0];
Console console = System.console();
System.out.println("This is the line: " + line);
Here the output:
This is the line: type=rule.composition id=ruleComp
I am wondering what exactly '/' does to the regular expression and was wondering whether anybody would be able to point me to some documentation and/or an answer highlighting what it does?
I also noticed that when I remove the '//' from the regex, the output changes to merely the first character, which I suppose makes sense given that \s* means that the expression spits on zero or more white space characters.
This is the line: t
This however raises the question: "what does the '//' add to the regular expression that sees the split occur at the end of the line"?
Any advice would be highly appreciated.
Z
Consider your input text (type=rule.composition id=ruleComp), and your two regexes:
regex 1: \s*//;
regex 2: \s*.
When you try and .split() against a regular expression, the regex engine will try and match the regular expression (which is computed from the text literal as an argument) and these two things can happen:
the regex cannot match anything (this is what happens with regex 1): the split effectively cannot operate and the 0th element is the input text;
the regex can match an empty string (this is what happens with regex 2): in this case, the regex engine notices this and cannot let the situation continue, since otherwise it would result in and endless loop. Therefore it forcefully advances by one token before proceeding.
Hence your results:
with the first regex, nothing is matched;
with the second regex, an empty string is matched; the regex engine chooses to shift one character and considers the "discarded" text (the previous token) as the 0th match.

Regexp: Specific characters in the text

My goal is to validate specific characters (*,^,+,?,$,[],[^]) in the some text, like:
?test.test => true
test.test => false
test^test => true
test:test => false
test-test$ => true
test-test => false
I've already created regex regarding to requirment above, but I am not sure in this.
^(.*)([\[\]\^\$\?\*\+])(.*)$
Will be good to know whether it can be optimized in such way.
Your regex is already optimized one as its very simple. You can make is much simpler or readable only.
Also if you use the matches() method of Java's String class then you'll not require the ^ and $ at the both ends.
.*([\\[\\]^$?*+]).*
Double slashes(\\) for Java, otherwise please use single slash(\).
Look, I have removed the captures () along with escape character \ for the characters ^$?*+ as they are inside the character class [].
TL;DR
The quickest regex to do the job is
# ^[^\]\[^$?*+]*([\]\[^$?*+])
^ #start of the string
[^ #any character BUT...
\]\[^$?*+ #...these ones (^$?*+ aren't special inside a character class)
]*+ #zero or more times (possessive quantifier)
([ #capture any of...
\]\[^$?*+ #...these characters
])
Be careful that in a java string, you need to escape the \ as well, so you should transform every \ into \\.
Discussion
At first two regex come in mind:
[\]\[^$?*+], which will match only the character you want inside the string.
^.*[\]\[^$?*+], which will match your string up to the desired character.
It's actually important performance-wise to understand the difference between the case with .* at the beginning and the one with no wildcard at all.
When searching for the pattern, the first .* will make the regex engine eat all the string, then backtrack character by character to see if it's a match for your character range [...]. So the regex will actually search from the end of the string.
This is an advantage when your wanted sign if near the end, a disadvantage when it is at the beginning.
On the other case, the regex engine will try every character, beginning from the left, until it matches what you want.
You can see what I mean with these two examples from the excellent regex101.com:
with the .*, match is found in 26 steps when near the beginning, 8 when it's near the beginning: http://regex101.com/r/oI3pS1/#debugger
without it, it is found in 5 steps when near the beginning and 23 when near the end
Now, if you want to combine these two approaches you can use the tl;dr answer: you eat everything that isn't your character, then you match your character (or fail if there isn't one).
On our example, it takes 7 steps wherever your character is in the string (and 7 steps even if there is no character, thanks to the possessive quantifier).
That should also work:
String regex = ".*[\\[\\]^$?*+].*";
String test1 = "?test.test";
String test2 = "test.test";
String test3 = "test^test";
String test4 = "test:test";
String test5 = "test-test$";
String test6 = "test-test";
System.out.println(test1.matches(regex));
System.out.println(test2.matches(regex));
System.out.println(test3.matches(regex));
System.out.println(test4.matches(regex));
System.out.println(test5.matches(regex));
System.out.println(test6.matches(regex));

Categories

Resources