Regex lazy solution for java?

Regex lazy solution for java? - java

I have a string "hooRayNexTcapItaLnextcapitall"
I want to capture the first instance of "next" (NexT - in this case)
My soultion:
(.*)([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])(.*)
My solution group1 returns next instead of Next
How can I correct my regex to capture the first next instead of capturing the last next?
Edit 1:
Let me put my question properly,
If the string contains any combination of upper and lower case letters that spell "NextCapital", reverse the characters of the word "Next". Case should be preserved. If "NextCapital" occurs multiple times, only update the first occurrence.
So, I am using group to capture. But my group is capturing the last occurrence of "nextCapital" instead of first occurrence.
Ex:
Input: hooRayNexTcapItaLnextcapitall
output: hooRayTxeNcapItaLnextcapitall
Edit 2:
Please correct my code.
My java code:
Pattern ptn = Pattern.compile("(.*)([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])(.*)");
//sb = hooRayNexTcapItaLnextcapitall
Matcher mtc = ptn.matcher(sb);
StringBuilder c = new StringBuilder();
if(mtc.find()){
StringBuilder d = new StringBuilder();
StringBuilder e = new StringBuilder();
d.append(mtc.group(1));
e.append(mtc.group(2));
e.reverse();
d.append(e);
d.append(mtc.group(3));
d.append(mtc.group(4));
sb = d;
}

Your regex actually works if you get group 2. Test it here! Your regex does not need to be that complicated.
Your regex can just be this:
next
If you use Matcher.find and turn on CASE_INSENSITIVE option, you can find the first substring of the string that matches the pattern. Then, use group() to get the actual string:
Matcher matcher = Pattern.compile("next", Pattern.CASE_INSENSITIVE).matcher("hooRayNexTcapItaLnextcapitall");
if (matcher.find()) {
System.out.println(matcher.group());
}
EDIT:
After seeing your requirements, I wrote this code:
String input = "hooRayNexTcapItaLnextcapitall";
Matcher m = Pattern.compile("next(?=capital)", Pattern.CASE_INSENSITIVE).matcher(input);
if (m.find()) {
StringBuilder outputBuilder = new StringBuilder(input);
StringBuilder reverseBuilder = new StringBuilder(input.substring(m.start(), m.end()));
outputBuilder.replace(m.start(), m.end(), reverseBuilder.reverse().toString());
System.out.println(outputBuilder);
}
I used a lookahead to match next only if there is capital after it. After a match is found, I created a string builder with the input, and another string builder with the matched portion of the input. Then, I replaced the matched range with the reverse of the second string builder.

String target = "next";
int index = line.toLowerCase().indexOf(target);
if (index != -1) {
line = line.substring(index, index + target.length());
System.out.println(line);
} else {
System.out.println("Not Found");
}
This would be my first attempt which allows room for adjusting the desired String to locate.
Otherwise you may use this ReGeX solution to achieve the same effect:
Pattern pattern = Pattern.compile("(?i)next");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(matcher.group());
}
The pattern "(?i)next" finds the substring matching "next" ignoring case.
Edit : This would reverse the order of the first occurrence of next.
String input = "hooRayNexTcapItaLnextcapitall";
String target = "nextcapital";
int index = input.toLowerCase().indexOf(target);
if (index != -1) {
String first = input.substring(index, index + target.length());
first = new StringBuilder(first.substring(0, 4)).reverse().toString() + first.substring(4, first.length());
input = input.substring(0, index) + first + input.substring(index + target.length(), input.length());
}
Edit Again : Here is a "fixed" form of your code.
String input = "hooRayNexTcapItaLnextcapitall";
Pattern ptn = Pattern.compile("([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])");
Matcher mtc = ptn.matcher(input);
if(mtc.find()){
StringBuilder d = new StringBuilder(mtc.group(1));
StringBuilder e = new StringBuilder(mtc.group(2));
input = input.replaceFirst(d.toString() + e.toString(), d.reverse().toString() + e.toString());
System.out.println(input);
}

Your regex is grabbing the second potential match for your group due to the default greedy nature of regex. Effectively, the first (.*) is grabbing as much as it can while still satisfying the rest of your regex.
To get what you intend, you can add a question mark to the first group, making it (.*?). This will make it non-greedy, grabbing the smallest string possible while still satisfying the rest of your regex.

Related

Java regex to extract and replace by value

Input String
${abc.xzy}/demo/${ttt.bbb}
test${kkk.mmm}
RESULT
World/demo/Hello
testSystem
The text inside the curly brackets are keys to my properties. I want to replace those properties with run time values.
I can do the following to get the regex match but what should i put in the replace logic to change the ${..} matched with the respective run time value in the input string.
Pattern p = Pattern.compile("\\{([^}]*)\\}");
Matcher m = p.matcher(s);
while (m.find()) {
// replace logic comes here
}

An alternative may be using a third-party lib such as Apache Commons Text.
They have StringSubstitutor class looks very promising.
Map valuesMap = HashMap();
valuesMap.put("abc.xzy", "World");
valuesMap.put("ttt.bbb", "Hello");
valuesMap.put("kkk.mmm", "System");
String templateString = "${abc.xzy}/demo/${ttt.bbb} test${kkk.mmm}"
StringSubstitutor sub = new StringSubstitutor(valuesMap);
String resolvedString = sub.replace(templateString);
For more info check out Javadoc https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringSubstitutor.html

You may use the following solution:
String s = "${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}";
Map<String, String> map = new HashMap<String, String>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "Hello");
map.put("kkk.mmm", "System");
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\\$\\{([^{}]+)\\}").matcher(s);
while (m.find()) {
String value = map.get(m.group(1));
m.appendReplacement(result, value != null ? value : m.group());
}
m.appendTail(result);
System.out.println(result.toString());
See the Java demo online, output:
World/demo/Hello
testSystem
The regex is
\$\{([^{}]+)\}
See the regex demo. It matches a ${ string, then captures any 1+ chars other than { and } into Group 1 and then matches }. If Group 1 value is present in the Map as a key, the replacement is the key value, else, the matched text is pasted back where it was in the input string.

Your regex needs to include the dollar. Also making the inner group lazy is sufficient to not include any } in the resulting key String.
String regex = "\\$\\{(.+?)\\}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while (m.find()) {
String key = m.group(1); // This is your matching group (the one in braces).
String value = someMap.get(key);
s.replaceFirst(regex, value != null ? value : "missingKey");
m = p.matcher(s); // you could alternatively reset the existing Matcher, but just create a new one, for simplicity's sake.
}
You could streamline this, by extracting the cursor position, and doing the replacement yourself, for the string. But either way, you need to reset your matcher, because otherwise it will parse on the old String.

The_Cute_Hedgehog's answer is good, but includes a dependency.
Wiktor Stribiżew's answer is missing a special case.
My answer aim to using java build-in regex and try to improve from Wiktor Stribiżew's answer. (Improve in Java code only, the regex is Ok)
Improvements:
Using StringBuilder is faster than StringBuffer
Initial StringBuilder capable to (int)(s.length()*1.2), avoid relocating memory many times in case of large input template s.
Avoid the case of regex special characters make wrong result by appendReplacement (like "cost: $100"). You can fix this problem in Wiktor Stribiżew's code by escape $ character in the replacement String like this value.replaceAll("\\$", "\\\\\\$")
Here is the improved code:
String s = "khj${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}{kkk.missing}string";
Map<String, String> map = new HashMap<>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "cost: $100");
map.put("kkk.mmm", "System");
StringBuilder result = new StringBuilder((int)(s.length()*1.2));
Matcher m = Pattern.compile("\\$\\{([^}]+)\\}").matcher(s);
int nonCaptureIndex = 0;
while (m.find()) {
String value = map.get(m.group(1));
if (value != null) {
int index = m.start();
if (index > nonCaptureIndex) {
result.append(s.substring(nonCaptureIndex, index));
}
result.append(value);
nonCaptureIndex = m.end();
}
}
result.append(s.substring(nonCaptureIndex, s.length()));
System.out.println(result.toString());

N-th indexOf in String?

I need to extract a sub-string of a URL.
URLs
/service1/api/v1.0/foo -> foo
/service1/api/v1.0/foo/{fooId} -> foo/{fooId}
/service1/api/v1.0/foo/{fooId}/boo -> foo/{fooId}/boo
And some of those URLs may have request parameters.
Code
String str = request.getRequestURI();
str = str.substring(str.indexOf("/") + 1);
str = str.substring(str.indexOf("/") + 1);
str = str.substring(str.indexOf("/") + 1);
str = str.substring(str.indexOf("/") + 1, str.indexOf("?"));
Is there a better way to extract the sub-string instead of recurrent usage of indexOf method?

There are many alternative ways:
Use Java-Stream API on splitted String with \ delimiter:
String str = "/service1/api/v1.0/foo/{fooId}/boo";
String[] split = str.split("\\/");
String url = Arrays.stream(split).skip(4).collect(Collectors.joining("/"));
System.out.println(url);
With the elimination of the parameter, the Stream would be like:
String url = Arrays.stream(split)
.skip(4)
.map(i -> i.replaceAll("\\?.+", ""))
.collect(Collectors.joining("/"));
This is also where Regex takes its place! Use the classes Pattern and Matcher.
String str = "/service1/api/v1.0/foo/{fooId}/boo";
Pattern pattern = Pattern.compile("\\/.*?\\/api\\/v\\d+\\.\\d+\\/(.+)");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
If you rely on the indexOf(..) usage, you might want to use the while-loop.
String str = "/service1/api/v1.0/foo/{fooId}/boo?parameter=value";
String string = str;
while(!string.startsWith("v1.0")) {
string = string.substring(string.indexOf("/") + 1);
}
System.out.println(string.substring(string.indexOf("/") + 1, string.indexOf("?")));
Other answers include a way that if the prefix is not mutable, you might want to use only one call of idndexOf(..) method (#JB Nizet):
string.substring("/service1/api/v1.0/".length(), string.indexOf("?"));
All these solutions are based on your input and fact, the pattern is known, or at least the number of the previous section delimited with \ or the version v1.0 as a checkpoint - the best solution might not appear here since there are unlimited combinations of the URL. You have to know all the possible combinations of input URL to find the best way to handle it.

Path is quite useful for that :
public static void main(String[] args) {
Path root = Paths.get("/service1/api/v1.0/foo");
Path relativize = root.relativize(Paths.get("/service1/api/v1.0/foo/{fooId}/boo"));
System.out.println(relativize);
}
Output :
{fooId}/boo

How about this:
String s = "/service1/api/v1.0/foo/{fooId}/boo";
String[] sArray = s.split("/");
StringBuilder sb = new StringBuilder();
for (int i = 4; i < sArray.length; i++) {
sb.append(sArray[i]).append("/");
}
sb.deleteCharAt(sb.length() - 1);
System.out.println(sb.toString());
Output:
foo/{fooId}/boo
If the url prefix is always /service1/api/v1.0/, you just need to do s.substring("/service1/api/v1.0/".length()).

There are a few good options here.
1) If you know "foo" will always be the 4th token, then you have the right idea already. The only issue with your way is that you have the information you need to be efficient, but you aren't using it. Instead of copying the String multiple times and looping anew from the beginning of the new String, you could just continue from where you left off, 4 times, to find the starting point of what you want.
String str = "/service1/api/v1.0/foo/{fooId}/boo";
// start at the beginning
int start = 0;
// get the 4th index of '/' in the string
for (int i = 0; i != 4; i++) {
// get the next index of '/' after the index 'start'
start = str.indexOf('/',start);
// increase the pointer to the next character after this slash
start++;
}
// get the substring
str = str.substring(start);
This will be far, far more efficient than any regex pattern.
2) Regex: (java.util.regex.*). This will work if you what you want is always preceded by "service1/api/v1.0/". There may be other directories before it, e.g. "one/two/three/service1/api/v1.0/".
// \Q \E will automatically escape any special chars in the path
// (.+) will capture the matched text at that position
// $ marks the end of the string (technically it matches just before '\n')
Pattern pattern = Pattern.compile("/service1/api/v1\\.0/(.+)$");
// get a matcher for it
Matcher matcher = pattern.matcher(str);
// if there is a match
if (matcher.find()) {
// get the captured text
str = matcher.group(1);
}
If your path can vary some, you can use regex to account for it. e.g.: service/api/v3/foo/{bar}/baz/" (note varying number formats and trailing '/') could be matched as well by changing the regex to "/service\\d*/api/v\\d+(?:\\.\\d+)?/(.+)(?:/|$)"

JAVA Get text from String

Hi I get this String from server :
id_not="autoincrement"; id_obj="-"; id_tr="-"; id_pgo="-"; typ_not=""; tresc="Nie wystawił"; datetime="-"; lon="-"; lat="-";
I need to create a new String e.x String word and send a value which I get from String tresc="Nie wystawił"

Like #Jan suggest in comment you can use regex for example :
String str = "id_not=\"autoincrement\"; id_obj=\"-\"; id_tr=\"-\"; id_pgo=\"-\"; typ_not=\"\"; tresc=\"Nie wystawił\"; datetime=\"-\"; lon=\"-\"; lat=\"-\";";
Pattern p = Pattern.compile("tresc(.*?);");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group());
}
Output
tresc="Nie wystawił";
If you want to get only the value of tresc you can use :
Pattern p = Pattern.compile("tresc=\"(.*?)\";");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
Output
Nie wystawił

Something along the lines of
Pattern p = Pattern.compile("tresc=\"([^\"]+)\");
Matcher m = p.matcher(stringFromServer);
if(m.find()) {
String whatYouWereLookingfor = m.group(1);
}
should to the trick. JSON parsing might be much better in the long run if you need additional values

Your question is unclear but i think you get a string from server and from that string you want the string/value for tresc. You can first search for tresc in the string you get. like:
serverString.substring(serverString.indexOf("tresc") + x , serverString.length());
Here replace x with 'how much further you want to pick characters.
Read on substring and delimiters
As values are separated by semicolon so annother solution could be:
int delimiter = serverstring.indexOf(";");
//in string thus giving you the index of where it is in the string
// Now delimiter can be -1, if lets say the string had no ";" at all in it i.e. no ";" is not found.
//check and account for it.
if (delimiter != -1)
String subString= serverstring.substring(5 , iend);
Here 5 means tresc is on number five in string, so it will five you tresc part.
You can then use it anyway you want.

Replace group 1 of Java regex with out replacing the entire regex

I have a regex pattern that will have only one group. I need to find texts in the input strings that follows the pattern and replace ONLY the match group 1. For example I have the regex pattern and the string to be applied on as shown below. The replacement string is "<---->"
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher("plan plans lander planitia");
The expected result is
plan p<--->s <--->der p<--->itia
I tried following approaches
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll(m.group(1),"<--->");
}
System.out.print(result);
This gives result as
p<---> p<--->s <--->der p<--->itia
Another approach
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll("\\w*(lan)\\w+","<--->");
}
System.out.print(result);
Result is
plan <---> <---> <--->
I have gone through this link. Here the part of the string before the match is always constant and is "foo" but in my case it varies. Also I have looked at this and this but I am unable to apply any on the solutions given to my present scenario.
Any help is appreciated

You need to use the following pattern with capturing groups:
(\w*)lan(\w+)
^-1-^ ^-2-^
and replace with $1<--->$2
See the regex demo
The point is that we use a capturing group around the parts that we want to keep and just match what we want to discard.
Java demo:
String str = "plan plans lander planitia";
System.out.println(str.replaceAll("(\\w*)lan(\\w+)", "$1<--->$2"));
// => plan p<--->s <--->der p<--->itia
If you need to be able to replace the Group 1 and keep the rest, you may use the replace callback method emulation with Matcher#appendReplacement:
String text = "plan plans lander planitia";
String pattern = "\\w*(lan)\\w+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(0).replaceFirst(Pattern.quote(m.group(1)), "<--->"));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb.toString());
// output => plan p<--->s <--->der p<--->itia
See another Java demo
Here, since we process a match by match, we should only replace the Group 1 contents once with replaceFirst, and since we replace the substring as a literal, we should Pattern.quote it.

To dynamically control the replacement value, use a find() loop with appendReplacement(), finalizing the result with appendTail().
That way you have full control of the replacement value. In your case, the pattern is the following, and you can get the positions indicated.
start(1)
↓ end(1)
↓ ↓
\\w*(lan)\\w+
↑ ↑
start() end()
You can then extract the values to keep.
String input = "plan plans lander planitia";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher(input);
while (m.find())
m.appendReplacement(buf, input.substring(m.start(), m.start(1)) +
"<--->" +
input.substring(m.end(1), m.end()));
String output = m.appendTail(buf).toString();
System.out.println(output);
Output
plan p<--->s <--->der p<--->itia
If you don't like that it uses the original string, you can use the matched substring instead.
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher("plan plans lander planitia");
while (m.find()) {
String match = m.group();
int start = m.start();
m.appendReplacement(buf, match.substring(0, m.start(1) - start) +
"<--->" +
match.substring(m.end(1) - start, m.end() - start));
}
String output = m.appendTail(buf).toString();

While Wiktors explanation of the use of capturing groups is completely correct, you could avoid using them at all. The \\w* at the start of your pattern seems irrelevant, as you want to keep it anyways, so we can simply leave it out of the pattern. The check for a word-character after lan can be done using a lookahead, like (?=\w), so we actually only match lan in a pattern like "lan(?=\\w)" and can do a simple replace with "<--->" (or whatever you like).

I like others solutions. This is slightly optimalised bulletproof version:
public static void main (String [] args) {
int groupPosition = 1;
String replacement = "foo";
Pattern r = Pattern.compile("foo(bar)");
Matcher m = r.matcher("bar1234foobar1234bar");
StringBuffer sb = new StringBuffer();
while (m.find()) {
StringBuffer buf = new StringBuffer(m.group());
buf.replace(m.start(groupPosition)-m.start(), m.end(groupPosition)-m.start(), replacement);
m.appendReplacement(sb, buf.toString());
}
m.appendTail(sb);
System.out.println(sb.toString()); // result is "bar1234foofoo1234bar"
}

replace substring using regex

I have a string which contains many <xxx> values.
I want to retrive the value inside <>, do some manipulation and re-insert the new value into the string.
What I did is
input = This is <abc_d> a sample <ea1_j> input <lmk_02> string
while(input.matches(".*<.+[\S][^<]>.*"))
{
value = input.substring(input.indexOf("<") + 1, input.indexOf(">"));
//calculate manipulatedValue from value
input = input.replaceFirst("<.+>", manipulatedValue);
}
but after the first iteration, value contains abc_d> a sample <ea1_j> input <lmk_02. I believe indexOf(">") will give the first index of ">". Where did I go wrong?

This is a slightly easier way of accomplishing what you are trying to do:
String input = "This is <abc_d> a sample <ea1_j> input <lmk_02> string";
Matcher matcher = Pattern.compile("<([^>]*)>").matcher(input);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(sb, manipulateValue(matcher.group(1)));
}
matcher.appendTail(sb);
System.out.println(sb.toString());

This is a good use case for the appendReplacement and appendTail idiom:
Pattern p = Pattern.compile("<([^>]+)>");
Matcher m = p.matcher(input);
StringBuffer out = new StringBuffer():
while(m.find()) {
String value = m.group(1);
// calculate manipulatedValue
m.appendReplacement(out, Matcher.quoteReplacement(manipulatedValue));
}
m.appendTail(out);

Try using an escape character \\ to the regex.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex lazy solution for java? - java

Related

Java regex to extract and replace by value

N-th indexOf in String?

JAVA Get text from String

Replace group 1 of Java regex with out replacing the entire regex

replace substring using regex

Categories

Resources