I need to extract a sub-string of a URL.
URLs
/service1/api/v1.0/foo -> foo
/service1/api/v1.0/foo/{fooId} -> foo/{fooId}
/service1/api/v1.0/foo/{fooId}/boo -> foo/{fooId}/boo
And some of those URLs may have request parameters.
Code
String str = request.getRequestURI();
str = str.substring(str.indexOf("/") + 1);
str = str.substring(str.indexOf("/") + 1);
str = str.substring(str.indexOf("/") + 1);
str = str.substring(str.indexOf("/") + 1, str.indexOf("?"));
Is there a better way to extract the sub-string instead of recurrent usage of indexOf method?
There are many alternative ways:
Use Java-Stream API on splitted String with \ delimiter:
String str = "/service1/api/v1.0/foo/{fooId}/boo";
String[] split = str.split("\\/");
String url = Arrays.stream(split).skip(4).collect(Collectors.joining("/"));
System.out.println(url);
With the elimination of the parameter, the Stream would be like:
String url = Arrays.stream(split)
.skip(4)
.map(i -> i.replaceAll("\\?.+", ""))
.collect(Collectors.joining("/"));
This is also where Regex takes its place! Use the classes Pattern and Matcher.
String str = "/service1/api/v1.0/foo/{fooId}/boo";
Pattern pattern = Pattern.compile("\\/.*?\\/api\\/v\\d+\\.\\d+\\/(.+)");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
If you rely on the indexOf(..) usage, you might want to use the while-loop.
String str = "/service1/api/v1.0/foo/{fooId}/boo?parameter=value";
String string = str;
while(!string.startsWith("v1.0")) {
string = string.substring(string.indexOf("/") + 1);
}
System.out.println(string.substring(string.indexOf("/") + 1, string.indexOf("?")));
Other answers include a way that if the prefix is not mutable, you might want to use only one call of idndexOf(..) method (#JB Nizet):
string.substring("/service1/api/v1.0/".length(), string.indexOf("?"));
All these solutions are based on your input and fact, the pattern is known, or at least the number of the previous section delimited with \ or the version v1.0 as a checkpoint - the best solution might not appear here since there are unlimited combinations of the URL. You have to know all the possible combinations of input URL to find the best way to handle it.
Path is quite useful for that :
public static void main(String[] args) {
Path root = Paths.get("/service1/api/v1.0/foo");
Path relativize = root.relativize(Paths.get("/service1/api/v1.0/foo/{fooId}/boo"));
System.out.println(relativize);
}
Output :
{fooId}/boo
How about this:
String s = "/service1/api/v1.0/foo/{fooId}/boo";
String[] sArray = s.split("/");
StringBuilder sb = new StringBuilder();
for (int i = 4; i < sArray.length; i++) {
sb.append(sArray[i]).append("/");
}
sb.deleteCharAt(sb.length() - 1);
System.out.println(sb.toString());
Output:
foo/{fooId}/boo
If the url prefix is always /service1/api/v1.0/, you just need to do s.substring("/service1/api/v1.0/".length()).
There are a few good options here.
1) If you know "foo" will always be the 4th token, then you have the right idea already. The only issue with your way is that you have the information you need to be efficient, but you aren't using it. Instead of copying the String multiple times and looping anew from the beginning of the new String, you could just continue from where you left off, 4 times, to find the starting point of what you want.
String str = "/service1/api/v1.0/foo/{fooId}/boo";
// start at the beginning
int start = 0;
// get the 4th index of '/' in the string
for (int i = 0; i != 4; i++) {
// get the next index of '/' after the index 'start'
start = str.indexOf('/',start);
// increase the pointer to the next character after this slash
start++;
}
// get the substring
str = str.substring(start);
This will be far, far more efficient than any regex pattern.
2) Regex: (java.util.regex.*). This will work if you what you want is always preceded by "service1/api/v1.0/". There may be other directories before it, e.g. "one/two/three/service1/api/v1.0/".
// \Q \E will automatically escape any special chars in the path
// (.+) will capture the matched text at that position
// $ marks the end of the string (technically it matches just before '\n')
Pattern pattern = Pattern.compile("/service1/api/v1\\.0/(.+)$");
// get a matcher for it
Matcher matcher = pattern.matcher(str);
// if there is a match
if (matcher.find()) {
// get the captured text
str = matcher.group(1);
}
If your path can vary some, you can use regex to account for it. e.g.: service/api/v3/foo/{bar}/baz/" (note varying number formats and trailing '/') could be matched as well by changing the regex to "/service\\d*/api/v\\d+(?:\\.\\d+)?/(.+)(?:/|$)"
Related
Input String
${abc.xzy}/demo/${ttt.bbb}
test${kkk.mmm}
RESULT
World/demo/Hello
testSystem
The text inside the curly brackets are keys to my properties. I want to replace those properties with run time values.
I can do the following to get the regex match but what should i put in the replace logic to change the ${..} matched with the respective run time value in the input string.
Pattern p = Pattern.compile("\\{([^}]*)\\}");
Matcher m = p.matcher(s);
while (m.find()) {
// replace logic comes here
}
An alternative may be using a third-party lib such as Apache Commons Text.
They have StringSubstitutor class looks very promising.
Map valuesMap = HashMap();
valuesMap.put("abc.xzy", "World");
valuesMap.put("ttt.bbb", "Hello");
valuesMap.put("kkk.mmm", "System");
String templateString = "${abc.xzy}/demo/${ttt.bbb} test${kkk.mmm}"
StringSubstitutor sub = new StringSubstitutor(valuesMap);
String resolvedString = sub.replace(templateString);
For more info check out Javadoc https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringSubstitutor.html
You may use the following solution:
String s = "${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}";
Map<String, String> map = new HashMap<String, String>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "Hello");
map.put("kkk.mmm", "System");
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\\$\\{([^{}]+)\\}").matcher(s);
while (m.find()) {
String value = map.get(m.group(1));
m.appendReplacement(result, value != null ? value : m.group());
}
m.appendTail(result);
System.out.println(result.toString());
See the Java demo online, output:
World/demo/Hello
testSystem
The regex is
\$\{([^{}]+)\}
See the regex demo. It matches a ${ string, then captures any 1+ chars other than { and } into Group 1 and then matches }. If Group 1 value is present in the Map as a key, the replacement is the key value, else, the matched text is pasted back where it was in the input string.
Your regex needs to include the dollar. Also making the inner group lazy is sufficient to not include any } in the resulting key String.
String regex = "\\$\\{(.+?)\\}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while (m.find()) {
String key = m.group(1); // This is your matching group (the one in braces).
String value = someMap.get(key);
s.replaceFirst(regex, value != null ? value : "missingKey");
m = p.matcher(s); // you could alternatively reset the existing Matcher, but just create a new one, for simplicity's sake.
}
You could streamline this, by extracting the cursor position, and doing the replacement yourself, for the string. But either way, you need to reset your matcher, because otherwise it will parse on the old String.
The_Cute_Hedgehog's answer is good, but includes a dependency.
Wiktor Stribiżew's answer is missing a special case.
My answer aim to using java build-in regex and try to improve from Wiktor Stribiżew's answer. (Improve in Java code only, the regex is Ok)
Improvements:
Using StringBuilder is faster than StringBuffer
Initial StringBuilder capable to (int)(s.length()*1.2), avoid relocating memory many times in case of large input template s.
Avoid the case of regex special characters make wrong result by appendReplacement (like "cost: $100"). You can fix this problem in Wiktor Stribiżew's code by escape $ character in the replacement String like this value.replaceAll("\\$", "\\\\\\$")
Here is the improved code:
String s = "khj${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}{kkk.missing}string";
Map<String, String> map = new HashMap<>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "cost: $100");
map.put("kkk.mmm", "System");
StringBuilder result = new StringBuilder((int)(s.length()*1.2));
Matcher m = Pattern.compile("\\$\\{([^}]+)\\}").matcher(s);
int nonCaptureIndex = 0;
while (m.find()) {
String value = map.get(m.group(1));
if (value != null) {
int index = m.start();
if (index > nonCaptureIndex) {
result.append(s.substring(nonCaptureIndex, index));
}
result.append(value);
nonCaptureIndex = m.end();
}
}
result.append(s.substring(nonCaptureIndex, s.length()));
System.out.println(result.toString());
I have a string "hooRayNexTcapItaLnextcapitall"
I want to capture the first instance of "next" (NexT - in this case)
My soultion:
(.*)([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])(.*)
My solution group1 returns next instead of Next
How can I correct my regex to capture the first next instead of capturing the last next?
Edit 1:
Let me put my question properly,
If the string contains any combination of upper and lower case letters that spell "NextCapital", reverse the characters of the word "Next". Case should be preserved. If "NextCapital" occurs multiple times, only update the first occurrence.
So, I am using group to capture. But my group is capturing the last occurrence of "nextCapital" instead of first occurrence.
Ex:
Input: hooRayNexTcapItaLnextcapitall
output: hooRayTxeNcapItaLnextcapitall
Edit 2:
Please correct my code.
My java code:
Pattern ptn = Pattern.compile("(.*)([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])(.*)");
//sb = hooRayNexTcapItaLnextcapitall
Matcher mtc = ptn.matcher(sb);
StringBuilder c = new StringBuilder();
if(mtc.find()){
StringBuilder d = new StringBuilder();
StringBuilder e = new StringBuilder();
d.append(mtc.group(1));
e.append(mtc.group(2));
e.reverse();
d.append(e);
d.append(mtc.group(3));
d.append(mtc.group(4));
sb = d;
}
Your regex actually works if you get group 2. Test it here! Your regex does not need to be that complicated.
Your regex can just be this:
next
If you use Matcher.find and turn on CASE_INSENSITIVE option, you can find the first substring of the string that matches the pattern. Then, use group() to get the actual string:
Matcher matcher = Pattern.compile("next", Pattern.CASE_INSENSITIVE).matcher("hooRayNexTcapItaLnextcapitall");
if (matcher.find()) {
System.out.println(matcher.group());
}
EDIT:
After seeing your requirements, I wrote this code:
String input = "hooRayNexTcapItaLnextcapitall";
Matcher m = Pattern.compile("next(?=capital)", Pattern.CASE_INSENSITIVE).matcher(input);
if (m.find()) {
StringBuilder outputBuilder = new StringBuilder(input);
StringBuilder reverseBuilder = new StringBuilder(input.substring(m.start(), m.end()));
outputBuilder.replace(m.start(), m.end(), reverseBuilder.reverse().toString());
System.out.println(outputBuilder);
}
I used a lookahead to match next only if there is capital after it. After a match is found, I created a string builder with the input, and another string builder with the matched portion of the input. Then, I replaced the matched range with the reverse of the second string builder.
String target = "next";
int index = line.toLowerCase().indexOf(target);
if (index != -1) {
line = line.substring(index, index + target.length());
System.out.println(line);
} else {
System.out.println("Not Found");
}
This would be my first attempt which allows room for adjusting the desired String to locate.
Otherwise you may use this ReGeX solution to achieve the same effect:
Pattern pattern = Pattern.compile("(?i)next");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(matcher.group());
}
The pattern "(?i)next" finds the substring matching "next" ignoring case.
Edit : This would reverse the order of the first occurrence of next.
String input = "hooRayNexTcapItaLnextcapitall";
String target = "nextcapital";
int index = input.toLowerCase().indexOf(target);
if (index != -1) {
String first = input.substring(index, index + target.length());
first = new StringBuilder(first.substring(0, 4)).reverse().toString() + first.substring(4, first.length());
input = input.substring(0, index) + first + input.substring(index + target.length(), input.length());
}
Edit Again : Here is a "fixed" form of your code.
String input = "hooRayNexTcapItaLnextcapitall";
Pattern ptn = Pattern.compile("([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])");
Matcher mtc = ptn.matcher(input);
if(mtc.find()){
StringBuilder d = new StringBuilder(mtc.group(1));
StringBuilder e = new StringBuilder(mtc.group(2));
input = input.replaceFirst(d.toString() + e.toString(), d.reverse().toString() + e.toString());
System.out.println(input);
}
Your regex is grabbing the second potential match for your group due to the default greedy nature of regex. Effectively, the first (.*) is grabbing as much as it can while still satisfying the rest of your regex.
To get what you intend, you can add a question mark to the first group, making it (.*?). This will make it non-greedy, grabbing the smallest string possible while still satisfying the rest of your regex.
I have a string (which is an URL) in this pattern https://xxx.kflslfsk.com/kjjfkskfjksf/v1/files/media/93939393hhs8.jpeg
now I want to clip it to this
media/93939393hhs8.jpeg
I want to remove all the characters before the second last slash /.
i'm a newbie in java but in swift (iOS) this is how we do this:
if let url = NSURL(string:"https://xxx.kflslfsk.com/kjjfkskfjksf/v1/files/media/93939393hhs8.jpeg"), pathComponents = url.pathComponents {
let trimmedString = pathComponents.suffix(2).joinWithSeparator("/")
print(trimmedString) // "output = media/93939393hhs8.jpeg"
}
Basically, I'm removing everything from this Url expect of last 2 item and then.
I'm joining those 2 items using /.
String ret = url.substring(url.indexof("media"),url.indexof("jpg"))
Are you familiar with Regex? Try to use this Regex (explained in the link) that captures the last 2 items separated with /:
.*?\/([^\/]+?\/[^\/]+?$)
Here is the example in Java (don't forget the escaping with \\:
Pattern p = Pattern.compile("^.*?\\/([^\\/]+?\\/[^\\/]+?$)");
Matcher m = p.matcher(string);
if (m.find()) {
System.out.println(m.group(1));
}
Alternatively there is the split(..) function, however I recommend you the way above. (Finally concatenate separated strings correctly with StringBuilder).
String part[] = string.split("/");
int l = part.length;
StringBuilder sb = new StringBuilder();
String result = sb.append(part[l-2]).append("/").append(part[l-1]).toString();
Both giving the same result: media/93939393hhs8.jpeg
string result=url.substring(url.substring(0,url.lastIndexOf('/')).lastIndexOf('/'));
or
Use Split and add last 2 items
string[] arr=url.split("/");
string result= arr[arr.length-2]+"/"+arr[arr.length-1]
public static String parseUrl(String str) {
return (str.lastIndexOf("/") > 0) ? str.substring(1+(str.substring(0,str.lastIndexOf("/")).lastIndexOf("/"))) : str;
}
This is the string that I have:
KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007
This is a weather report. I need to extract the following numbers from the report: 10/M13. It is temperature and dewpoint, where M means minus. So, the place in the String may differ and the temperature may be presented as M10/M13 or 10/13 or M10/13.
I have done the following code:
public String getTemperature (String metarIn){
Pattern regex = Pattern.compile(".*(\\d+)\\D+(\\d+)");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 1) {
temperature = matcher.group(1);
System.out.println(temperature);
}
return temperature;
}
Obviously, the regex is wrong, since the method always returns null. I have tried tens of variations but to no avail. Thanks a lot if someone can help!
This will extract the String you seek, and it's only one line of code:
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
Here's some test code:
public static void main(String[] args) throws Exception {
String input = "KLAS 282356Z 32010KT 10SM FEW090 M01/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
System.out.println(tempAndDP);
}
Output:
M01/M13
The regex should look like:
M?\d+/M?\d+
For Java this will look like:
"M?\\d+/M?\\d+"
You might want to add a check for white space on the front and end:
"\\sM?\\d+/M?\\d+\\s"
But this will depend on where you think you are going to find the pattern, as it will not be matched if it is at the end of the string, so instead we should use:
"(^|\\s)M?\\d+/M?\\d+($|\\s)"
This specifies that if there isn't any whitespace at the end or front we must match the end of the string or the start of the string instead.
Example code used to test:
Pattern p = Pattern.compile("(^|\\s)M?\\d+/M?\\d+($|\\s)");
String test = "gibberish M130/13 here";
Matcher m = p.matcher(test);
if (m.find())
System.out.println(m.group().trim());
This returns: M130/13
Try:
Pattern regex = Pattern.compile(".*\\sM?(\\d+)/M?(\\d+)\\s.*");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 2) {
temperature = matcher.group(1);
System.out.println(temperature);
}
Alternative for regex.
Some times a regex is not the only solution. It seems that in you case, you must get the 6th block of text. Each block is separated by a space character. So, what you need to do is count the blocks.
Considering that each block of text does NOT HAVE fixed length
Example:
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
int spaces = 5;
int begin = 0;
while(spaces-- > 0){
begin = s.indexOf(' ', begin)+1;
}
int end = s.indexOf(' ', begin+1);
String result = s.substring(begin, end);
System.out.println(result);
Considering that each block of text does HAVE fixed length
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String result = s.substring(33, s.indexOf(' ', 33));
System.out.println(result);
Prettier alternative, as pointed by Adrian:
String result = rawString.split(" ")[5];
Note that split acctualy receives a regex pattern as parameter
I want to parse a line from a CSV(comma separated) file, something like this:
Bosh,Mark,mark#gmail.com,"3, Institute","83, 1, 2",1,21
I have to parse the file, and instead of the commas between the apostrophes I wanna have ';', like this:
Bosh,Mark,mark#gmail.com,"3; Institute","83; 1; 2",1,21
I use the following Java code but it doesn't parse it well:
Pattern regex = Pattern.compile("(\"[^\\]]*\")");
Matcher matcher = regex.matcher(line);
if (matcher.find()) {
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
the output is:
Bosh,Mark,mark#gmail.com,"3; Institute";"83; 1; 2",1,21
anyone have any idea how to fix this?
This is my solution to replace , inside quote to ;. It assumes that if " were to appear in a quoted string, then it is escaped by another ". This property ensures that counting from start to the current character, if the number of quotes " is odd, then that character is inside a quoted string.
// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);
int start = 0;
StringBuilder output = new StringBuilder();
while (matcher.find()) {
// System.out.println(m.group() + "\n " + m.start() + " " + m.end());
output
.append(line.substring(start, matcher.start())) // Append unrelated contents
.append(matcher.group().replaceAll(",", ";")); // Append replaced string
start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents
// System.out.println(output);
Although I cannot find any case that will fail the method of replace the matched group like you did in line = line.replace(matcher.group(), replacedMatch);, I feel safer to rebuild the string from scratch.
Here's a way:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
String in = "Bosh,Mark,mark#gmail.com,\"3, \"\" Institute\",\"83, 1, 2\",1,21";
String regex = "[^,\"\r\n]+|\"(\"\"|[^\"])*\"";
Matcher matcher = Pattern.compile(regex).matcher(in);
StringBuilder out = new StringBuilder();
while(matcher.find()) {
out.append(matcher.group().replace(',', ';')).append(',');
}
out.deleteCharAt(out.length() - 1);
System.out.println(in + "\n" + out);
}
}
which will print:
Bosh,Mark,mark#gmail.com,"3, "" Institute","83, 1, 2",1,21
Bosh,Mark,mark#gmail.com,"3; "" Institute","83; 1; 2",1,21
Tested on Ideone: http://ideone.com/fCgh7
Here is the what you need
String line = "Bosh,Mark,mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Matcher matcher = regex.matcher(line);
while(matcher.find()){
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
line will have value you needed.
Have you tried to make the RegExp lazy?
Another idea: inside the [] you should use a " too. If you do that, you should have the expected output with global flag set.
Your regex is faulty. Why would you want to make sure there are no ] within the "..." expression? You'd rather make the regex reluctant (default is eager, which means it catches as much as it can).
"(\"[^\\]]*\")"
should be
"(\"[^\"]*\")"
But nhadtdh is right, you should use a proper CSV library to parse it and replace , to ; in the values the parser returns.
I'm sure you'll find a parser when googling "Java CSV parser".
Shouldn't your regex be ("[^"]*") instead? In other words, your first line should be:
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Of course, this is assuming you can't have quotes in the quoted values of your input line.