Java Regex to extract substring with optional trailing slash - java

Regex:
\/test\/(.*|\/?)
Input
/something/test/{abc}/listed
/something/test/{abc}
Expected
{abc} for both the inputs

You need to capture all characters other than / after /test/:
String s = "/something/test/{abc}/listed";
Pattern pattern = Pattern.compile("/test/([^/]+)"); // or "/test/\\{([^/}]+)"
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
See the online demo
Details:
/test/ - matches /test/
([^/]+) - matches and captures into Group 1 one or more (+) (but as many as possible, since + is greedy) characters other than / (due to the negated character class [^/]).
Note that in Java regex patterns you do not need to escape / since it is not a special character and one needs no regex delimiters.

This should work for you :
public static void main(String[] args) {
String s1 = "/something/test/{abc}/listed";
String s2 = "/something/test/{abc}";
System.out.println(s1.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
System.out.println(s2.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
}
O/P :
{abc}
{abc}

Regex (as Java string, that is with doubled backslashes):
".*\\/test\\/([^/]*).*"

Related

Regex for extracting digits in version format

I am going to extract numbers from a string. Numbers represents a version.
It means, I am going to match numbers which are between:
_ and /
/ and /
I have prepared the following regex, but it doesn't work as expected:
.*[\/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})\/.*
For the following example, the regex should match twice:
Input: name_1.1.1/9.10.0/abc. Expected result: 1.1.1 and 9.10.0
, but my regex returns only 9.10.0, 1.1.1 is omitted. Do you have any idea what is wrong?
You could just split the string on _ or /, and then retain components which appear to be versions:
List<String> versions = new ArrayList<>();
String input = "name_1.1.1/9.10.0/abc";
String[] parts = input.split("[_/]");
for (String part : parts) {
if (part.matches("\\d+(?:\\.\\d+)*")) {
versions.add(part);
}
}
System.out.println(versions); // [1.1.1, 9.10.0]
You can assert the / at the end instead of matching it, and omit the .*
Note that you don't have to escape the /
[/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})(?=/)
Regex demo | Java demo
Example code
String regex = "[/_](\\d{1,2}[.]\\d{1,2}[.]\\d{1,2})(?=/)";
String string = "name_1.1.1/9.10.0/abc";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
1.1.1
9.10.0
Another option could be using a positive lookbehind to assert either a / or _ to the left, and get a match only.
(?<=[/_])\d{1,2}[.]\d{1,2}[.]\d{1,2}(?=/)
regex demo
Code Demo
String regex = "(\\d+.\\d+.\\d+)";
String string = "name_1.1.1/9.10.0/abc";
String string2 = "randomversion4.5.6/09.7.8_9.88.9";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
Matcher matcher2 = pattern.matcher(string2);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
while (matcher2.find()) {
System.out.println(matcher2.group(1));
}
Out:
1.1.1
9.10.0
4.5.6
09.7.8
9.88.9
Just write regex for what you want to match. In this case just the version number.
Regex can be used to match whole strings or to find if there is a substring that exists in a string.
When using regex to find a substring, you cannot always match all filenames or any string. Hence only match on what you want to find.
This way you can find the versions no matter what string it is in.

Parse string using Java Regex Pattern?

I have the below java string in the below format.
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:"
Using the java.util.regex package matter and pattern classes I have to get the output string int the following format:
Output: [NYK:1100][CLT:2300][KTY:3540]
Can you suggest a RegEx pattern which can help me get the above output format?
You can use this regex \[name:([A-Z]+)\]\[distance:(\d+)\] with Pattern like this :
String regex = "\\[name:([A-Z]+)\\]\\[distance:(\\d+)\\]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append("[");
result.append(matcher.group(1));
result.append(":");
result.append(matcher.group(2));
result.append("]");
}
System.out.println(result.toString());
Output
[NYK:1100][CLT:2300][KTY:3540]
regex demo
\[name:([A-Z]+)\]\[distance:(\d+)\] mean get two groups one the upper letters after the \[name:([A-Z]+)\] the second get the number after \[distance:(\d+)\]
Another solution from #tradeJmark you can use this regex :
String regex = "\\[name:(?<name>[A-Z]+)\\]\\[distance:(?<distance>\\d+)\\]";
So you can easily get the results of each group by the name of group instead of the index like this :
while (matcher.find()) {
result.append("[");
result.append(matcher.group("name"));
//----------------------------^^
result.append(":");
result.append(matcher.group("distance"));
//------------------------------^^
result.append("]");
}
If the format of the string is fixed, and you always have just 3 [...] groups inside to deal with, you may define a block that matches [name:...] and captures the 2 parts into separate groups and use a quite simple code with .replaceAll:
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:";
String matchingBlock = "\\s*\\[name:([A-Z]+)]\\[distance:(\\d+)]";
String res = s.replaceAll(String.format(".*%1$s%1$s%1$s.*", matchingBlock),
"[$1:$2][$3:$4][$5:$6]");
System.out.println(res); // [NYK:1100][CLT:2300][KTY:3540]
See the Java demo and a regex demo.
The block pattern matches:
\\s* - 0+ whitespaces
\\[name: - a literal [name: substring
([A-Z]+) - Group n capturing 1 or more uppercase ASCII chars (\\w+ can also be used)
]\\[distance: - a literal ][distance: substring
(\\d+) - Group m capturing 1 or more digits
] - a ] symbol.
In the .*%1$s%1$s%1$s.* pattern, the groups will have 1 to 6 IDs (referred to with $1 - $6 backreferences from the replacement pattern) and the leading and final .* will remove start and end of the string (add (?s) at the start of the pattern if the string can contain line breaks).

Regex including date string, email, number

I have this regex expression:
String patt = "(\\w+?)(:|<|>)(\\w+?),";
Pattern pattern = Pattern.compile(patt);
Matcher matcher = pattern.matcher(search + ",");
I am able to match a string like
search = "firstName:Giorgio"
But I'm not able to match string like
search = "email:giorgio.rossi#libero.it"
or
search = "dataregistrazione:27/10/2016"
How I should modify the regex expression in order to match these strings?
You may use
String pat = "(\\w+)[:<>]([^,]+)"; // Add a , at the end if it is necessary
See the regex demo
Details:
(\w+) - Group 1 capturing 1 or more word chars
[:<>] - one of the chars inside the character class, :, <, or >
([^,]+) - Group 2 capturing 1 or more chars other than , (in the demo, I added \n as the demo input text contains newlines).
You can use regex like this:
public static void main(String[] args) {
String[] arr = new String[]{"firstName:Giorgio", "email:giorgio.rossi#libero.it", "dataregistrazione:27/10/2016"};
String pattern = "(\\w+[:|<|>]\\w+)|(\\w+:\\w+\\.\\w+#\\w+\\.\\w+)|(\\w+:\\d{1,2}/\\d{1,2}/\\d{4})";
for(String str : arr){
if(str.matches(pattern))
System.out.println(str);
}
}
output is:
firstName:Giorgio
email:giorgio.rossi#libero.it
dataregistrazione:27/10/2016
But you have to remember that this regex will work only for your format of data. To make up the universal regex you should use RFC documents and articles (i.e here) about email format. Also this question can be useful.
Hope it helps.
The Character class \w matches [A-Za-z0-9_]. So kindly change the regex as (\\w+?)(:|<|>)(.*), to match any character from : to ,.
Or mention all characters that you can expect i.e. (\\w+?)(:|<|>)[#.\\w\\/]*, .

Make regex for url in java

Given a string of type :
https://www.abcd.efg/try-till-you-succedd.html
So , I want a regex that give me data from second last '-' , that is you-succedd.html in this case.
public static void main(String[] args)
{
Pattern p = Pattern.compile(".*-\\s*(.*)");
Matcher m = p.matcher("https://www.abcd.efg/try-till-you-succedd.html");
if (m.find())
System.out.println(m.group(1));
}
But it gives success.html only. Please help
Here is a regex you can use
Pattern p = Pattern.compile("-([^-]*-[^-]*$)");
Matcher m = p.matcher("https://www.abcd.efg/try-till-you-succedd.html");
if (m.find())
System.out.println(m.group(1));
See IDEONE demo
Output: you-succedd.html
Regex means...:
- - a literal hyphen
([^-]*-[^-]*$) - a capturing group that will hold the value we need that matches...
[^-]* - 0 or more characters other than a hyphen
- - a hyphen
[^-]*$ - - 0 or more characters other than a hyphen until the end of string ($).
Note that you can add \.html before $ if you want to restrict the matches to strings that end with .html.
UPDATE
To obtain only you-succedd, you can use
String pattern = "-([^-]*-[^-]*)\\.[^.\s-]+$";
Or
String pattern = "-([^-]*-[^-]*)\\.\\w+$";
See a regex demo 1 and demo 2
simply you can use like this
.*-(.*-.*.html)$

Java - Regular Expressions matching one to another

I am trying to retrieve bits of data using RE. Problem is I'm not very fluent with RE. Consider the code.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class HTTP{
private static String getServer(httpresp){
Pattern p = Pattern.compile("(\bServer)(.*[Server:-\r\n]"); //What RE syntax do I use here?
Matcher m = p.matcher(httpresp);
if (m.find()){
return m.group(2);
public static void main(String[] args){
String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n"; //Test data
System.out.println(getServer(testdata));
How would I get "Server:" to the next "\r\n" out which would output "Apache"? I googled around and tried myself, but have failed.
It's a one liner:
private static String getServer(httpresp) {
return httpresp.replaceAll(".*Server: (.*?)\r\n.*", "$1");
}
The trick here is two-part:
use .*?, which is a reluctant match (consumes as little as possible and still match)
regex matches whole input, but desired target captured and returned using a back reference
You could use capturing groups or positive lookbehind.
Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)");
Then print the group index 1.
Example:
String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n";
Matcher matcher = Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
OR
Matcher matcher = Pattern.compile("(?:\\bServer\\b\\S*\\s+)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
Output:
Apache
Explanation:
(?:\\bServer:\\s*) In regex, non-capturing group would be represented as (?:...), which will do matching only. \b called word boundary which matches between a word character and a non-word character. Server: matches the string Server: and the following zero or more spaces would be matched by \s*
(.*?) In regex (..) called capturing group which captures those characters which are matched by the pattern present inside the capturing group. In our case (.*?) will capture all the characters non-greedily upto,
(?=[\r\n]+) one or more line breaks are detected. (?=...) called positive lookahead which asserts that the match must be followed by the characters which are matched by the pattern present inside the lookahead.

Categories

Resources