java regex add trailing slash - java

I am trying to redirect the urls to add trailing slash
/news -> /news/
/news?param1=value1 -> /news/?param1=value
/news#anchor?param1=value1 -> /news/#anchor?param1=value1
I need to do it through a regex that identifies only the path and add /. When there are no parameters there is no problem.
^(/[a-z0–9/_\-]*[^/])$ -> $1/
But when there are parameters I am not able to create the regular expression that separates the path from the parameters.
Any ideas?, thanks

Might be just need to extend the end of string past the parameters.
To cover both with and without parameters might be:
^(/[a-z0–9/_-]*(?<!/))([^/]*)$ -> $1/$2
see https://regex101.com/r/Iwl23o/2

You shouldn't match the end of the string with $ and there is no need for [^/] at the end either.
^(/[a-z0–9/_\-]*)
const regex = new RegExp("^(/[a-z0–9/_\-]*)");
console.log("/news".replace(regex, "$1/"));
console.log("/news?param1=value1".replace(regex, "$1/"));
console.log("/news#anchor?param1=value1".replace(regex, "$1/"));

You can use a very simple regex like this:
^([/\w]+)
With this replacement string: $1/
Working demo

The pattern you tried matches only /news because the anchor $ asserts the end of the string.
If you omit the anchor, it would also match the ? and # as you use [^/] which matches any char except a forward slash.
You could repeat 1 or more times matching a forward slash followed by 1 or more times any char listed in the character class to prevent matching ///
In the replacement use the full match and add a a forward slash.
^(?:/[a-z0-9_-]+)+
Regex demo | Java demo
String regex = "^(?:/[a-z0-9_-]+)+";
String string = "/news\n"
+ "/news?param1=value1\n"
+ "/news#anchor?param1=value1";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
String result = matcher.replaceAll("$0/");
System.out.println(result);
Output
/news/
/news/?param1=value1
/news/#anchor?param1=value1
Note that in your regex, the hyphen in this part 0–9 is
https://www.compart.com/en/unicode/U+2013 instead of https://www.compart.com/en/unicode/U+002D

You can do it as follows:
public class Main {
public static void main(final String[] args) {
String[] arr = { "/news", "/news?param1=value1", "/news#anchor?param1=value1" };
for (String s : arr) {
System.out.println(s.replaceFirst("([^\\/\\p{Punct}]+)", "$1/"));
}
}
}
Output:
/news/
/news/?param1=value1
/news/#anchor?param1=value1
Explanation of the regex:
(: Start of capturing group#1
[: Start of character classes
^: None of
\/: A / character
\p{Punct}: A punctuation character.
]: End of character classes
+: One or more times
): End of capturing group#1

Related

Regular expression to handle two different file extensions

I am trying to create a regular expression that takes a file of name
"abcd_04-04-2020.txt" or "abcd_04-04-2020.txt.gz"
How can I handle the "OR" condition for the extension. This is what I have so far
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3})")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
This handles only the .txt. How can I handle ".txt.gz"
Thanks
Why not just use endsWith instead complex regex
if(fileName.endsWith(".txt") || fileName.endsWith(".txt.gz")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
You can use the below regex to achieve your purpose:
^[\w-]+\d{2}-\d{2}-\d{4}\.txt(?:\.gz)?$
Explanation of the above regex:]
^,$ - Matches start and end of the test string resp.
[\w-]+ - Matches word character along with hyphen one or more times.
\d{} - Matches digits as many numbers as mentioned in the curly braces.
(?:\.gz)? - Represents non-capturing group matching .gz zero or one time because of ? quantifier. You could have used | alternation( or as you were expecting OR) but this is legible and more efficient too.
You can find the demo of the above regex here.
IMPLEMENTATION IN JAVA:
import java.util.regex.*;
public class Main
{
private static final Pattern pattern = Pattern.compile("^[\\w-]+\\d{2}-\\d{2}-\\d{4}\\.txt(?:\\.gz)?$", Pattern.MULTILINE);
public static void main(String[] args) {
String testString = "abcd_04-04-2020.txt\nabcd_04-04-2020.txt.gz\nsomethibsnfkns_05-06-2020.txt\n.txt.gz";
Matcher matcher = pattern.matcher(testString);
while(matcher.find()){
System.out.println(matcher.group(0));
}
}
}
You can find the implementation of the above regex in java in here.
NOTE: If you want to match for valid dates also; please visit this.
You can replace .[a-zA-Z]{3} with .txt(\.gz)
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}).txt(\.gz)?")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
? will work for your required | . Try adding
(.[a-zA-Z]{2})?
to your original regex
([\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3}(.[a-zA-Z]{2})?)
A possible way of doing it:
Pattern pattern = Pattern.compile("^[\\w._-]+_\\d{2}-\\d{2}-\\d{4}(\\.txt(\\.gz)?)$");
Then you can run the following test:
String[] fileNames = {
"abcd_04-04-2020.txt",
"abcd_04-04-2020.tar",
"abcd_04-04-2020.txt.gz",
"abcd_04-04-2020.png",
".txt",
".txt.gz",
"04-04-2020.txt"
};
Arrays.stream(fileNames)
.filter(fileName -> pattern.matcher(fileName).find())
.forEach(System.out::println);
// output
// abcd_04-04-2020.txt
// abcd_04-04-2020.txt.gz
I think what you want (following from the direction you were going) is this:
[\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.[a-zA-Z]{3}(?:$|\\.[a-zA-Z]{2}$)
At the end, I have a conditional statement. It has to either match the end of the string ($) OR it has to match a literal dot followed by 2 letters (\\.[a-zA-Z]{2}). Remember to escape the ., because in regex . means "match any character".

Regex including date string, email, number

I have this regex expression:
String patt = "(\\w+?)(:|<|>)(\\w+?),";
Pattern pattern = Pattern.compile(patt);
Matcher matcher = pattern.matcher(search + ",");
I am able to match a string like
search = "firstName:Giorgio"
But I'm not able to match string like
search = "email:giorgio.rossi#libero.it"
or
search = "dataregistrazione:27/10/2016"
How I should modify the regex expression in order to match these strings?
You may use
String pat = "(\\w+)[:<>]([^,]+)"; // Add a , at the end if it is necessary
See the regex demo
Details:
(\w+) - Group 1 capturing 1 or more word chars
[:<>] - one of the chars inside the character class, :, <, or >
([^,]+) - Group 2 capturing 1 or more chars other than , (in the demo, I added \n as the demo input text contains newlines).
You can use regex like this:
public static void main(String[] args) {
String[] arr = new String[]{"firstName:Giorgio", "email:giorgio.rossi#libero.it", "dataregistrazione:27/10/2016"};
String pattern = "(\\w+[:|<|>]\\w+)|(\\w+:\\w+\\.\\w+#\\w+\\.\\w+)|(\\w+:\\d{1,2}/\\d{1,2}/\\d{4})";
for(String str : arr){
if(str.matches(pattern))
System.out.println(str);
}
}
output is:
firstName:Giorgio
email:giorgio.rossi#libero.it
dataregistrazione:27/10/2016
But you have to remember that this regex will work only for your format of data. To make up the universal regex you should use RFC documents and articles (i.e here) about email format. Also this question can be useful.
Hope it helps.
The Character class \w matches [A-Za-z0-9_]. So kindly change the regex as (\\w+?)(:|<|>)(.*), to match any character from : to ,.
Or mention all characters that you can expect i.e. (\\w+?)(:|<|>)[#.\\w\\/]*, .

Java Regex to extract substring with optional trailing slash

Regex:
\/test\/(.*|\/?)
Input
/something/test/{abc}/listed
/something/test/{abc}
Expected
{abc} for both the inputs
You need to capture all characters other than / after /test/:
String s = "/something/test/{abc}/listed";
Pattern pattern = Pattern.compile("/test/([^/]+)"); // or "/test/\\{([^/}]+)"
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
See the online demo
Details:
/test/ - matches /test/
([^/]+) - matches and captures into Group 1 one or more (+) (but as many as possible, since + is greedy) characters other than / (due to the negated character class [^/]).
Note that in Java regex patterns you do not need to escape / since it is not a special character and one needs no regex delimiters.
This should work for you :
public static void main(String[] args) {
String s1 = "/something/test/{abc}/listed";
String s2 = "/something/test/{abc}";
System.out.println(s1.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
System.out.println(s2.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
}
O/P :
{abc}
{abc}
Regex (as Java string, that is with doubled backslashes):
".*\\/test\\/([^/]*).*"

Regex matching up to a character if it occurs

I need to match string as below:
match everything upto ;
If - occurs, match only upto - excluding -
For e.g. :
abc; should return abc
abc-xyz; should return abc
Pattern.compile("^(?<string>.*?);$");
Using above i can achieve half. but dont know how to change this pattern to achieve the second requirement. How do i change .*? so that it stops at forst occurance of -
I am not good with regex. Any help would be great.
EDIT
I need to capture it as group. i cant change it since there many other patterns to match and capture. Its only part of it that i have posted.
Code looks something like below.
public static final Pattern findString = Pattern.compile("^(?<string>.*?);$");
if(findString.find())
{
return findString.group("string"); //cant change anything here.
}
Just use a negated char class.
^[^-;]*
ie.
Pattern p = Pattern.compile("^[^-;]*");
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group());
}
This would match any character at the start but not of - or ;, zero or more times.
This should do what you are looking for:
[^-;]*
It matches characters that are not - or ;.
Tipp: If you don't feel sure with regular expressions there are great online solutions to test your input, e.g. https://regex101.com/
UPDATE
I see you have an issue in the code since you try to access .group in the Pattern object, while you need to use the .group method of the Matcher object:
public static String GetTheGroup(String str) {
Pattern findString = Pattern.compile("(?s)^(?<string>.*?)[;-]");
Matcher matcher = findString.matcher(str);
if (matcher.find())
{
return matcher.group("string"); //you have to change something here.
}
else
return "";
}
And call it as
System.out.println(GetTheGroup("abc-xyz;"));
See IDEONE demo
OLD ANSWER
Your ^(?<string>.*?);$ regex only matches 0 or more characters other than a newline from the beginning up to the first ; that is the last character in the string. I guess it is not what you expect.
You should learn more about using character classes in regex, as you can match 1 symbol from a specified character set that is defined with [...].
You can achieve this with a String.split taking the first element only and a [;-] regex that matches a ; or - literally:
String res = "abc-xyz;".split("[;-]")[0];
System.out.println(res);
Or with replaceAll with (?s)[;-].*$ regex (that matches the first ; or - and then anything up to the end of string:
res = "abc-xyz;".replaceAll("(?s)[;-].*$", "");
System.out.println(res);
See IDEONE demo
I have found the solution without removing groupings.
(?<string>.*?) matches everything upto next grouping pattern
(?:-.*?)? followed by a non grouping pattern starts with - and comes zero or once.
; end character.
So putting all together:
public static final Pattern findString = Pattern.compile("^(?<string>.*?)(?:-.*?)?;$");
if(findString.find())
{
return findString.group("string"); //cant change anything here.
}

How to extract a substring using regex for this pattern

i have to extract a string between / and ?, i.e exampleproduct
https://local.host.com/order/faces/Home/myorder/exampleproduct?_adf.ctrl-state=mfun9p14r_19
how to write regular expression for this
i am using this logic but i am unable to
private static String extractPageNameFromURL(String urlFull) {
if (urlFull != null) {
Pattern pattern = Pattern.compile("/(.*?).jspx?");
Matcher matcher = pattern.matcher(urlFull);
while (matcher.find()) {
String str1 = matcher.group(1);
String[] dataRows = str1.split("/");
urlFull = dataRows[dataRows.length - 1];
}
}
return urlFull;
}
public static void main(String[] args) {
System.out.println(DtmUtils.extractPageNameFromURL("https://local.host.com/order/faces/Home/myorder/exampleproduct?_adf.ctrl-state=mfun9p14r_19"));
}
Thanks
Raj
If I'm following what you're asking, then you're attempting to pull exampleproduct from the URL.
Here's the regex to use to accomplish this. Group 1 should have the name after the last / and before the first ? after that slash.
^.*\/([^?]+)\?.*$
See an example of the regex
^ -- Beginning of line anchor
.* -- find 0 or more characters. Note the * is greedy.
\/ -- Find a literal /.
([^?]+) -- Capture everything that is not a question mark
\? -- Find a literal question mark
.* -- now match everything after the question mark
$ -- end of line anchor
and here's a quick example of using it in Java. This is a quick example, and will need to be modified before using it.
String urlFull = "https://local.host.com/order/faces/Home/myorder/exampleproduct?_adf.ctrl-state=mfun9p14r_19";
Pattern pattern = Pattern.compile("^.*\\/([^?]+)\\?.*$");
Matcher matcher = pattern.matcher(urlFull);
matcher.find();
String p = matcher.group(1);
System.out.println(p);
I didn't follow why the original regex you wrote had the .jspx?, but if there's more to the problem you'll need to update the question to explain.
To match exampleproduct in your input this lookahead based regex will work for you:
[^/]+(?=\?)
In Java code:
Pattern pat = Pattern.compile("[^/]+(?=\\?)");
RegEx Demo
This pattern might work for you \?(.*).
This regex finds a question mark and selects everything after it.
I tried this pattern with your path and it worked fine: ([\w_-]*)(\.)*([\w_-]*)?(?=\?)
It would also match, if your filename had a file ending.

Categories

Resources