Regular expression to handle two different file extensions

Regular expression to handle two different file extensions - java

I am trying to create a regular expression that takes a file of name
"abcd_04-04-2020.txt" or "abcd_04-04-2020.txt.gz"
How can I handle the "OR" condition for the extension. This is what I have so far
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3})")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
This handles only the .txt. How can I handle ".txt.gz"
Thanks

Why not just use endsWith instead complex regex
if(fileName.endsWith(".txt") || fileName.endsWith(".txt.gz")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}

You can use the below regex to achieve your purpose:
^[\w-]+\d{2}-\d{2}-\d{4}\.txt(?:\.gz)?$
Explanation of the above regex:]
^,$ - Matches start and end of the test string resp.
[\w-]+ - Matches word character along with hyphen one or more times.
\d{} - Matches digits as many numbers as mentioned in the curly braces.
(?:\.gz)? - Represents non-capturing group matching .gz zero or one time because of ? quantifier. You could have used | alternation( or as you were expecting OR) but this is legible and more efficient too.
You can find the demo of the above regex here.
IMPLEMENTATION IN JAVA:
import java.util.regex.*;
public class Main
{
private static final Pattern pattern = Pattern.compile("^[\\w-]+\\d{2}-\\d{2}-\\d{4}\\.txt(?:\\.gz)?$", Pattern.MULTILINE);
public static void main(String[] args) {
String testString = "abcd_04-04-2020.txt\nabcd_04-04-2020.txt.gz\nsomethibsnfkns_05-06-2020.txt\n.txt.gz";
Matcher matcher = pattern.matcher(testString);
while(matcher.find()){
System.out.println(matcher.group(0));
}
}
}
You can find the implementation of the above regex in java in here.
NOTE: If you want to match for valid dates also; please visit this.

You can replace .[a-zA-Z]{3} with .txt(\.gz)
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}).txt(\.gz)?")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}

? will work for your required | . Try adding
(.[a-zA-Z]{2})?
to your original regex
([\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3}(.[a-zA-Z]{2})?)

A possible way of doing it:
Pattern pattern = Pattern.compile("^[\\w._-]+_\\d{2}-\\d{2}-\\d{4}(\\.txt(\\.gz)?)$");
Then you can run the following test:
String[] fileNames = {
"abcd_04-04-2020.txt",
"abcd_04-04-2020.tar",
"abcd_04-04-2020.txt.gz",
"abcd_04-04-2020.png",
".txt",
".txt.gz",
"04-04-2020.txt"
};
Arrays.stream(fileNames)
.filter(fileName -> pattern.matcher(fileName).find())
.forEach(System.out::println);
// output
// abcd_04-04-2020.txt
// abcd_04-04-2020.txt.gz

I think what you want (following from the direction you were going) is this:
[\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.[a-zA-Z]{3}(?:$|\\.[a-zA-Z]{2}$)
At the end, I have a conditional statement. It has to either match the end of the string ($) OR it has to match a literal dot followed by 2 letters (\\.[a-zA-Z]{2}). Remember to escape the ., because in regex . means "match any character".

Related

java regex add trailing slash

I am trying to redirect the urls to add trailing slash
/news -> /news/
/news?param1=value1 -> /news/?param1=value
/news#anchor?param1=value1 -> /news/#anchor?param1=value1
I need to do it through a regex that identifies only the path and add /. When there are no parameters there is no problem.
^(/[a-z0–9/_\-]*[^/])$ -> $1/
But when there are parameters I am not able to create the regular expression that separates the path from the parameters.
Any ideas?, thanks

Might be just need to extend the end of string past the parameters.
To cover both with and without parameters might be:
^(/[a-z0–9/_-]*(?<!/))([^/]*)$ -> $1/$2
see https://regex101.com/r/Iwl23o/2

You shouldn't match the end of the string with $ and there is no need for [^/] at the end either.
^(/[a-z0–9/_\-]*)
const regex = new RegExp("^(/[a-z0–9/_\-]*)");
console.log("/news".replace(regex, "$1/"));
console.log("/news?param1=value1".replace(regex, "$1/"));
console.log("/news#anchor?param1=value1".replace(regex, "$1/"));

You can use a very simple regex like this:
^([/\w]+)
With this replacement string: $1/
Working demo

The pattern you tried matches only /news because the anchor $ asserts the end of the string.
If you omit the anchor, it would also match the ? and # as you use [^/] which matches any char except a forward slash.
You could repeat 1 or more times matching a forward slash followed by 1 or more times any char listed in the character class to prevent matching ///
In the replacement use the full match and add a a forward slash.
^(?:/[a-z0-9_-]+)+
Regex demo | Java demo
String regex = "^(?:/[a-z0-9_-]+)+";
String string = "/news\n"
+ "/news?param1=value1\n"
+ "/news#anchor?param1=value1";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
String result = matcher.replaceAll("$0/");
System.out.println(result);
Output
/news/
/news/?param1=value1
/news/#anchor?param1=value1
Note that in your regex, the hyphen in this part 0–9 is
https://www.compart.com/en/unicode/U+2013 instead of https://www.compart.com/en/unicode/U+002D

You can do it as follows:
public class Main {
public static void main(final String[] args) {
String[] arr = { "/news", "/news?param1=value1", "/news#anchor?param1=value1" };
for (String s : arr) {
System.out.println(s.replaceFirst("([^\\/\\p{Punct}]+)", "$1/"));
}
}
}
Output:
/news/
/news/?param1=value1
/news/#anchor?param1=value1
Explanation of the regex:
(: Start of capturing group#1
[: Start of character classes
^: None of
\/: A / character
\p{Punct}: A punctuation character.
]: End of character classes
+: One or more times
): End of capturing group#1

Matching three or more identical characters - Java program [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!

Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().

[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.

String.matches returns whether the whole string matches the regex, not just any substring.

java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*

Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}

I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).

Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.

you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}

You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

java regular expression matching with arithmetic operation [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!

Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().

[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.

String.matches returns whether the whole string matches the regex, not just any substring.

java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*

Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}

I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).

Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.

you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}

You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

Regex matching up to a character if it occurs

I need to match string as below:
match everything upto ;
If - occurs, match only upto - excluding -
For e.g. :
abc; should return abc
abc-xyz; should return abc
Pattern.compile("^(?<string>.*?);$");
Using above i can achieve half. but dont know how to change this pattern to achieve the second requirement. How do i change .*? so that it stops at forst occurance of -
I am not good with regex. Any help would be great.
EDIT
I need to capture it as group. i cant change it since there many other patterns to match and capture. Its only part of it that i have posted.
Code looks something like below.
public static final Pattern findString = Pattern.compile("^(?<string>.*?);$");
if(findString.find())
{
return findString.group("string"); //cant change anything here.
}

Just use a negated char class.
^[^-;]*
ie.
Pattern p = Pattern.compile("^[^-;]*");
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group());
}
This would match any character at the start but not of - or ;, zero or more times.

This should do what you are looking for:
[^-;]*
It matches characters that are not - or ;.
Tipp: If you don't feel sure with regular expressions there are great online solutions to test your input, e.g. https://regex101.com/

UPDATE
I see you have an issue in the code since you try to access .group in the Pattern object, while you need to use the .group method of the Matcher object:
public static String GetTheGroup(String str) {
Pattern findString = Pattern.compile("(?s)^(?<string>.*?)[;-]");
Matcher matcher = findString.matcher(str);
if (matcher.find())
{
return matcher.group("string"); //you have to change something here.
}
else
return "";
}
And call it as
System.out.println(GetTheGroup("abc-xyz;"));
See IDEONE demo
OLD ANSWER
Your ^(?<string>.*?);$ regex only matches 0 or more characters other than a newline from the beginning up to the first ; that is the last character in the string. I guess it is not what you expect.
You should learn more about using character classes in regex, as you can match 1 symbol from a specified character set that is defined with [...].
You can achieve this with a String.split taking the first element only and a [;-] regex that matches a ; or - literally:
String res = "abc-xyz;".split("[;-]")[0];
System.out.println(res);
Or with replaceAll with (?s)[;-].*$ regex (that matches the first ; or - and then anything up to the end of string:
res = "abc-xyz;".replaceAll("(?s)[;-].*$", "");
System.out.println(res);
See IDEONE demo

I have found the solution without removing groupings.
(?<string>.*?) matches everything upto next grouping pattern
(?:-.*?)? followed by a non grouping pattern starts with - and comes zero or once.
; end character.
So putting all together:
public static final Pattern findString = Pattern.compile("^(?<string>.*?)(?:-.*?)?;$");
if(findString.find())
{
return findString.group("string"); //cant change anything here.
}

Get substring between two characters

How do you build a regex to return for the characters between < and # of a string?
For example <1001#10.2.2.1> would return 1001.
Would something using <.?> work?

Would something using "<.?>" work?
A slightly modified version of it would work: <.*?# (you need an # at the end, and you need a reluctant quantifier *? in place of an optional mark ?). However it could be inefficient because of backtracking. Something like this would be better:
<([^#]*)#
This expression starts by finding <, taking as many non-# characters as it could, and capturing the # before stopping.
Parentheses denote a capturing group. Use regex API to extract it:
Pattern p = Pattern.compile("<([^#]*)#");
Matcher m = p.matcher("<1001#10.2.2.1>");
if (m.find()) {
System.out.println(m.group(1));
}
This prints 1001 (demo).

What about the next:
(?<=<)[^#]*
e.g.:
private static final Pattern REGEX_PATTERN =
Pattern.compile("(?<=<)[^#]*");
public static void main(String[] args) {
String input = "<1001#10.2.2.1>";
Matcher matcher = REGEX_PATTERN.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Output:
1001

Um.
<([0-9]*?)#
I'm assuming it's numbers only.
if all characters use this..
<(.*?)#
tested here..
Maybe i'm lacking knowledge but my understanding of regex is that you need () to get the capture groups... otherwise if you don't you'll just be selecting characters without actually "capturing" them.
so this..
<.?>
won't do anything .

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expression to handle two different file extensions - java

Why not just use endsWith instead complex regex if(fileName.endsWith(".txt") || fileName.endsWith(".txt.gz")){ Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\."); }

You can replace .[a-zA-Z]{3} with .txt(\.gz) if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}).txt(\.gz)?")){ Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\."); }

? will work for your required | . Try adding (.[a-zA-Z]{2})? to your original regex ([\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3}(.[a-zA-Z]{2})?)

Related

java regex add trailing slash

Matching three or more identical characters - Java program [duplicate]

java regular expression matching with arithmetic operation [duplicate]

Regex matching up to a character if it occurs

Get substring between two characters

Categories

Resources