Ensuring a string is of a certain pattern - java

I am trying to work out if there is a way to get a check to ensure the string I am checking follows a structure.
eg: String s = "abcd, afsfsfs, abcdef, 90> 20, abeds"
Need to confirm that there is a ', ' followed by a ', ' followed by a ', 'followed by a '> ' and finally a ', '. The letters and numbers can vary in length between the characters that separate them.
I am a bit stuck on this. Any help would be appreciated.

If you want any number of letters,digits between special characters you can use this regex:
public static void main(String[] args) {
String s = "abcd, afsfsfs, abcdef, 90> 20, abeds";
boolean matches = s.matches("\\w+, \\w+, \\w+, \\d+> \\d+, \\w+");
System.out.println(matches);
}

You can use the following regex pattern in conjunction with String#matches():
.*, .*, .*, .*>.*, .*
Code sample:
public static void main(String args[])
{
String s = "abcd, afsfsfs, abcdef, 90> 20, abeds";
if (s.matches(".*, .*, .*, .*>.*, .*")) {
System.out.println("match");
}
else {
System.out.println("no match");
}
}
Demo here:
Rextester

Try something like the below.FYI,not tested yet. Explanation, With [^,]+ .. you are saying match anything but , and then match ,. The second pattern is [^>]+ > Match any char but > and the match >.
[] Character Classes or Character Sets
^ inside [ ] means Negated Character Classes. read more
^(?![\s]*$) [^,]+ , [^,]+ , [^,]+ , [^>]+ > [^,]+ $
start no empty 1st 2nd 3rd 4th end

Try this:
^\s*(?:\s*\w+\s*,\s*){3}\w+\s*>\s*\w+,(?!.*[,>]).*$
Regex Demo 1
it will make sure that the format is exactly what you have wanted. and there is no further , or > sign in the rest of the string. But if your intention is to allow more repetition of ,> in the string once the format is being found, then you may remove the next to last part i.e. (?!.*[,>]) from the regex thus it becomes:
^\s*(?:\s*\w+\s*,\s*){3}\w+\s*>\s*\w+,.*$
Regex Demo 2

Related

regular expressions to determine if a string starts with ;

The requirement is simple: if the given string matches:
starts with ';'
starts with some char or chars among '\r','\n','\t',' ', and then followed with ';'.
For example ";", "\r;","\r\n;", " \r\n \t;" should all be ok.
Here is my code and it does not work:
private static String regex = "[\\r|\\n| |\\t]+;";
private static boolean startsWithSemicolon(String str) {
return str.matches(regex);
}
Thanks for any help.
You have 2 choices:
Use matches(), in which case the regex must match the entire input, so you'd have to add matching of characters following the ;.
Regex: str.matches("[\\r\\n\\t ]*;.*")
or: Pattern.compile("[\\r\\n\\t ]*;.*").matcher(str).matches()
Use find(), in which case the regex must be anchored to the beginning of the input:
Regex: Pattern.compile("^[\\r\\n\\t ]*;").matcher(str).find()

java regex add trailing slash

I am trying to redirect the urls to add trailing slash
/news -> /news/
/news?param1=value1 -> /news/?param1=value
/news#anchor?param1=value1 -> /news/#anchor?param1=value1
I need to do it through a regex that identifies only the path and add /. When there are no parameters there is no problem.
^(/[a-z0–9/_\-]*[^/])$ -> $1/
But when there are parameters I am not able to create the regular expression that separates the path from the parameters.
Any ideas?, thanks
Might be just need to extend the end of string past the parameters.
To cover both with and without parameters might be:
^(/[a-z0–9/_-]*(?<!/))([^/]*)$ -> $1/$2
see https://regex101.com/r/Iwl23o/2
You shouldn't match the end of the string with $ and there is no need for [^/] at the end either.
^(/[a-z0–9/_\-]*)
const regex = new RegExp("^(/[a-z0–9/_\-]*)");
console.log("/news".replace(regex, "$1/"));
console.log("/news?param1=value1".replace(regex, "$1/"));
console.log("/news#anchor?param1=value1".replace(regex, "$1/"));
You can use a very simple regex like this:
^([/\w]+)
With this replacement string: $1/
Working demo
The pattern you tried matches only /news because the anchor $ asserts the end of the string.
If you omit the anchor, it would also match the ? and # as you use [^/] which matches any char except a forward slash.
You could repeat 1 or more times matching a forward slash followed by 1 or more times any char listed in the character class to prevent matching ///
In the replacement use the full match and add a a forward slash.
^(?:/[a-z0-9_-]+)+
Regex demo | Java demo
String regex = "^(?:/[a-z0-9_-]+)+";
String string = "/news\n"
+ "/news?param1=value1\n"
+ "/news#anchor?param1=value1";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
String result = matcher.replaceAll("$0/");
System.out.println(result);
Output
/news/
/news/?param1=value1
/news/#anchor?param1=value1
Note that in your regex, the hyphen in this part 0–9 is
https://www.compart.com/en/unicode/U+2013 instead of https://www.compart.com/en/unicode/U+002D
You can do it as follows:
public class Main {
public static void main(final String[] args) {
String[] arr = { "/news", "/news?param1=value1", "/news#anchor?param1=value1" };
for (String s : arr) {
System.out.println(s.replaceFirst("([^\\/\\p{Punct}]+)", "$1/"));
}
}
}
Output:
/news/
/news/?param1=value1
/news/#anchor?param1=value1
Explanation of the regex:
(: Start of capturing group#1
[: Start of character classes
^: None of
\/: A / character
\p{Punct}: A punctuation character.
]: End of character classes
+: One or more times
): End of capturing group#1

Regular expression to handle two different file extensions

I am trying to create a regular expression that takes a file of name
"abcd_04-04-2020.txt" or "abcd_04-04-2020.txt.gz"
How can I handle the "OR" condition for the extension. This is what I have so far
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3})")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
This handles only the .txt. How can I handle ".txt.gz"
Thanks
Why not just use endsWith instead complex regex
if(fileName.endsWith(".txt") || fileName.endsWith(".txt.gz")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
You can use the below regex to achieve your purpose:
^[\w-]+\d{2}-\d{2}-\d{4}\.txt(?:\.gz)?$
Explanation of the above regex:]
^,$ - Matches start and end of the test string resp.
[\w-]+ - Matches word character along with hyphen one or more times.
\d{} - Matches digits as many numbers as mentioned in the curly braces.
(?:\.gz)? - Represents non-capturing group matching .gz zero or one time because of ? quantifier. You could have used | alternation( or as you were expecting OR) but this is legible and more efficient too.
You can find the demo of the above regex here.
IMPLEMENTATION IN JAVA:
import java.util.regex.*;
public class Main
{
private static final Pattern pattern = Pattern.compile("^[\\w-]+\\d{2}-\\d{2}-\\d{4}\\.txt(?:\\.gz)?$", Pattern.MULTILINE);
public static void main(String[] args) {
String testString = "abcd_04-04-2020.txt\nabcd_04-04-2020.txt.gz\nsomethibsnfkns_05-06-2020.txt\n.txt.gz";
Matcher matcher = pattern.matcher(testString);
while(matcher.find()){
System.out.println(matcher.group(0));
}
}
}
You can find the implementation of the above regex in java in here.
NOTE: If you want to match for valid dates also; please visit this.
You can replace .[a-zA-Z]{3} with .txt(\.gz)
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}).txt(\.gz)?")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
? will work for your required | . Try adding
(.[a-zA-Z]{2})?
to your original regex
([\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3}(.[a-zA-Z]{2})?)
A possible way of doing it:
Pattern pattern = Pattern.compile("^[\\w._-]+_\\d{2}-\\d{2}-\\d{4}(\\.txt(\\.gz)?)$");
Then you can run the following test:
String[] fileNames = {
"abcd_04-04-2020.txt",
"abcd_04-04-2020.tar",
"abcd_04-04-2020.txt.gz",
"abcd_04-04-2020.png",
".txt",
".txt.gz",
"04-04-2020.txt"
};
Arrays.stream(fileNames)
.filter(fileName -> pattern.matcher(fileName).find())
.forEach(System.out::println);
// output
// abcd_04-04-2020.txt
// abcd_04-04-2020.txt.gz
I think what you want (following from the direction you were going) is this:
[\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.[a-zA-Z]{3}(?:$|\\.[a-zA-Z]{2}$)
At the end, I have a conditional statement. It has to either match the end of the string ($) OR it has to match a literal dot followed by 2 letters (\\.[a-zA-Z]{2}). Remember to escape the ., because in regex . means "match any character".

Regex matching word that is in the middle of any character except a letter

I'd like to know how to detect word that is between any characters except a letter from alphabet. I need this, because I'm working on a custom import organizer for Java. This is what I have already tried:
The regex expression:
[^(a-zA-Z)]InitializationEvent[^(a-zA-Z)]
I'm searching for the word "InitializationEvent".
The code snippet I've been testing on:
public void load(InitializationEvent event) {
It looks like adding space before the word helps... is the parenthesis inside of alphabet range?
I tested this in my program and it didn't work. Also I checked it on regexr.com, showing same results - class name not recognized.
Am I doing something wrong? I'm new to regex, so it might be a really basic mistake, or not. Let me know!
Lose the parentheses:
[^a-zA-Z]InitializationEvent[^a-zA-Z]
Inside [], parentheses are taken literally, and by inverting the group (^) you prevent it from matching because a ( is preceding InitializationEvent in your string.
Note, however, that the above regex will only match if InitializationEvent is neither at the beginning nor at the end of the tested string. To allow that, you can use:
(^|[^a-zA-Z])InitializationEvent([^a-zA-Z]|$)
Or, without creating any matching groups (which is supposed to be cleaner, and perform better):
(?:^|[^a-zA-Z])InitializationEvent(?:[^a-zA-Z]|$)
how to detect word that is between any characters except a letter from alphabet
This is the case where lookarounds come handy. You can use:
(?<![a-zA-Z])InitializationEvent(?![a-zA-Z])
(?<![a-zA-Z]) is negative lookbehind to assert that there is no alphabet at previous position
(?![a-zA-Z]) is negative lookahead to assert that there is no alphabet at next position
RegEx Demo
The parentheses are causing the problem, just skip them:
"[^a-zA-Z]InitializationEvent[^a-zA-Z]"
or use the predefined non-word character class which is slightly different because it also excludes numbers and the underscore:
"\\WInitializationEvent\\W"
But as it seems you want to match a class name, this might be ok because the remaining character are exactly those that are allowed in a class name.
I'm not sure about your application but from a regexp perspective you can use negative lookaheads and negative lookbehinds to define what cannot surround the String to specify a match.
I have added the negative lookahead (?![a-zA-Z]) and the negative lookbehind (?<![a-zA-Z]) in place of your [^(a-zA-Z)] originally supplied to create: (?<![a-zA-Z])InitializationEvent(?![a-zA-Z])
Quick Fiddle I created:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld{
public static void main(String []args){
String pattern = "(?<![a-zA-Z])InitializationEvent(?![a-zA-Z])";
String sourceString = "public void load(InitializationEvent event) {";
String sourceString2 = "public void load(BInitializationEventA event) {";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(sourceString);
if (m.find( )) {
System.out.println("Found value of pattern in sourceString: " + m.group(0) );
} else {
System.out.println("NO MATCH in sourceString");
}
Matcher m2 = r.matcher(sourceString2);
if (m2.find( )) {
System.out.println("Found value of pattern in sourceString2: " + m2.group(0) );
} else {
System.out.println("NO MATCH in sourceString2");
}
}
}
output:
sh-4.3$ java -Xmx128M -Xms16M HelloWorld
Found value of pattern in sourceString: InitializationEvent
NO MATCH in sourceString2
You seem really close:
[^(a-zA-Z)]*(InitializationEvent)[^(a-zA-Z)]*
I think this is what you are looking for. The asterisk provides a match for zero or many of the character or group before it.
EDIT/UPDATE
My apologies on the initial response.
[^a-zA-Z]+(InitializationEvent)[^a-zA-Z]+
My regex is a little rusty, but this will match on any non-alphabet character one or many times prior to the InitializationEvent and after.

Regex match for a character without previous character

I have the following String:
"location-string:location-string:location-C?:\string"
which I would like to split into the following three Strings:
location-string location-string location-C?:\string
What should the regex expression be when using String.split(regex)?
Basically, I want to split on colon ':' characters except those that are preceded by a '?' character!
Thanks in advance,
PM.
You could use negative lookbehind. It matches the colon which was not preceeded by ?
(?<!\?):
Java regex would be,
"(?<!\\?):"
DEMO
You could use a split() with limit.
public static void main(String[] args) {
String s = "location-string:location-string:location-C?:\\string";
System.out.println(Arrays.toString(s.split(":", 3)));
}
O/P :
[location-string, location-string, location-C?:\string]

Categories

Resources