I have a regex scipt which validate a field variable for some extensions (pdf, doc, jpeg, jpg, and png). But sometimes, this field can be empty. I see on some topics that "^$" can solved my problem. I try a lot of combinaisons (cause I do not know regex) but it doesn't work. I give you my current code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(field_Fichier1.getFileName());
return matcher.matches();
Thanks for your help
// Mine = doesn't work for empty field
//String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
// Anubhava = doesn't work for empty field
//String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png)))?";
// or
//String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png)))";
// Bohemian = can't be run = error: "Groovy:illegal string body character after dollar sign;"
String REGEX = "^$|([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
Why do you have \$ in your regex. You can just make your whole regex optional to allow for empty string match:
String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png)))?";
? in the end will make whole regex match optional thus allowing it match "" as well.
Just add ^$| to the front of your regex:
String REGEX = "^$|([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
Note that I haven't checked your existing regex - I'm assuming it works for non-blank input.
Related
I was trying to replace concatenation symbol '+' with '||' in given multi-line script, however it seems that java regex just replaces 1 occurrence, instead of all.
String ss="A+B+C+D";
Matcher mm=Pattern.compile("(?imc)(.+)\\s*\\+\\s*(.+)").matcher(ss);
while(mm.find())
{
System.out.println(mm.group(1));
System.out.println(mm.group(2));
ss=mm.replaceAll("$1 \\|\\| $2");
}
System.out.println(ss); // Output: A+B+C||D, Expected: A||B||C||D
The reason you only replace one element, is because you match the entire line. The regular expression you use "(?imc)(.+)\\s*\\+\\s*(.+)", matches anything (.+) until the end, then reverts, so it can match the rest \\s*\\+.... So basically your group 1 is .+ almost everything, but the last + and beyond. Therefore replaceAll can only match once, and will terminate after that one replacement.
What you need is a replacement that finds + optionally wrapped in spaces:
Pattern.compile("(?imc)\\s*\\+\\s*");
This should match all you want to match, and does not match the entire line, but only your replacement character.
You could just use:
ss = ss.replaceAll("\\+", "||")
as #ernest_k has pointed out. If you really want to continue using a matcher with iteration, then use Matcher#appendReplacement with a StringBuffer:
String ss = "A+B+C+D";
Matcher mm = Pattern.compile("\\+").matcher(ss);
StringBuffer sb = new StringBuffer();
while (mm.find()) {
mm.appendReplacement(sb, "||");
}
mm.appendTail(sb);
System.out.println(sb);
I thing maybe we would just need a simple string replace:
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\\+";
final String string = "A+B+C+D";
final String subst = "||";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
This link on the right panel explains your original expression. The first capturing group does match between one and unlimited times, as many times as possible, thus it would not work here. If we would have changed them to (.+?), it would have partially worked, yet still unnecessary.
I've been trying to split Strings using RegEx with no success. The idea is to split a given music file metadata from its file name in a way so that:
"01. Kodaline - Autopilot.mp3"
.. would result in..
metadata[0] = "01"
metadata[1] = "Kodaline"
metadata[2] = "Autopilot"
This is the RegEx I've been trying to use in its original form:
^(.*)\.(.*)\-(.*)\.(mp3|flac)
From what I've read, I need to format the RegEx for String.split(String regex) to work. So here's my formatted RegEx:
^(.*)\\.(.*)\\-(.*)\\.(mp3|flac)
..and this is what my code looks like:
String filename = "01. Kodaline - Autopilot.mp3";
String regex = "^(.*)\\.(.*)\\-(.*)\\.(mp3|flac)";
String[] metadata = filename.split(regex);
But I'm not receiving the result I expected. Can you help me on this?
Your regex is fine for matching the input string. Your problem is that you used split(), which expects a regex with a totally different purpose. For split(), the regex you give it matches the delimiters (separators) that separate parts of the input; they don't match the entire input. Thus, in a different situation (not your situation), you could say
String[] parts = s.split("[\\- ]");
The regex matches one character that is either a dash or a space. So this will look for dashes and spaces in your string and return the parts separated by the dashes and spaces.
To use your regex to match the input string, you need something like this:
String filename = "01. Kodaline - Autopilot.mp3";
String regex = "^(.*)\\.(.*)\\-(.*)\\.(mp3|flac)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(filename);
String[] metadata = new String[4];
if (matcher.find()) {
metadata[0] = matcher.group(1); // in real life I'd use a loop
metadata[1] = matcher.group(2);
metadata[2] = matcher.group(3);
metadata[3] = matcher.group(4);
// the rest of your code
}
which sets metadata to the strings "01", " Kodaline ", " Autopilot", "mp3", which is close to what you want except maybe for extra spaces (which you can look for in your regex). Unfortunately, I don't think there's a built-in Matcher function that returns all the groups in one array.
(By the way, in your regex, you don't need the backslashes in front of -, but they're harmless, so I left them in. The - doesn't normally have a special meaning, so it doesn't need to be escaped. Inside square brackets, however, a hyphen is special, so you should use backslashes if you want to match a set of characters and a hyphen is one of those characters. That's why I used backslashes in my split example above.)
this worked for me
str.split("\\.\\s+|\\s+-\\s+|\\.(mp3|flac)");
Try something like:
String filename = "01. Kodaline - Autopilot.mp3";
String fileWithoutExtension = filename.substring(0, filename.lastIndexOf('.'));
System.out.println(Arrays.toString(fileWithoutExtension.replaceAll("[^\\w\\s]", "").split("\\s+")));
Output:
[01, Kodaline, Autopilot]
Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks
Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html
In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}
This seems like a well known title, but I am really facing a problem in this.
Here is what I have and what I've done so far.
I have validate input string, these chars are not allowed :
&%$###!~
So I coded it like this:
String REGEX = "^[&%$###!~]";
String username= "jhgjhgjh.#";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(username);
if (matcher.matches()) {
System.out.println("matched");
}
Change your first line of code like this
String REGEX = "[^&%$##!~]*";
And it should work fine. ^ outside the character class denotes start of line. ^ inside a character class [] means a negation of the characters inside the character class. And, if you don't want to match empty usernames, then use this regex
String REGEX = "[^&%$##!~]+";
i think you want this:
[^&%$###!~]*
To match a valid input:
String REGEX = "[^&%$##!~]*";
To match an invalid input:
String REGEX = ".*[&%$##!~]+.*";
Ok I have a String that I'm parsing and I need to use toUpperCase() on that string. After that I'm using Java RegExp. Problem is that for some reason the Java's String toUpperCase() is modifying the white spaces and my RegExp will not work.
Is there a way to tell toUpperCase() to ignore white spaces? Or maybe its possible to handle this in RegExp?
Below is the code I'm using to figure this out. If I uncomment the toUpperCase() line below, my RegExp will not work!!
String regExp = "([t][o][k][e][n][\\s]*[=][\\s]*)";
String content = "The token ='testing'" ;
//content = content.toUpperCase(); //uncomment this and RegExp will break!!!
Pattern pattern = Pattern.compile(regExp);
Matcher matcher = pattern.matcher(content);
if(matcher.find()){
int startIndex= matcher.start(1);
int endIndex = matcher.end(1);
String posStartExpression = content.substring(startIndex,endIndex);
System.out.println(posStartExpression);
}
You are encountering this behaviour because your regex is case sensitive.
Try this:
Pattern.compile(regExp, Pattern.CASE_INSENSITIVE);