Java toUpperCase() and RegExp problem - java

Ok I have a String that I'm parsing and I need to use toUpperCase() on that string. After that I'm using Java RegExp. Problem is that for some reason the Java's String toUpperCase() is modifying the white spaces and my RegExp will not work.
Is there a way to tell toUpperCase() to ignore white spaces? Or maybe its possible to handle this in RegExp?
Below is the code I'm using to figure this out. If I uncomment the toUpperCase() line below, my RegExp will not work!!
String regExp = "([t][o][k][e][n][\\s]*[=][\\s]*)";
String content = "The token ='testing'" ;
//content = content.toUpperCase(); //uncomment this and RegExp will break!!!
Pattern pattern = Pattern.compile(regExp);
Matcher matcher = pattern.matcher(content);
if(matcher.find()){
int startIndex= matcher.start(1);
int endIndex = matcher.end(1);
String posStartExpression = content.substring(startIndex,endIndex);
System.out.println(posStartExpression);
}

You are encountering this behaviour because your regex is case sensitive.
Try this:
Pattern.compile(regExp, Pattern.CASE_INSENSITIVE);

Related

Java RegEx doesn't replaceAll

I was trying to replace concatenation symbol '+' with '||' in given multi-line script, however it seems that java regex just replaces 1 occurrence, instead of all.
String ss="A+B+C+D";
Matcher mm=Pattern.compile("(?imc)(.+)\\s*\\+\\s*(.+)").matcher(ss);
while(mm.find())
{
System.out.println(mm.group(1));
System.out.println(mm.group(2));
ss=mm.replaceAll("$1 \\|\\| $2");
}
System.out.println(ss); // Output: A+B+C||D, Expected: A||B||C||D
The reason you only replace one element, is because you match the entire line. The regular expression you use "(?imc)(.+)\\s*\\+\\s*(.+)", matches anything (.+) until the end, then reverts, so it can match the rest \\s*\\+.... So basically your group 1 is .+ almost everything, but the last + and beyond. Therefore replaceAll can only match once, and will terminate after that one replacement.
What you need is a replacement that finds + optionally wrapped in spaces:
Pattern.compile("(?imc)\\s*\\+\\s*");
This should match all you want to match, and does not match the entire line, but only your replacement character.
You could just use:
ss = ss.replaceAll("\\+", "||")
as #ernest_k has pointed out. If you really want to continue using a matcher with iteration, then use Matcher#appendReplacement with a StringBuffer:
String ss = "A+B+C+D";
Matcher mm = Pattern.compile("\\+").matcher(ss);
StringBuffer sb = new StringBuffer();
while (mm.find()) {
mm.appendReplacement(sb, "||");
}
mm.appendTail(sb);
System.out.println(sb);
I thing maybe we would just need a simple string replace:
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\\+";
final String string = "A+B+C+D";
final String subst = "||";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
This link on the right panel explains your original expression. The first capturing group does match between one and unlimited times, as many times as possible, thus it would not work here. If we would have changed them to (.+?), it would have partially worked, yet still unnecessary.

validate empty variable regex

I have a regex scipt which validate a field variable for some extensions (pdf, doc, jpeg, jpg, and png). But sometimes, this field can be empty. I see on some topics that "^$" can solved my problem. I try a lot of combinaisons (cause I do not know regex) but it doesn't work. I give you my current code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(field_Fichier1.getFileName());
return matcher.matches();
Thanks for your help
// Mine = doesn't work for empty field
//String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
// Anubhava = doesn't work for empty field
//String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png)))?";
// or
//String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png)))";
// Bohemian = can't be run = error: "Groovy:illegal string body character after dollar sign;"
String REGEX = "^$|([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
Why do you have \$ in your regex. You can just make your whole regex optional to allow for empty string match:
String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png)))?";
? in the end will make whole regex match optional thus allowing it match "" as well.
Just add ^$| to the front of your regex:
String REGEX = "^$|([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
Note that I haven't checked your existing regex - I'm assuming it works for non-blank input.

Regex - to accept latin/ucs2 characters

I am trying to write a regex to accept latin/UCS2 characters. But I am getting error while doing that. In the following code, the 'text1' should pass for the pattern. I am still working on this. can anyone please help me in fxing this?
String text1 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz !\"#$%&'()*+,-./:;<=>?#"
+ "{|}~¡ ";
String pattern = "^[a-zA-Z0-9\\*\\?\\$\\[\\]\\(\\)\\|\\{\\}\\/\\'\\#\\~\\.,;\"\\<=\\>-#%&!+:~¡ ]+$";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text1);
if (m.find()) {
System.out.println("true");
}
What is not working? Is the pattern not matching or is there an error message?
What I see first you have escaped so many characters, that doesn't need to be escaped and an important one is not escaped.
In a character class there are only a few characters that have a special meaning []- and ^ when it is at the first position. You haven't escaped the -, this can cause an error, so try:
String pattern = "^[a-zA-Z0-9*?$\\[\\]()|{}/'#~.,;\"<=>\\-#%&!+:~¡ £¤¥ §¿ ÄÅÆÇÉÑÖØÜßàäåæ èéìñòöøùü ]+$";
The next thing is: Have a look at Unicode Properties/Scripts. You can e.g. use \\p{L} to match a letter in any language.
String pattern = "^[\\p{L}\\p{M}0-9*?$\\[\\]()|{}/'#~.,;\"<=>\\-#%&!+:~¡ £¤¥ §¿]+$";
Would match all letters you had in your class and more!

Regex Format a hexadecimal

I would like to know how to create a regex pattern in format of a hexadecimal.
The format should be: (0-9A-F)_16
I tried [0-9A-F]_[0-9], but I am getting errors. Also, I do not believe the first part before the underscore works for multiple digits.
Example:
FEDCBA987654321_16
[0-9A-Fa-f]+_16
should work for this (+ after a regex token means "match one or more repetitions of this token").
If you want to check whether a given string matches this pattern exactly, use
boolean foundMatch = subjectString.matches("[0-9A-Fa-f]+_16");
If you want to find the part of a longer string that matches your regex, you should add word boundaries around your regex:
String ResultString = null;
Pattern regex = Pattern.compile("\\b[0-9A-Fa-f]+_16\\b");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}

Java regex validating special chars

This seems like a well known title, but I am really facing a problem in this.
Here is what I have and what I've done so far.
I have validate input string, these chars are not allowed :
&%$###!~
So I coded it like this:
String REGEX = "^[&%$###!~]";
String username= "jhgjhgjh.#";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(username);
if (matcher.matches()) {
System.out.println("matched");
}
Change your first line of code like this
String REGEX = "[^&%$##!~]*";
And it should work fine. ^ outside the character class denotes start of line. ^ inside a character class [] means a negation of the characters inside the character class. And, if you don't want to match empty usernames, then use this regex
String REGEX = "[^&%$##!~]+";
i think you want this:
[^&%$###!~]*
To match a valid input:
String REGEX = "[^&%$##!~]*";
To match an invalid input:
String REGEX = ".*[&%$##!~]+.*";

Categories

Resources