Regular Expression Syntax - java

I need to find two types of instances when there is a "[" character using regular expressions:
When the "[" character is followed by a number.
When the "[" character is followed by letters.
In Java I have tried:
Pattern firstinstance = Pattern.compile("\\[abcdefgABCDEFG");
Pattern secondinstance = Pattern.compile("\\[[0-9]");
These however, don't really seem to work. Do you guys have any possible suggestions?

The first instance is when the "[" character is followed by a number.
Any decimal digit in any script:
"\\[\\p{Nd}"
Any digit in 0-9 only:
"\\[\\d"
"\\[[0-9]"
The second instance is when the "[" character is followed by letters.
Any letter in any script:
"\\[\\p{L}"
Only letters in A-Z or a-z:
"\\[[A-Za-z]"

Pattern firstinstance = Pattern.compile("\\[[a-zA-Z]+");
Pattern secondinstance = Pattern.compile("\\[[0-9]+");

Pattern first = Pattern.compile("[[][0-9]");
Pattern second = Patter.compile("[[][A-z]+");
Regular expressions are very simple to understand. Have a look at Basic Concepts

In Java, you need to escape your escape characters (this is a consequence of the pattern being defined a string). So you would use the code
Pattern firstinstance = Pattern.compile("\\[[0-9]");
Pattern secondinstance = Pattern.compile("\\[[a-zA-Z]");
Those strings are read as
\[[0-9]
and
\[[a-zA-Z]
which are the regular expression you want.
Note, to get a literal backslash in the regex you need to use 4 backslashes \\\\.

Related

java regular expression and replace all occurrences

I want to replace one string in a big string, but my regular expression is not proper I guess. So it's not working.
Main string is
Some sql part which is to be replaced
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'
String to find and replace is
Based on some condition sql part to be replaced
hemp.EMPLOYEE_NAME = 'xxx'
I have tried this with
Pattern and Matcher class is used and
Pattern pat1 = Pattern.compile("/^hemp.EMPLOYEE_NAME\\s=\\s\'\\w\'\\s[and|or]*/$", Pattern.CASE_INSENSITIVE);
Matcher mat = pat1.matcher(cond);
while (mat.find()) {
System.out.println("Match: " + mat.group());
cond = mat.replaceFirst("xx "+mat.group()+"x");
mat = pat1.matcher(cond);
}
It's not working, not entering the loop at all. Any help is appreciated.
Obviously not - your regexp pattern doesn't make any sense.
The opening /: In some languages, regexps aren't strings and start with an opening slash. Java is not one of those languages, and it has nothing to do with regexps itself. So, this looks for a literal slash in that SQL, which isn't there, thus, failure.
^ is regexpese for 'start of string'. Your string does not start with hemp.EMPLOYEE_NAME, so that also doesn't work. Get rid of both / and ^ here.
\\s is one whitespace character (there are many whitespace characters - this matches any one of them, exactly one though). Your string doesn't have any spaces. Your intent, surely, was \\s* which matches 0 to many of them, i.e.: \\s* is: "Whitespace is allowed here". \\s is: There must be exactly one whitespace character here. Make all the \\s in your regexp an \\s*.
\\w is exactly one 'word' character (which is more or less a letter or digit), you obviously wanted \\w*.
[and|or] this is regexpese for: "An a, or an n, or a d, or an o, or an r, or a pipe symbol". Clearly you were looking for (and|or) which is regexpese for: Either the sequence "and", or the sequence "or".
* - so you want 0 to many 'and' or 'or', which makes no sense.
closing slash: You don't want this.
closing $: You don't want this - it means 'end of string'. Your string didn't end here.
The code itself:
replaceFirst, itself, also does regexps. You don't want to double apply this stuff. That's not how you replace a found result.
This is what you wanted:
Matcher mat = pat1.matcher(cond);
mat.replaceFirst("replacement goes here");
where replacement can include references to groups in the match if you want to take parts of what you matched (i.e. don't use mat.group(), use those references).
More generally did you read any regexp tutorial, did any testing, or did any reading of the javadoc of Pattern and Matcher?
I've been developing for a few years. It's just personal experience, perhaps, but, reading is pretty fundamental.
Instead of the anchors ^ and $, you can use word boundaries \b to prevent a partial match.
If you want to match spaces on the same line, you can use \h to match horizontal whitespace char, as \s can also match a newline.
You can use replaceFirst on the string using $0 to get the full match, and an inline modifier (?i) for a case insensitive match.
Note that using [and|or] is a character class matching one of the listed chars and escape the dot to match it literally, or else . matches any char except a newline.
(?i)\bhemp\.EMPLOYEE_NAME\h*=\h*'\w+'\h+(?:and|or)\b
See a regex demo or a Java demo
For example
String regex = "\\bhemp\\.EMPLOYEE_NAME\\h*=\\h*'\\w+'\\h+(?:and|or)\\b";
String string = "cond = emp.EMAIL_ID = 'xx#xx.com' AND\n"
+ "emp.PERMANENT_ADDR LIKE('%98n%') \n"
+ "AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'";
System.out.println(string.replaceFirst(regex, "xx$0x"));
Output
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND xxhemp.EMPLOYEE_NAME = 'xxx' andx is_active='Y'

Regular expression for unicode in java Dash version

It is possible to improve the performance of the following through a regular expression, the code is functional, but I want to know if there is any way to select the possible dash that exist in the unicode to standardize my dash
Words:
48553−FS002
48553-FS002
48553 FS002
48553-FS002-ESD12
Java
String reference = "48553−FS002";
String separador = reference.replaceFirst ( "\\w+(\\W)?\\w+", "$1" );
if(!separator.equals ( " " )) {
reference = reference.replaceAll ( separator, "-" );
}
Or you could search for the unicode code, I was reading the following: dash, but i haven't managed to make it work Java Regex Unicode
If you need to match any non-word but space, you may use
reference = reference.replaceAll("[^\\w ]", "-");
Or, with character class subtraction:
reference = reference.replaceAll("[\\W&&[^ ]]", "-");
You can use the following pattern to match your hyphen or dash like patterns:
[\p{Pd}\u00AD\u2212]
Here,
\p{Pd} - matches any Punctuation, Dash symbols
\u00AD - matches a soft hyphen
\u2212 - matches a minus symbol.
If you know your strings only contain word characters and separators, as seems to be the case, then you can just use
reference = reference.replaceAll("[^ \\w]", "-");

How to properly replace a character in a string using java regex?

I my java app, I have a following character sequence: b"2 (any single character, followed by a double quote followed by a single-digit number)
I need to replace the double quote with a single quote character.
I'm trying this:
Pattern p = Pattern.compile(".\"d");
Matcher m = p.matcher(initialOutput);
String replacement = m.replaceAll(".'d");
This does not seem to do anything.
What is the right way of doing this?
First off, d represents a literal character. You're looking for \d, which represents a numeric digit.
The other issue is that you're replacing variable characters with the string literal ".'d". One solution is to capture the variable portions and reference them in the replacement:
String replacement = initialOutput.replaceAll("(.)\"(\\d)", "$1'$2");
Another approach is to use lookarounds to check the surrounding characters without actually matching them for replacement:
String replacement = initialOutput.replaceAll("(?<=.)\"(?=\\d)", "'");

Java Regex for username

I'm looking for a regex in Java, java.util.regex, to accept only letters ’, -, and . and a range of Unicode characters such as umlauts, eszett, diacritic and other valid letters from European languages.
What I don't want is numbers, spaces like “ ” or “ Tom”, or special characters like !”£$% etc.
So far I'm finding it very confusing.
I started with this
[A-Za-z.\\s\\-\\.\\W]+$
And ended up with this:
[A-Za-z.\\s\\-\\.\\D[^!\"£$%\\^&*:;##~,/?]]+$
Using the cavat to say none of the inner square brackets, according to the documentation
Anyone have any suggestions for a new regex or reasons why the above isn't working?
For my answer, I want to use a simpler regex similar to yours: [A-Z[^!]]+, which means "At least once: (a character from A to Z) or (a character that is not '!').
Note that "not '!'" already includes A to Z. So everything in the outer character group([A-Z...) is pointless.
Try [\p{Alpha}'-.]+ and compile the regex with the Pattern.UNICODE_CHARACTER_CLASS flag.
Use: (?=.*[##$%&\s]) - Return true when atleast one special character (from set) and also if username contain space.
you can add more special character as per your requirment. For Example:
String str = "k$shor";
String regex = "(?=.*[##$%&\\s])";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find()); => gives true

Java-based regular expression to allow alphanumeric chars and ', and

I'm new to regular expressions in Java and I need to validate if a string has alphanumeric chars, commas, apostrophes and full stops (periods) only. Anything else should equate to false.
Can anyone give any pointers?
I have this at the moment which I believe does alphanumerics for each char in the string:
Pattern p = Pattern.compile("^[a-zA-Z0-9_\\s]{1," + s.length() + "}");
Thanks
Mr Albany Caxton
I'm new to regular expressions in Java and I need to validate if a string has alphanumeric chars, commas, apostrophes and full stops (periods) only.
I suggest you use the \p{Alnum} class to match alpha-numeric characters:
Pattern p = Pattern.compile("[\\p{Alnum},.']*");
(I noticed that you included \s in your current pattern. If you want to allow white-space too, just add \s in the character class.)
From documentation of Pattern:
[...]
\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}]
[...]
You don't need to include ^ and {1, ...}. Just use methods like Matcher.matches or String.matches to match the full pattern.
Also, note that you don't need to escape . within a character class ([...]).
Pattern p = Pattern.compile("^[a-zA-Z0-9_\\s\\.,]{1," + s.length() + "}$");
Keep it simple:
String x = "some string";
boolean matches = x.matches("^[\\w.,']*$");

Categories

Resources