Java-based regular expression to allow alphanumeric chars and ', and - java

I'm new to regular expressions in Java and I need to validate if a string has alphanumeric chars, commas, apostrophes and full stops (periods) only. Anything else should equate to false.
Can anyone give any pointers?
I have this at the moment which I believe does alphanumerics for each char in the string:
Pattern p = Pattern.compile("^[a-zA-Z0-9_\\s]{1," + s.length() + "}");
Thanks
Mr Albany Caxton

I'm new to regular expressions in Java and I need to validate if a string has alphanumeric chars, commas, apostrophes and full stops (periods) only.
I suggest you use the \p{Alnum} class to match alpha-numeric characters:
Pattern p = Pattern.compile("[\\p{Alnum},.']*");
(I noticed that you included \s in your current pattern. If you want to allow white-space too, just add \s in the character class.)
From documentation of Pattern:
[...]
\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}]
[...]
You don't need to include ^ and {1, ...}. Just use methods like Matcher.matches or String.matches to match the full pattern.
Also, note that you don't need to escape . within a character class ([...]).

Pattern p = Pattern.compile("^[a-zA-Z0-9_\\s\\.,]{1," + s.length() + "}$");

Keep it simple:
String x = "some string";
boolean matches = x.matches("^[\\w.,']*$");

Related

java regular expression and replace all occurrences

I want to replace one string in a big string, but my regular expression is not proper I guess. So it's not working.
Main string is
Some sql part which is to be replaced
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'
String to find and replace is
Based on some condition sql part to be replaced
hemp.EMPLOYEE_NAME = 'xxx'
I have tried this with
Pattern and Matcher class is used and
Pattern pat1 = Pattern.compile("/^hemp.EMPLOYEE_NAME\\s=\\s\'\\w\'\\s[and|or]*/$", Pattern.CASE_INSENSITIVE);
Matcher mat = pat1.matcher(cond);
while (mat.find()) {
System.out.println("Match: " + mat.group());
cond = mat.replaceFirst("xx "+mat.group()+"x");
mat = pat1.matcher(cond);
}
It's not working, not entering the loop at all. Any help is appreciated.
Obviously not - your regexp pattern doesn't make any sense.
The opening /: In some languages, regexps aren't strings and start with an opening slash. Java is not one of those languages, and it has nothing to do with regexps itself. So, this looks for a literal slash in that SQL, which isn't there, thus, failure.
^ is regexpese for 'start of string'. Your string does not start with hemp.EMPLOYEE_NAME, so that also doesn't work. Get rid of both / and ^ here.
\\s is one whitespace character (there are many whitespace characters - this matches any one of them, exactly one though). Your string doesn't have any spaces. Your intent, surely, was \\s* which matches 0 to many of them, i.e.: \\s* is: "Whitespace is allowed here". \\s is: There must be exactly one whitespace character here. Make all the \\s in your regexp an \\s*.
\\w is exactly one 'word' character (which is more or less a letter or digit), you obviously wanted \\w*.
[and|or] this is regexpese for: "An a, or an n, or a d, or an o, or an r, or a pipe symbol". Clearly you were looking for (and|or) which is regexpese for: Either the sequence "and", or the sequence "or".
* - so you want 0 to many 'and' or 'or', which makes no sense.
closing slash: You don't want this.
closing $: You don't want this - it means 'end of string'. Your string didn't end here.
The code itself:
replaceFirst, itself, also does regexps. You don't want to double apply this stuff. That's not how you replace a found result.
This is what you wanted:
Matcher mat = pat1.matcher(cond);
mat.replaceFirst("replacement goes here");
where replacement can include references to groups in the match if you want to take parts of what you matched (i.e. don't use mat.group(), use those references).
More generally did you read any regexp tutorial, did any testing, or did any reading of the javadoc of Pattern and Matcher?
I've been developing for a few years. It's just personal experience, perhaps, but, reading is pretty fundamental.
Instead of the anchors ^ and $, you can use word boundaries \b to prevent a partial match.
If you want to match spaces on the same line, you can use \h to match horizontal whitespace char, as \s can also match a newline.
You can use replaceFirst on the string using $0 to get the full match, and an inline modifier (?i) for a case insensitive match.
Note that using [and|or] is a character class matching one of the listed chars and escape the dot to match it literally, or else . matches any char except a newline.
(?i)\bhemp\.EMPLOYEE_NAME\h*=\h*'\w+'\h+(?:and|or)\b
See a regex demo or a Java demo
For example
String regex = "\\bhemp\\.EMPLOYEE_NAME\\h*=\\h*'\\w+'\\h+(?:and|or)\\b";
String string = "cond = emp.EMAIL_ID = 'xx#xx.com' AND\n"
+ "emp.PERMANENT_ADDR LIKE('%98n%') \n"
+ "AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'";
System.out.println(string.replaceFirst(regex, "xx$0x"));
Output
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND xxhemp.EMPLOYEE_NAME = 'xxx' andx is_active='Y'

Regular expression for unicode in java Dash version

It is possible to improve the performance of the following through a regular expression, the code is functional, but I want to know if there is any way to select the possible dash that exist in the unicode to standardize my dash
Words:
48553−FS002
48553-FS002
48553 FS002
48553-FS002-ESD12
Java
String reference = "48553−FS002";
String separador = reference.replaceFirst ( "\\w+(\\W)?\\w+", "$1" );
if(!separator.equals ( " " )) {
reference = reference.replaceAll ( separator, "-" );
}
Or you could search for the unicode code, I was reading the following: dash, but i haven't managed to make it work Java Regex Unicode
If you need to match any non-word but space, you may use
reference = reference.replaceAll("[^\\w ]", "-");
Or, with character class subtraction:
reference = reference.replaceAll("[\\W&&[^ ]]", "-");
You can use the following pattern to match your hyphen or dash like patterns:
[\p{Pd}\u00AD\u2212]
Here,
\p{Pd} - matches any Punctuation, Dash symbols
\u00AD - matches a soft hyphen
\u2212 - matches a minus symbol.
If you know your strings only contain word characters and separators, as seems to be the case, then you can just use
reference = reference.replaceAll("[^ \\w]", "-");

Java String Split using Regex with Escape Character

I have a string which needs to be split based on a delimiter(:). This delimiter can be escaped by a character (say '?'). Basically the delimiter can be preceded by any number of escape character. Consider below example string:
a:b?:c??:d???????:e
Here, after the split, it should give the below list of string:
a
b?:c??
d???????:e
Basically, if the delimiter (:) is preceded by even number of escape characters, it should split. If it is preceded by odd number of escape characters, it should not split. Is there a solution to this with regex?
Any help would be greatly appreciated.
Similar question has been asked earlier here, But the answers are not working for this use case.
Update:
The solution with the regex: (?:\?.|[^:?])* correctly split the string. However, this also gives few empty strings. If + is given instead of *, even the real empty matches also ignored. (Eg:- a::b gives only a,b)
Scenario 1: No empty matches
You may use
(?:\?.|[^:?])+
Or, following the pattern in the linked answer
(?:\?.|[^:?]++)+
See this regex demo
Details
(?: - start of a non-capturing group
\?. - a ? (the delimiter) followed with any char
| - or
[^:?] - any char but the : (your delimiter char) and ? (the escape char)
)+ - 1 or more repetitions.
In Java:
String regex = "(?:\\?.|[^:?]++)+";
In case the input contains line breaks, prepend the pattern with (?s) (like (?s)(?:\\?.|[^:?])+) or compile the pattern with Pattern.DOTALL flag.
Scenario 2: Empty matches included
You may add (?<=:)(?=:) alternative to the above pattern to match empty strings between : chars, see this regex demo:
String s = "::a:b?:c??::d???????:e::";
Pattern pattern = Pattern.compile("(?>\\?.|[^:?])+|(?<=:)(?=:)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("'" + matcher.group() + "'");
}
Output of the Java demo:
''
'a'
'b?:c??'
''
'd???????:e'
''
Note that if you want to also match empty strings at the start/end of the string, use (?<![^:])(?![^:]) rather than (?<=:)(?=:).

Regular Expression Syntax

I need to find two types of instances when there is a "[" character using regular expressions:
When the "[" character is followed by a number.
When the "[" character is followed by letters.
In Java I have tried:
Pattern firstinstance = Pattern.compile("\\[abcdefgABCDEFG");
Pattern secondinstance = Pattern.compile("\\[[0-9]");
These however, don't really seem to work. Do you guys have any possible suggestions?
The first instance is when the "[" character is followed by a number.
Any decimal digit in any script:
"\\[\\p{Nd}"
Any digit in 0-9 only:
"\\[\\d"
"\\[[0-9]"
The second instance is when the "[" character is followed by letters.
Any letter in any script:
"\\[\\p{L}"
Only letters in A-Z or a-z:
"\\[[A-Za-z]"
Pattern firstinstance = Pattern.compile("\\[[a-zA-Z]+");
Pattern secondinstance = Pattern.compile("\\[[0-9]+");
Pattern first = Pattern.compile("[[][0-9]");
Pattern second = Patter.compile("[[][A-z]+");
Regular expressions are very simple to understand. Have a look at Basic Concepts
In Java, you need to escape your escape characters (this is a consequence of the pattern being defined a string). So you would use the code
Pattern firstinstance = Pattern.compile("\\[[0-9]");
Pattern secondinstance = Pattern.compile("\\[[a-zA-Z]");
Those strings are read as
\[[0-9]
and
\[[a-zA-Z]
which are the regular expression you want.
Note, to get a literal backslash in the regex you need to use 4 backslashes \\\\.

In a java regex, how can I get a character class e.g. [a-z] to match a - minus sign?

Pattern pattern = Pattern.compile("^[a-z]+$");
String string = "abc-def";
assertTrue( pattern.matcher(string).matches() ); // obviously fails
Is it possible to have the character class match a "-" ?
Don't put the minus sign between characters.
"[a-z-]"
Escape the minus sign
[a-z\\-]
Inside a character class [...] a - is treated specially(as a range operator) if it's surrounded by characters on both sides. That means if you include the - at the beginning or at the end of the character class it will be treated literally(non-special).
So you can use the regex:
^[a-z-]+$
or
^[-a-z]+$
Since the - that we added is being treated literally there is no need to escape it. Although it's not an error if you do it.
Another (less recommended) way is to not include the - in the character class:
^(?:[a-z]|-)+$
Note that the parenthesis are not optional in this case as | has a very low precedence, so with the parenthesis:
^[a-z]|-+$
Will match a lowercase alphabet at the beginning of the string and one or more - at the end.
I'd rephrase the "don't put it between characters" a little more concretely.
Make the dash the first or last character in the character class. For example "[-a-z1-9]" matches lower-case characters, digits or dash.
This works for me
Pattern p = Pattern.compile("^[a-z\\-]+$");
String line = "abc-def";
Matcher matcher = p.matcher(line);
System.out.println(matcher.matches()); // true

Categories

Resources