Java using regex to verify an input string - java

g.:
String string="Marc Louie, Garduque Bautista";
I want to check if a string contains only words, a comma and spaces. i have tried to use regex and the closest I got is this :
String pattern = "[a-zA-Z]+(\\s[a-zA-Z]+)+";
but it doesnt check if there is a comma in there or not. Any suggestion ?

You need to use the pattern
^[A-Za-z, ]++$
For example
public static void main(String[] args) throws IOException {
final String input = "Marc Louie, Garduque Bautista";
final Pattern pattern = Pattern.compile("^[A-Za-z, ]++$");
if (!pattern.matcher(input).matches()) {
throw new IllegalArgumentException("Invalid String");
}
}
EDIT
As per Michael's astute comment the OP might mean a single comma, in which case
^[A-Za-z ]++,[A-Za-z ]++$
Ought to work.

Why not just simply:
"[a-zA-Z\\s,]+"

Use this will best
"(?i)[a-z,\\s]+"

If you mean "some words, any spaces and one single comma, wherever it occurs to be" then my feeling is to suggest this approach:
"^[^,]* *, *[^,]*$"
This means "Start with zero or more characters which are NOT (^) a comma, then you could find zero or more spaces, then a comma, then again zero or more spaces, then finally again zero or more characters which are NOT (^) a comma".

To validate String in java where No special char at beginning and end but may have some special char in between.
String strRGEX = "^[a-zA-Z0-9]+([a-zA-Z0-9-/?:.,\'+_\\s])+([a-zA-Z0-9])$";
String toBeTested= "TesADAD2-3t?S+s/fs:fds'f.324,ffs";
boolean testResult= Pattern.matches(strRGEX, toBeTested);
System.out.println("Test="+testResult);

Related

Regex pattern for ^A

I need to remove ^A from an incoming string, I'm looking for a regex pattern for it
Don’t want to use \\p{cntrl} , I don’t want to delete the other control characters coming in the string
You should use escaping for '^A':
public static void main(String[] args) {
String value = "remove special char ^A A and B";
System.out.println(value.replaceAll("\\^A", ""));
}
Output:
remove special char A and B
I suggest you avoid using regex and instead use basic string manipulation :
String toRemove = "^A";
yourString.replace(toRemove, "");
You can try it here.
Be careful not to use String methods that work on regexs, especially since the name of the methods aren't very informative (replace replaces all occurences of a fixed string, replaceAll replaces all occurences that match a regex pattern).

Removing string before open parenthesis using regular expression

I have an expression Nvl(cost,Sum(cost1))
i need to remove string before paranthesis i.e NVl and Sum in this case
String functions=externalFormat.replaceAll("\\([^\\(]*\\)", "");
Input Nvl(cost,Sum(Cost1)
Output cost,cost1
If you can't use capture groups (ie, if you CAN use capture groups, (\w*?)\( will capture the text you need to replace in the first group)
You could use a positive look-ahead to only capture word characters (letter or number) that appear before an open bracket:
\w*(?=\()
You could even add optional white space characters in case of things like: Nvl1 (cost,Sum (cost1)) by including them before the look-ahead: -
\w*\s*(?=\()
Hope this helps solve your problem.
Check below code :
public static void main(String[] args) {
String externalFormat="Nvl(cost,Sum(Cost1)";
String functions=externalFormat.replaceAll("\\w+\\(|\\)", "");
System.out.println(functions.toLowerCase());
}

Java regular expression for French names

I need to modify regular expression to allow all standard characters, French characters, spaces AND dash (hyphen) but only one at a time.
What I have right now is:
import java.util.regex.Pattern;
public class FrenchRegEx {
static final String NAME_PATTERN = "[\u00C0-\u017Fa-zA-Z-' ]+";
public static void main(String[] args) {
String name;
//name = "Jean Luc"; // allowed
//name = "Jean-Luc"; // allowed
//name = "Jean-Luc-Marie"; // allowed
name = "Jean--Luc"; // NOT allowed
if (!Pattern.matches(NAME_PATTERN, name)) {
System.out.println("ERROR!");
} else System.out.println("OK!");
}
}
and it allows 'Jean--Luc' as a name and that is not allowed.
Any help with this?
Thanks.
So, you want a pattern which is a 0 or more hyphens, separated by 1 or more other characters. It's just a matter of writing the pattern that way:
"[\u00C0-\u017Fa-zA-Z']+([- ][\u00C0-\u017Fa-zA-Z']+)*"
This also assumes you don't want names to start or end with a hyphen or space, nor that you want more than one space in a row, and that you also want to disallow a space to follow or proceed a hyphen.
You need to disallow consecutive hyphens. You may do it with a negative lookahead:
static final String NAME_PATTERN = "(?!.*--)[\u00C0-\u017Fa-zA-Z-' ]+";
^^^^^^^^
To disallow any of the special chars to be consecutive, use
static final String NAME_PATTERN = "(?!.*([-' ])\\1)[\u00C0-\u017Fa-zA-Z-' ]+";
Another way is to unroll the pattern a bit to match strings where the special char(s) can appear in between letters, but cannot appear consecutively (i.e. if you need to match Abc-def'here like strings):
static final String NAME_PATTERN = "[\u00C0-\u017Fa-zA-Z]+(?:[-' ][\u00C0-\u017Fa-zA-Z]+)*";
or to only allow 1 special char that can only appear in between letters (i.e. if you nee to only allow strings like abc-def, or abc'def):
static final String NAME_PATTERN = "[\u00C0-\u017Fa-zA-Z]+(?:[-' ][\u00C0-\u017Fa-zA-Z]+)?";
Note that you do not need anchors here because you are using the pattern inside a .matches() method that requires a full string match.
NOTE: you may further tune the patterns by moving special chars that may appear anywhere in the string from the [-' ] character class to the [\u00C0-\u017Fa-zA-Z] character classes, like [\u00C0-\u017Fa-zA-Z], but watch out for -. It should be placed at the end, near ].
Try using ([\u00C0-\u017Fa-zA-Z']+[- ]?)+. This would match one or more names separated by exactly one dash or space.

Java split regex non-greedy match not working

Why is non-greedy match not working for me? Take following example:
public String nonGreedy(){
String str2 = "abc|s:0:\"gef\";s:2:\"ced\"";
return str2.split(":.*?ced")[0];
}
In my eyes the result should be: abc|s:0:\"gef\";s:2 but it is: abc|s
The .*? in your regex matches any character except \n (0 or more times, matching the least amount possible).
You can try the regular expression:
:[^:]*?ced
On another note, you should use a constant Pattern to avoid recompiling the expression every time, something like:
private static final Pattern REGEX_PATTERN =
Pattern.compile(":[^:]*?ced");
public static void main(String[] args) {
String input = "abc|s:0:\"gef\";s:2:\"ced\"";
System.out.println(java.util.Arrays.toString(
REGEX_PATTERN.split(input)
)); // prints "[abc|s:0:"gef";s:2, "]"
}
It is behaving as expected. The non-greedy match will match as little as it has to, and with your input, the minimum characters to match is the first colon to the next ced.
You could try limiting the number of characters consumed. For example to limit the term to "up to 3 characters:
:.{0,3}ced
To make it split as close to ced as possible, use a negative look-ahead, with this regex:
:(?!.*:.*ced).*ced
This makes sure there isn't a closer colon to ced.

How to remove special characters from a string?

I want to remove special characters like:
- + ^ . : ,
from an String using Java.
That depends on what you define as special characters, but try replaceAll(...):
String result = yourString.replaceAll("[-+.^:,]","");
Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".
Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).
So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):
String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");
If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").
A third way could be something like this, if you can exactly define what should be left in your string:
String result = yourString.replaceAll("[^\\w\\s]","");
This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.
Edit: please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.
Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:
String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");
The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.
Additional information on Unicode
Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.
This will replace all the characters except alphanumeric
replaceAll("[^A-Za-z0-9]","");
As described here
http://developer.android.com/reference/java/util/regex/Pattern.html
Patterns are compiled regular expressions. In many cases, convenience methods such as String.matches, String.replaceAll and String.split will be preferable, but if you need to do a lot of work with the same regular expression, it may be more efficient to compile it once and reuse it. The Pattern class and its companion, Matcher, also offer more functionality than the small amount exposed by String.
public class RegularExpressionTest {
public static void main(String[] args) {
System.out.println("String is = "+getOnlyStrings("!&(*^*(^(+one(&(^()(*)(*&^%$##!#$%^&*()("));
System.out.println("Number is = "+getOnlyDigits("&(*^*(^(+91-&*9hi-639-0097(&(^("));
}
public static String getOnlyDigits(String s) {
Pattern pattern = Pattern.compile("[^0-9]");
Matcher matcher = pattern.matcher(s);
String number = matcher.replaceAll("");
return number;
}
public static String getOnlyStrings(String s) {
Pattern pattern = Pattern.compile("[^a-z A-Z]");
Matcher matcher = pattern.matcher(s);
String number = matcher.replaceAll("");
return number;
}
}
Result
String is = one
Number is = 9196390097
Try replaceAll() method of the String class.
BTW here is the method, return type and parameters.
public String replaceAll(String regex,
String replacement)
Example:
String str = "Hello +-^ my + - friends ^ ^^-- ^^^ +!";
str = str.replaceAll("[-+^]*", "");
It should remove all the {'^', '+', '-'} chars that you wanted to remove!
To Remove Special character
String t2 = "!##$%^&*()-';,./?><+abdd";
t2 = t2.replaceAll("\\W+","");
Output will be : abdd.
This works perfectly.
Use the String.replaceAll() method in Java.
replaceAll should be good enough for your problem.
You can remove single char as follows:
String str="+919595354336";
String result = str.replaceAll("\\\\+","");
System.out.println(result);
OUTPUT:
919595354336
If you just want to do a literal replace in java, use Pattern.quote(string) to escape any string to a literal.
myString.replaceAll(Pattern.quote(matchingStr), replacementStr)

Categories

Resources