Check String whether it contains only Latin characters? - java

Greetings,
I am developing GWT application where user can enter his details in Japanese.
But the 'userid' and 'password' should only contain English characters(Latin Alphabet).
How to validate Strings for this?

You can use String#matches() with a bit regex for this. Latin characters are covered by \w.
So this should do:
boolean valid = input.matches("\\w+");
This by the way also covers numbers and the underscore _. Not sure if that harms. Else you can just use [A-Za-z]+ instead.
If you want to cover diacritical characters as well (ä, é, ò, and so on, those are per definition also Latin characters), then you need to normalize them first and get rid of the diacritical marks before matching, simply because there's no (documented) regex which covers diacriticals.
String clean = Normalizer.normalize(input, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
boolean valid = clean.matches("\\w+");
Update: there's an undocumented regex in Java which covers diacriticals as well, the \p{L}.
boolean valid = input.matches("\\p{L}+");
Above works at Java 1.6.

public static boolean isValidISOLatin1 (String s) {
return Charset.forName("US-ASCII").newEncoder().canEncode(s);
} // or "ISO-8859-1" for ISO Latin 1
For reference, see the documentation on Charset.

There is my solution and it is working excellent
public static boolean isStringContainsLatinCharactersOnly(final String iStringToCheck)
{
return iStringToCheck.matches("^[a-zA-Z0-9.]+$");
}

There might be a better approach, but you could load a collection with whatever you deem to be acceptable characters, and then check each character in the username/password field against that collection.
Pseudo:
foreach (character in username)
{
if !allowedCharacters.contains(character)
{
throw exception
}
}

For something this simple, I'd use a regular expression.
private static final Pattern p = Pattern.compile("\\p{Alpha}+");
static boolean isValid(String input) {
Matcher m = p.matcher(input);
return m.matches();
}
There are other pre-defined classes like \w that might work better.

I successfully used a combination of the answers of user232624, Joachim Sauer and Tvaroh:
static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII"); // or "ISO-8859-1" for ISO Latin 1
boolean isValid(String input) {
return Character.isLetter(ch) && asciiEncoder.canEncode(username);
}

Related

Pattern matching for Japanese string have issues in java

I have a strange issue while pattern matching only Japaneese characters in Java.
Let me explain by code.
private static final Pattern ADDRESS_STRING_PATTERN =
Pattern.compile("^[\\p{L}\\d\\s\\p{Punct}]{1,200}$");
private static boolean isValidInput(final String input, Pattern pattern) {
return pattern.matcher(input).matches();
}
System.out.println("こんにちは、元気ですか");
Here I am matching any Letter,Space, digit or Punctuation letters 1 to 200.
Now this will always return false. After some debugging found that the issue is with one character "、" . If I add that character as part of the regular expression it works fine.
Anyone come across this issue ? Or is this bug in Java ?
The thing is that 、 (U+3001 IDEOGRAPHIC COMMA) belongs to "Punctuation, other" Unicode category and \\p{Punct} only matches ASCII punctuation by default. If you use a Pattern.UNICODE_CHARACTER_CLASS option or (?U) embedded flag option, it will match (i.e. the pattern might look like "(?U)^[\\p{L}\\d\\s\\p{Punct}]{1,200}$"). However, this may impact \d and \s, and I am not sure you want to match all Unicode digits and whitespace.
An alternative is to use \p{P}\p{S} (to match Unicode punctuation and symbols) instead of \p{Punct} (the POSIX character class matches both punctuation and symbols).
See a Java demo printing true:
private static final Pattern ADDRESS_STRING_PATTERN = Pattern.compile("^[\\p{L}\\d\\s\\p{P}\\p{S}]{1,200}$");
private static boolean isValidInput(final String input, Pattern pattern) {
return pattern.matcher(input).matches();
}
public static void main (String[] args) throws java.lang.Exception
{
System.out.println(isValidInput("こんにちは、元気ですか",ADDRESS_STRING_PATTERN));
}
// => true

Check special arrangement of specific signs in a string in Java

I need to check a string whether it includes a specific arrangements of letters and numbers.
Valid arrangements are for example:
X
X-Y
A-H-K-L-J-Y
A-H-J-Y
123
12?
12*
12-17
Invalid are for example:
-X-Y
-XY
*12
?12
I have written this method in java to solve this problem (but i don´t have some experiences with regular expressions):
public boolean checkPatternMatching(String sourceToScan, String searchPattern) {
boolean patternFounded;
if (sourceToScan == null) {
patternFounded = false;
} else {
Pattern pattern = Pattern.compile(Pattern.quote(searchPattern),
Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(sourceToScan);
patternFounded = matcher.find();
}
return patternFounded;
}
How can i implemented this requirement with regular expressions?
By the way: It is a good solution to check a string, whether it includes numeric content by using the method isNumeric from the java class StringUtils?
//EDIT
The link, which was edited by the admins includes not specific arrangements of characters but only an appearance of characters with regular expressions in general !
After a good while trying to help, answering to constantly changing questions, just found out that the same was asked yesterday, and that the OP doesn't accept answers to his questions...all I have left to say is good night sir, good luck
n-th answer follows:
First pattern: [a-z](-[a-z])* : a letter, possibly followed by more letters, separated by -.
Second pattern: \d+(-\d+)*[?*]* : a number, possibly followed by more numbers, separated by -, and possibly ending with ? or *.
So join them together: ^([a-z](-[a-z])*)|(\d+(-\d+)*[?*]*)$. ^ and $ mark the beginning and the end of the string.
Few more comments on the code: you don't need to use Pattern.quote, and you should use matches() instead of find(), because find() returns true if any part of the string matches the pattern, and you want the whole string:
public static boolean checkPatternMatching(String sourceToScan, String searchPattern) {
boolean patternFounded;
if (sourceToScan == null) {
patternFounded = false;
} else {
Pattern pattern = Pattern.compile(searchPattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(sourceToScan);
patternFounded = matcher.matches();
}
return patternFounded;
}
Called like this: checkPatternMatching(s, "^([a-z](-[a-z])*)|(\\d+(-\\d+)*[?*]*)$")
About the second question, this is the current implementation of StringUtils.isNumeric:
public static boolean isNumeric(final CharSequence cs) {
if (isEmpty(cs)) {
return false;
}
final int sz = cs.length();
for (int i = 0; i < sz; i++) {
if (Character.isDigit(cs.charAt(i)) == false) {
return false;
}
}
return true;
}
So no, there is nothing wrong about it, that is as simple as it gets. But you need to include an external JAR in your program, which I find unnecessary if you just want to use such a simple method.
I believe that you should first remove the Pattern.quote() method because that would turn the inputting patterns into string literals; and those are not really useful in your context.
To match the valid arrangements with letters, something like this should work:
^[a-z](?:-[a-z])*$
For the numbers (if I understood the rules correctly):
^\\d+(?:[?*]|-\\d+)*$
And if you want to combine them:
^(?:[a-z](?:-[a-z])*|\\d+(?:[?*]|-\\d+)*)$
I'm not familiar with Java itself, nor the isNumeric method, sorry.
As per your comment, if you want to accept *12 or 1?2 or 12*456, you can use:
^\\*?\\d+(?:[?*]\\d*|-\\d+)*$
Then add it to the previous regex like so:
^(?:[a-z](?:-[a-z])*|\\*?\\d+(?:[?*]\\d*|-\\d+)*)$

A method that checks if a String consists of certain characters

I need to make a method that checks if a given String only consists of lower- and uppercase letters, numbers, dots (.), hyphens (-) and underscores ( ).
public boolean isValidString(String name) {
}
I just don't know how to get it started :(
Tx in advance
Use regular expressions:
String s = "Your_string-123.";
Pattern p = Pattern.compile("([a-zA-Z]|[0-9]|\\.|\\-|_)+");
Matcher m = p.matcher(s);
if (m.matches()) {
System.out.println(true);
}
Check out Regular Expressions. Quick tutorial here as well.
Use the standard library functions.
IsLetter()
IsDigit()
IsWhiteSpace()
etc.
Using this Regex,
str.matches("([A-Za-z0-9.\\-_])+")
Example:
public boolean isValidString(String name) {
return name.matches("([A-Za-z0-9.\\-_])+");
}
Covert String toCharArray, and then with foreach check each char wether it is from ASCII set you need

How to remove special characters from a string?

I want to remove special characters like:
- + ^ . : ,
from an String using Java.
That depends on what you define as special characters, but try replaceAll(...):
String result = yourString.replaceAll("[-+.^:,]","");
Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".
Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).
So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):
String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");
If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").
A third way could be something like this, if you can exactly define what should be left in your string:
String result = yourString.replaceAll("[^\\w\\s]","");
This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.
Edit: please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.
Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:
String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");
The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.
Additional information on Unicode
Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.
This will replace all the characters except alphanumeric
replaceAll("[^A-Za-z0-9]","");
As described here
http://developer.android.com/reference/java/util/regex/Pattern.html
Patterns are compiled regular expressions. In many cases, convenience methods such as String.matches, String.replaceAll and String.split will be preferable, but if you need to do a lot of work with the same regular expression, it may be more efficient to compile it once and reuse it. The Pattern class and its companion, Matcher, also offer more functionality than the small amount exposed by String.
public class RegularExpressionTest {
public static void main(String[] args) {
System.out.println("String is = "+getOnlyStrings("!&(*^*(^(+one(&(^()(*)(*&^%$##!#$%^&*()("));
System.out.println("Number is = "+getOnlyDigits("&(*^*(^(+91-&*9hi-639-0097(&(^("));
}
public static String getOnlyDigits(String s) {
Pattern pattern = Pattern.compile("[^0-9]");
Matcher matcher = pattern.matcher(s);
String number = matcher.replaceAll("");
return number;
}
public static String getOnlyStrings(String s) {
Pattern pattern = Pattern.compile("[^a-z A-Z]");
Matcher matcher = pattern.matcher(s);
String number = matcher.replaceAll("");
return number;
}
}
Result
String is = one
Number is = 9196390097
Try replaceAll() method of the String class.
BTW here is the method, return type and parameters.
public String replaceAll(String regex,
String replacement)
Example:
String str = "Hello +-^ my + - friends ^ ^^-- ^^^ +!";
str = str.replaceAll("[-+^]*", "");
It should remove all the {'^', '+', '-'} chars that you wanted to remove!
To Remove Special character
String t2 = "!##$%^&*()-';,./?><+abdd";
t2 = t2.replaceAll("\\W+","");
Output will be : abdd.
This works perfectly.
Use the String.replaceAll() method in Java.
replaceAll should be good enough for your problem.
You can remove single char as follows:
String str="+919595354336";
String result = str.replaceAll("\\\\+","");
System.out.println(result);
OUTPUT:
919595354336
If you just want to do a literal replace in java, use Pattern.quote(string) to escape any string to a literal.
myString.replaceAll(Pattern.quote(matchingStr), replacementStr)

How to validate string using regex in java

I want to validate a string which donot have numeric characters.
If my string is "javaABC" then it must be validated
If my string is "java1" then it must not be validated
I want to restrict all the integers.
Try this:
String Text = ...;
boolean HasNoNumber = Text.matches("^[^0-9]*$");
'^[^0-9]*$' = From Start(^) to end ($), there are ([...]) only non(^) number(0-9). You can use '\D' as other suggest too ... but this is easy to understand.
See more info here.
You can use this:
\D
"\D" matches non-digit characters.
Here is one way that you can search for a digit in a String:
public boolean isValid(String stringToValidate) {
if(Pattern.compile("[0-9]").matcher(stringToValidate).find()) {
// The string is not valid.
return false;
}
// The string is valid.
return true;
}
More detail is here:
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
The easiest to understand is probably matching for a single digit and if found fail, instead of creating a regexp that makes sure that all characters in the string are non-digits.

Categories

Resources