Check that string contains non-latin letters

Check that string contains non-latin letters - java

I have the following method to check that string contains only latin symbols.
private boolean containsNonLatin(String val) {
return val.matches("\\w+");
}
But it returns false if I pass string: my string because it contains space.
But I need the method which will check that if string contains letters not in Latin alphabet it should return false and it should return true in all other cases.
Please help to improve my method.
examples of valid strings:
w123.
w, 12
w#123
dsf%&#

You can use \p{IsLatin} class:
return !(var.matches("[\\p{Punct}\\p{Space}\\p{IsLatin}]+$"));
Java Regex Reference

I need something like not p{IsLatin}
If you need to match all letters but Latin ASCII letters, you can use
"[\\p{L}\\p{M}&&[^\\p{Alpha}]]+"
The \p{Alpha} POSIX class matches [A-Za-z]. The \p{L} matches any Unicode base letter, \p{M} matches diacritics. When we add &&[^\p{Alpha}] we subtract these [A-Za-z] from all the Unicode letters.
The whole expression means match one or more Unicode letters other than ASCII letters.
To add a space, just add \s:
"[\\s\\p{L}\\p{M}&&[^\\p{Alpha}]]+"
See IDEONE demo:
List<String> strs = Arrays.asList("w123.", "w, 12", "w#123", "dsf%&#", "Двв");
for (String str : strs)
System.out.println(!str.matches("[\\s\\p{L}\\p{M}&&[^\\p{Alpha}]]+")); // => 4 true, 1 false

Just add a space to your matcher:
private boolean isLatin(String val) {
return val.matches("[ \\w]+");
}

User this :
public static boolean isNoAlphaNumeric(String s) {
return s.matches("[\\p{L}\\s]+");
}
\p{L} means any Unicode letter.
\s space character

Related

Why isn't my regex matching uppercase characters and underscores?

I have the following Java code:
public static void main(String[] args) {
String var = "ROOT_CONTEXT_MATCHER";
boolean matches = var.matches("/[A-Z][a-zA-Z0-9_]*/");
System.out.println("The value of 'matches' is: " + matches);
}
This prints: The value of 'matches' is: false
Why doesn't my var match the regex? If I am reading my regex correctly, it matches any String:
Beginning with an upper-case char, A-Z; then
Consisting of zero or more:
Lower-case chars a-z; or
Upper-case chars A-Z; or
Digits 0-9; or
An underscore
The String "ROOT_CONTEXT_MATCHER":
Starts with an A-Z char; and
Consists of 19 subsequent characters that are all uppper-case A-Z or are an underscore
What's going on here?!?

The issue is with the forward slash characters at the beginning and at the end of the regex. They don't have any special meaning here and are treated as literals. Simply remove them to get it fixed:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
If you intended to use metacharacters for boundary matching, the correct characters are ^ for the beginning of the line, and $ for the end of the line:
boolean matches = var.matches("^[A-Z][a-zA-Z0-9_]*$");
although these are not needed here because String#matches would match the entire string.

You need to remove regex delimiers i.e. / from Java regex:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
That can be further shortened to:
boolean matches = var.matches("[A-Z]\\w*");
Since \\w is equivalent of [a-zA-Z0-9_] (word character)

Why is this Java regex not working?

I'm trying to match any string consisting of:
Any alphanumeric string of 1+ chars; then
Two periods (".."); then
Any alphanumeric string of 1+ chars
For example:
mydatabase..mytable
anotherDatabase23..table28
etc.
Given the following function:
public boolean isValidDBTableName(String candidate) {
if(candidate.matches("[a-zA-Z0-9]+..[a-zA-Z0-9]+"))
return true;
else
return false;
}
Passing this function the value "mydb..tablename" causes it to return false. Why? Thanks in advance!

As NeplatnyUdaj has pointed out in comment, your current regex should return true for the input "mydb..tablename".
However, your regex has the problem of over-matching, where it returns true for invalid names such as nodotname.
You need to escape ., since in Java regex, it will match any character except for line separators:
"[a-zA-Z0-9]+\\.\\.[a-zA-Z0-9]+"
In regex, you can escape meta-characters (character with special meaning) with \. To specify \ in string literal, you need to escape it again.

You must escape the period in regexes. As a \ must also be escaped, this gives
"[a-zA-Z0-9]+\\.\\.[a-zA-Z0-9]+"

I just tried your regex in Eclipse and it worked. Or at least did not fail. Try stripping whitespace characters.
#Test
public void test()
{
String testString = "mydb..tablename";
Assert.assertTrue("no match", testString.matches("[a-zA-Z0-9]+..[a-zA-Z0-9]+"));
Assert.assertFalse("falsematch", "a.b".matches("[a-zA-Z0-9]+..[a-zA-Z0-9]+"));
}

Regular Expression - inserting space after comma only if succeeded by a letter or number

In Java I want to insert a space after a String but only if the character after the comma is succeeded by a digit or letter. I am hoping to use the replaceAll method which uses regular expressions as a parameter. So far I have the following:
String s1="428.0,chf";
s1 = s1.replaceAll(",(\\d|\\w)",", ");
This code does successfully distinguish between the String above and one where there is already a space after the comma. My problem is that I can't figure out how to write the expression so that the space is inserted. The code above will replace the c in the String shown above with a space. This is not what I want.
s1 should look like this after executing the replaceAll: "428.0 chf"

s1.replaceAll(",(?=[\da-zA-Z])"," ");
(?=[\da-zA-Z]) is a positive lookahead which would look for a digit or a word after ,.This lookahead would not be replaced since it is never included in the result.It's just a check
NOTE
\w includes digit,alphabets and a _.So no need of \d.
A better way to represent it would be [\da-zA-Z] instead of \w since \w also includes _ which you do not need 2 match

Try this, and note that $1 refers to your matched grouping:
s1.replaceAll(",(\\d|\\w)"," $1");
Note that String.replaceAll() works in the same way as a Matcher.replaceAll(). From the doc:
The replacement string may contain references to captured subsequences

String s1="428.0,chf";
s1 = s1.replaceAll(",([^_]\\w)"," $1"); //Match alphanumeric except '_' after ','
System.out.println(s1);
Output: -
428.0 chf
Since \w matches digits, words, and an underscore, So, [^_] negates the underscore from \w..
$1 represents the captured group.. You captured c after , here, so replace c with _$1 -> _c.. "_" represent a space..

Try this....
public class Tes {
public static void main(String[] args){
String s1="428.0,chf";
String[] sArr = s1.split(",");
String finalStr = new String();
for(String s : sArr){
finalStr = finalStr +" "+ s;
}
System.out.println(finalStr);
}
}

How can i get know that my String contains diacritics?

For Example -
text = Československá obchodní banka;
text string contains diacritics like Č , á etc.
I want to write a function where i will pass this string "Československá obchodní banka" and function will return true if string contains diacritics else false.
I have to handle diacritics and string which contains character which doesn't fall in A-z or a-z range separately.
1) If String contains diacritics then I have to do some XXXXXX on it.
2) If String contains character other than A-Z or a-z and not contains diacritics then do some other operations YYYYY.
I have no idea how to do it.

One piece of knowledge: in Unicode there exists a code for á but the same result one may get with an a and a combining mark-'.
You can use java.text.Normalizer, as follows:
public static boolean hasDiacritics(String s) {
// Decompose any á into a and combining-'.
String s2 = Normalizer.normalize(s, Normalizer.Form.NFD);
return s2.matches("(?s).*\\p{InCombiningDiacriticalMarks}.*");
//return !s2.equals(s);
}

The Normalizer class seems to be able to accomplish this. Some limited testing indicate that
Normalizer.isNormalized(text, Normalizer.Form.NFD)
might be what you need.

Java - How to test if a String contains both letters and numbers

I need a regex which will satisfy both conditions.
It should give me true only when a String contains both A-Z and 0-9.
Here's what I've tried:
if PNo[0].matches("^[A-Z0-9]+$")
It does not work.

I suspect that the regex below is slowed down by the look-around, but it should work regardless:
.matches("^(?=.*[A-Z])(?=.*[0-9])[A-Z0-9]+$")
The regex asserts that there is an uppercase alphabetical character (?=.*[A-Z]) somewhere in the string, and asserts that there is a digit (?=.*[0-9]) somewhere in the string, and then it checks whether everything is either alphabetical character or digit.

It easier to write and read if you use two separate regular expressions:
String s = "blah-FOO-test-1-2-3";
String numRegex = ".*[0-9].*";
String alphaRegex = ".*[A-Z].*";
if (s.matches(numRegex) && s.matches(alphaRegex)) {
System.out.println("Valid: " + input);
}
Better yet, write a method:
public boolean isValid(String s) {
String n = ".*[0-9].*";
String a = ".*[A-Z].*";
return s.matches(n) && s.matches(a);
}

A letter may be either before or after the digit, so this expression should work:
(([A-Z].*[0-9])|([0-9].*[A-Z]))
Here is a code example that uses this expression:
Pattern p = Pattern.compile("(([A-Z].*[0-9])|([0-9].*[A-Z]))");
Matcher m = p.matcher("AXD123");
boolean b = m.find();
System.out.println(b);

Here is the regex for you
Basics:
Match in the current line of string: .
Match 0 or any amount of any characters: *
Match anything in the current line: .*
Match any character in the set (range) of characters: [start-end]
Match one of the regex from a group: (regex1|regex2|regex3)
Note that the start and end comes from ASCII order and the start must be before end. For example you can do [0-Z], but not [Z-0]. Here is the ASCII chart for your reference
Check the string against regex
Simply call yourString.matches(theRegexAsString)
Check if string contains letters:
Check if there is a letter: yourString.matches(".*[a-zA-Z].*")
Check if there is a lower cased letter: yourString.matches(".*[a-z].*")
Check if there is a upper cased letter: yourString.matches(".*[A-Z].*")
Check if string contains numbers:
yourString.matches(".*[0-9].*")
Check if string contains both number and letter:
The simplest way is to match twice with letters and numbers
yourString.matches(".*[a-zA-Z].*") && yourString.matches(".*[0-9].*")
If you prefer to match everything all together, the regex will be something like: Match a string which at someplace has a character and then there is a number afterwards in any position, or the other way around. So your regex will be:
yourString.matches(".*([a-zA-Z].*[0-9]|[0-9].*[a-zA-Z]).*")
Extra regex for your reference:
Check if the string stars with letter
yourString.matches("[a-zA-Z].*")
Check if the string ends with number
yourString.matches(".*[0-9]")

This should solve your problem:
^([A-Z]+[0-9][A-Z0-9]*)|([0-9]+[A-Z][A-Z0-9]*)$
But it's unreadable. I would suggest to first check input with "^[A-Z0-9]+$", then check with "[A-Z]" to ensure it contains at least one letter then check with "[0-9]" to ensure it contains at least one digit. This way you can add new restrictions easily and code will remain readable.

What about ([A-Z].*[0-9]+)|([0-9].*[A-Z]+) ?

Try using (([A-Z]+[0-9])|([0-9]+[A-Z])) .It should solve.

use this method:
private boolean isValid(String str)
{
String Regex_combination_of_letters_and_numbers = "^(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]+$";
String Regex_just_letters = "^(?=.*[a-zA-Z])[a-zA-Z]+$";
String Regex_just_numbers = "^(?=.*[0-9])[0-9]+$";
String Regex_just_specialcharachters = "^(?=.*[##$%^&+=])[##$%^&+=]+$";
String Regex_combination_of_letters_and_specialcharachters = "^(?=.*[a-zA-Z])(?=.*[##$%^&+=])[a-zA-Z##$%^&+=]+$";
String Regex_combination_of_numbers_and_specialcharachters = "^(?=.*[0-9])(?=.*[##$%^&+=])[0-9##$%^&+=]+$";
String Regex_combination_of_letters_and_numbers_and_specialcharachters = "^(?=.*[a-zA-Z])(?=.*[0-9])(?=.*[##$%^&+=])[a-zA-Z0-9##$%^&+=]+$";
if(str.matches(Regex_combination_of_letters_and_numbers))
return true;
if(str.matches(Regex_just_letters))
return true;
if(str.matches(Regex_just_numbers))
return true;
if(str.matches(Regex_just_specialcharachters))
return true;
if(str.matches(Regex_combination_of_letters_and_specialcharachters))
return true;
if(str.matches(Regex_combination_of_numbers_and_specialcharachters))
return true;
if(str.matches(Regex_combination_of_letters_and_numbers_and_specialcharachters))
return true;
return false;
}
You can delete some conditions according to your taste

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Check that string contains non-latin letters - java

You can use \p{IsLatin} class: return !(var.matches("[\\p{Punct}\\p{Space}\\p{IsLatin}]+$")); Java Regex Reference

Just add a space to your matcher: private boolean isLatin(String val) { return val.matches("[ \\w]+"); }

User this : public static boolean isNoAlphaNumeric(String s) { return s.matches("[\\p{L}\\s]+"); } \p{L} means any Unicode letter. \s space character

Related

Why isn't my regex matching uppercase characters and underscores?

Why is this Java regex not working?

Regular Expression - inserting space after comma only if succeeded by a letter or number

How can i get know that my String contains diacritics?

Java - How to test if a String contains both letters and numbers

Categories

Resources