In JAVA (Android), I'm trying to determine in a String, if each character has a equivalent in upper or lower case.
My goal is not to lower or upper the case, but to know if it's possible.
For example this function would return true for : 'e' 'é' 'i' 'l' 'L' 'O' 'P'
and false for emojis or chinese characters.
Is there any function that can do this?
EDIT : To be more clear, the function was supposed to take a character for argument, not a String and return false if the character had no uppercase or lowercase version.
You can try this:
boolean validate(char c){
return Character.isUpperCase(c) || Character.isLowerCase(c);
}
This will return true iff it is a Letter in uppercase or lower case only. Otherwise it'll return false.
The requirements are still not entirely specified (do you care whether the upper/lowercase equivalent is a different character from the original?), but my most straightforward interpretation of the question is:
For each character ch in a given string, is it true that either toUpperCase(ch) yields an uppercase character, or that toLowerCase(ch) yields a lowercase character?
I phrase it that way because Character.toUpperCase() returns "the uppercase equivalent of the character, if any; otherwise, the character itself".
The doc for String.toUppercase() doesn't mention what happens if there is no uppercase equivalent for some characters, but I think we can assume it returns those characters unchanged, as does Character.toUpperCase().
So a straightforward implementation of that condition would be to test
Character.isUpperCase(s.toUpperCase().charAt(0)) ||
Character.isLowerCase(s.toLowerCase().charAt(0));
for each character as a String.
I'm using the String rather than Character case conversion functions here, in order to take advantage of locale-sensitive mapping. Not only that, but regardless of locale, there are characters that cannot be converted to uppercase by Character.toUpperCase() because their uppercase equivalent is more than one character! For example, we would get incorrect results for \u00df 'ß' (see docs for details).
public class TestUpper {
public static void main(String[] args) {
final String test = "\u0633\u0644\u0627\u0645 World \u00df\u01c8eéilLOP\u76f4!";
for (Character ch : test.toCharArray()) {
System.out.format("'%c' (U+%04x): hasCase()=%b%n", ch, (int)ch, hasCase(ch));
}
}
static boolean hasCase(Character ch) {
String s = ch.toString();
// Does the character s have an uppercase or a lowercase equivalent?
return Character.isUpperCase(s.toUpperCase().charAt(0)) ||
Character.isLowerCase(s.toLowerCase().charAt(0));
}
}
And the results:
'س' (U+0633): hasCase()=false
'ل' (U+0644): hasCase()=false
'ا' (U+0627): hasCase()=false
'م' (U+0645): hasCase()=false
' ' (U+0020): hasCase()=false
'W' (U+0057): hasCase()=true
'o' (U+006f): hasCase()=true
'r' (U+0072): hasCase()=true
'l' (U+006c): hasCase()=true
'd' (U+0064): hasCase()=true
' ' (U+0020): hasCase()=false
'ß' (U+00df): hasCase()=true
'Lj' (U+01c8): hasCase()=true
'e' (U+0065): hasCase()=true
'é' (U+00e9): hasCase()=true
'i' (U+0069): hasCase()=true
'l' (U+006c): hasCase()=true
'L' (U+004c): hasCase()=true
'O' (U+004f): hasCase()=true
'P' (U+0050): hasCase()=true
'直' (U+76f4): hasCase()=false
'!' (U+0021): hasCase()=false
These test cases include Arabic letters and a Chinese character (which are isLetter(), but have no upper/lowercase equivalents), the requested test letters, space and punctuation, and a titlecase letter.
The results are correct according to the criteria currently stated in the question. However, the OP has said in comments that he wants the function to return false for titlecase characters, such as U+01c8, whereas the above code returns true because they have uppercase and lowercase equivalents (U+01c7 and U+01c9). But the OP's statement seems to be based on the mistaken impression that titlecase letters do not have uppercase and lowercase equivalents. Ongoing discussion has not yet resolved the confusion.
Disclaimer: This answer doesn't attempt to take into account supplementary or surrogate code points.
For a simple method, there's Character.isLowerCase. But you actually need to be careful- it depends on language. Some languages may have a lower case 'é' but no uppercase. Or like the turkish "I" may have a different lower case version than other languages.
To work around that, I'd use something like Character.isLetter(myChar) && String.valueOf(myChar).toLowerCase().equals(String.valueOf(myChar)). Remember to use the version of toLowerCase that takes a Locale as parameter if not comparing in the default Locale.
Check if the character is either a lowercase letter or an uppercase letter:
Character.isLowerCase(ch) != Character.isUpperCase(ch)
Alternatively, you can compare the lower and uppercased forms of the character:
Character.toLowerCase(ch) == Character.toUpperCase(ch)
However, you need to be careful about locale (there is one letter in Turkish where I think the lower and uppercase forms are the same).
Two strings uppercase and lowercase not matching does not necessarily mean the string is valid. true1 will not equal TRUE1 but fails the test case. You need to check each individual character. This is a rough cut, you'll probably have to do something fancy for emojis and Chinese characters.
public static boolean isAllCase(String value) {
String upper = value.toUpperCase();
String lower = value.toLowerCase();
if(upper.length() != lower.length())
return false;
for(int i = 0; i < upper.length(); i++) {
if(upper.charAt(i) == lower.charAt(i))
return false;
}
return true;
}
public boolean hasEquivalentCase(char ch) {
return (Character.isLowerCase(ch)) || Character.isUpperCase(ch)
}
public boolean validate(char value){
if( (value >= 'a' && value <= 'z') || (value >= 'A' &&
value <= 'Z')
return true;
return false;
}
this for each caracter to your String.
public boolean All( String cad ){
for( int i = 0; i < cad.lenght() ; i++ ){
if( !validate(cad.charAt(i)) ){
#the letter has not upper or lower
return false;
}
}
return true;
}
Related
I'm very new to Java so please bear with me.
My assignment:
Ask the user to input a password and write a message stating whether or not it is acceptable. The password requirements:
the password is at least 8 characters long
it has upper case and lower case letters
at least one letter is followed by a number
it has one of the special characters $#?!_-=%
I really dont now what to do on number 3 and 4. Ive read something about regex but we didnt even had that in class. are there any other possible methods?
For number 3 you can use the cycle. Inside it, you can catch every letter via isLetter() method and then check the following element of your array by isDigit() method
boolean isLetterFollowedByNumber;
for (int[] a : nameOfYourArray) {
if (Character.isLetter(array[i]) && Character.isLetter(array[i])) {
isLetterFollowedByNumber = true;
}
}
For number 4 you can just compare every element of your array of char with special characters
boolean hasCharacter;
for (int[] a : nameOfYourArray) {
if (a == '$' || a == '#' || a == '?' || a == '!' || a == '_'- || a == '=' || a == '%') {
hasCharacter = true;
}
}
Both of my examples include for-each loop, but you can use for loop as well. Good luck with your task!
I want to create a regex in Java to match at least 1 alphabet and 1 non-alphabet (could be anything except A-Za-z) and no white space.
Below Regex is working partially correct:
^([A-Za-z]{1,}[^A-Za-z]{1,})+$
It matches aaaa7777
but doesn't match 777aaaaa.
Any Help would be appreciated.
Your regex implicitly assumes the order of the characters you want to match. The regex is saying that a letter must come before a non-latter. However, you want the letter and the non-letter to come in either order, so you need to account for both cases. Also note that it should be [^\sa-zA-Z] instead of [^a-zA-Z] as you don't allow spaces.
(?:[a-zA-Z][^\sa-zA-Z]|[^\sa-zA-Z][a-zA-Z])
At the start and end, any non-space character is allowed, so:
^\S*(?:[a-zA-Z][^\sa-zA-Z]|[^\sa-zA-Z][a-zA-Z])\S*$
You may use
s.matches("(?=\\P{Alpha}*\\p{Alpha})(?=\\p{Alpha}*\\P{Alpha})\\S*")
This is how the pattern works.
Details
The pattern will match a whole string since ^ and \z anchors are implicit in matches
(?=\P{Alpha}*\p{Alpha}) - a lookahead that requires at least one ASCII letter after any 0+ chars other than an ASCII letter
(?=\p{Alpha}*\P{Alpha}) - a lookahead that requires a char other than an ASCII letter after 0 or more ASCII letters
\S* - zero or more non-whitespace chars.
To make the regex Unicode aware replace \p{Alpha} with \p{L} and \P{Alpha} with \P{L}.
Regular expressions aren't the right tool for this type of validation. Just write out the plain logic, your specific example:
public class Main {
public static void main(String[] args) {
System.out.println("'foo' ? " + doesMatch("foo"));
System.out.println("'bar7' ? " + doesMatch("bar7"));
System.out.println("'55baz' ? " + doesMatch("55baz"));
}
public static boolean doesMatch(String input) {
boolean hasAlpha = false,
hasNonAlpha = false;
for(char ch : input.toCharArray()) {
if(ch >= 'a' && ch <= 'z' || ch >= 'A' && ch <= 'Z') {
hasAlpha = true;
} else {
hasNonAlpha = true;
}
if(hasAlpha && hasNonAlpha) {
return true;
}
}
return false;
}
}
Anyone can understand what inputs do match and which inputs don't. If you use regular expressions this wouldn't be so simple.
i'm working on my project in java, in my project i need to get input from some stream an to parse text and make it to generic char-by-char to some other types, one of them is "ValueNumber".
for that I'm using switch case
Now,because Number can start with ' - ' I need to check if the current char is a Digit between 0 to 9 or ' - ' or something else.
My question is how can I make some variable that will hold all the 10th digits by one variable ?
String will hold it, or StringBuilder for better performance and
then you can parse the string and see if it matches the regex:
return str.matches("[-]?[0-9]+");
if true, it is digit with or without negation sign, if false, it is not a digit you described. The digit can be as long as String allows.
I think you're imagining something like:
switch(aChar) {
case '+':
handlePlus();
break;
case ' ':
handleSpace();
break;
case anyOf("-01234567890");
handlePartOfNumber(aChar);
break;
}
Unfortunately in Java, switch is not this sophisticated. switch deals with exact matches only.
You will need to use a series of if/else blocks instead:
if(aChar == '+') {
handlePlus();
} else if(aChar == ' ') {
handleSpace();
} else if(isMinusOrDigit(aChar)) {
handlePartOfNumber(aChar);
}
Now, how do we implement isMinusOrDigit(char c)?
You've asked about "some variable that holds all the digits". Maybe you mean an array, or a List or a Set. I'll choose Set because it's the purest "bag of items, you don't care the order".
private static Set<Character> MINUS_AND_DIGITS = minusAndDigits();
private static Set<Character> minusAndDigits() {
Set<Character> set = new HashSet<>();
for(char c = '0'; c<='9'; c++) {
set.add(c);
}
set.add('-');
}
private static boolean isMinusOrDigit(char c) {
return MINUS_AND_DIGITS.contains(c);
}
You could also use
a List<Character> (again .contains(c))
an array -- char[] (see How can I test if an array contains a certain value?)
or even a String -- return "0123456789-".indexOf(c) != -1;
But in this case, you don't need a "multi value variable" to work out whether a character is a minus or a digit - because the number characters are next to each other in ASCII:
private static boolean isMinusOrDigit(char c) {
return c == '-' || ( c >= 0 && c<=9 );
}
I am trying to write one regular expression for string. Let us say there is a string RBY_YBR where _ represents empty so we can recursively replace the alphabets and _ and the result is RRBBYY_ . There can be two or more alphabet pairs can be formed or something like this also RRR .
Conditions
1). Left or right alphabet should be the same.
2). If there is no _ then the alphabet should be like RRBBYY not RBRBYY or RBYRBY etc.
3). There can be more than one underscore _ .
From regular expression I am trying to find whether the given string can satisfy the regular expression or not by replacing the character with _ to form a pattern of consecutive alphabets
The regular expression which I wrote is
String regEx = "[A-ZA-Z_]";
But this regular expression is failing for RBRB. since there is no empty space to replace the characters and RBRB is also not in a pattern.
How could I write the effective regular expression to solve this.
Ok, as I understand it, a matching string shall either consist only of same characters being grouped together, or must contain at least one underscore.
So, RRRBBR would be invalid, while RRRRBB, RRRBBR_, and RRRBB_R_ would all be valid.
After comment of question creator, additional condition: Every character must occur 0 or 2 or more times.
As far as I know, this is not possible with Regular Expressions, as Regular Expressions are finite-state machines without "storage". You would have to "store" each character found in the string to check that it won't appear later again.
I would suggest a very simple method for verifying such strings:
public static boolean matchesMyPattern(String s) {
boolean withUnderscore = s.contains("_");
int[] found = new int[26];
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
if (ch != '_' && (ch < 'A' || ch > 'Z')) {
return false;
}
if (ch != '_' && i > 0 && s.charAt(i - 1) != ch && found[ch - 'A'] > 0
&& !withUnderscore) {
return false;
}
if (ch != '_') {
found[ch - 'A']++;
}
}
for (int i = 0; i < found.length; i++) {
if (found[i] == 1) {
return false;
}
}
return true;
}
Please take my answer with a grain of salt, since it's a bit of a "Fastest gun in the West" post.
It follows the same assumptions as Florian Albrecht's answer. (thanks)
I believe that this will solve your problem:
(([A-Za-z])(\2|_)+)+
https://regex101.com/r/7TfSVc/1
It works by using the second capturing group and ensuring that more of it follow, or there are underscores.
Known bug: it does not work if an underscore starts a string.
EDIT
This one is better, though I forgot what I was doing by the end of it.
(([A-Za-z_])(\2|_)+|_+[A-Za-z]_*)+
https://regex101.com/r/7TfSVc/4
Hi I want the opposite function of getNumericValue
int i = Character.getNumericValue('A');
if('A' == Character.someFunction(i)){
System.out.println("hooray");
}
I have tried "Character.forDigit" but this seems to be completely wrong.
Am new to java so please help.
The opposite is Character.forDigit
if(Character.forDigit(Character.getNumericValue('b'), Character.MAX_RADIX) == 'b') {
// true!
}
if(Character.forDigit(Character.getNumericValue('B'), Character.MAX_RADIX) == 'b') {
// true!
}
if(Character.getNumericValue('B') == Character.getNumericValue('b')) {
// true!
}
if((int)('B') == (int)'b') {
// false
}
Although given your question I think your looking for the actual ASCII char code for the letter.
Read this Java Character literals value with getNumericValue() post to see more information about Character.getNumericValue
To convert between char and int you can use typecasting. For example:
char myChar = (char) 65;
System.out.println(myChar);
will result in A. Hope that helps!
Character.getNumericValue('A') convert 'A' into unicode representation of the char. Probably you don't want to use that.
The letters A-Z in their uppercase ('\u0041' through '\u005A'),
lowercase ('\u0061' through '\u007A'), and full width variant
('\uFF21' through '\uFF3A' and '\uFF41' through '\uFF5A') forms have
numeric values from 10 through 35. This is independent of the Unicode
specification, which does not assign numeric values to these char
values.