How to determine a string is english or arabic? - java

Is there a way to determine a string is English or Arabic?

Here is a simple logic that I just tried:
public static boolean isProbablyArabic(String s) {
for (int i = 0; i < s.length();) {
int c = s.codePointAt(i);
if (c >= 0x0600 && c <= 0x06E0)
return true;
i += Character.charCount(c);
}
return false;
}
It declares the text as arabic if and only if an arabic unicode code point is found in the text. You can enhance this logic to be more suitable for your needs.
The range 0600 - 06E0 is the code point range of Arabic characters and symbols (See Unicode tables)

Java in itself supports various language checks by unicode, Arabic is also supported. Much simpler and smallest way to do the same is by UnicodeBlock
public static boolean textContainsArabic(String text) {
for (char charac : text.toCharArray()) {
if (Character.UnicodeBlock.of(charac) == Character.UnicodeBlock.ARABIC) {
return true;
}
}
return false;
}

A minor change to cover all arabic characters and symbols range
private boolean isArabic(String text){
String textWithoutSpace = text.trim().replaceAll(" ",""); //to ignore whitepace
for (int i = 0; i < textWithoutSpace.length();) {
int c = textWithoutSpace.codePointAt(i);
//range of arabic chars/symbols is from 0x0600 to 0x06ff
//the arabic letter 'لا' is special case having the range from 0xFE70 to 0xFEFF
if (c >= 0x0600 && c <=0x06FF || (c >= 0xFE70 && c<=0xFEFF))
i += Character.charCount(c);
else
return false;
}
return true;
}

You can usually tell by the code points within the string itself. Arabic occupies certain blocks in the Unicode code space.
It's a fairly safe bet that, if a substantial proportion of the characters exist in those blocks (such as بلدي الحوامات مليء الثعابينة), it's Arabic text.

This answer is somewhat correct. But when we combine Farsi and English letters it returns TRUE!, which is not true.
Here I modified the same method so that it works well
public static boolean isProbablyArabic(String s) {
for (int i = 0; i < s.length();) {
int c = s.codePointAt(i);
if (!(c >= 0x0600 && c <= 0x06E0))
return false;
i += Character.charCount(c);
}
return true;
}

You could use N-gram-based text categorization (google for that phrase) but it is not a fail-proof technique, and it may require a not too short string.
You might also decide that a string with only ASCII letters is not Arabic.

English characters tend to be in these 4 Unicode blocks:
BASIC_LATIN
LATIN_1_SUPPLEMENT
LATIN_EXTENDED_A
GENERAL_PUNCTUATION
public static boolean isEnglish(String text) {
boolean onlyEnglish = false;
for (char character : text.toCharArray()) {
if (Character.UnicodeBlock.of(character) == Character.UnicodeBlock.BASIC_LATIN
|| Character.UnicodeBlock.of(character) == Character.UnicodeBlock.LATIN_1_SUPPLEMENT
|| Character.UnicodeBlock.of(character) == Character.UnicodeBlock.LATIN_EXTENDED_A
|| Character.UnicodeBlock.of(character) == Character.UnicodeBlock.GENERAL_PUNCTUATION) {
onlyEnglish = true;
} else {
onlyEnglish = false;
}
}
return onlyEnglish;
}

Just an adaptation of existing answer to Kotlin:
fun String.textContainsArabic(): Boolean =
any { Character.UnicodeBlock.of(it) == Character.UnicodeBlock.ARABIC }

I tried this with my code and it works fine.
Using codePointAt which is a method that returns the Unicode value of the character at the specified index in a string.
public static boolean isItArabic(String someText)
{
for(int i = 0; i<someText.length(); i++)
{
int point = someText.codePointAt(i);
if(!(point >= 1536 && point <= 1791)) {
return false;
}
}
return true;
}

Try This :
internal static bool ContainsArabicLetters(string text)
{
foreach (char character in text.ToCharArray())
{
if (character >= 0x600 && character <= 0x6ff)
return true;
if (character >= 0x750 && character <= 0x77f)
return true;
if (character >= 0xfb50 && character <= 0xfc3f)
return true;
if (character >= 0xfe70 && character <= 0xfefc)
return true;
}
return false;
}

Related

Registering an invalid piece of data from a pattern

I need to use a six letter word and if it has a sequence of letter number letter number letter number, then it will be a valid piece of data. Otherwise, it will be considered invalid. The problem with my code is, it always runs it as valid. Here is my code:
vstatus=false;
char a=pcode.charAt(0);
char b=pcode.charAt(1);
char c=pcode.charAt(2);
char d=pcode.charAt(3);
char e=pcode.charAt(4);
char f=pcode.charAt(5);
if(!Character.isLetter(a)) vstatus=true;
if(!Character.isDigit(b)) vstatus=true;
if(!Character.isLetter(c)) vstatus=true;
if(!Character.isDigit(d)) vstatus=true;
if(!Character.isLetter(e)) vstatus=true;
if(!Character.isDigit(f)) vstatus=true;
if (vstatus=true)
{
System.out.println(convertUpperCase(pcode)+" is a valid postal code");
}
if (vstatus=false)
{
System.out.println(convertUpperCase(pcode)+" is not a valid postal code");
}
I gues this would be shorter code for your problem:
String pcode = "aza2a3";
String regex = "[A-Za-z]{1}[\\d]{1}[A-Za-z]{1}[\\d]{1}[A-Za-z]{1}[\\d]{1}";
boolean matches = pcode.matches(regex);
System.out.println(matches);
matches is true if your string is in form that you need and false if i does not match your required string
In your code, even if one character is correct and all the others are wrong, you make it true. You need to confirm that all the characters are correct. Use &&.
String pcode = "i8i8i8";
char a=pcode.charAt(0);
char b=pcode.charAt(1);
char c=pcode.charAt(2);
char d=pcode.charAt(3);
char e=pcode.charAt(4);
char f=pcode.charAt(5);
if (Character.isLetter(a) && Character.isDigit(b) && Character.isLetter(c) && Character.isDigit(d) && Character.isLetter(e) && Character.isDigit(f)) {
System.out.println("valid");
} else {
System.out.println("not a valid postal code");
}
Or, in a loop:
String pcode = "i8i8i8";
boolean flag = true;
for (int i = 0; i < pcode.length(); i++) {
if (!(i % 2 == 0 && Character.isLetter(pcode.charAt(i)) || Character
.isDigit(pcode.charAt(i)))) {
flag = false;
}
}
if (flag) { System.out.println("valid"); }
else { System.out.println("not valid"); }
Ideally, extract the code into a method:
static boolean isPcodeValid(String s) {
for (int i = 0; i < s.length(); i++) {
if (!(i % 2 == 0 && Character.isLetter(s.charAt(i)) || Character
.isDigit(s.charAt(i)))) {
return false;
}
}
return true;
}
Or, use some Java8+ features:
static boolean isPcodeValid(String s) {
return IntStream.range(0, s.length())
.allMatch(i -> i % 2 == 0 && Character.isLetter(s.charAt(i)) || Character
.isDigit(s.charAt(i)));
}
Finally,
You may use this regex:
String r = [A-Za-z][\d][A-Za-z][\d][A-Za-z][\d]
if (pcode.matches(r)) {
// valid
} else {
// invalid
}
5 different ways, choose the one that suits you the best.

Method to Check Password in Java Not Working

I'm trying to write a method that returns if the string is or isn't a valid password in CodeHS.
It needs to be at least eight characters long and can only have letters and digits.
In the grader, it passes every test except for passwordCheck("codingisawesome") and passwordCheck("QWERTYUIOP").
Here's what I have so far:
public boolean passwordCheck(String password)
{
if (password.length() < 8)
{
return false;
}
else
{
char c;
int count = 0;
for (int i = 0; i < password.length(); i++)
{
c = password.charAt(i);
if (!Character.isLetterOrDigit(c))
{
return false;
} else if (Character.isDigit(c))
{
count++;
}
}
if (count < 2)
{
return false;
}
}
return true;
}
If anyone can help, I'd appreciate it. Thanks.
Try an approach using patterns (this is simpler than looping):
public boolean passwordCheck(String password)
{
return password!=null && password.length()>=8 && password.matches("[A-Za-z0-9]*");
}
Decent tutorial on regular expressions (that's where the A-Z magic comes from): http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Assuming your requirement is as stated
It needs to be at least eight characters long and can only have letters and digits
Then there is no need to count digits. Simply check that the password is the minimum length, then loop over every character returning false if any are not a letter or digit. Like,
public boolean passwordCheck(String password) {
if (password != null && password.length() >= 8) {
for (char ch : password.toCharArray()) {
if (!Character.isLetterOrDigit(ch)) {
return false;
}
}
return true;
}
return false;
}
It's failing those tests because your code checks that the password must have at least 2 digits:-
if (count < 2)
{
return false;
}
And your test strings don't have any. Remove this piece of code and it should work. For a better way of doing it, see other answers.

Trying to return true if all the letters in a string are the same

What I have so far:
public boolean allSameLetter(String str)
{
for (int i = 1; i < str.length(); i++)
{
int charb4 = i--;
if ( str.charAt(i) != str.charAt(charb4))
{
return false;
}
if ( i == str.length())
{
return true;
}
}
}
Please excuse any inefficiencies if any; still relatively new to coding in general. Am I lacking some knowledge in terms of using operators and .charAt() together? Is it illogical? Or is my error elsewhere?
Using regex:
return str.matches("^(.)\\1*$");
Using streams:
str.chars().allMatch(c -> c == str.charAt(0));
Other:
return str.replace(String.valueOf(str.charAt(0), "").length() == 0;
You can follow the below steps:
(1) Get the first character (i.e., 0th index)
(2) Check the first character is the same with subsequent characters, if not return false (and comes out from method)
(3) If all chars match i.e., processing goes till the end of the method and returns true
public boolean allSameLetter(String str) {
char c1 = str.charAt(0);
for(int i=1;i<str.length;i++) {
char temp = str.charAt(i);
if(c1 != temp) {
//if chars does NOT match,
//just return false from here itself,
//there is no need to verify other chars
return false;
}
}
//As it did NOT return from above if (inside for)
//it means, all chars matched, so return true
return true;
}
As Andrew said, you are decreasing i within your for loop. You can fix this by changing it to int charb4 = i - 1;. As for making your code more efficient you could condense it down to this.
public boolean allSameLetter(String str) {
for(char c : str.toCharArray())
if(c != str.charAt(0)) return false;
return true;
}
Comment if you don't understand a part of it :)
public boolean allSameLetter(String str)
{
for (int i = 1; i < str.length() -1; i++)
{
if ( str.charAt(i) != str.charAt(i+1))
{
return false;
}
}
return true
}
-1 is there since I am checking the current value in the array, then the next value in the array, thus I need to stop a place earlier.
If the loop if statement is never entered, it will make it far enough into the code to return true
You have to create a for loop that searches through the length of the String - 1. this way the program will not crash because of a 3 letter word with the program trying to get the 4th letter. This is what works for me:
public boolean allSameLetter(String str)
{
for(int i = 0; i< str.length()-1; i++){
if (str.charAt(i) != str.charAt(i+1)){
return false;
}
}
return true;
}
if((new HashSet<Character>(Arrays.asList(s.toCharArray()))).size()==1)
return true;
return false;
This should be enough
The bug is caused by
int charb4 = i--;
this line is equal to
int charb4 = i-1;
i=i-1;
Because of this, your loop will never stop.
The easiest way to fix this
public boolean allSameLetter(String str)
{
for (int i = 1; i < str.length(); i++)
{
if ( str.charAt(i) != str.charAt(i-1))
{
return false;
}
}
}

Check a string for consonants

I want to write a method to check a string for consonants using either .contains or .indexOf.
I guess I could do it the long way and check for every consonant in the alphabet but I know there is a better way. This is what I have so far but like I said this is sort of the long way, I think.
public boolean containsConsonant(String searchString) {
if(searchString.contains("b") || searchString.contains("c")){
return true;
}
I think a simple for loop is most readable here, you can test that a character is within the desired range with a boolean and. And you can use an or test to skip vowels. Something like,
public boolean containsConsonant(String searchString) {
if (searchString == null) {
return false;
}
for (char ch : searchString.toCharArray()) {
char lower = Character.toLowerCase(ch);
if (lower >= 'a' && lower <= 'z') {
if (lower == 'a' || lower == 'e' || lower == 'i' ||
lower == 'o' || lower == 'u') continue;
return true;
}
}
return false;
}
Optimization
You could then optimize the above (and directly to your question) by using contains on an extracted constant String of vowels. Something like,
private static final String vowels = "aeiou";
public static boolean containsConsonant(final String searchString) {
if (searchString == null) {
return false;
}
for (char ch : searchString.toCharArray()) {
char lower = Character.toLowerCase(ch);
if (lower >= 'a' && lower <= 'z' && !vowels.contains(String.valueOf(lower))) {
return true;
}
}
return false;
}
I see that you explicitly ask using contains or indexOf
in case - if you can use matches - it would be very easy to implement.
public boolean containsConsonant(String searchString){
String consonants = ".*[bcdfghj].*"; //list the characters to be checked
return searchString.matches(consonants);
}
you can create say an array containing all consonants and then run it using a loop
e.g.
String[] consonants{"b", "c",....}
boolean containsConsonants(String searchString, String[]arr){
for (String consonant: arr){
if(searchString.contains(str)){ return true} return False

Writing a method to remove vowels in a Java String [duplicate]

This question already has answers here:
How do I compare strings in Java?
(23 answers)
Closed 9 years ago.
I am a beginner of programming, and am writing a Java method to remove vowel in Strings, but I do not know how to fix this error: ";" expected :
public String disemvowel(String s) {
boolean isVowel(char c);
if (c == 'a') {
return true;
} else if if (c == 'e') {
return true;
} else if if (c == 'i') {
return true;
} else if if (c == 'o') {
return true;
} else if if (c == 'u') {
return true;
}
String notVowel = "";
int l = s.length();
for (int z = 0; z <= l; z++) {
if (isVowel == "false") {
char x = s.charAt(z);
notVowel = notVowel + x;
}
}
return notVowel;
}
String str= "Your String";
str= str.replaceAll("[AEIOUaeiou]", "");
System.out.println(str);
A much simpler approach would be to do the following:
String string = "A really COOL string";
string = string.replaceAll("[AaEeIiOoUu]", "");
System.out.println(string);
This will apply the regular expression, [AaEeIiOoUu] to string. This expression will match all vowels in the character group [AaEeIiOoUu] and replace them with "" empty string.
You've got a lot of syntax errors.
boolean isVowel(char c); - not sure what you're doing with this. if you want it as a separate method, separate it out (and don't place a semicolon after it, which would be invalid syntax.
else if if is invalid syntax. If you're doing an else if, then you only need the one if.
Even if the code would compile, for (int z = 0; z <= l; z++) will cause you to step off of the String. Remove the <= in favor of <.
isVowel == "false" is never going to work. You're comparing a String to a boolean. You want !isVowel instead.
Putting the syntax errors aside, think of it like this.
You have a string that contains vowels. You wish to have a string that doesn't contain vowels.
The most straightforward approach is to iterate over the String, placing all non-vowel characters into a separate String, which you then return.
Interestingly enough, the half-method you have there can accomplish the logic of determining whether something is or isn't a vowel. Extract that to its own method. Then, call it in your other method. Do take into account capital letters though.
I leave the rest as an exercise to the reader.
Here is your code, without changing any logic, but unscrambling the isVowel method:
public String disemvowel(String s) {
// Removed the "isVowel" method from here and moved it below
String notVowel = "";
int l = s.length();
for (int z = 0; z <= l; z++) {
// Note that the "isVowel" method has not been called.
// And note that, when called, isVowel returns a boolean, not a String.
// (And note that, as a general rule, you should not compare strings with "==".)
// So this area needs a lot of work, but we'll start with this
boolean itIsAVowel = isVowel(s.charAt(z));
// (I made the variable name "itIsAVowel" to emphasize that it's name has nothing to do with the method name.
// You can make it "isVowel" -- the same as the method -- but that does not in any way change the function.)
// Now take it from there...
if (isVowel == "false") {
char x = s.charAt(z);
notVowel = notVowel + x;
}
}
return notVowel;
}
// You had this line ending with ";"
boolean isVowel(char c) {
if (c == 'a') {
return true;
// Note that you coded "if if" on the lines below -- there should be only one "if" per line, not two
} else if if (c == 'e') {
return true;
} else if if (c == 'i') {
return true;
} else if if (c == 'o') {
return true;
} else if if (c == 'u') {
return true;
}
// You were missing this final return
return false;
}
(Yes, I know this should be a comment, but you can't put formatted code in a comment.)
You could try something like this:
public static String removeVowels(final String string){
final String vowels = "AaEeIiOoUu";
final StringBuilder builder = new StringBuilder();
for(final char c : string.toCharArray())
if(vowels.indexOf(c) < 0)
builder.append(c);
return builder.toString();
}

Categories

Resources