What is "charAt(i) - 'a'" means in Trie structure? [duplicate]

What is "charAt(i) - 'a'" means in Trie structure? [duplicate] - java

This question already has answers here:
Java: Subtract '0' from char to get an int... why does this work?
(10 answers)
Closed 3 years ago.
I am reading about the search function which checks the Trie data structure, but I don't understand why the code subtract the character a to get the index. Can anyone help? Thanks in advance!
// Returns true if key presents in trie, else false
static boolean search(String key)
{
int level;
int length = key.length();
int index;
TrieNode pCrawl = root;
for (level = 0; level < length; level++)
{
index = key.charAt(level) - 'a';
if (pCrawl.children[index] == null)
return false;
pCrawl = pCrawl.children[index];
}
return (pCrawl != null && pCrawl.isEndOfWord);
}

Assuming key contains only lower case English letters, key.charAt(i) = 'a' maps each lower case letter to an index between 0 (for 'a') and 25 (for 'z').
The children array probably has a length of 26, and each element of that array corresponds with a latter between 'a' and 'z'.

In java whenever we subtract a character from another character it converts both characters into ascii code and return their subtraction like:- ascii code of a is 97 & ascii code of b is 98 ( 'b' - 'a' ) will return 1
In your code when you will pass string in this method it will return subtraction of 'a' from each character of string

char variables are actually integral, reflecting the Unicode value of the corresponding char. 'a' is thus in fact 97; 'b' is 98 etc. Subtracting 97 from a character will translate characters between 'a' and 'z' to numbers between 0 and 25.

Related

counting alphabetic characters between 'a' and 'z'

I came across a code which checks whether a character is between 'a' and 'z' case insensitive. However, I don't understand what the line after that is doing which is:
alphabets[c - 'a']++;
Could someone please explain this code to me?
alphabets = new int[26];
for (int i = 0; i < str.length(); i++)
{
char c = str.charAt(i);
if ('a' <= c && c <= 'z')
{
alphabets[c - 'a']++; // what does this do?
}
}

This code counts the number of times every lower-case letter appears in the strings. alphabets is an array where the first (i.e., index 0) index holds the number of as, the second the amount of bs, etc.
Subtracting a from the character will produce the relative index, and then ++ will increment the counter for that letter.

A char in Java is just a small integer, 16 bits wide. Generally speaking, the values it holds are the values that Unicode [aside: Java does not represent characters as "ASCII"] assigns to characters, but fundamentally, chars are just integers. Thus 'a' is the integer 0x0061, which can also be written as 97.
So, if you have value in the range 'a' to 'z', you have a value in the range 97 to 122. Subracting 'a' (subtracting 97) puts it in the range 0 to 25, which is suitable for indexing the 26-element array alphabets.

How do I get the numerical value/position of a character in the alphabet (1-26) in constant time (O(1)) without using any built in method or function?

How do I get the numerical value/position of a character in the alphabet (1-26) in constant time (O(1)) without using any built in method or function and without caring about the case of the character?

If your compiler supports binary literals you can use
int value = 0b00011111 & character;
If it does not, you can use 31 instead of 0b00011111 since they are equivalent.
int value = 31 & character;
or if you want to use hex
int value = 0x1F & character;
or in octal
int value = 037 & character;
You can use any way to represent the value 31.
This works because in ASCII, undercase values are prefixed with 011, and uppercase 010 and then the binary equivalent of 1-26.
By using the bitmask of 00011111 and the AND operand, we covert the 3 most significant bits to zeros. This leaves us with 00001 to 11010, 1 to 26.

Adding to the very good (self) answer of Charles Staal.
Assuming ascii encoding following will work. Updated from the kind comment of Yves Daoust
int Get1BasedIndex(char ch) {
return ( ch | ('a' ^ 'A') ) - 'a' + 1;
}
This will make the character uppercase and change the index.
However a more readable solution (O(1)) is:
int Get1BasedIndex(char ch) {
return ('a' <= ch && ch <= 'z') ? ch - 'a' + 1 : ch - 'A' + 1;
}
One more solution that is constant time but requires some extra memory is:
static int cha[256];
static void init() {
int code = -1;
fill_n (&cha[0], &cha[256], code);
code = 1;
for(char s = 'a', l = 'A'; s <= 'z'; ++s, ++l) {
cha[s] = cha[l] = code++;
}
}
int Get1BasedIndex(char ch) {
return cha[ch];
}

We can get their ASCII values and then subtract from the starting character ASCII(a - 97, A - 65)
char ch = 'a';
if(ch >=65 && ch <= 90)//if capital letter
System.out.println((int)ch - 65);
else if(ch >=97 && ch <= 122)//if small letters
System.out.println((int)ch - 97);

Strictly speaking it is not possible to do it portably in C/C++ because there is no guarantee on the ordering of the characters.
This said, with a contiguous sequence, Char - 'a' and Char - 'A' obviously give you the position of a lowercase or uppercase letter, and you could write
Ord= 'a' <= Char && Char <= 'z' ? Char - 'a' :
('A' <= Char && Char <= 'Z' ? Char - 'A' : -1);
If you want to favor efficiency over safety, exploit the binary representation of ASCII codes and use the branchless
#define ToUpper(Char) (Char | 0x20)
Ord= ToUpper(Char) - 'a';
(the output for non-letter character is considered unspecified).
Contrary to the specs, these snippets return the position in range [0, 25], more natural with zero-based indexing languages.

Why does this lead to an ArrayIndexOutOfBoundsException?

There is something that doesn't quite make sense to me. Why does this:
public static int[] countNumbers(String n){
int[] counts = new int[10];
for (int i = 0; i < n.length(); i++){
if (Character.isDigit(n.charAt(i)))
counts[n.charAt(i)]++;
}
return counts;
}
bring up an ArrayOutOfBounds error while this:
public static int[] countNumbers(String n){
int[] counts = new int[10];
for (int i = 0; i < n.length(); i++){
if (Character.isDigit(n.charAt(i)))
counts[n.charAt(i) - '0']++;
}
return counts;
}
does not? The only difference between the two examples is that the index for counts is being subtracted by zero in the second example. If I'm not mistake, shouldn't the first example display correctly since the same value is being checked?
Here are the value being passed for the two methods:
System.out.print("Enter a string: ");
String phone = input.nextLine();
//Array that invokes the count letter method
int[] letters = countLetters(phone.toLowerCase());
//Array that invokes the count number method
int[] numbers = countNumbers(phone);

This is the problem:
counts[n.charAt(i)]++;
n.charAt(i) is a character, which will be converted to an integer. So '0' is actually 48, for example... but your array only has 10 elements.
Note that the working version isn't subtracting 0 - it's subtracting '0', or 48 when converted to an int.
So basically:
Character UTF-16 code unit UTF-16 code unit - '0'
'0' 48 0
'1' 49 1
'2' 50 2
'3' 51 3
'4' 52 4
'5' 53 5
'6' 54 6
'7' 55 7
'8' 56 8
'9' 67 9
The code is still broken for non-ASCII digits though. As it can only handle ASCII digits, it would be better to make that explicit:
for (int i = 0; i < n.length(); i++){
char c = n.charAt(i);
if (c >= '0' && c <= '9') {
counts[c - '0']++;
}
}

'0' is quite different from 0. '0' is the code of the "zero" character.

problem is in the line counts[n.charAt(i)]. here n.charat(i) may return values larger than 9;

The confusion here is that you're thinking '0' == 0. This is not true. When treated as a number, '0' has the ASCII value for the character 0, which is 48.

Because n.charAt(i) returns a character which is then boxed to a number. In this case the character 0 is actually ASCII value 48.
By subtracting character '0', you are subtracting the value 48 and taking the index into a range that is 0-9 because you've checked the character is a valid digit.

Why doesn't my compare work between char and int in Java?

char c = '0';
int i = 0;
System.out.println(c == i);
Why does this always returns false?

Although this question is very unclear, I am pretty sure the poster wants to know why this prints false:
char c = '0';
int i = 0;
System.out.println(c == i);
The answer is because every printable character is assigned a unique code number, and that's the value that a char has when treated as an int. The code number for the character 0 is decimal 48, and obviously 48 is not equal to 0.
Why aren't the character codes for the digits equal to the digits themselves? Mostly because the first few codes, especially 0, are too special to be used for such a mundane purpose.

The char c = '0' has the ascii code 48. This number is compared to s, not '0'. If you want to compare c with s you can either do:
if(c == s) // compare ascii code of c with s
This will be true if c = '0' and s = 48.
or
if(c == s + '0') // compare the digit represented by c
// with the digit represented by s
This will be true if c = '0' and s = 0.

The char and int value can not we directly compare we need to apply casting. So need to casting char to string and after string will pars into integer
char c='0';
int i=0;
Answer is like
String c = String.valueOf(c);
System.out.println(Integer.parseInt(c) == i)
It will return true;
Hope it will help you
Thanks

You're saying that s is an Integer and c (from what I see) is a Char.. so there you, that's the problem: Integer vs. Char comparation.

Conversion from ASCII values to Char

String source = "WEDGEZ"
char letter = source.charAt(i);
shift=5;
for (int i=0;i<source.length();i++){
if (source.charAt(i) >=65 && source.charAt(i) <=90 )
letterMix =(char)(('D' + (letter - 'D' + shift) % 26));
}
Ok what I'm trying to do is take the string WEDGEZ, and shift each letter by 5, so W becomes B and E becomes J, etc. However I feel like there is some inconsistency with the numbers I'm using.
For the if statement, I'm using ASCII values, and for the
letterMix= statement, I'm using the numbers from 1-26 (I think). Well actually, the question is about that too:
What does
(char)(('D' + (letter - 'D' + shift) % 26)); return anyway? It returns a char right, but converted from an int. I found that statement online somewhere I didn't compose it entirely myself so what exactly does that statement return.
The general problem with this code is that for W it returns '/' and for Z it returns _, which I'm guessing means it's using the ASCII values. I really dont know how to approach this.
Edit: New code
for (int i=0;i<source.length();i++)
{
char letter = source.charAt(i);
letterMix=source.charAt(i);
if (source.charAt(i) >=65 && source.charAt(i) <=90 ){
letterMix=(char)('A' + ( ( (letter - 'A') + input ) % 26));
}
}

Well I'm not sure if this homework, so i'll be stingy with the Code.
You're Writing a Caesar Cipher with a shift of 5.
To address your Z -> _ problem...I'm Assuming you want all the letters to be changed into encoded letters (and not weird Symbols). The problem is ASCII values of A-Z lie between 65 and 90.
When coding Z (for eg), you end up adding 5 to it, which gives u the value 95 (_).
What you need to do is Wrap around the available alphabets. First isolate, the relative position of the character in the alphabets (ie A = 0, B = 1 ...) You Need to subtract 65 (which is ASCII of A. Add your Shift and then apply modulus 26. This will cause your value to wrap around.
eg, it your encoding Z, (ASCII=90), so relative position is 25 (= 90 - 65).
now, 25 + 5 = 30, but you need the value to be within 26. so you take modulus 26
so 30 % 26 is 4 which is E.
So here it is
char letter = message(i);
int relativePosition = letter - 'A'; // 0-25
int encode = (relativePosition + shift) % 26
char encodedChar = encode + 'A' // convert it back to ASCII.
So in one line,
char encodedChar = 'A' + ( ( (letter - 'A') + shift ) % 26)
Note, This will work only for upper case, if your planning to use lower case, you'll need some extra processing.
You can use Character.isUpperCase() to check for upper case.

You can try this code for convert ASCII values to Char
class Ascii {
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
char ch=sc.next().charAt(0);
if(ch==' ') {
int in=ch;
System.out.println(in);
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What is "charAt(i) - 'a'" means in Trie structure? [duplicate] - java

Assuming key contains only lower case English letters, key.charAt(i) = 'a' maps each lower case letter to an index between 0 (for 'a') and 25 (for 'z'). The children array probably has a length of 26, and each element of that array corresponds with a latter between 'a' and 'z'.

char variables are actually integral, reflecting the Unicode value of the corresponding char. 'a' is thus in fact 97; 'b' is 98 etc. Subtracting 97 from a character will translate characters between 'a' and 'z' to numbers between 0 and 25.

Related

counting alphabetic characters between 'a' and 'z'

How do I get the numerical value/position of a character in the alphabet (1-26) in constant time (O(1)) without using any built in method or function?

Why does this lead to an ArrayIndexOutOfBoundsException?

Why doesn't my compare work between char and int in Java?

Conversion from ASCII values to Char

Categories

Resources