counting alphabetic characters between 'a' and 'z' - java

I came across a code which checks whether a character is between 'a' and 'z' case insensitive. However, I don't understand what the line after that is doing which is:
alphabets[c - 'a']++;
Could someone please explain this code to me?
alphabets = new int[26];
for (int i = 0; i < str.length(); i++)
{
char c = str.charAt(i);
if ('a' <= c && c <= 'z')
{
alphabets[c - 'a']++; // what does this do?
}
}

This code counts the number of times every lower-case letter appears in the strings. alphabets is an array where the first (i.e., index 0) index holds the number of as, the second the amount of bs, etc.
Subtracting a from the character will produce the relative index, and then ++ will increment the counter for that letter.

A char in Java is just a small integer, 16 bits wide. Generally speaking, the values it holds are the values that Unicode [aside: Java does not represent characters as "ASCII"] assigns to characters, but fundamentally, chars are just integers. Thus 'a' is the integer 0x0061, which can also be written as 97.
So, if you have value in the range 'a' to 'z', you have a value in the range 97 to 122. Subracting 'a' (subtracting 97) puts it in the range 0 to 25, which is suitable for indexing the 26-element array alphabets.

Related

What is "charAt(i) - 'a'" means in Trie structure? [duplicate]

This question already has answers here:
Java: Subtract '0' from char to get an int... why does this work?
(10 answers)
Closed 3 years ago.
I am reading about the search function which checks the Trie data structure, but I don't understand why the code subtract the character a to get the index. Can anyone help? Thanks in advance!
// Returns true if key presents in trie, else false
static boolean search(String key)
{
int level;
int length = key.length();
int index;
TrieNode pCrawl = root;
for (level = 0; level < length; level++)
{
index = key.charAt(level) - 'a';
if (pCrawl.children[index] == null)
return false;
pCrawl = pCrawl.children[index];
}
return (pCrawl != null && pCrawl.isEndOfWord);
}
Assuming key contains only lower case English letters, key.charAt(i) = 'a' maps each lower case letter to an index between 0 (for 'a') and 25 (for 'z').
The children array probably has a length of 26, and each element of that array corresponds with a latter between 'a' and 'z'.
In java whenever we subtract a character from another character it converts both characters into ascii code and return their subtraction like:- ascii code of a is 97 & ascii code of b is 98 ( 'b' - 'a' ) will return 1
In your code when you will pass string in this method it will return subtraction of 'a' from each character of string
char variables are actually integral, reflecting the Unicode value of the corresponding char. 'a' is thus in fact 97; 'b' is 98 etc. Subtracting 97 from a character will translate characters between 'a' and 'z' to numbers between 0 and 25.

How do I get the numerical value/position of a character in the alphabet (1-26) in constant time (O(1)) without using any built in method or function?

How do I get the numerical value/position of a character in the alphabet (1-26) in constant time (O(1)) without using any built in method or function and without caring about the case of the character?
If your compiler supports binary literals you can use
int value = 0b00011111 & character;
If it does not, you can use 31 instead of 0b00011111 since they are equivalent.
int value = 31 & character;
or if you want to use hex
int value = 0x1F & character;
or in octal
int value = 037 & character;
You can use any way to represent the value 31.
This works because in ASCII, undercase values are prefixed with 011, and uppercase 010 and then the binary equivalent of 1-26.
By using the bitmask of 00011111 and the AND operand, we covert the 3 most significant bits to zeros. This leaves us with 00001 to 11010, 1 to 26.
Adding to the very good (self) answer of Charles Staal.
Assuming ascii encoding following will work. Updated from the kind comment of Yves Daoust
int Get1BasedIndex(char ch) {
return ( ch | ('a' ^ 'A') ) - 'a' + 1;
}
This will make the character uppercase and change the index.
However a more readable solution (O(1)) is:
int Get1BasedIndex(char ch) {
return ('a' <= ch && ch <= 'z') ? ch - 'a' + 1 : ch - 'A' + 1;
}
One more solution that is constant time but requires some extra memory is:
static int cha[256];
static void init() {
int code = -1;
fill_n (&cha[0], &cha[256], code);
code = 1;
for(char s = 'a', l = 'A'; s <= 'z'; ++s, ++l) {
cha[s] = cha[l] = code++;
}
}
int Get1BasedIndex(char ch) {
return cha[ch];
}
We can get their ASCII values and then subtract from the starting character ASCII(a - 97, A - 65)
char ch = 'a';
if(ch >=65 && ch <= 90)//if capital letter
System.out.println((int)ch - 65);
else if(ch >=97 && ch <= 122)//if small letters
System.out.println((int)ch - 97);
Strictly speaking it is not possible to do it portably in C/C++ because there is no guarantee on the ordering of the characters.
This said, with a contiguous sequence, Char - 'a' and Char - 'A' obviously give you the position of a lowercase or uppercase letter, and you could write
Ord= 'a' <= Char && Char <= 'z' ? Char - 'a' :
('A' <= Char && Char <= 'Z' ? Char - 'A' : -1);
If you want to favor efficiency over safety, exploit the binary representation of ASCII codes and use the branchless
#define ToUpper(Char) (Char | 0x20)
Ord= ToUpper(Char) - 'a';
(the output for non-letter character is considered unspecified).
Contrary to the specs, these snippets return the position in range [0, 25], more natural with zero-based indexing languages.

charvariable=(char)(charvariable+3) what does this syntax mean?

I have been looking around on the internet at caesar ciphers and while I understand the loop I don't understand why this line of code is able to shift a char to another char? I don't understand this line here:
letter = (char)(letter - 26);
When I take (char) out it doesn't work and I have never seen it with the type being in parentheses followed by an operation.
Hopefully this is an easy question and thanks for the help.
for (int i = 0; i < buffer.Length; i++)
{
// Letter.
char letter = buffer[i];
// Add shift to all.
letter = (char)(letter + shift);
// Subtract 26 on overflow.
// Add 26 on underflow.
if (letter > 'z')
{
//The following line is the line I don't understand. Why char in parentheses then another parentheses?
letter = (char)(letter - 26);
}
else if (letter < 'a')
{
letter = (char)(letter + 26);
}
// Store.
buffer[i] = letter;
}
(char) is a cast. That means that it takes a value which is of one type, and converts it to a value of another type. Thus, if x is an int, (double)x yields a double whose value is the same value as the integer value.
The reason (char) is necessary in this expression is that Java does all its integer arithmetic on values of type int or long. So even though letter is a char, in the expression letter + 26, letter will be automatically converted to an int, and then 26 is added to the integer. (char) converts it back to a char type (which is an integer value from 0 to 65535). Java will not automatically convert a larger integer type (int, whose values are from -2147483648 to 2147483647) to a shorter integer type (char), therefore it's necessary to use a cast.
However, Java does allow this:
letter += 26;
which has the same effect, and does not require a cast.
There are 26 letters in the english alphabet, and char is an integral type
char ch = 'Z' - 25;
System.out.println(ch); // <-- A
JLS-4.2.1 - Integral Types and Values says (in part),
For char, from '\u0000' to '\uffff' inclusive, that is, from 0 to 65535

Character subtraction in String

Here is the code snippet i am trying to figure out whats happening
String line = "Hello";
for (int i = 0; i < line.length(); i++) {
char character = line.charAt(i);
int srcX = 0;
if (character == '.') {
}else{
srcX = (character - '0') * 20;
System.out.println("Character is " + (character - '0') +" " + srcX);
}
}
and executing that code will result to this
Character is 24 480
Character is 53 1060
Character is 60 1200
Character is 60 1200
Character is 63 1260
How a character minus the string which is "0" result in integer?? and where does the system base its answer to have 24,53,60,60,63?
You are allowed to subtract characters because char is an integer type.
The value of a character is the value of its codepoint (more or less, the details are tricky due to Unicode and UTF-16 and all that).
When you subtract the character '0' from another character, you are essentially subtracting 48, the code point of the character DIGIT ZERO.
So, for example, something like '5' - '0' would evaluate to 53 - 48 = 5. You commonly see this pattern when "converting" strings containing digits to numeric values. It is not common to subtract '0' from a character like 'H' (whose codepoint is 72), but it is possible and Java does not care. It simply treats characters like integers.
http://www.ascii.cl/
'0' is 48 in ascii
'H' is 72.
Therefore 72-48 gives you 24

Conversion from ASCII values to Char

String source = "WEDGEZ"
char letter = source.charAt(i);
shift=5;
for (int i=0;i<source.length();i++){
if (source.charAt(i) >=65 && source.charAt(i) <=90 )
letterMix =(char)(('D' + (letter - 'D' + shift) % 26));
}
Ok what I'm trying to do is take the string WEDGEZ, and shift each letter by 5, so W becomes B and E becomes J, etc. However I feel like there is some inconsistency with the numbers I'm using.
For the if statement, I'm using ASCII values, and for the
letterMix= statement, I'm using the numbers from 1-26 (I think). Well actually, the question is about that too:
What does
(char)(('D' + (letter - 'D' + shift) % 26)); return anyway? It returns a char right, but converted from an int. I found that statement online somewhere I didn't compose it entirely myself so what exactly does that statement return.
The general problem with this code is that for W it returns '/' and for Z it returns _, which I'm guessing means it's using the ASCII values. I really dont know how to approach this.
Edit: New code
for (int i=0;i<source.length();i++)
{
char letter = source.charAt(i);
letterMix=source.charAt(i);
if (source.charAt(i) >=65 && source.charAt(i) <=90 ){
letterMix=(char)('A' + ( ( (letter - 'A') + input ) % 26));
}
}
Well I'm not sure if this homework, so i'll be stingy with the Code.
You're Writing a Caesar Cipher with a shift of 5.
To address your Z -> _ problem...I'm Assuming you want all the letters to be changed into encoded letters (and not weird Symbols). The problem is ASCII values of A-Z lie between 65 and 90.
When coding Z (for eg), you end up adding 5 to it, which gives u the value 95 (_).
What you need to do is Wrap around the available alphabets. First isolate, the relative position of the character in the alphabets (ie A = 0, B = 1 ...) You Need to subtract 65 (which is ASCII of A. Add your Shift and then apply modulus 26. This will cause your value to wrap around.
eg, it your encoding Z, (ASCII=90), so relative position is 25 (= 90 - 65).
now, 25 + 5 = 30, but you need the value to be within 26. so you take modulus 26
so 30 % 26 is 4 which is E.
So here it is
char letter = message(i);
int relativePosition = letter - 'A'; // 0-25
int encode = (relativePosition + shift) % 26
char encodedChar = encode + 'A' // convert it back to ASCII.
So in one line,
char encodedChar = 'A' + ( ( (letter - 'A') + shift ) % 26)
Note, This will work only for upper case, if your planning to use lower case, you'll need some extra processing.
You can use Character.isUpperCase() to check for upper case.
You can try this code for convert ASCII values to Char
class Ascii {
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
char ch=sc.next().charAt(0);
if(ch==' ') {
int in=ch;
System.out.println(in);
}
}
}

Categories

Resources