Too many characters in character literal error - java

I'm creating a stylish text app but on some places I'm getting an error ("Too many characters in character literal"). I am writing only one letter but when I paste it converts into many letters like this: "\uD83C\uDD89" and the original letter is "🆉".
Please tell me how to write this in a correct way.
for (int charOne = 0; charOne <= strBld.length() - 1; charOne++) {
char a = strBld.charAt(charOne);
char newCh = getSpecialCharEighth(a);
strBld.setCharAt(charOne, newCh);
}
private char getSpecialCharEighth(char a) {
char ch = a;
if (ch == 'Z' || ch == 'z') {
ch = '\uD83C\uDD89';
}
return ch;
}

A Java char stores a 16-bit value, i.e. can store 65536 different values. There are currently 137929 characters in Unicode (12.1).
To handle this, Java strings are stored in UTF-16, which is a 16-bit encoding. Most Unicode characters, known as code points, are stored in a single 16-bit value. Some are stored in a pair of 16-bit values, known as surrogate pairs.
This means that a Unicode character may be stored as 2 char "characters" in Java, which means that if you want your code to have full Unicode character support, you can't store a Unicode character in a single char value.
They can be stored in an int variable, where the value is then referred to as a code point in Java. It is however often easier to store them as a String.
In your case, you seem to be replacing Unicode characters, so a regex replacement call might be better, e.g.
s = s.replaceAll("[Zz]", "\uD83C\uDD89");
// Or like this if source file is UTF-8
s = s.replaceAll("[Zz]", "🆉");
UPDATE
If you want to keep a method for determining the replacement value, you could do this:
s = Pattern.compile(".").matcher(s).replaceAll​(mr -> getSpecialCharEighth(mr.group()));
private static String getSpecialCharEighth(String s) {
int cp = s.codePointAt(0);
if (cp >= 'A' && cp <= 'Z')
return Character.toString​(cp - 'A' + 0x1f170); // "🅰" - "🆉"
if (cp >= 'a' && cp <= 'z')
return Character.toString​(cp - 'a' + 0x1f170); // "🅰" - "🆉"
return s;
}
Note: replaceAll​(replacer) is Java 9+ and Character.toString(codePoint) is Java 11+.
UPDATE 2
Since question is tagged android, Java 9 and Java 11 APIs are not available, so here is Java 7+ solution.
StringBuffer buf = new StringBuffer(s.length() + 16);
Matcher m = Pattern.compile(".").matcher(s);
while (m.find())
m.appendReplacement(buf, getSpecialCharEighth(m.group()));
s = m.appendTail(buf).toString();
private static String getSpecialCharEighth(String s) {
int cp = s.codePointAt(0);
if (cp >= 'A' && cp <= 'Z')
return new String(new int[] { cp - 'A' + 0x1f170 }, 0, 1);
if (cp >= 'a' && cp <= 'z')
return new String(new int[] { cp - 'a' + 0x1f170 }, 0, 1);
return s;
}
Result with s = "Hello World!"
🅷🅴🅻🅻🅾 🆆🅾🆁🅻🅳!

You can't do that with char data type. Use String instead.
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).

Related

Caesar Cipher decryption not working for non-alphabetical characters

I'm completely new to programming and have been tasked with writing a method in Java to decrypt a message encrypted using a Caesar Cipher (without importing any utilities).
The following code was provided for encrypting a message:
public String encrypt(String plainText, int offset) {
String cipher = "";
char[] arr = plainText.toCharArray();
for (int i = 0; i < arr.length; i++) {
int numericalVal = (int) arr[i];
if (Character.isUpperCase(arr[i])) {
cipher += (char) (((numericalVal + offset - 65) % 26) + 65);
} else if (numericalVal == 32) {
cipher += arr[i];
}
else {
cipher += (char) (((numericalVal + offset - 97) % 26) + 97);
}
}
return cipher;
My solution must begin with the line Public String decrypt(String plainText, int offset) {
This is how I attempted to solve the problem:
public String decrypt(String plainText, int offset) {
String decipher = "";
char[] d_arr = plainText.toCharArray();
for (int i = 0; i < d_arr.length; i++) {
int numericalVal = (int) d_arr[i];
if (Character.isUpperCase(d_arr[i])) {
decipher += (char) ((((numericalVal - offset - 65) % 26 + 26) % 26) + 65);
//to get remainder for negative values too
} else if (numericalVal == 32) {
decipher += d_arr[i];
}
else {
decipher += (char) ((((numericalVal - offset - 97) % 26 + 26) % 26) + 97);
}
}
return decipher;
This works when decrypting letters of the alphabet, but non-alphabetical letters are not decrypted properly and I am unsure what the issue is.
For example:
public static void main(String[] args) {
CaesarCipher C = new CaesarCipher();
System.out.println(C.encrypt("?", 4)); //returns the ] symbol
System.out.println(C.decrypt("]", 4)); //returns the letter s
We were told that adjusting the code to ignore non-alphabetical characters entirely was possible but would require more work, so I changed the } else if (numericalVal == 32) { cipher += arr[i]; code in both the encrypt and decrypt Strings to } else if (numericalVal < 65 || (numericalVal > 90 && numericalVal < 97) || numericalVal > 122) { cipher += arr[i];.
This circumvented the issue but I was told that it's much easier to just decrypt the non-alphabetical characters as well, so I reverted this change, but now I'm at a complete loss as to how to solve this problem. I feel like I'm missing something very simple as I managed to do it "the hard way" but cannot do it the easier way. I can see that when a non-alphabetical value is encrypted, the alphabetical letters are essentially skipped, but the same is not occurring for the decryption process. I presume this is related to the adjustment I made to find the remainder of negative values, but I am unsure.
What you generally try and do is to define your own alphabet rather than using the ABC, and put that in a string (or a char array). Then you replace the - 65 (which is not very great when it comes to encoding, you could have used e.g. just - 'A') by looking up the character in the alphabet.
Then you can perform the modulus operation on the size of the alphabet, i.e. alphabet.length() for strings or alphabet.length for char arrays. Then you perform the modular addition / subtraction, and finally you find the corresponding character in your alphabet again.
Now you have some special code for space and upper / lowercase. That would not work anymore when you'd include special characters. There are two ways around this. The simplest one is to create one big alphabet with uppercase, lowercase and special characters. If you want to keep the case you could also use e.g. 3 separate alphabets.
So you start off with e.g.
private static final String ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.;";
or you could use:
private static final String ALPHABET_UPPER = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static final String ALPHABET_LOWER = "abcdefghijklmnopqrstuvwxyz";
private static final String ALPHABET_SIGNS = ",.;";
I'd say the single alphabet is a bit more secure as you would directly be able to see the signs in the other. Then again, the Caesar cipher was only somewhat secure when almost nobody was able to read in the first place.
Better split your application into methods, have at least:
charToIndex(char c): int;
indexToChar(int i): char;
shiftIndex(int i, int shift): int.
and while we are at it:
public static int mod(int i, int n) {
return ((i % n) + n) % n;
}

What makes the 'getCharNumber' method case-insensitive while it only every checks for lowercase (by author of CtCI)

public class Common {
public static int getCharNumber(Character c) {
int a = Character.getNumericValue('a');
int z = Character.getNumericValue('z');
int val = Character.getNumericValue(c);
if (a <= val && val <= z) {
return val - a;
}
return -1;
}
public static int[] buildCharFrequencyTable(String phrase) {
int[] table = new int[Character.getNumericValue('z') - Character.getNumericValue('a') + 1];
for (char c : phrase.toCharArray()) {
int x = getCharNumber(c);
if (x != -1) {
table[x]++;
}
}
return table;
}
}
Above algorithm was used for testing whether a string is a permuation of a palindrome and was authored by CtCI (Cracking the Coding Interview).
My question: Why is the getCharNumber method case-insensitive?
I thought it should to be case-sensitive as it only checks for lowercase characters.
Why is the getCharNumber case-insensitive?
The getCharNumber method uses Java's Character#getNumericValue(char) method for which its JavaDoc states in particular:
The letters A-Z in their uppercase ('\u0041' through '\u005A'), lowercase ('\u0061' through '\u007A'), and full width variant ('\uFF21' through '\uFF3A' and '\uFF41' through '\uFF5A') forms have numeric values from 10 through 35. This is independent of the Unicode specification, which does not assign numeric values to these char values.
Meaning that for example for character A and a this API method returns the same value, i.e. 10, and thus there's no case-sensitivity.
For reference, see also
Character.getNumericValue(..) in Java returns same number for upper and lower case characters
What is the reverse of Character.getNumericValue
Java Character literals value with getNumericValue()
Character.getNumericValue() issue
Character.getNumericvalue in char Frequency table

Getting a character to be represented by an int 0-25

I'm working on a cryptography program that implements various traditional methods. For some of them it is best for my message to be represented as numbers 0-25. A is 0, B is 1, etc. This shift cipher is a case of that since you must take mod 26 for a wrap around. Also spaces and punctuation must be preserved.
Here is the code for the method that does the shift cipher:
public static void shift(char k, char eord)
{
if(eord=='E' || eord=='e')
{
int [] mi= new int[mc.length];
for(int i=0; i<mc.length; i++)
{
if ((mc[i]>='a' && mc[i]<='z') || (mc[i]>='A' && mc[i]<='Z'))
{
mi[i]=(int)(mc[i]+k);
mi[i]=mi[i]%26;
//mc[i]=(char)mi[i];
//System.out.println(mi[i]);
}
}
}
}
mc is an array of characters that holds the message and eord is a char that will determine whether to run the algorithm to encrypt or decrypt. What I have the code do is check to make sure that mc[i] is a letter and then add the char k (the key) and then I type cast it into an integer so I can mod 26. Something does not work correctly because when I have a key of 'b' (1) and see what the integer representation is it is definitely not correct. I also need to convert it back to a character when I'm done so I can give the user the plaintext/cipher text of the message.
To get an integer value of 0..25 of a character you need to make sure that you only get upper- or lowercase characters in the alphabet.
Let's assume lowercase. Then you can simply convert the characters to integer by subtracting the value of the 'a' character. As the character values are ordered as in the normal alphabet, this will give 'a' value of 0 and 'z' value of 25... and all the letters in between will get the correct value as well.
I'll show a lowercase version as I don't like shouting:
public class CharacterToZeroBasedIntegerRange {
public static int characterToIntegerRange(char c) {
if (c < 'a' || c > 'z') {
throw new IllegalArgumentException(String.format("Character with value %04X is not a letter", (int) c));
}
return c - 'a';
}
public static void main(String[] args) {
String test = "Hello world!".toLowerCase().replaceAll("[^a-z]", "");
System.out.println(test);
for (int i = 0; i < test.length(); i++) {
char c = test.charAt(i);
System.out.println(characterToIntegerRange(c));
}
}
}

Convert a letter to the corresponding letter on the opposite counting direction in the alphabet

I am self-studying Java and I am very at the beginning learning the basics. With below code I am trying to convert a letter to the corresponding letter on the opposite counting direction in the alphabet(i.e A to Z or Z to A etc.). It works for a single letter but not for a series of letters. How can I make it work with more than one letter? If you can use the simplest way it would be good as I am quite new in Java. I don't(know how to) export any built in classes etc.
Thank you.
class Turner{
int find(int fin, int mi,int ma,char ch[]){
int mid = (ma+mi)/2;
int x;
if(ch[mid]==fin)
return mid;
else if(fin<ch[mid])
return(find(fin, mi,mid-1,ch));
else
return x = find(fin,(mid+1),ma,ch);
}
}
class Turn {
public static void main(String args[]) throws java.io.IOException
{
Turner try1 = new Turner();
char arra[] = new char[26];
char arrb[] = new char[26];
int min = 0;
int max = arra.length;
char a = 'A';
char b = 'Z';
int i;
char letter;
for(i=0;i<26;i++)
{
arra[i]=a;
a++;
arrb[i]=b;
b--;
}
System.out.println("Enter a letter: ");
letter = (char)System.in.read();
System.out.print(arrb[try1.find(letter,min,max,arra)]);
}
}
Have you considered just doing some math?
letter = Character.toUpperCase((char)System.in.read());
System.out.print((char)('Z' - (letter - 'A'));
And it works for only one letter because you are not repeating the conversion procedure. The program reads one char, prints its opposite and then terminates.
All you have to do is to put the read and print code inside some sort of loop, so every time it runs it will promptly wait for the next letter.
while (true) {
letter = Character.toUpperCase((char)System.in.read());
if ((letter > 'Z') || (letter < 'A'))
break; // user inputted an invalid symbol, terminating the program
System.out.print((char)('Z' - (letter - 'A')));
}
If your function (they are called methods in java) works, good. Just put it in a while loop or otherwise call it when you need it.
boolean done = false;
while(!done){
System.out.println("Enter a letter (space to quit): ");
letter = (char)System.in.read();
if(letter == ' ') done = true;
else System.out.print(arrb[try1.find(letter,min,max,arra)]);
}
And Havenard is right, this can be written considerably more simply, with arithmetic on chars. For instance ch -'a' == 1 evaluates to true when ch is 'b'.
One other note: find and Turner aren't very descriptive names for what these things do. Before long it could get messy without simple and to the point naming.
A character has an equivalent numerical value. For "basic characters", this mapping is called ASCII table: http://www.asciitable.com/
Now, in java, you can convert a char into an int by casting. Example: int nValue=(int) 'a'.
Since there is a numerical value associated to 'a' and another one associated with 'z', you could use some simple math to solve your problem.
See:
int aNumericalValue = (int) 'a';
int zNumericalValue = (int) 'z';
char characterToConvert = ...;
int characterToConvertAsNumericalValue = (int) characterToConvert;
int resultCharacterAsNumericalValue = zNumericalValue - (characterToConvertAsNumericalValue - aNumericalValue);
char resultCharacter = (char) resultCharacterAsNumericalValue;
Or, you could write all of this as a single line of code:
char resultCharacter = (char) ((int) 'z' - ((int) characterToConvert - (int) 'a'));
And finally, if you are willing to hardcode some ASCII values:
char resultCharacter = (char) (122 - ((int) characterToConvert - 97));
Note that this is for lowercase letters. For caps, use 'A' (ascii 65), 'Z' (ascii 90).

Convert byte array to escaped string

I need some help converting a java byte array to a 7-Bit ASCII string. However I am getting 8-bit sequences and need to escape any unreadable character to it's escaped sequence. Is there a simple solution for this or do I need to build my own?
Seeing that the range of readable characters in 7-bit ASCII is continuous right now I am thinking of the following:
for( int i = 0; i < buffer.length; i++ ) {
int codePoint = ( (int) buffer[ i ] ) & 255;
if( 0x20 <= codePoint && codePoint <= 0x7e ) {
res = res + String( (char) codePoint );
} else {
String c = Integer.toHexString( codePoint );
if( c.length() < 2 ) {
c = "0" + c;
}
res = res + "\\0x" + c;
}
}
However this seems like an awful lot of work for such a simple conversion. Is there a better way?
Also I might need to do the same to data that has been converted from the byte array to strings. Is there a simpler solution in this case?
public static String escape(byte[] data) {
StringBuilder cbuf = new StringBuilder();
for (byte b : data) {
if (b >= 0x20 && b <= 0x7e) {
cbuf.append((char) b);
} else {
cbuf.append(String.format("\\0x%02x", b & 0xFF));
}
}
return cbuf.toString();
}
You can use the format method to pare back the verbiage.
Note that this method is only safe because the ASCII range matches the lower range of the UTF-16 encoding used by Java Strings.
If it doesn't fit base64 so second standard is java.net.URLEncoder.

Categories

Resources