Base64 encoding btoa

Base64 encoding btoa - java

I am encoding some text on my frontend part using btoa function:
const encodedText = btoa(searchText);
This seems to work totally fine and decoding goes like this on backend part:
byte[] decodedBytes = Base64.getDecoder().decode(searchedText);
String decodedString = new String(decodedBytes, Charset.defaultCharset());
Which also works fine. However, this seems to fail when using ü letter. My program encodes it as A==, and as far as I know, it should be w7w=
I am not sure what I did wrong.

You could use
const encodedText = btoa(unescape(encodeURIComponent(searchText)));
instead to encode unicode characters first.
See Unicode strings and The "Unicode Problem" for further reading.
console.log(btoa('ü'));
console.log(btoa(unescape(encodeURIComponent('ü'))));

Related

Encode to UTF-8. Encode character eg. ö to Ã¶

I want to encode a string in Android to UTF-8. For example this string:
Grüne Ähren beißen Flöhe
to
GrÃ¼ne Ãhren beiÃen FlÃ¶he
But no matter what I do I encode ü to ü or ü to %C3%BC (online often called 'raw URL encode').
Found solutions to convert to byte[] or URI.toASCIIString(). But non of them work for me.
UPDATE
I am participating in the eBay partner network and try to concat a searchword to my partner url.
The people of eBay must use a wrong character set, as UTF-8 URL encoded string don't work.
A searchword with UTF-8 URL encoding
(Grüne Ähren beißen Flöhe
to
Gr%C3%BCne%20%C3%84hren%20bei%C3%9Fen%20Fl%C3%B6he)
comes out to this result in the eBay searchbox:
If I encode my searchword with ISO_8859_1 it works (GrÃ¼ne Ãhren beiÃen FlÃ¶he):
Thank you very much community

What you essentially want is to convert a String to it's byte representation according to UTF-8 and interpret these bytes using a different Charset, such as ISO-8859-1.
This is usually the cause of many problems. You want to intentionally do what most developers do incorrectly (or they simply ignore the problems of charsets).
Since you just need this to work, use this piece of code:
byte[] bytes = "Grüne Ähren beißen Flöhe".getBytes("UTF-8");
String result = new String(bytes, "ISO-8859-1");
see it at work here.

Weird Base64 Encoding

I have found weird thing about base64 decoding
As far as I know, asterisk(*) is not a base64 charater.
Accdently, I tested base64 decoding with string that includ asterisk
Here the code
String test = "icUI6W/ViQ4HNnA62XDbnBknC4CO4hpizE726rPT2Z+Z+/VHteyx1txPpb4ytqngfezfVo/*bTVr";
sun.misc.BASE64Decoder bd = new sun.misc.BASE64Decoder();
data = bd.decodeBuffer(test);
I expect some exceptions occure, But not that
Can Any one explain that?
Is it sun java bug?
Thanks

Charset bug in AES Decryption on Android system

Good evening!
In my android app the smartphones load a AES encrypted String from my server and store it in a variable. After that process the variable and a key are pass to a method which decrypt the string. My mistake is that german umlauts (ä, ü, ö) aren't correct decoded. All umlauts displayed as question marks with black background...
My Code:
public static String decrypt(String input, String key) {
byte[] output = null;
String newString = "";
try {
SecretKeySpec skey = new SecretKeySpec(key.getBytes(), "AES");
Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
cipher.init(Cipher.DECRYPT_MODE, skey);
output = cipher.doFinal(Base64.decode(input, Base64.DEFAULT));
newString = new String(output);
} catch(Exception e) {}
return newString;
}
The code works perfectly - only umlauts displayed not correctly, an example is that (should be "ö-ä-ü"):
How can I set the encoding of the decrypted String? In my iOS app I use ASCII to encoding the decoded downloaded String. That works perfectly! Android and iOS get the String from the same Server on the same way - so I think the problem is the local Code above.
I hope you can help me with my problem... Thanks!

There is no text but encoded text.
It seems like you are guessing at the character set and encoding—That's no way to communicate.
To recover the text, you need to reverse the original process applied to it with the parameters associated with each step.
For explanation, assume that the server is taking text from a Java String and sending it to you securely.
String uses the Unicode character set (specifically, Unicode's UTF-16 encoding).
Get the bytes for the String, using some specific encoding, say ISO8859-1. (UTF-8 could be better because it is also an encoding for the Unicode character set, whereas ISO8859-1 has a lot fewer characters.) As #Andy points out, exceptions are your friends here.
Encrypt the bytes with a specific key. The key is a sequence of bytes, so, if you are generating this from a string, you have to use a specific encoding.
Encode the encrypted bytes with Base64, producing a Java String (again, UTF-16) with a subset of characters so reduced that it can be re-encoded in just about any character encoding and placed in just about any context such as SMTP, XML, or HTML without being misinterpreted or making it invalid.
Transmit the string using a specific encoding. An HTTP header and/or HTML charset value is usually used to communicate which encoding.
To receive the text, you have to get:
the bytes,
the encoding from step 5,
the key from step 3,
the encoding from step 3 and
the encoding from step 2.
Then you can reverse all of the steps. Per your comments, you discovered you weren't using the encoding from step 2. You also need to use the encoding from step 3.

Convert Utf-16 to UTF-8 strings with data losing using Java

I have to insert text which 99,9% is UTF-8 but have 0.01% UTF-16 characters. Sо when I try to save it in my Mysql databse using Hibernate and Spring an exception occured. I can even remove these chars there is no problem, so I want to convert all my text in UTF-8 and save to my database with data losing, so the problem chars to be removed. I tried
String string = "😈 Devil Emoji";
byte[] converttoBytes = string.getBytes("UTF-16");
string = new String(converttoBytes, "UTF-8");
System.out.println(string);
But nothing happens.
😈 Devil Emoji
Is there any external library in order to do that?

😈 probably has nothing to do with UTF-16. It's hex is F09F9888. Notice that that is 4 bytes. Also notice that that is a UTF-8 encoding, not a "Unicode" encoding: U+1F608 or \u1F608. UTF-16 would be none of the above. More (scarfboy).
MySQL's utf8 handles only 3-byte (or shorter) UTF-8 characters. MySQL's utf8mb4 also handles 4-byte characters like that little devil.
You need to change the CHARACTER SET of the column you are storing him into. And you need to establish that your connection is charset=UTF-8.
Note: things outside MySQL call it UTF-8, but MySQL calls it utf8mb4.

String holds Unicode in java, so all scripts can be combined.
byte[] converttoBytes = string.getBytes("UTF-16");
These bytes are binary data, but actually used to store text, encoded in UTF-16.
string = new String(converttoBytes, "UTF-8");
Now String thinks that the bytes represent text encoding in UTF-8, and converts those. This is wrong.
Now to detect the encoding, either UTF-8 or UTF-16, then that should best be done on bytes, not String, as that String then has an erroneous conversion with possible loss.
As UTF-8 has the most strict format of both, we'll check that one.
Also UTF-16 has a byte 0 for ASCII, that almost never occurs in normal text.
So something like
public static String string(byte[] bytes) {
ByteBuffer buffer = ByteBuffer.wrap(bytes);
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
try {
String s = decoder.decode(buffer).toString();
if (!s.contains("\u0000")) { // Could be UTF-16
return s;
}
} catch (CharacterCodingException e) { // Error in UTF-8
}
return new String(bytes, "UTF-16LE");
}
If you only have a String (for instance from the database), then
if (!s.contains("\u0000")) { // Could be UTF-16
s = new String(s.getBytes("Windows-1252"), "UTF-16LE");
}
might work or make a larger mess.

String encoding - Shift_JIS / UTF-8

I get a string from a 3rd party library, which is not well encoded.
Unfortunately I'm not allowed to change the library or use another one...
So the actual problem is, that the 3rd party library result string will encode characters like "è ò à ù ì ä ö ü, ..." as SHIFT_JIS (Kanji) inside an UTF-8 string. But only if the character is connected to a word and isn't standalone.
For example:
"Ö Just a simple test"
"ÖJust a simple test"
I tried the following without success:
byte[] b = resultString.getBytes("Shift_JIS");
String value = new String(b, "UTF-8");
UPDATE 1:
That's the content of "resultString".
Note:
The byte array shown, is without any modifications (such as getBytes("Shift_JIS"), it's just the resultString as bytes)
Do you have any ideas?
Any help would be greatly appreciated.
Thank you.

Well, very strange:
As
byte[] b = resultString.getBytes("Shift_JIS");
String value = new String(b, "UTF-8");
didn't work for me I tried the following:
String value = new String(resultString.getBytes("SHIFT-JIS"), "UTF-8")
Works like a charm.
Maybe it was because of the underscore and lower case character in "Shift_JIS".

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.