String encoding - Shift_JIS / UTF-8

String encoding - Shift_JIS / UTF-8 - java

I get a string from a 3rd party library, which is not well encoded.
Unfortunately I'm not allowed to change the library or use another one...
So the actual problem is, that the 3rd party library result string will encode characters like "è ò à ù ì ä ö ü, ..." as SHIFT_JIS (Kanji) inside an UTF-8 string. But only if the character is connected to a word and isn't standalone.
For example:
"Ö Just a simple test"
"ÖJust a simple test"
I tried the following without success:
byte[] b = resultString.getBytes("Shift_JIS");
String value = new String(b, "UTF-8");
UPDATE 1:
That's the content of "resultString".
Note:
The byte array shown, is without any modifications (such as getBytes("Shift_JIS"), it's just the resultString as bytes)
Do you have any ideas?
Any help would be greatly appreciated.
Thank you.

Well, very strange:
As
byte[] b = resultString.getBytes("Shift_JIS");
String value = new String(b, "UTF-8");
didn't work for me I tried the following:
String value = new String(resultString.getBytes("SHIFT-JIS"), "UTF-8")
Works like a charm.
Maybe it was because of the underscore and lower case character in "Shift_JIS".

Related

Encode to UTF-8. Encode character eg. ö to Ã¶

I want to encode a string in Android to UTF-8. For example this string:
Grüne Ähren beißen Flöhe
to
GrÃ¼ne Ãhren beiÃen FlÃ¶he
But no matter what I do I encode ü to ü or ü to %C3%BC (online often called 'raw URL encode').
Found solutions to convert to byte[] or URI.toASCIIString(). But non of them work for me.
UPDATE
I am participating in the eBay partner network and try to concat a searchword to my partner url.
The people of eBay must use a wrong character set, as UTF-8 URL encoded string don't work.
A searchword with UTF-8 URL encoding
(Grüne Ähren beißen Flöhe
to
Gr%C3%BCne%20%C3%84hren%20bei%C3%9Fen%20Fl%C3%B6he)
comes out to this result in the eBay searchbox:
If I encode my searchword with ISO_8859_1 it works (GrÃ¼ne Ãhren beiÃen FlÃ¶he):
Thank you very much community

What you essentially want is to convert a String to it's byte representation according to UTF-8 and interpret these bytes using a different Charset, such as ISO-8859-1.
This is usually the cause of many problems. You want to intentionally do what most developers do incorrectly (or they simply ignore the problems of charsets).
Since you just need this to work, use this piece of code:
byte[] bytes = "Grüne Ähren beißen Flöhe".getBytes("UTF-8");
String result = new String(bytes, "ISO-8859-1");
see it at work here.

Base64 encoding btoa

I am encoding some text on my frontend part using btoa function:
const encodedText = btoa(searchText);
This seems to work totally fine and decoding goes like this on backend part:
byte[] decodedBytes = Base64.getDecoder().decode(searchedText);
String decodedString = new String(decodedBytes, Charset.defaultCharset());
Which also works fine. However, this seems to fail when using ü letter. My program encodes it as A==, and as far as I know, it should be w7w=
I am not sure what I did wrong.

You could use
const encodedText = btoa(unescape(encodeURIComponent(searchText)));
instead to encode unicode characters first.
See Unicode strings and The "Unicode Problem" for further reading.
console.log(btoa('ü'));
console.log(btoa(unescape(encodeURIComponent('ü'))));

Java cyrillic encoding

I have input string - UAH;"Ãîëüô 855229-7", it should be displayed like UAH;"Гольф 855229-7", I'm trying to use Cp1251 encoding, but get output UAH;"????? 855229-7".
String cyrillic = row[0] + row[1];
String utf8String= new String(cyrillic.getBytes("Cp1251"), "UTF-8");
lbl1.setText(utf8String);

UTF-8 has nothing to do with this. All of your characters in cyrillic are being represented as single bytes.
Currently, those bytes are in the ISO 8859-1 encoding, also known as Latin-1, which is a subset of the Windows English code page, Cp1252. So, you want to encode the string as Cp1252, then decode the resulting bytes as Cp1251:
String corrected8String = new String(cyrillic.getBytes("Cp1252"), "Cp1251");

Base64 String to Windows1251 (cyrillic symbols)

I have a trouble to convert email attachment(simple text file in windows-1251 encoding with latin and cyrillic symbols) to String. I.e I have a problem with converting cyrillic.
I got attachment file as base64 encoded String like this:
Base64Encoded email Attachment
Original file
So when I try to decode it, I got "?" instead of Cyrillic symbols.
How can I get right Cyrillic(Russian) symbols instead of "?"
I've already tried this code with all encodings, but nothing help to get correct Russian symbols.
BASE64Decoder dec = new BASE64Decoder();
for (String key : Charset.availableCharsets().keySet()) {
System.out.println("K=" + key + " Value:" +
Charset.availableCharsets().get(key));
try {
System.out.println(new String(dec.decodeBuffer(encoded), key));
} catch (Exception e) {
continue;
}
}
Thank You beforehand.

I am not very familiar with BPEL and protocols it uses. If you communicate between nodes using some binary protocols, then you must 1) ensure, client and receiver use the same charset and 2) convert java string into proper bytes in this encoding. Java stores string internally in UTF-16 format. So when you execute String correct = new String(commonName.getBytes("ISO-8859-1"), "ISO-8859-5") you will get correct string in UTF-16. Then you need to export it to bytes in requested encoding, eg. byte[] buff = correct.getBytes("UTF-8") assuming the encoding you use between nodes is UTF-8. If happen the encoding is different, then you must make sure, it actually supports Cyrillic characters (e.g. ISO-8859-1 does not support it).
If you use XML for data exchange, make sure it uses suitable encoding in <?xml encoding="UTF-8"?>. You don't need then to play with bytes, you just need to correctly "import" the string (see correct variable). Writing to XML converts characters automatically, but it (encoding) must support characters you want to write. So if you set encoding="ISO-88591", then you will get those question marks again.

conwert object which contains strings with utf-8 to string with proper coding

I'm processing MMS and got it text part as :
mmsBodyPart.getContent();
it's simpy Object. Now i need to convert it to String using utf-8. I have tried:
String contentText = (String) mmsBodyPart.getContent();
but it doesn't works with specyfics characters and some strange chars appear.
Also i tried :
String content = new String(contentText.getBytes("UTF-8"), "UTF-8"));
not a mystery that also failed.
How that can be done ?
EDIT: Problem was caused by bad encoding in file. Nothing wrong was in code, ya didn't thought about it in first place...

Strings haven't an Encoding in Java. If you need one, you should use byte[] with Encoding to get a String

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

String encoding - Shift_JIS / UTF-8 - java

Related

Encode to UTF-8. Encode character eg. ö to Ã¶

Base64 encoding btoa

Java cyrillic encoding

Base64 String to Windows1251 (cyrillic symbols)

conwert object which contains strings with utf-8 to string with proper coding

Categories

Resources