How to decode UTF-8 encoded String using java?

How to decode UTF-8 encoded String using java? - java

Actually i'm having String in UTF-8 encoded form in the mail. I want it to decode it. I use Java mimeutility.decode text. But it doesn't decode properly.
Example String
=?UTF-8?B?0J/RgNC40LLQtdGC?==?UTF-8?B?0JfQtNGA0LDQstGB0YLQstGD0LnRgtC1?=
When i used
MimeUtility.decodeText("=?UTF-8?B?0J/RgNC40LLQtdGC?==?UTF-8?B?0JfQtNGA0LDQstGB0YLQstGD0LnRgtC1?=")
it yields
Привет=?UTF-8?B?0JfQtNGA0LDQstGB0YLQstGD0LnRgtC1?=
Please help me. Thanks in advance

It is mime-encoded -- the "B" encoding, to be specific (rfc2047 section 4.1).
I think you can decode it using javamail javax.mail.internet.InternetHeaders or MimeUtility
class.

Related

Encode to UTF-8. Encode character eg. ö to Ã¶

I want to encode a string in Android to UTF-8. For example this string:
Grüne Ähren beißen Flöhe
to
GrÃ¼ne Ãhren beiÃen FlÃ¶he
But no matter what I do I encode ü to ü or ü to %C3%BC (online often called 'raw URL encode').
Found solutions to convert to byte[] or URI.toASCIIString(). But non of them work for me.
UPDATE
I am participating in the eBay partner network and try to concat a searchword to my partner url.
The people of eBay must use a wrong character set, as UTF-8 URL encoded string don't work.
A searchword with UTF-8 URL encoding
(Grüne Ähren beißen Flöhe
to
Gr%C3%BCne%20%C3%84hren%20bei%C3%9Fen%20Fl%C3%B6he)
comes out to this result in the eBay searchbox:
If I encode my searchword with ISO_8859_1 it works (GrÃ¼ne Ãhren beiÃen FlÃ¶he):
Thank you very much community

What you essentially want is to convert a String to it's byte representation according to UTF-8 and interpret these bytes using a different Charset, such as ISO-8859-1.
This is usually the cause of many problems. You want to intentionally do what most developers do incorrectly (or they simply ignore the problems of charsets).
Since you just need this to work, use this piece of code:
byte[] bytes = "Grüne Ähren beißen Flöhe".getBytes("UTF-8");
String result = new String(bytes, "ISO-8859-1");
see it at work here.

Java Encoding for "GB2312" CHARACTER ® replacing with question mark(?)

I'm trying to get encoded value using GB2312 characterset but I'm getting '? 'instead of '®'
Below is my sample code:
new String("Test ®".getBytes("GB2312"));
but I'm getting Test ? instead of Test ®.
Does any one faced this issue?
Java version- JDK6
Platform: Window 7
I'm not aware of Chinese character encoding so need suggestion.

For better understanding, the statement divided in two parts:
byte[] bytes = "Test ®".getBytes("GB2312"); // bytes, encoding the string to GB2312
new String(bytes); // back to string, using default encoding
Probably ® is not a valid GB2312 character, so it is converted to ?. See the result of
Charset.forName("GB2312").newEncoder().canEncode("®")
Based on documentation of getBytes:
The behavior of this method when this string cannot be encoded in the given charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.
which also suggest using CharsetEncoder.

Foreign Language Characters decode in Java & iacute;

I want to display characters of foreign languages in jasper reports. The reports passes the text to java code for RTF formatting. Unfortunately the mysql database returns decoded string like below with spaces removed
& iacute;
what I want to display is
í
Any suggestions how to do it with java?
text: bebida fría
from database: bebida fr& iacute;a

That are HTML entities. You can use StringEscapeUtils.unescapeHtml4 from apache commons library.
Still remains to see how your RTF handles Unicode.

If I understand your question, then you could use the unicode literal,
System.out.println("bebida fr\u00EDa");
Output is (the requested)
bebida fría

Check database table encoding. Also you can try to encode your string with proper encoding.
ByteBuffer encode = Charset.forName("UTF-8").encode(myString);
String encodedStr = new String(encode.array());

Decrypt servlet parameter, encrypted with BasicTextEncryptor

I am having a little problem with BasicTextEncryptor.
Strings results are encoded in BASE64 after encryption. In my case I want to encrypt a string, and send it through URL parameter to a servlet. Within the servlet I want to decrypt this parameter and get the original string. The problem is that sometimes the encrypted string contains some characters (like spaces) and in URL those are represented in other symbols (+ for example). In this case I can't decrypt that string anymore because it is not anymore the same one.
Can anyone give me a hint how to solve this? I am doing this to perform an email confirmation through servlet link, if anyone could suggest me another solution will be very appreciated.

At the end the problem was simpler that I thought: Java URL encoding of query string parameters
I just encoded the string like this:
String url = "http://example.com/query?q=" + URLEncoder.encode(MyString, "ISO-8859-1");
Then the string that I will take from servlet request will be decoded to the right string by default.

How to check encoding in java?

I am facing a problem about encoding.
For example, I have a message in XML, whose format encoding is "UTF-8".
<message>
<product_name>apple</product_name>
<price>1.3</price>
<product_name>orange</product_name>
<price>1.2</price>
.......
</message>
Now, this message is supporting multiple languages:
Traditional Chinese (big5),
Simple Chinese (gb),
English (utf-8)
And it will only change the encoding in specific fields.
For example (Traditional Chinese),
蘋果
1.3
橙
1.2
.......
Only "蘋果" and "橙" are using big5, "<product_name>" and "</product_name>" are still using utf-8.
<price>1.3</price> and <price>1.2</price> are using utf-8.
How do I know which word is using different encoding?

It looks like whoever is providing the XML is providing incorrect XML. They should be using a consistent encoding.
http://sourceforge.net/projects/jchardet/files/ is a pretty good heuristic charset detector.
It's a port of the one used in Firefox to detect the encoding of pages that are missing a charset in content-type or a BOM.
You could use that to try and figure out the encoding for substrings in a malformed XML file if you can't get the provider to fix their output.

you should use only one encoding in one xml file. there are counterparts of the characters of big5 in the UTF_8 encoding.

Because I cannot get the provider to fix the output, so I should be handle it by myself and I cannot use the extend library in this project.
I only can solve that like this,
String str = new String(big5String.getByte("UTF-8"));
before display the message.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to decode UTF-8 encoded String using java? - java

It is mime-encoded -- the "B" encoding, to be specific (rfc2047 section 4.1). I think you can decode it using javamail javax.mail.internet.InternetHeaders or MimeUtility class.

Related

Encode to UTF-8. Encode character eg. ö to Ã¶

Java Encoding for "GB2312" CHARACTER ® replacing with question mark(?)

Foreign Language Characters decode in Java & iacute;

Decrypt servlet parameter, encrypted with BasicTextEncryptor

How to check encoding in java?

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to decode UTF-8 encoded String using java? - java

It is mime-encoded -- the "B" encoding, to be specific (rfc2047 section 4.1). I think you can decode it using javamail javax.mail.internet.InternetHeaders or MimeUtility class.

Related

Encode to UTF-8. Encode character eg. ö to Ã¶

Java Encoding for "GB2312" CHARACTER ® replacing with question mark(?)

Foreign Language Characters decode in Java **& iacute;**

Decrypt servlet parameter, encrypted with BasicTextEncryptor

How to check encoding in java?

Categories

Resources

Foreign Language Characters decode in Java & iacute;