I supposed to show/atleast print attached characters in a string
String str = (attached)
System.out.println("Str : "+Str);
But I am unable to print the exact chars. I did use UTF 8 and 16 encode.
Thanks in advance
You have what is known as a double encoding.
You have the three character sequence "你好吗" which you correctly point out is encoded in UTF-8 as E4BDA0 E5A5BD E59097.
But now, start encoding each byte of THAT encoding in UTF-8. Start with E4. What is that codepoint in UTF-8? Try it! It's C3 A4!
You get the idea.... :-)
Here is a Java app which illustrates this:
public class DoubleEncoding {
public static void main(String[] args) throws Exception {
byte[] encoding1 = "你好吗".getBytes("UTF-8");
String string1 = new String(encoding1, "ISO8859-1");
for (byte b : encoding1) {
System.out.printf("%2x ", b);
}
System.out.println();
byte[] encoding2 = string1.getBytes("UTF-8");
for (byte b : encoding2) {
System.out.printf("%2x ", b);
}
System.out.println();
}
}
Related
I am writing a Java test class and would like to iterate over all the Charsets specified in the StandardCharsets class and specify each encoding when performing the .getBytes() on the myString variable.
I want to try something like this:
String myString = "Some Junk";
for (Charset encoding : StandardCharsets) {
System.out.println("Using Encoding: " + encoding.displayName());
byte[] newBytes = myString.getBytes(encoding);
for (byte b : newBytes ) {
System.out.print(b + " ");
}
System.out.println("");
}
Obviously that is not correct. Short of doing each one manually how can I step through all the Charsets defined in the StandardCharsets class?
So based on your suggestions I came up with this which works (and probably can/should be improved on)
String myString = "Some Junk";
for (Field charSet : StandardCharsets.class.getDeclaredFields()) {
String encoding = charSet.getName();
//This is because the Charsets in StandardCharsets all use underscores however
//when passing the string to .getBytes() you need to pass UTF-8 and not UTF_8
//** All except for ISO_8859_1 ** - Sigh I wish I could do this better.
if (encoding.startsWith("U")) {
encoding = encoding.replaceAll("_", "-");
}
System.out.println("Using Encoding: " + encoding);
byte[] newByteStr = myString.getBytes(Charset.forName(encoding));
for (byte b : newByteStr ) {
System.out.print(b + " ");
}
System.out.println("");
}
This will convert the string in myString to a byte array with the relevant encoding of every Charset found in StandardCharSets. Which is what I wanted in the end.
EDIT 1: So based on MC Emperor's comment I now have this
String myString = "Some Junk";
for (Field field: StandardCharsets.class.getDeclaredFields()) {
if (field.get(null) instanceof Charset charset) {
System.out.println("Using Encoding: " + charset.displayName());
byte[] newByteStr = myString.getBytes(charset);
for (byte b : newByteStr ) {
System.out.print(b + " ");
}
System.out.println();
}
}
This seems far better as I no longer have to use string replacements.
Before that sorry for my bad english because it's not my first language.
Excuse me, i'm trying to convert array byte to string in error handling try catch java. And if user input integers, the result will be +2. I've tried but always error, i hope someone can help me.
Here's my code:
package exception;
public class TugasTiga {
public static void main (String [] args) {
byte[] b = new byte[5];
System.out.println("Input bilangan bulat: ");
try { System.in.read(b);
} catch (java.io.IOException e);
int N = Integer.valueOf(b).intValue();
System.out.println("Hasil: " + (N+2));
}
}
To convert a byte[] to String use s=new String(bytes,"UTF-8") or whatever encoding has been used.
However I assume that you misunderstood how the console works because you asked for a conversion to string but you need integer. We normally use the Scanner class to read interactive input and convert it.
Take a look at this tutorial, which explains how to use the scanner class: https://www.w3schools.com/java/java_user_input.asp
Just like that:
byte[] bytes = new byte[5];
String stringFromByteArray = new String(bytes);
Here's the solution:
byte[] bytes = new byte[5]; //create the list
String finalS = ""; //create the string
for(byte element : bytes) { //for all elements in the list
finalS += Byte.toString(element); //add to the string "finalS" the byte converted to string
}
EXAMPLE: if the 5 bytes are {1,2,3,4,5}, the string is "12345"
Just like the picture, I'd like to convert between the encoded UTF-8 String and Native String in Java.
Would anyone some suggestions? Thanks a lot!
ps.
For example,
String a = "这是一个例子,this is a example";
String b = null;
// block A: processing a, and let b = "这是一个例子,this is a example"
How to implement the "block A"?
Apache Commons Lang StringEscapeUtils.unescapeXml(...) is what you want. Depending on where your original string came from, one of the HTML variants may be more appropriate.
Use like so:
String a = "这是一个例子,this is a example";
String b = StringEscapeUtils.unescapeXml(a);
// block A: processing a, and let b = "这是一个例子,this is a example"
System.out.println(a);
System.out.println(b);
Output:
这是一个例子,this is a example
这是一个例子,this is a example
There are methods for converting the other way also.
You can use Charset. See the documentation here
Charset.forName("UTF-8").encode(text)
Or
you can also use getBytes() method of 'java.lang.String' Class
text.getBytes(Charset.forName("UTF-8"));
documentation:
public byte[] getBytes(Charset charset)
Encodes this String into a sequence of bytes using the given charset,
storing the result into a
new byte array.
This method always replaces malformed-input and unmappable-character
sequences with this charset's default replacement byte array. The
CharsetEncoder class should be used when more control over the
encoding process is required.
Parameters: charset - The Charset to be used to encode the String
Returns: The resultant byte array
Since:
1.6
To the right are hexadecimal numeric HTML entities.
Now the apache commons library has a StringEscapeUtils which can convert from that to String, but the reverse is not obvious (= should be tried, might give named entities).
public static void main(String[] args) throws InterruptedException {
String a = "这是一个例子,this is a example";
String b = fromHtmlEntities(a);
System.out.println(b);
String a2 = toHtmlEntities(b);
System.out.println(a2.equals(a));
System.out.println(a);
System.out.println(a2);
}
public static String fromHtmlEntities(String s) {
Pattern numericEntityPattern = Pattern.compile("\\&#[Xx]([0-9A-Fa-f]{1,6});");
Matcher m = numericEntityPattern.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find()) {
int codePoint = Integer.parseInt(m.group(1), 16);
String replacement = new String(new int[] { codePoint }, 0, 1);
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
return sb.toString();
}
// Uses java 8
public static String toHtmlEntities(String s) {
int[] codePoints = s.codePoints().flatMap(
(cp) -> cp < 128 // ASCII?
? IntStream.of(cp)
: String.format("&#x%X;", cp).codePoints())
.toArray();
return new String(codePoints, 0, codePoints.length);
}
public static void main(String[] args) {
try {
String name = "i love my country";
byte[] sigToVerify = name.getBytes();
System.out.println("file data:" + sigToVerify);
String name1 = "data";
byte[] sigToVerify1 = name1.getBytes();
System.out.println("file data1:" + sigToVerify1);
}
}
I am trying to execute the above program but getBytes() gives me different values for the same String. Is there any way to get the same byte while executing multiple times for a given string?
System.out.println("file data:" + sigToVerify);
Here you are not printing the value of a String. As owlstead pointed out correctly in the comments, the Object.toString() method will be invoked on the byte array sigToVerify. Leading to an output of this format:
getClass().getName() + '#' + Integer.toHexString(hashCode())
If you want to print each element in the array you have to loop through it.
byte[] bytes = "i love my country".getBytes();
for(byte b : bytes) {
System.out.println("byte = " + b);
}
Or even simpler, use the Arrays.toString() method:
System.out.println(Arrays.toString(bytes));
try printing out the contents of the byte array instead of the toString() result of the variable
for(byte b : sigToVerify)
System.out.print(b +"\t");
if the bytes getting printed are the same, then you're good to go.
I got an unicode string from an external server like this:
005400610020007400650020007400ED0020007400FA0020003F0020003A0029
and I have to decode it using java. I know that the '\u' prefix make the magic (i.e. '\u0054' -> 'T'), but I don't know how transform it to use as a common string.
Thanks in advance.
Edit: Thanks to everybody. All the answers work, but I had to choose only one :(
Again, thanks.
It looks like a UTF-16 encoding. Here is a method to transform it:
public static String decode(String hexCodes, String encoding) throws UnsupportedEncodingException {
if (hexCodes.length() % 2 != 0)
throw new IllegalArgumentException("Illegal input length");
byte[] bytes = new byte[hexCodes.length() / 2];
for (int i = 0; i < bytes.length; i++)
bytes[i] = (byte) Integer.parseInt(hexCodes.substring(2 * i, 2 * i + 2), 16);
return new String(bytes, encoding);
}
public static void main(String[] args) throws UnsupportedEncodingException {
String hexCodes = "005400610020007400650020007400ED0020007400FA0020003F0020003A0029";
System.out.println(decode(hexCodes, "UTF-16"));
}
}
Your example returns "Ta te tí tú ? :)"
You can simply split the String in Strings of length 4 and then use Integer.parseInt(s, 16) to get the numeric value. Cast that to a char and build a String out of it. For the above example you will get:
Ta te tí tú ? :)
It can be interpreted as UTF-16 or as UCS2 (a sequence of codepoints coded in 2 bytes, hexadecimal representation), it's equivalent as long as we do not fall outside the BMP.
An alternative parsing method:
public static String mydecode(String hexCode) {
StringBuilder sb = new StringBuilder();
for(int i=0;i<hexCode.length();i+=4)
sb.append((char)Integer.parseInt(hexCode.substring(i,i+4),16));
return sb.toString();
}
public static void main(String[] args) {
String hexCodes = "005400610020007400650020007400ED0020007400FA0020003F0020003A0029";
System.out.println(mydecode(hexCodes));
}