I have a string in UTF-8 format. I want to convert it to clean ANSI format. How to do that?
You could use a java function like this one here to convert from UTF-8 to ISO_8859_1 (which seems to be a subset of ANSI):
private static String convertFromUtf8ToIso(String s1) {
if(s1 == null) {
return null;
}
String s = new String(s1.getBytes(StandardCharsets.UTF_8));
byte[] b = s.getBytes(StandardCharsets.ISO_8859_1);
return new String(b, StandardCharsets.ISO_8859_1);
}
Here is a simple test:
String s1 = "your utf8 stringáçﬠ";
String res = convertFromUtf8ToIso(s1);
System.out.println(res);
This prints out:
your utf8 stringáç?
The ﬠ character gets lost because it cannot be represented with ISO_8859_1 (it has 3 bytes when encoded in UTF-8). ISO_8859_1 can represent á and ç.
You can do something like this:
new String("your utf8 string".getBytes(Charset.forName("utf-8")));
in this format 4 bytes of UTF8 converts to 8 bytes of ANSI
Converting UTF-8 to ANSI is not possible generally, because ANSI only has 128 characters (7 bits) and UTF-8 has up to 4 bytes. That's like converting long to int, you lose information in most cases.
Related
There are some SO quetions but no helped me. I would like to convert byte[] from org.apache.commons.codec.digest.HmacUtils to String. This code produces some weird output:
final String value = "value";
final String key = "key";
byte[] bytes = HmacUtils.hmacSha1(key, value);
String s = new String(bytes);
What am I doing wrong?
Try to use:
String st = HmacUtils.hmacSha1Hex(key, value);
First, the result of hmacSha1 would produce a digest, not not a clear String. Besides, you may have to specify an encoding format, for example
String s = new String(bytes, "US-ASCII");
or
String s = new String(bytes, "UTF-8");
For a more general solution, if you don't have HmacUtils available:
// Prepare a buffer for the string
StringBuilder builder = new StringBuilder(bytes.length*2);
// Iterate through all bytes in the array
for(byte b : bytes) {
// Convert them into a hex string
builder.append(String.format("%02x",b));
// builder.append(String.format("%02x",b).toUpperCase()); // for upper case characters
}
// Done
String s = builder.toString();
To explain your problem:
You are using a hash function. So a hash is usually an array of bytes which should look quite random.
If you use new String(bytes) you try to create a string from these bytes. But Java will try to convert the bytes to characters.
For example: The byte 65 (hex 0x41) becomes the letter 'A'. 66 (hex 0x42) the letter 'B' and so on. Some numbers can't be converted into readable characters. Thats why you see strange characters like '�'.
So new String(new byte[]{0x41, 0x42, 0x43}) will become 'ABC'.
You want something else: You want each byte converted into a 2 digit hex String (and append these strings).
Greetings!
You may need to have an encoding format. Check out this link here.
UTF-8 byte[] to String
I have a string like that "QQBkADEBbgAxAXoA" I am creating a byte array of the string and using this code to convert it to string in c#.
string value = new UnicodeEncoding()).GetString(array)
I need this UnicodeEncoding in java. Is there a class that can perform it in java?
C# class UnicodeEncoding encodes on UTF-16.
String value = new String(bytes, "UTF-16LE");
The code above worked for me, since c# was using little endian representation, and Java UTF-16 is big endian.
The C# class UnicodeEncoding encodes the string using the UTF-16 encoding.
In Java you should be able to convert the bytes back to a string like this:
byte[] bytes = ...;
String value = new String(bytes, "UTF-16");
Or the other way around, convert a Java string to bytes using UTF-16 encoding:
byte[] bytes = value.getBytes("UTF-16");
How can I convert this byte[] to String :
byte[] mytest = new byte[] {100,25,28,-122,-26,94,-3,-26};
i get this : "d��^�" when I use :
new String( mytest , "UTF-8" )
Here is code java for creation of key :
m_key = new javax.crypto.spec.SecretKeySpec(new byte[] {100,25,28,-122,-26,94,-3,-26}, "DES");
Thanks.
In order to decode the byte array into something like ASCII, you need to know its original encoding. Otherwise you would need to treat it as binary.
Note: Base64 is intended for transferring binary data across networks.
I would suggest Base64 encoding your byte array. Then in your PHP code decoding the Base64 string back into a UTF-8 string.
In Java, here's how to Base64 encode your byte array and then decode it back to UTF-8:
import org.apache.commons.codec.binary.Base64;
public class MyTest {
public static void main(String[] args) throws Exception {
byte[] byteArray = new byte[] {100,25,28,-122,-26,94,-3,-26};
System.out.println("To UTF-8 string: " + new String(byteArray, "UTF-8"));
byte[] base64 = Base64.encodeBase64(byteArray);
System.out.println("To Base64 string: " + new String(base64, "UTF-8"));
byte[] decoded = Base64.decodeBase64(base64);
System.out.println("Back to UTF-8 string: " + new String(decoded, "UTF-8"));
/* the decoded byte array is the same as the original byte array */
for (int i = 0; i < decoded.length; i++) {
assert byteArray[i] == decoded[i];
}
}
}
The output from the above code is:
To UTF-8 string: d��^�
To Base64 string: ZBkchuZe/eY=
Back to UTF-8 string: d��^�
So if you wanted to use the same binary data in your PHP code, cut and paste the Base64 string into your PHP code and decode it back to UTF-8. Something like this:
<?php
$str = 'ZBkchuZe/eY=';
$key = base64_decode($str);
echo $key;
?>
I don't code in PHP, but you should be able to decode Base64 using this method:
http://php.net/manual/en/function.base64-decode.php
The above code should echo back the original binary data as UTF-8 (albeit with funny characters). The point is that the funny-looking string in the $key variable is representing the same binary data you had in the Java byte array:
d��^�
You should be able to pass the $key variable into your PHP encryption method.
with the way you are doing it makes no sense imo. you are creating a new string with the byte[] as an argument. i dont think that function is suppose to parse. so what you end up with is a lot of junk. but a little bit of googling got me this: http://www.mkyong.com/java/how-do-convert-byte-array-to-string-in-java/
Would m_key.getEncoded() give you the desired result.
Javadocs - SecretKeySpec
If not, you have to identify the Key provider that was used for the encoding (which resulted in the byte array that you have now) and decode.
I have code of character in Windows-1251 code table.
How i can get code of this character in UTF-8 code table?
For example i have character 'А' with coded in Windows-1251 equals 192, appropriate utf-8 code equals 1040
How i can to initialize Character or char in Java with code 192 from Windows-1251 code table?
char c = (char)192; //how to specify the encoding ?
To convert a byte[] encoding in one character encoding to another you can do
public static byte[] convertEncoding(byte[] bytes, String from, String to) {
return new String(bytes, from).getBytes(to);
}
How can I convert so called "php unicode"(link to php unicode) to normal character via Java? Example \xEF\xBC\xA1 -> A. Are there any embedded methods in jdk or should I use regex for this conversion?
You first need to get the bytes out of the string into a byte-array without changing them and then decode the byte-array as a UTF-8 string.
The simplest way to get the string into a byte array is to encode it using ISO-8859-1 which map every character with a unicode value less than 256 to a byte with the same value (or the equivalent negative)
String phpUnicode = "\u00EF\u00BC\u00A1"
byte[] bytes = phpUnicode.getBytes("ISO-8859-1"); // maps to bytes with the same ordinal value
String javaString = new String(bytes, "UTF-8");
System.out.println(javaString);
Edit
The above converts the UTF-8 to the Unicode character. If you then want to convert it to a reasonable ASCII equivalent, there's no standard way of doing that: but see this question
Edit
I assumed that you had a string containing characters that had the same ordinal value as the UTF-8 sequence but you indicate that your string literally contains the escape sequence, as in:
String phpUnicode = "\\xEF\\xBC\\xA1";
The JDK doesn't have any built-in methods to convert Strings like this so you'll need to use your own regex. Since we ultimately want to convert a utf-8 byte-sequence into a String, we need to set up a byte-array, using maybe:
Pattern oneChar = Pattern.compile("\\\\x([0-9A-F]{2})|(.)", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher matcher = oneChar.matcher(phpUnicode);
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
while (matcher.find()) {
int ch;
if (matcher.group(1) == null) {
ch = matcher.group(2).charAt(0);
}
else {
ch = Integer.parseInt(matcher.group(1), 16);
}
bytes.write((int) ch);
}
String javaString = new String(bytes.toByteArray(), "UTF-8");
System.out.println(javaString);
This will generate a UTF-8 stream by converting \xAB sequences. This UTF-8 stream is then converted to a Java string. It's important to note that any character that is not part of an escape sequence will be converted to a byte equivalent to to the low-order 8 bites of the unicode character. This works fine for ascii but can cause transcoding problems for non-ascii characters.
#McDowell:
The sequence:
String phpUnicode = "\u00EF\u00BC\u00A1"
byte[] bytes = phpUnicode.getBytes("ISO-8859-1");
creates a byte array containing as many bytes as the original string has characters and for each character with a unicode value below 256, the same numeric value is stored in the byte-array.
The character FULLWIDTH LATIN CAPITAL LETTER A (U+FF41) is not present in the original String so the fact that it is not in ISO-8859-1 is irrelevant.
I know that transcoding bugs can occur when you convert characters to bytes that's why I said that ISO-8859-1 would only "map every character with a unicode value less than 256 to a byte with the same value"
The character in question is U+FF21 (FULLWIDTH LATIN CAPITAL LETTER A). The PHP form (\xEF\xBC\xA1) is a UTF-8 encoded octet sequence.
In order to decode this sequence to a Java String (which is always UTF-16), you would use the following code:
// \xEF\xBC\xA1
byte[] utf8 = { (byte) 0xEF, (byte) 0xBC, (byte) 0xA1 };
String utf16 = new String(utf8, Charset.forName("UTF-8"));
// print the char as hex
for(char ch : utf16.toCharArray()) {
System.out.format("%02x%n", (int) ch);
}
If you want to decode the data from a string literal you could use code of this form:
public static void main(String[] args) {
String utf16 = transformString("This is \\xEF\\xBC\\xA1 string");
for (char ch : utf16.toCharArray()) {
System.out.format("%s %02x%n", ch, (int) ch);
}
}
private static final Pattern SEQ
= Pattern.compile("(\\\\x\\p{Alnum}\\p{Alnum})+");
private static String transformString(String encoded) {
StringBuilder decoded = new StringBuilder();
Matcher matcher = SEQ.matcher(encoded);
int last = 0;
while (matcher.find()) {
decoded.append(encoded.substring(last, matcher.start()));
byte[] utf8 = toByteArray(encoded.substring(matcher.start(), matcher.end()));
decoded.append(new String(utf8, Charset.forName("UTF-8")));
last = matcher.end();
}
return decoded.append(encoded.substring(last, encoded.length())).toString();
}
private static byte[] toByteArray(String hexSequence) {
byte[] utf8 = new byte[hexSequence.length() / 4];
for (int i = 0; i < utf8.length; i++) {
int offset = i * 4;
String hex = hexSequence.substring(offset + 2, offset + 4);
utf8[i] = (byte) Integer.parseInt(hex, 16);
}
return utf8;
}