Unicode base 64 encoding with java

Unicode base 64 encoding with java - java

I am trying to encode and decode a UTF8 string to base64.
In theory not a problem but when decoding and never seem to output the correct characters but the ?.
String original = "خهعسيبنتا";
B64encoder benco = new B64encoder();
String enc = benco.encode(original);
try
{
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ara", original.getBytes());
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes());
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes());
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
The output to the console is as follow:
Original: خهعسيبنتا
ara = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
Encoded: Pz8/Pz8/Pz8/
enc = 50, 7A, 38, 2F, 50, 7A, 38, 2F, 50, 7A, 38, 2F
Decoded: ?????????
dec = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
prtHx simply writes the hex value of the bytes to the output.
Am I doing something obviously wrong here?
Andreas pointed to the correct solution by highlighting that the getBytes() method uses the platform default encoding (Cp1252) even though the source file itself is UTF-8. By using the getBytes("UTF-8") I was able to notice that the bytes encoded and decoded were actually different.
further investigation shown that the encode method used getBytes(). Changing this did the trick nicely.
try
{
String enc = benco.encode(original);
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ori", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
System encoding Cp1252
Original: خهعسيبنتا
ori = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
Encoded: 2K7Zh9i52LPZitio2YbYqtin
enc = 32, 4B, 37, 5A, 68, 39, 69, 35, 32, 4C, 50, 5A, 69, 74, 69, 6F, 32, 59, 62, 59, 71, 74, 69, 6E
Decoded: خهعسيبنتا
dec = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
Thanks.

String#getBytes() encodes the characters using the platform's default charset. The actual encoding of the String literal "خهعسيبنتا" is "defined" in the java source file (you choose a character encoding when you create or save the file)
This could be the reason, why ara is encode to 0x3f bytes..
Give this a try:
out.println("Original: " + original);
prtHx("ara", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));

Related

Decode Korean String from Bytes in Java

I ran into struggles converting a byte array korean chars in Java.
Wikipedia states that somehow 3 bytes are beeing used for each char, but not all bits are taken into account.
Is there a simple way of converting this very special...format? I don't want to write loops and counters keeping track of bits and bytes, as it would get messy and I can't imagine that there is no simple solution. A native java lib would be perfect, or maybe someone figured some smart bitshift logic out.
UPDATE 2:
A working solution has been posted by #DavidConrad below, I was wrong assuming it is UTF-8 encoded.
UPDATE:
These bytes
[91, -80, -8, -69, -25, 93, 32, -64, -78, -80, -18, -73, -50]
should output this:
[공사] 율곡로
But using
new String(shortStrBytes,"UTF8"); // or
new String(shortStrBytes,StandardCharsets.UTF_8);
turns them to this:
[����] �����
The returned string has 50% more chars

Since you added the bytes to the question, I have done a little research and some experimenting, and I believe that the text you have is encoded as EUC-KR. I got the expected Korean characters when interpreting them as that encoding.
// convert bytes to a Java String
byte[] data = {91, -80, -8, -69, -25, 93, 32, -64, -78, -80, -18, -73, -50};
String str = new String(data, "EUC-KR");
// now convert String to UTF-8 bytes
byte[] utf8 = str.getBytes(StandardCharsets.UTF_8);
System.out.println(HexFormat.ofDelimiter(" ").formatHex(utf8));
This prints the following hexadecimal values:
5b ea b3 b5 ec 82 ac 5d 20 ec 9c a8 ea b3 a1 eb a1 9c
Which is the proper UTF-8 encoding of those Korean characters and, with a terminal that supported them, printing the string should display them properly, too.

You should use StandardCharsets.UTF_8.
Converting from String to byte[] and vice versa:
import java.util.*;
import java.nio.charset.StandardCharsets;
public class Translater {
public static String translateBytesToString(byte[] b) {
return new String(b, StandardCharsets.UTF_8);
}
public static byte[] translateStringToBytes(String s) {
return s.getBytes(StandardCharsets.UTF_8);
}
public static void main(String[] args) {
final String STRING = "[공사] 율곡로";
final byte[] BYTES = {91, -22, -77, -75, -20, -126, -84, 93, 32, -20, -100, -88, -22, -77, -95, -21, -95, -100};
String s = translateBytesToString(BYTES);
byte[] b = translateStringToBytes(STRING);
System.out.println("String: " + translateBytesToString(BYTES));
System.out.print("Bytes: ");
for (int i=0; i<b.length; i++)
System.out.print(b[i] + " ");
}
}

Java Byte to String giving weird result?

I have this code:
private static c e;
private static byte[] f = { 55, -86, -102, 55, -23, 26, -83, 103, 125, -57, -110, -34, 70, 102, 48, -103 };
private String a;
private SecureRandom b;
private int c;
private byte[] d;
public c(String paramString, SecureRandom paramSecureRandom)
{
this.a = paramString;
this.b = paramSecureRandom;
}
public static c a()
{
if (e == null)
{
e = new c("AES/CBC/PKCS7Padding", new SecureRandom());
e.a(f, 16);
}
return e;
}
With f being the array of bytes and 16 to do with reading 16 bytes of the IV generated with SecureRandom(). (atleast I assume that's what it is doing?) However when I use this:
byte[] byteArray = { 55, -86, -102, 55, -23, 26, -83, 103, 125, -57, -110, -34, 70, 102, 48, -103 };
String value = new String(byteArray, "ISO-8859-1");
System.out.println(value);
I get this output: 7ª7ég}ÇÞFf0
I'm attempting to work out how this app i've got generates the encryptionkey used for encrypting/decrypting... that result above surely can't be anything right? Am I completely on the wrong track here?
I've included the full class code here incase it helps: http://pastie.org/private/5fhp9yqknzoansd1vc0xfg
Would really love to know what the code above is actually doing so I can port it to PHP, not too good # Java.
Thanks in advance.

Your output 7ª7ég}ÇÞFf0 makes sense to me.
You are using the character set: ISO-8859-1, and thus the bytes will be decoded to the characters they are mapped to in that character set.
Your byte array is created using base 10, and java bytes are signed. This means your byte array has the following hexadecimal values (in order):
37, AA, 9A, 37, E9, 1A, AD, 67, 7D, C7, 92, DE, 46, 66, 30, 99
According to the ISO-8859-1 character set, these values map to the following:
7, ª, (nil), 7, é, (nil), SHY, g, }, Ç, (nil), Þ, F, f, 0, (nil)
Which is pretty close to what your string is actually. The (nil) characters are not rendered in your string because the character set does not have glyphs for the corresponding values. And for the character SHY, I will assume again there is no glyph (while the standard indicates there actually should be).
Your output seems correct to me! :)
Remember, your encryption key is just a sequence of bytes. You shouldn't expect the data to be human-readable.

Why Java do not respect given array length?

I saw problem in this piece of code:
byte[] buf = new byte[6];
buf = "abcdef".getBytes();
System.out.println(buf.length);
Array was made for 6 bytes. If I get bytes from string with length 6 I will get much more bytes. So how will all these bytes get into this array? But this is working. Moreover, buf.length shows length of that array as it is array of chars not those bytes.
Afterwards, I realized that in
byte[] buf = new byte[6];
6 does not mean much, i.e. I can put there 0 or 1 or 2 or so on and code will work (with buf.length showing length of given string not array - what I see as second problem or discrepancy).
This question is different than Why does Java's String.getBytes() uses “ISO-8859-1” because it have one aspect more, at least: variables assignment oversight (getBytes() returns new array), i.e. it don't fully address my question.

That is not how variable assignments work
Thinking that assigning a 6 byte array to a variable will limit the length of any other arrays assigned to the same variable show a fundamental lack of comprehension on what variable are and how they work.
Really think about why you think assigning a variable to a fixed length array would limit the length of being assigned to another length array?
Strings are Unicode in Java
Strings in Java are Unicode and internally represented as UTF-16 which means they are 2 or 4 bytes per character in memory.
When they are converted to a byte array the number of bytes that represents the string is determined by what encoding is used when converting to the byte[].
Always specify an appropriate character encoding when converting Strings to arrays to get what you expect.
But even then UTF-8 would not guarantee single bytes per character, and ASCII would be not be able to represent non ASCII Unicode characters.
Character encoding is tricky
The ubiquitous internet encoding standard is UTF-8 it will correct in 99.9999999% of all cases, in those cases it isn't converting UTF-8 to the correct encoding is trivial because UTF-8 is so well supported in every toolchain.
Learn to make everything final and you will a lot easier time and less confusion.
import com.google.common.base.Charsets;
import javax.annotation.Nonnull;
import java.util.Arrays;
public class Scratch
{
public static void main(final String[] args)
{
printWithEncodings("Hello World!");
printWithEncodings("こんにちは世界!");
}
private static void printWithEncodings(#Nonnull final String s)
{
System.out.println("s = " + s);
final byte[] defaultEncoding = s.getBytes(); // never do this, you do not know what you will get!
// for ASCII characters the first three will all be the same single byte representations
final byte[] iso88591Encoding = s.getBytes(Charsets.ISO_8859_1);
final byte[] asciiEncoding = s.getBytes(Charsets.US_ASCII);
final byte[] utf8Encoding = s.getBytes(Charsets.UTF_8);
final byte[] utf16Encoding = s.getBytes(Charsets.UTF_16);
System.out.println("Arrays.toString(defaultEncoding) = " + Arrays.toString(defaultEncoding));
System.out.println("Arrays.toString(iso88591) = " + Arrays.toString(iso88591Encoding));
System.out.println("Arrays.toString(asciiEncoding) = " + Arrays.toString(asciiEncoding));
System.out.println("Arrays.toString(utf8Encoding) = " + Arrays.toString(utf8Encoding));
System.out.println("Arrays.toString(utf16Encoding) = " + Arrays.toString(utf16Encoding));
}
}
results in
s = Hello World!
Arrays.toString(defaultEncoding) = [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
Arrays.toString(iso88591) = [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
Arrays.toString(asciiEncoding) = [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
Arrays.toString(utf8Encoding) = [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
Arrays.toString(utf16Encoding) = [-2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33]
s = こんにちは世界!
Arrays.toString(defaultEncoding) = [-29, -127, -109, -29, -126, -109, -29, -127, -85, -29, -127, -95, -29, -127, -81, -28, -72, -106, -25, -107, -116, 33]
Arrays.toString(iso88591) = [63, 63, 63, 63, 63, 63, 63, 33]
Arrays.toString(asciiEncoding) = [63, 63, 63, 63, 63, 63, 63, 33]
Arrays.toString(utf8Encoding) = [-29, -127, -109, -29, -126, -109, -29, -127, -85, -29, -127, -95, -29, -127, -81, -28, -72, -106, -25, -107, -116, 33]
Arrays.toString(utf16Encoding) = [-2, -1, 48, 83, 48, -109, 48, 107, 48, 97, 48, 111, 78, 22, 117, 76, 0, 33]
Always specify the Charset encoding!
.bytes(Charset) is always the correct way to convert a String to bytes. Use whatever encoding you need.
Internally supported encodings for JDK7

new byte[6]; has no effect whatsoever as the array reference buf is getting updated with reference of the array returned by "abcdef".getBytes();.

That's because String.getBytes() returns an entirely different array object which is then assigned to buf. You could have just as easily done this:
byte[] buf = "abcdef".getBytes();
System.out.println(buf.length);

LDAP query doesn't return correct data from Active Directory

I'm working on a tool to get user details from the AD and import them into another system. We were planning on using the objectSid as the unique identifier but I've found that for some reason, the objectSid in the LDAP result does not match what's in Active Directory. Most of the bytes are the same but there are some there are different and sometimes LDAP results have fewer bytes than there are in the AD.
objectSid from user in AD:
decimal: [ 1, 5, 0, 0, 0, 0, 0, 5, 21, 0, 0, 0, 35, 106, 222, 96, 236, 251, 239, 68, 32, 255, 234, 203, 122, 4, 0, 0]
hex: [01, 05, 00, 00, 00, 00, 00, 05, 15, 00, 00, 00, 23, 6A, DE, 60, EC, FB, EF, 44, 20, FF, EA, CB, 7A, 04, 00, 00]
objectSid for same user via LDAP result:
decimal: [ 1, 5, 0, 0, 0, 0, 0, 5, 21, 0, 0, 0, 35, 106, 63, 96, 63, 63, 63, 68, 32, 63, 63, 63, 122, 4, 0, 0]
hex: [01, 05, 00, 00, 00, 00, 00, 05, 15, 00, 00, 00, 23, 6A, 3F, 60, 3F, 3F, 3F, 44, 20, 3F, 3F, 3F, 7A, 04, 00, 00]
It almost seems as if any value over 128 comes back as 63/3F in the LDAP result. For another user, the LDAP result is missing 1 byte (the question marks):
hex from AD: [01 05 00 00 00 00 00 05 15 00 00 00 23 6A DE 60 EC FB EF 44 20 FF EA CB 88 04 00 00]
hex from LDAP: [01 05 00 00 00 00 00 05 15 00 00 00 23 6A 3F 60 3F 3F 3F 44 20 3F 3F 3F ?? 04 00 00]
Here's the main portion of the code I'm using to do these tests.
final String ldapADServer = "ldap://" + cmdLine.getOptionValue("ldap");
final String bindDN = cmdLine.getOptionValue("u");
final String bindCredential = cmdLine.getOptionValue("p");
final String baseCtxDN = cmdLine.getOptionValue("d");
final Hashtable<String, Object> env = new Hashtable<String, Object>();
env.put(Context.SECURITY_AUTHENTICATION, "simple");
env.put(Context.SECURITY_PRINCIPAL, bindDN);
env.put(Context.SECURITY_CREDENTIALS, bindCredential);
env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.ldap.LdapCtxFactory");
env.put(Context.PROVIDER_URL, ldapADServer);
env.put("com.sun.jndi.ldap.trace.ber", System.err);
final LdapContext ctx = new InitialLdapContext(env, null);
final String searchFilter = "(&(objectClass=user) (sAMAccountName=" + accountName + "))";
final SearchControls searchControls = new SearchControls();
searchControls.setSearchScope(SearchControls.SUBTREE_SCOPE);
final StringBuilder builder = new StringBuilder();
final NamingEnumeration<SearchResult> results = ctx.search(baseCtxDN, searchFilter, searchControls);
while (results != null && results.hasMoreElements()) {
final SearchResult result = results.nextElement();
builder.append(LdapHelper.getSearchResultDetails(result, ""));
}
logger.info("Search results: {}{}", StringUtils.NEW_LINE, builder.toString());
The LdapHelper simply loops through all attributes and returns them in a nicely formatted string. The objectGUID and objectSid are printed in hex format.
I was running the test using JRE 6 as well as JRE 7 with the same result. Our AD server is Window Server 2008 RC2 and I've tried to use both AD ports, 389 and 3268.
I'm going to look into other Java LDAP libraries now but I wanted to see if anyone else had run into these issues or does anyone know why this is and how to get around it? I.e. is there a way to get the proper values from AD?

I've now done the same using the UnboundID LDAP SDK and this works properly and returns the full and correct objectSid as well as objectGUID. So this seems to be a bug in the standard J2SE library?
Code to do that in case anyone is interested:
private static void unboundIdLdapSearch(final String ldapADServer, final String bindDN, final String bindCredential, final String baseCtxDN, final String userName) throws LDAPException, Exception {
final LDAPConnection connection = new LDAPConnection(ldapADServer.substring(0, ldapADServer.indexOf(':')),
Integer.parseInt(ldapADServer.substring(ldapADServer.indexOf(':') + 1)), bindDN, bindCredential);
findAccountByAccountName(connection, baseCtxDN, userName);
connection.close();
}
private static void findAccountByAccountName(final LDAPConnection connection, final String baseCtxDN, final String accountName) throws Exception {
final String searchFilter = "(&(objectClass=user)(sAMAccountName=" + accountName + "))";
logger.info("LDAP search filter: {}", searchFilter);
final SearchRequest request = new SearchRequest(baseCtxDN, SearchScope.SUB, searchFilter);
final com.unboundid.ldap.sdk.SearchResult result = connection.search(request);
final int numOfResults = result.getEntryCount();
final StringBuilder builder = new StringBuilder();
builder.append("Search returned with ").append(numOfResults).append(" results: ").append(StringUtils.NEW_LINE);
for (final SearchResultEntry entry : result.getSearchEntries()) {
builder.append(LdapHelper.getSearchResultDetails(entry, ""));
}
logger.info("Search results: {}{}", StringUtils.NEW_LINE, builder.toString());
}
In addition, I happened to stumble across why the JNDI LDAP method didn't work properly for objectSid and objectGUID and got it working in addition to my UnboundID solution.
First of all, I realized that when I used the UnboundID method of 'getValue' which returns a string, it also returned the same values the J2SE JNDI version did which is when I figured out that this does a String conversion to UTF-8 of the imported value.
I then happened to come across another blog post (http://www.jroller.com/eyallupu/entry/java_jndi_how_to_convert) as well as this page: http://docs.oracle.com/javase/jndi/tutorial/ldap/misc/attrs.html . So all that's needed in order to get the objectSid and objectGUID properly is to add them to the list of binary attributes by adding a space separated list of attribute names to the map for the LDAP context:
env.put("java.naming.ldap.attributes.binary", "objectSid objectGUID");

Coldfusion and java encryption functions

while trying to translate token generator for uservoice from java to coldfusion, I noticed the the hash function in java does the one in coldfusion :
String salted = "63bfb29835aedc55aae944e7cc9a202dmbdevsite";
byte[] hash = DigestUtils.sha(salted);
gives = [-19, -18, 7, 92, -121, 13, 88, 68, -84, 61, -77, -20, -85, -102, -102, -62, -70, 45, -16, 18]
<cfset Salted="63bfb29835aedc55aae944e7cc9a202dmbdevsite" />
<cfset hash=Hash(Salted,"SHA") />
<cfset arrBytes = hash.GetBytes() />
gives = 69686969485553675655486853565252656751686651696765665765576567506665506870484950
Can anyone explain this ?
Thanks

You are actually getting the same result, however the outputs are encoded differently. For Java it's a byte array, and it's important to note that byte is signed. For ColdFusion you're getting hex that for some reason is outputted in decimal format for each hex character. If you look at http://asciitable.com/ and map the decimal numbers to their characters (e.g. 69 to E, 68 to D, 48 to 0), you get:
EDEE075C870D5844AC3DB3ECAB9A9AC2BA2DF012
Hashed results are often stored as hex. If you encode the Java version into hex, you'll get the same:
byte[] bytes = { -19, -18, 7, 92, -121, 13, 88, 68, -84, 61, -77, -20,
-85, -102, -102, -62, -70, 45, -16, 18 };
StringBuilder sb = new StringBuilder(2 * hash.length);
for (byte b : hash) {
sb.append("0123456789ABCDEF".charAt((b & 0xF0) >> 4));
sb.append("0123456789ABCDEF".charAt((b & 0x0F)));
}
String hex = sb.toString();
System.out.println(hex);

You can use BinaryDecode to get the same byte array as the Java Hash.
<cfset Salted="63bfb29835aedc55aae944e7cc9a202dmbdevsite" />
<cfset hash = Hash(Salted,"SHA") />
<cfset arrBytes = BinaryDecode(hash, "hex") />

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Unicode base 64 encoding with java - java

Related

Decode Korean String from Bytes in Java

Java Byte to String giving weird result?

Why Java do not respect given array length?

LDAP query doesn't return correct data from Active Directory

Coldfusion and java encryption functions

Categories

Resources