MalformatedInputException with Utility.convertByteArrayToCharArray and MessageDigest

MalformatedInputException with Utility.convertByteArrayToCharArray and MessageDigest - java

I'm trying to create a glassfish custom JDBCRealm and during some test on it, I got a MalformatedInputException when I using the com.sun.enterprise.util.Utility.convertByteArrayToCharArray function.
So I decided to externalize the part of my function which throw this error to test it and understand where it comming.
Resumed function:
public void justATestFunction()
throws Exception
{
final char[] password = "myP4ssW0rd42".toCharArray();
MessageDigest md = MessageDigest.getInstance("SHA-256");
// according to the Utility doc, if the Charset parameter is null or empty,
// it will call the Charset.defaultCharset() function to define the charset to use
byte[] hashedPassword= Utility.convertCharArrayToByteArray(password, null);
hashedPassword = md.digest(hashedPassword);
Utility.convertByteArrayToCharArray(hashedPassword, null); // throw a MalformatedInputException
}
Thank you in advance for your answer.

Let's look at each step:
byte[] hashedPassword= Utility.convertCharArrayToByteArray(password, null);
The above converts the Unicode-16 characters to a byte-array using the default endcoding, probably WIN-1252 or UTF-8. Since the password contains nothing outside standard 7-bit ASCII, the result is the same for either encoding.
hashedPassword = md.digest(hashedPassword);
hashedPassword now refers to a completely different byte array containing the BINARY digest of the original password. This is a BINARY string and no longer represents anything in any character encoding. It is pure binary data.
Utility.convertByteArrayToCharArray(hashedPassword, null);
Now you attempt to "decode" the binary string as if it were encoded with the default character set, which will undoubtedly throw an exception.
I suspect you really wanted to display either the hexadecimal representation of the digest, or maybe the base-64 version. In either case, what you have done will never work.
Since you haven't explained what you want to accomplish, this is the best anyone can do.

Related

How to detect encoding mismatch

I have a bunch of old AES-encrypted Strings encrypted roughly like this:
String is converted to bytes with ISO-8859-1 encoding
Bytes are encrypted with AES
Result is converted to BASE64 encoded char array
Now I would like to change the encoding to UTF8 for new values (eg. '€' does not work with ISO-8859-1). This will of
course cause problems if I try to decrypt the old ISO-8859-1 encoded values with UTF-8 encoding:
org.junit.ComparisonFailure: expected:<!#[¤%&/()=?^*ÄÖÖÅ_:;>½§#${[]}<|'äöå-.,+´¨]'-Lorem ipsum dolor ...> but was:<!#[�%&/()=?^*����_:;>��#${[]}<|'���-.,+��]'-Lorem ipsum dolor ...>
I'm thinking of creating some automatic encoding fallback for this.
So the main question would be that is it enough to inspect the decrypted char array for '�' characters to figure out encoding mismatch? And what is the 'correct' way to declare that '�' symbol when comparing?
if (new String(utf8decryptedCharArray).contains("�")) {
// Revert to doing the decrypting with ISO-8859-1
decryptAsISO...
}

When decrypting, you get back the original byte sequence (result of your step 1), and then you can only guess whether these bytes denote characters according to the ISO-8859-1 or the UTF-8 encoding.
From a byte sequence, there's no way to clearly tell how it is to be interpreted.
A few ideas:
You could migrate all the old encrypted strings (decrypt, decode to string using ISO-8859-1, encode to byte array using UTF-8, encrypt). Then the problem is solved once and forever.
You could try to decode the byte array in both versions, see if one version is illegal, or if both versions are equal, and if it still is ambiguous, take the one with higher probability according to expected characters. I wouldn't recommend to go that way, as it needs a lot of work and still there's some probability of error.
For the new entries, you could prepend the string / byte sequence by some marker that doesn't appear in ISO-8859-1 text. E.g. some people follow the convention to prepend a Byte Order Marker at the beginning of UTF-8 encoded files. Although the resulting bytes (EF BB BF) aren't strictly illegal in ISO-8859-1 (being read as ï»¿), they are highly unlikely. Then, when your decrypted bytes start with EF BB BF, decode to string using UTF-8, otherwise using ISO-8859-1. Still, there's a non-zero probability of error.
If ever possible, I'd go for migrating the existing entries. Otherwise, you'll have to carry on with "old-format compatibility stuff" in your code base forever, and still can't absolutely guarantee correct behaviour.

When decoding bytes to text, don't rely on the � character to detect malformed input. Use a strict decoder. Here is a helper method for that:
static String decodeStrict(byte[] bytes, Charset charset) throws CharacterCodingException {
return charset.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT)
.decode(ByteBuffer.wrap(bytes))
.toString();
}
Here is the corresponding strict encoder helper method, in case you need it:
static byte[] encodeStrict(String str, Charset charset) throws CharacterCodingException {
ByteBuffer buf = charset.newEncoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT)
.encode(CharBuffer.wrap(str));
byte[] bytes = buf.array();
if (bytes.length == buf.limit())
return bytes;
return Arrays.copyOfRange(bytes, 0, buf.limit());
}
Since ISO-8859-1 allows all bytes, you can't use it to detect malformed input. UTF-8 is however validating, so it is very likely to detect malformed input. It is however not 100% guaranteed, but it's the best we get do.
So, try decoding using strict UTF-8, and then fall back to ISO-8859-1 if it fails:
static String decode(byte[] bytes) {
try {
return decodeStrict(bytes, StandardCharsets.UTF_8);
} catch (CharacterCodingException e) {
return new String(bytes, StandardCharsets.ISO_8859_1);
}
}
Test
System.out.println(decode("señor".getBytes(StandardCharsets.ISO_8859_1))); // prints: señor
System.out.println(decode("señor".getBytes(StandardCharsets.UTF_8))); // prints: señor
System.out.println(decode("€100".getBytes(StandardCharsets.UTF_8))); // prints: €100

How to retrieve actual secret key from KeyStore in java?

I'm not well aware with Java KeyStore. What I want to do is to have an encrypted structure to store my keys.
I've multiple clusters and there exists a key associated with every cluster & now I want to store those keys securely such that they are all encrypted using single main key (for an instance, 'loginid')
I wandered alot in search of this issue and somewhere on stackoverflow itself someone suggested about Java keyStore to store SecretKey (Symmetric Encryption). I read its documentation & found it perfect as per my requirements but couldn't understand its implementation properly.
Here is a code snippet I'm working on -
public class Prac {
public static void main(String[] args) throws KeyStoreException, FileNotFoundException, IOException, NoSuchAlgorithmException, CertificateException, UnrecoverableKeyException, UnrecoverableEntryException {
KeyStore ks = KeyStore.getInstance("JCEKS");
char[] ksPwd = "yashkaranje98".toCharArray();
ks.load(null, ksPwd);
KeyStore.ProtectionParameter protParam = new KeyStore.PasswordProtection(ksPwd);
javax.crypto.SecretKey mySecretKey = new SecretKeySpec("_anky!#ubn#$0e41".getBytes(),"AES");
KeyStore.SecretKeyEntry skEntry = new KeyStore.SecretKeyEntry(mySecretKey);
ks.setEntry("cluster1", skEntry, protParam);
java.io.FileOutputStream fos = null;
try {
fos = new java.io.FileOutputStream("keystore.ks");
ks.store(fos, ksPwd);
} finally {
if (fos != null) {
fos.close();
}
}
java.io.FileInputStream fis = null;
try {
ks.load(new FileInputStream("keystore.ks"), ksPwd);
} finally {
if (fis != null) {
fis.close();
}
}
SecretKey key = (SecretKey)ks.getKey("cluster1", ksPwd);
String encodedKey = Base64.getEncoder().encodeToString(key.getEncoded());
System.out.println(encodedKey);
}
}
Alias: "cluster1"
Key to store: _anky!#ubn#$0e41
Protection Parameter: yashkaranje98
It prints: X2Fua3khQHVibiMkMGU0MQ==
What I expect is key itsef: _anky!#ubn#$0e41
Kindly please let me know what I'm missing...but before please tell me what I'm expecting is it even legit? or does it make sense?
(I am still learning about this KeyStore concept so there might be some silly mistakes.)

A secret AES key consists of random bytes. Such a key should not be printed directly, because the bytes may not represent valid characters or they may present control characters that don't print on screen. If you'd copy them then you might miss data. If you print them in the wrong terminal you may send terminal control codes.
Because of this you need to print out key values as hexadecimals or base 64. Normally for symmetric keys hex is preferred as it is easy to see the contents and size from the hex (the size in bytes is half that of the hex size, the size in bits is 4 times the hex size as each hex digit represents a 4 bit nibble). However, as Java still lacks a good Hex encoder, base 64 is also a good option.
Of course, in that case, to compare, you should also decode it from base 64 before you insert it into the key store.
Also beware that you don't specify the character encoding when you call getBytes on the string. If you would use higher valued characters then you may get different results on various systems, as getBytes without argument assumes the platform encoding. Specifying StandardCharsets.UTF_8 usually makes more sense.
Of course, as keys should contain random bytes, the getBytes method needs to go entirely, but you should keep this in mind anyway.
When I look at the code it seems you've missed the last 10 years of Java progress. No var, no null avoidance, missing imports, and no try-with-resources. That's a shame, because those would make your code a lot more readable. It's valid, mind you, but yeah...

Charset bug in AES Decryption on Android system

Good evening!
In my android app the smartphones load a AES encrypted String from my server and store it in a variable. After that process the variable and a key are pass to a method which decrypt the string. My mistake is that german umlauts (ä, ü, ö) aren't correct decoded. All umlauts displayed as question marks with black background...
My Code:
public static String decrypt(String input, String key) {
byte[] output = null;
String newString = "";
try {
SecretKeySpec skey = new SecretKeySpec(key.getBytes(), "AES");
Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
cipher.init(Cipher.DECRYPT_MODE, skey);
output = cipher.doFinal(Base64.decode(input, Base64.DEFAULT));
newString = new String(output);
} catch(Exception e) {}
return newString;
}
The code works perfectly - only umlauts displayed not correctly, an example is that (should be "ö-ä-ü"):
How can I set the encoding of the decrypted String? In my iOS app I use ASCII to encoding the decoded downloaded String. That works perfectly! Android and iOS get the String from the same Server on the same way - so I think the problem is the local Code above.
I hope you can help me with my problem... Thanks!

There is no text but encoded text.
It seems like you are guessing at the character set and encoding—That's no way to communicate.
To recover the text, you need to reverse the original process applied to it with the parameters associated with each step.
For explanation, assume that the server is taking text from a Java String and sending it to you securely.
String uses the Unicode character set (specifically, Unicode's UTF-16 encoding).
Get the bytes for the String, using some specific encoding, say ISO8859-1. (UTF-8 could be better because it is also an encoding for the Unicode character set, whereas ISO8859-1 has a lot fewer characters.) As #Andy points out, exceptions are your friends here.
Encrypt the bytes with a specific key. The key is a sequence of bytes, so, if you are generating this from a string, you have to use a specific encoding.
Encode the encrypted bytes with Base64, producing a Java String (again, UTF-16) with a subset of characters so reduced that it can be re-encoded in just about any character encoding and placed in just about any context such as SMTP, XML, or HTML without being misinterpreted or making it invalid.
Transmit the string using a specific encoding. An HTTP header and/or HTML charset value is usually used to communicate which encoding.
To receive the text, you have to get:
the bytes,
the encoding from step 5,
the key from step 3,
the encoding from step 3 and
the encoding from step 2.
Then you can reverse all of the steps. Per your comments, you discovered you weren't using the encoding from step 2. You also need to use the encoding from step 3.

Converting string to byte[] returns wrong value (encoding?)

I read a byte[] from a file and convert it to a String:
byte[] bytesFromFile = Files.readAllBytes(...);
String stringFromFile = new String(bytesFromFile, "UTF-8");
I want to compare this to another byte[] I get from a web service:
String stringFromWebService = webService.getMyByteString();
byte[] bytesFromWebService = stringFromWebService.getBytes("UTF-8");
So I read a byte[] from a file and convert it to a String and I get a String from my web service and convert it to a byte[]. Then I do the following tests:
// works!
org.junit.Assert.assertEquals(stringFromFile, stringFromWebService);
// fails!
org.junit.Assert.assertArrayEquals(bytesFromFile, bytesFromWebService);
Why does the second assertion fail?

Other answers have covered the likely fact that the file is not UTF-8 encoded giving rise to the symptoms described.
However, I think the most interesting aspect of this is not that the byte[] assert fails, but that the assert that the string values are the same passes. I'm not 100% sure why this is, but I think the following trawl through the source code might give us the answer:
Looking at how new String(bytesFromFile, "UTF-8"); works - we see that the constructor calls through to StringCoding.decode()
This in turn, if supplied with tht UTF-8 character set, calls through to StringDecoder.decode()
This calls through to CharsetDecoder.decode() which decides what to do if the character is unmappable (which I guess will be the case if a non-UTF-8 character is presented)
In this case it uses an action defined by
private CodingErrorAction unmappableCharacterAction
= CodingErrorAction.REPORT;
Which means that it still reports the character it has decoded, even though it's technically unmappable.
I think this means that even when the code gets an umappable character, it substitutes its best guess - so I'm guessing that its best guess is correct and hence the String representations are the same under comparison, but the byte[] are no longer the same.
This hypothesis is kind of supported by the fact that the catch block for CharacterCodingException in StringCoding.decode() says:
} catch (CharacterCodingException x) {
// Substitution is always enabled,
// so this shouldn't happen

I don't understand it fully, but here's what I get so fare:
The problem is that the data contains some bytes which are not valid UTF-8 bytes as I know by the following check:
// returns false for my data!
public static boolean isValidUTF8(byte[] input) {
CharsetDecoder cs = Charset.forName("UTF-8").newDecoder();
try {
cs.decode(ByteBuffer.wrap(input));
return true;
}
catch(CharacterCodingException e){
return false;
}
}
When I change the encoding to ISO-8859-1 everything works fine. The strange thing (which a don't understand yet) is why my conversion (new String(bytesFromFile, "UTF-8");) doesn't throw any exception (like my isValidUTF8 method), although the data is not valid UTF-8.
However, I think I will go another and encode my byte[] in a Base64 string as I don't want more trouble with encoding.

The real problem in your code is that you don't know what the real file encoding.
When you read the string from the web service you get a sequence of chars; when you convert the string from chars to bytes the conversion is made right because you specify how to transform char in bytes with a specific encoding ("UFT-8"). when you read a text file you face a different problem. You have a sequence of bytes that needs to be converted to chars. In order to do it properly you must know how the chars where converted to bytes i.e. what is the file encoding. For files (unless specified) it's a platform constants; on windows the file are encoded in win1252 (which is very close to ISO-8859-1); on linux/unix it depends, I think UTF8 is the default.
By the way the web service call did a decond operation under the hood; the http call use an header taht defins how chars are encoded, i.e. how to read the bytes form the socket and transform then to chars. So calling a SOAP web service gives you back an xml (which can be marshalled into a Java object) with all the encoding operations done properly.
So if you must read chars from a File you must face the encoding issue; you can use BASE64 as you stated but you lose one of the main benefits of text files: the are human readable, easing debugging and developing.

MessageDigest hashes differently on different machines

I'm having a problem with MessageDigest returning different hash values on different computers.
One computer is running 32-bit Java on Windows Vista and the other is running 64-bit Java on Mac OS. I'm not sure if it is because MessageDigest is machine dependent, or I need to explicitly specify a character encoding somewhere, or perhaps something else. Here's the
code:
public static boolean authenticate(String salt, String encryptedPassword,
char[] plainTextPassword ) throws NoSuchAlgorithmException {
// do I need to explcitly specify character encoding here? -->
String saltPlusPlainTextPassword = salt + new String(plainTextPassword);
MessageDigest sha = MessageDigest.getInstance("SHA-512");
// is this machine dependent? -->
sha.update(saltPlusPlainTextPassword.getBytes());
byte[] hashedByteArray = sha.digest();
// or... perhaps theres a translation problem here? -->
String hashed = new String(hashedByteArray);
return hashed.equals(encryptedPassword);
}
Should this code execute differently on these two different machines?
If it is machine dependent the way I've written it, is there another way hash these passwords that is more portable? Thanks!
Edit:::::
This is the code I'm using to generate the salts:
public static String getSalt() {
int size = 16;
byte[] bytes = new byte[size];
new Random().nextBytes(bytes);
return org.apache.commons.codec.binary.Base64.encodeBase64URLSafeString(bytes);
}
Solution:::
Thanks to the accepted solution, I was able to fix my code:
public static boolean authenticate_(String salt, String encryptedPassword,
char[] plainTextPassword ) throws NoSuchAlgorithmException, UnsupportedEncodingException {
// This was ok
String saltPlusPlainTextPassword = salt + new String(plainTextPassword);
MessageDigest sha = MessageDigest.getInstance("SHA-512");
// must specify "UTF-8" encoding
sha.update(saltPlusPlainTextPassword.getBytes("UTF-8"));
byte[] hashedByteArray = sha.digest();
// Use Base64 encoding here -->
String hashed = org.apache.commons.codec.binary.Base64.encodeBase64URLSafeString(hashedByteArray);
return hashed.equals(encryptedPassword);
}

Encodings are causing you problems. First here:
saltPlusPlainTextPassword.getBytes()
That will use the default encoding for the machine. Bad idea. Specify "UTF-8" as a simple solution. (It's guaranteed to be present.)
Next this causes issues:
String hashed = new String(hashedByteArray);
hashedByteArray is arbitrary binary data. To safely convert it to text, either use a base-64 encoding or just hex. Again, you're currently using the default encoding, which will vary from machine to machine. There are loads of 3rd party libraries for base64 encoding in Java.

Likely Jon Skeet's solution above is the cause, and his recommendations should definitely be taken into account, but another possible cause is a misunderstanding of salt.
Salt is a semi-secret random value that is applied to a String prior to hashing. This makes it harder to perform a brute force attack when trying to guess what an originating String was because the salt is presumably unknown to the attacker.
Salt values generally differ installation to installation. Its possible that the actual cause is just that you have the salt values set differently on the different machines.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.