NSData bytes not matching bytes from Java WS with same base64 string

NSData bytes not matching bytes from Java WS with same base64 string - java

I am using protocol buffers in an iOS application. The app consumes a web service written in Java, which spits back a base64 encoded string.
The base64 string is the same on both ends.
In the app however, whenever I try to convert the string to NSData, the number of bytes may or may not be the same on both ends. The result is a possible invalid protocol buffer exception, invalid end tag.
For example:
Source(bytes) | NSData | Diff
93 93 0
6739 6735 -4
5745 5739 -6
The bytes are equal in the trivial case of an empty protocol buffer.
Here is the Java source:
import org.apache.commons.codec.binary.Base64;
....
public static String bytesToBase64(byte[] bytes) {
return Base64.encodeBase64String(bytes);
}
On the iOS side, I have tried various algorithms from similar questions which all agree in byte size and content.
What could be causing this?

On closer inspection, the issue was my assumption that Base64 is Base64. I was using the url variant in the web service while the app's decode was expecting a normal version.
I noticed underscores in the Base64, which I thought odd.
The Base64 page http://en.wikipedia.org/wiki/Base64 map of value/char shows no underscores, but later in the article goes over variants, which do use underscores.

Related

Using Amazon AWS Cognito `.well-known/jwks.json` data fails to base64 decode some fields

When using Amazon AWS Cognito Federated Identities, and parsing the data at:
https://cognito-identity.amazonaws.com/.well-known/jwks_uri which looks like:
{"keys":[
{"kty":"RSA",
"alg":"RS512",
"use":"sig",
"kid":"ap-northeast-11",
"n":"AI7mc1assO5n6yB4b7jPCFgVLYPSnwt4qp2BhJVAmlXRntRZ5w4910oKNZDOr4fe/BWOI2Z7upUTE/ICXdqirEkjiPbBN/duVy5YcHsQ5+GrxQ/UbytNVN/NsFhdG8W31lsE4dnrGds5cSshLaohyU/aChgaIMbmtU0NSWQ+jwrW8q1PTvnThVQbpte59a0dAwLeOCfrx6kVvs0Y7fX7NXBbFxe8yL+JR3SMJvxBFuYC+/om5EIRIlRexjWpNu7gJnaFFwbxCBNwFHahcg5gdtSkCHJy8Gj78rsgrkEbgoHk29pk8jUzo/O/GuSDGw8qXb6w0R1+UsXPYACOXM8C8+E=",
"e":"AQAB"},
... }
This works fine decoding the n field using this code (Kotlin calling JDK 8 Base64 class):
Base64.getDecoder().decode(encodedN.toByteArray())
But when using Cognito User Pools which has data at a URL in the form of: https://cognito-idp.${REGION}.amazonaws.com/${POOLID}/.well-known/jwks.json
It has the same type of data, but it will not decode. Instead I end up with errors such as:
Illegal base64 character 5f
Since that is an underscore _ and in the Base64 URL alphabet, I tried changing my decoding to:
Base64.getUrlDecoder().decode(encodedN.toByteArray())
But then the first set of data no longer decodes correctly because it contains / and other invalid characters for Base64 URL encoding.
Is there a method that can handle both of these jwks sets of data with the same decoder?!?
Note: this question is intentionally written and answered by the author (Self-Answered Questions), so that solutions for interesting problems are shared in SO.

The issue is that the Amazon AWS Cognito team is using two different Base64 encoding alphabets for basically the same thing. So you will need to detect which is being used.
If the encoded string ends with = or contains + or / then it is definitely the normal Base64.getDecoder(). If it contains a - or _ then it is definitely the Base64.getUrlDecoder(). Otherwise nothing special is there and it is best to use the Base64.getUrlDecoder() because you do not know if the length would need padding or not.
This translates to (in Kotlin, but logically is applicable to any language):
fun base64SafeDecoder(encoded: String): ByteArray {
val decoder = if (encoded.endsWith('=') || encoded.any { it == '+' || it == '/' }) {
Base64.getDecoder()
}
else {
Base64.getUrlDecoder()
}
return decoder.decode(encoded.toByteArray())
}
This would be a problem for any language that has Base64 decoding in that they might be loose and ignore the invalid character (some do), or they might be strict and throw an exception. Some test websites for Base64 encoding/decoding exhibit both of these behaviors as well, and the silent ignoring of invalid characters is dangerous. You would then have an error later using the results of the decoding later.

You can try using the apache variant of the Base64 decode (org.apache.commons.codec.binary.Base64).
The decodeBase64(String base64String) method handles both base64 and base64 url safe encodings seamlessly. And the isBase64 method provides a check to detect if a string is encoded in either base64 or base64 url safe.

Charset bug in AES Decryption on Android system

Good evening!
In my android app the smartphones load a AES encrypted String from my server and store it in a variable. After that process the variable and a key are pass to a method which decrypt the string. My mistake is that german umlauts (ä, ü, ö) aren't correct decoded. All umlauts displayed as question marks with black background...
My Code:
public static String decrypt(String input, String key) {
byte[] output = null;
String newString = "";
try {
SecretKeySpec skey = new SecretKeySpec(key.getBytes(), "AES");
Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
cipher.init(Cipher.DECRYPT_MODE, skey);
output = cipher.doFinal(Base64.decode(input, Base64.DEFAULT));
newString = new String(output);
} catch(Exception e) {}
return newString;
}
The code works perfectly - only umlauts displayed not correctly, an example is that (should be "ö-ä-ü"):
How can I set the encoding of the decrypted String? In my iOS app I use ASCII to encoding the decoded downloaded String. That works perfectly! Android and iOS get the String from the same Server on the same way - so I think the problem is the local Code above.
I hope you can help me with my problem... Thanks!

There is no text but encoded text.
It seems like you are guessing at the character set and encoding—That's no way to communicate.
To recover the text, you need to reverse the original process applied to it with the parameters associated with each step.
For explanation, assume that the server is taking text from a Java String and sending it to you securely.
String uses the Unicode character set (specifically, Unicode's UTF-16 encoding).
Get the bytes for the String, using some specific encoding, say ISO8859-1. (UTF-8 could be better because it is also an encoding for the Unicode character set, whereas ISO8859-1 has a lot fewer characters.) As #Andy points out, exceptions are your friends here.
Encrypt the bytes with a specific key. The key is a sequence of bytes, so, if you are generating this from a string, you have to use a specific encoding.
Encode the encrypted bytes with Base64, producing a Java String (again, UTF-16) with a subset of characters so reduced that it can be re-encoded in just about any character encoding and placed in just about any context such as SMTP, XML, or HTML without being misinterpreted or making it invalid.
Transmit the string using a specific encoding. An HTTP header and/or HTML charset value is usually used to communicate which encoding.
To receive the text, you have to get:
the bytes,
the encoding from step 5,
the key from step 3,
the encoding from step 3 and
the encoding from step 2.
Then you can reverse all of the steps. Per your comments, you discovered you weren't using the encoding from step 2. You also need to use the encoding from step 3.

Base64 encode gives different result on linux CentOS terminal and in Java

I am trying to generate some random password on Linux CentOS and store it in database as base64. Password is 'KQ3h3dEN' and when I convert it with 'echo KQ3h3dEN | base64' as a result I will get 'S1EzaDNkRU4K'.
I have function in java:
public static String encode64Base(String stringToEncode)
{
byte[] encodedBytes = Base64.getEncoder().encode(stringToEncode.getBytes());
String encodedString = new String(encodedBytes, "UTF-8");
return encodedString;
}
And result of encode64Base("KQ3h3dEN") is 'S1EzaDNkRU4='.
So, it is adding "K" instead of "=" in this example. How to ensure that I will always get same result when using base64 on linux and base64 encoding in java?
UPDATE: Updated question as I didn't noticed "K" at the end of linux encoded string. Also, here are few more examples:
'echo KQ3h3dENa | base64' => result='S1EzaDNkRU5hCg==', but it should be 'S1EzaDNkRU5h'
echo KQ3h3dENaa | base64' => result='S1EzaDNkRU5hYQo=', but it should be 'S1EzaDNkRU5hYQ=='

Found solution after few hours of experimenting. It seems like new line was added to the string I wanted to encode. Solution would be :
echo -n KQ3h3dEN | base64
Result will be the same as with java base64 encode.

Padding
The '==' sequence indicates that the last group contained only one byte, and '=' indicates that it contained two bytes.
In theory, the padding character is not needed for decoding, since the number of missing bytes can be calculated from the number of Base64 digits. In some implementations, the padding character is mandatory, while for others it is not used.
So it depends on tools and libraries you use. If base64 with padding is the same as without padding for them, there is no problem. As an insurance you can use on linux tool that generates base64 with padding.

Use withoutPadding() of Base64.Encoder class to get Base64.Encoder instance which encodes without adding any padding character at the end.
check the link :
https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html#withoutPadding

Why do I have to encode a utf-8 parameter String to iso-Latin and then decode as utf-8 to get Java utf-8 String?

I have a Java servlet that takes a parameter String (inputString) that may contain Greek letters from a web page marked up as utf-8. Before I send it to a database I have to convert it to a new String (utf8String) as follows:
String utf8String = new String(inputString.getBytes("8859_1"), "UTF-8");
This works, but, as I hope will be appreciated, I hate doing something I don't understand, even if it works.
From the method description in the Java doc the getBytes() method "Encodes this String into a sequence of bytes using the named charset, storing the result into a new byte array" i.e. I am encoding it in 8859_1 — isoLatin. And from the Constructor description "Constructs a new String by decoding the specified array of bytes using the specified charset" i.e. decodes the byte array to utf-8.
Can someone explain to me why this is necessary?

My question is based on a misconception regarding the character set used for the HTTP request. I had assumed that because I marked up the web page from which the request was sent as UTF-8 the request would be sent as UTF-8, and so the Greek characters in the parameter sent to the servlet would be read as a UTF-8 String (‘inputString’ in my line of code) by the HttpRequest.getParameter() method. This is not the case.
HTTP requests are sent as ISO-8859-1 (POST) or ASCII (GET), which are generally the same. This is part of the URI Syntax specification — thanks to Andreas for pointing me to http://wiki.apache.org/tomcat/FAQ/CharacterEncoding where this is explained.
I had also forgotten that the encoding of Greek letters such as α for the request is URL-encoding, which produces %CE%B1. The getParameter() handles this by decoding it as two ISO-8859-1 characters, %CE and %B1 — Î and ± (I checked this).
I now understand why this needs to be turned into a byte array and the bytes interpreted as UTF-8. 0xCE does not represent a one-byte character in UTF-8 and hence it is addressed with the next byte, 0xB1, to be interpretted as α. (Î is 0xC3 0x8E and ± is 0xC2 0xB1 in UTF-8.)

When decoding, could you not create a class with a decoder method that takes the bytes [] as a parameter and
return it as a string? here is an example that i have used before.
public class Decoder
{
public String decode(byte[] bytes)
{
//Turns the bytes array into a string
String decodedString = new String(bytes);
return decodedString;
}
}
Try use this instead of .getBytes(). hope this works.

Thales HSM - Windows cp 1252 succes / Linux UTF-8 Fail

I am working on TCP/IP Application with HSM Module Integration.
My JAVA code was working fine in Windows 32 bit/JRE 32 Bit/IBM Websphere 7,
When i upgrade to RedHat Linux-64 bit/JRE 64 bit/IBM webshere 8, If i sending below 127 length of the string was working fine, but more than 127 it was returning the response. Also I have done some encoding techniques, but facing the same pblm pls guide me .
If the commandLength = less than 127, working fine, but it was > than 127 [ UTF-8 encoding was failing ]
So for more than 127 i am using extended ascii, but it was not working in the [UTF-8]/working fine in windows-1252
//hsmMessage.insert(0, (char)commandLength);
char[] extended_ascii = new char[1];
byte cp437bytes[]= new byte[1];
cp437bytes[0] = (byte) commandLength;
extended_ascii = new String(cp437bytes).toCharArray(); //extended_ascii = new String(cp437bytes, "CP437").toCharArray();
hsmMessage.insert(0, extended_ascii);
Thanks

Never use String objects to hold arbitrary binary data - use byte arrays or wrappers thereof.
The reason is that when converting from a byte array to a String the given locale is used to convert the bytes into Character objects which will in many circumstances end up with the String not holding the exact bytes you think it should, especially for byte values >= 128.
I had a very similar problem many years ago in the RADIUS server I had written. It would work fine for the vast majority of passwords, but if a user password had a £ symbol in it the differences between US-ASCII and UK-ASCII caused the underlying byte value to get mangled, resulting in mis-calculated encrypted passwords, and failed logins.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.