Illegal base64 character "a" using java.util.Base64 from within Scala - java

Suppose I have the following Base64 encoded String from a github API call to a file:
LyoKICogQ29weXJpZ2h0IDIwMTkgY29tLmdpdGh1Yi50aGVvcnlkdWRlcwog
KgogKiBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNp
b24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKICogeW91IG1heSBub3QgdXNlIHRo
aXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNl
LgogKiBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQK
ICoKICogICAgIGh0dHA6Ly93d3cuYXBhY2hlLm9yZy9saWNlbnNlcy9MSUNF
TlNFLTIuMAogKgogKiBVbmxlc3MgcmVxdWlyZWQgYnkgYXBwbGljYWJsZSBs
YXcgb3IgYWdyZWVkIHRvIGluIHdyaXRpbmcsIHNvZnR3YXJlCiAqIGRpc3Ry
aWJ1dGVkIHVuZGVyIHRoZSBMaWNlbnNlIGlzIGRpc3RyaWJ1dGVkIG9uIGFu
ICJBUyBJUyIgQkFTSVMsCiAqIFdJVEhPVVQgV0FSUkFOVElFUyBPUiBDT05E
SVRJT05TIE9GIEFOWSBLSU5ELCBlaXRoZXIgZXhwcmVzcyBvciBpbXBsaWVk
LgogKiBTZWUgdGhlIExpY2Vuc2UgZm9yIHRoZSBzcGVjaWZpYyBsYW5ndWFn
ZSBnb3Zlcm5pbmcgcGVybWlzc2lvbnMgYW5kCiAqIGxpbWl0YXRpb25zIHVu
ZGVyIHRoZSBMaWNlbnNlLgogKi8KCnBhY2thZ2UgY29tLmdpdGh1Yi50aGVv
cnlkdWRlcy5tb2RlbAoKaW1wb3J0IGNvbS5naXRodWIudGhlb3J5ZHVkZXMu
dXRpbC5LaXZ5UHJldHR5UHJpbnRlcgppbXBvcnQgb3JnLmJpdGJ1Y2tldC5p
bmt5dG9uaWsua2lhbWEuPT0+CmltcG9ydCBvcmcuYml0YnVja2V0Lmlua3l0
b25pay5raWFtYS5yZXdyaXRpbmcuUmV3cml0ZXIuXwppbXBvcnQgb3JnLmJp
dGJ1Y2tldC5pbmt5dG9uaWsua2lhbWEucmV3cml0aW5nLlN0cmF0ZWd5Cgov
KioKICogQmFzZSBUeXBlIGZvciBhbGwgbm9kZXMgb2YgYSBLaXZ5LUFTVAog
Ki8KdHJhaXQgQVNUTm9kZSBleHRlbmRzIEZvbGRhYmxlQVNUIHsgc2VsZiA9
PgogIC8qKgogICAqIFRyYXZlcnNlcyB0aGUgQVNUTm9kZSBhbmQgYXBwbGll
cyBTdHJhdGVneSBgc2Agb250byBgc2VsZmAgYW5kIGFsbCBjaGlsZHJlbiBv
ZiBzZWxmLgogICAqCiAgICogYHNgIGlzIGhlcmVieSBhcHBsaWVkIGJvdHRv
bSB1cCBpbiBsZWZ0IHRvIHJpZ2h0IG9yZGVyLgogICAqCiAgICogQHNlZSBb
W2h0dHBzOi8vYml0YnVja2V0Lm9yZy9pbmt5dG9uaWsva2lhbWEvc3JjLzAz
MjYzMGZhMjFkZGFkNWNmMzNjYmQ2ZWY5YzJmMDI3ODY2MWE2NzUvd2lraS9S
ZXdyaXRpbmcubWRdXQogICAqIEBwYXJhbSBzIHN0cmF0ZWd5IHRoYXQgaXMg
YXBwbGllZCB0byBgc2VsZmAgYW5kIGFsbCBjaGlsZHJlbi4KICAgKiBAcmV0
dXJuIGEgcmV3cml0dGVuIEFTVE5vZGUgYWNjb3JkaW5nIHRvIHRoZSBzdHJh
dGVneSBgc2AKICAgKi8KICBwcml2YXRlW3RoZW9yeWR1ZGVzXSBkZWYgdHJh
dmVyc2VBbmRBcHBseShzOlN0cmF0ZWd5KTpBU1ROb2RlCgogIC8qKgogICAq
IFJld3JpdGUgdGhlIEFTVE5vZGUgYHNlbGZgIGJ5IHRoZSBzcGVjaWZpY2F0
aW9uIG9mIGEgcGFydGlhbCBmdW5jdGlvbiBgZnBgLgogICAqCiAgICogSWYg
d2Ugd2FudCB0byBjaGFuZ2UgYSBzcGVjaWZpYyBbW21vZGVsLlB5dGhvbl1d
LW5vZGUgaW4gdGhlIEFTVCBmb3IgZXhhbXBsZSB3ZSBjb3VsZAogICAqIGFw
cGx5IHRoZSBmb2xsb3dpbmcgcmV3cml0ZSBzdHJhdGVneToKICAgKnt7ewog
ICAqICAgYXN0LnJld3JpdGUoewogICAqICAgIGNhc2UgUHl0aG9uKCJbMSwy
LDNdIikgPT4gUHl0aG9uKCJbMSwyLDMsNF0iKQogICAqICAgfSkKICAgKn19
fQogICAqCiAgICogUGxlYXNlIG5vdGUsIHRoYXQgQVNUTm9kZXMgY2FuIG5v
dCBiZSByZXdyaXR0ZW4gYXJiaXRyYXJpbHkuIFNpbmNlIGVhY2ggQVNUTm9k
ZSBpbXBsaWVzCiAgICogYSBzcGVjaWZpYyBwYXJhbWV0ZXIgbGlzdC4gQW4g
QVNUIGhhcyB0byBzdGF5IHN0cnVjdHVyZS1jb25zaXN0ZW50IGFmdGVyIGFw
cGx5aW5nIHJld3JpdGluZyBydWxlcy4KICAgKiBBIHJld3JpdGluZyBydWxl
IGFzOgogICAqIHt7ewogICAqICAgewogICAqICAgIGNhc2UgUHl0aG9uKHMp
ID0+IFRvcExldmVsKE5pbCkKICAgKiAgIH0KICAgKiB9fX0KICAgKiBpcyBu
b3QgdmFsaWQgYXMgYSBbW21vZGVsLlRvcExldmVsXV0tbm9kZSBjYW4gbm90
IG9jY3VyIGF0IHBvc2l0aW9ucyB3aGVyZSBhIFtbbW9kZWwuUHl0aG9uXV0t
bm9kZSBjYW4uCiAgICoKICAgKiBAc2VlIFtbaHR0cHM6Ly9iaXRidWNrZXQu
b3JnL2lua3l0b25pay9raWFtYS9zcmMvMDMyNjMwZmEyMWRkYWQ1Y2YzM2Ni
ZDZlZjljMmYwMjc4NjYxYTY3NS93aWtpL1Jld3JpdGluZy5tZF1dCiAgICog
QHBhcmFtIGZwIFBhcnRpYWwgZnVuY3Rpb24gdGhhdCBkZWZpbmVzIGhvdyB0
aGUgYXN0IHNob3VsZCBiZSByZXdyaXR0ZW4uCiAgICogQHJldHVybiBBIHJl
d3JpdHRlbiBBU1QgYWNjb3JkaW5nIHRvIHRoZSBzcGVjaWZpY2F0aW9uIGlu
IGBmcGAgb3IgdGhlIHNhbWUgYXN0IGlmIGBmcGAgY291bGQgbm90IGJlIGFw
cGxpZWQuCiAgICovCiAgZGVmIHJld3JpdGUoZnA6QVNUTm9kZSA9PT4gQVNU
Tm9kZSk6IEFTVE5vZGUgPSBzZWxmLnRyYXZlcnNlQW5kQXBwbHkocnVsZShm
cCkpCgogIC8qKgogICAqIFRyYW5zZm9ybXMgYHNlbGZgIGludG8gYSB3ZWxs
IGZvcm1hdHRlZCBraXZ5IHByb2dyYW0gdGhhdCBjYW4gYmUgd3JpdHRlbgog
ICAqIGludG8gYSBmaWxlLgogICAqCiAgICogVGhlIGZvbGxvd2luZyBBU1RO
b2RlIGZvciBleGFtcGxlOgogICAqIHt7ewogICAqICAgVG9wTGV2ZWwoCiAg
ICogICAgTGlzdCgKICAgKiAgICAgIFJvb3QoCiAgICogICAgICAgIFdpZGdl
dCgKICAgKiAgICAgICAgICBQbG90LAogICAqICAgICAgICAgIExpc3QoCiAg
ICogICAgICAgICAgICBXaWRnZXQoCiAgICogICAgICAgICAgICAgIExpbmVH
cmFwaCwKICAgKiAgICAgICAgICAgICAgTGlzdCgKICAgKiAgICAgICAgICAg
ICAgICBQcm9wZXJ0eShiYWNrZ3JvdW5kX25vcm1hbCxMaXN0KCcnKSksCiAg
ICogICAgICAgICAgICAgICAgUHJvcGVydHkoYmFja2dyb3VuZF9jb2xvcixM
aXN0KFswLDAsMCwxXSkpCiAgICogICApKSkpKSkpCiAgICogfX19CiAgICoK
ICAgKiBpcyBwcmludGVkOgogICAqIHt7ewogICAqIFBsb3Q6CiAgICogIExp
bmVHcmFwaDoKICAgKiAgICBiYWNrZ3JvdW5kX25vcm1hbDogJycKICAgKiAg
ICBiYWNrZ3JvdW5kX2NvbG9yOiBbMCwwLDAsMV0KICAgKiB9fX0KICAgKgog
ICAqIEByZXR1cm4gQSBmb3JtYXR0ZWQgQVNUTm9kZSB0aGF0IGNhbiBiZSBp
bnRlcnByZXRlZCBhcyBhIEtpdnkgZmlsZS4KICAgKi8KICBkZWYgcHJldHR5
OlN0cmluZyA9IEtpdnlQcmV0dHlQcmludGVyLmZvcm1hdChzZWxmKS5sYXlv
dXQKfQ==
As far as I see, this encoding is correct and only contains the standard alphabet of characters for a Base64 encoding. If I decode this encoding here, I get a correct translation. However, I tried various approaches to decode it programmatically and did not find a solution yet.
Let contentEncoded be the string containing the encoded file. I tried the following:
java.util.Base64.getDecoder.decode(contentEncoded)
java.util.Base64.getDecoder.decode(contentEncoded.getBytes)
java.util.Base64.getDecoder.decode(contentEncoded.getBytes(StandardCharsets.UTF_8))
java.util.Base64.getUrlDecoder.decode(contentEncoded))
java.util.Base64.getUrlDecoder.decode(contentEncoded.getBytes(StandardCharsets.UTF_8))
java.util.Base64.getMimeDecoder.decode(contentEncoded.replaceAll("\\n", "").replaceAll("\\r", ""))
However, all of them resulted in an error message: java.lang.IllegalArgumentException: Illegal base64 character a.
My question is: Am I not seeing something obvious? Are there some hidden control characters? Has anybody had similar issues and was able to fix them?

Just remove line breaks and it should work.
contentEncoded.replace("\n", "")

The following snippet decodes the encoding correctly:
val decodedWithMime = java.util.Base64.getMimeDecoder.decode(contentEncoded)
val convertedByteArray = decodedWithMime.map(_.toChar).mkString
as pointed out by comments, the error message Illegal Base64 character a corresponds to the hex value for the newline character \n. Using the Mime Decoder it is possible to decode the string without removing the newline characters beforehand.

Related

Java Encoding for "GB2312" CHARACTER ® replacing with question mark(?)

I'm trying to get encoded value using GB2312 characterset but I'm getting '? 'instead of '®'
Below is my sample code:
new String("Test ®".getBytes("GB2312"));
but I'm getting Test ? instead of Test ®.
Does any one faced this issue?
Java version- JDK6
Platform: Window 7
I'm not aware of Chinese character encoding so need suggestion.
For better understanding, the statement divided in two parts:
byte[] bytes = "Test ®".getBytes("GB2312"); // bytes, encoding the string to GB2312
new String(bytes); // back to string, using default encoding
Probably ® is not a valid GB2312 character, so it is converted to ?. See the result of
Charset.forName("GB2312").newEncoder().canEncode("®")
Based on documentation of getBytes:
The behavior of this method when this string cannot be encoded in the given charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.
which also suggest using CharsetEncoder.

Unexpected output from URLEncoder in Java

I am trying to encode a URL parameter.
For example when I am encoding
qOddENxeLxL+13drGKYUgA==\n
using URL Encoder tool
It gives the following output which works when I request API
qOddENxeLxL%2B13drGKYUgA%3D%3D%5Cn
But when I am encoding URL from my Java code (Android) using URLEncoder.encode("qOddENxeLxL+13drGKYUgA==\n", "UTF-8");
It gives me the following result
qOddENxeLxL%252B13drGKYUgA%253D%253D%250A
I tried using other Encoding schemes too but could not produce the same result.
The issue is because the \n is being interpreted as a new line character. Java will treat \ inside a string as starting an escape sequence.
You have to escape it in order to get the same thing as in the URL you provided.
System.out.println(URLEncoder.encode("qOddENxeLxL+13drGKYUgA==\\n", "UTF-8"));
This will provide the same result:
qOddENxeLxL%2B13drGKYUgA%3D%3D%5Cn
The issue is that you are feeding \n to the URLEncoder tool, which doesn't understand it as an escape sequence and so gives you %5Cn, and to the Java compiler inside a string literal, which does understand it and so gives you 0x0A.
Figured out the issue, here string was getting encoded two times.
While passing parameter to Retrofit call it is getting encoded automatically by retrofit and I was passing encoded parameter to retrofit so it got encoded again.
BTW thanks for the explanations. :)

Using Amazon AWS Cognito `.well-known/jwks.json` data fails to base64 decode some fields

When using Amazon AWS Cognito Federated Identities, and parsing the data at:
https://cognito-identity.amazonaws.com/.well-known/jwks_uri which looks like:
{"keys":[
{"kty":"RSA",
"alg":"RS512",
"use":"sig",
"kid":"ap-northeast-11",
"n":"AI7mc1assO5n6yB4b7jPCFgVLYPSnwt4qp2BhJVAmlXRntRZ5w4910oKNZDOr4fe/BWOI2Z7upUTE/ICXdqirEkjiPbBN/duVy5YcHsQ5+GrxQ/UbytNVN/NsFhdG8W31lsE4dnrGds5cSshLaohyU/aChgaIMbmtU0NSWQ+jwrW8q1PTvnThVQbpte59a0dAwLeOCfrx6kVvs0Y7fX7NXBbFxe8yL+JR3SMJvxBFuYC+/om5EIRIlRexjWpNu7gJnaFFwbxCBNwFHahcg5gdtSkCHJy8Gj78rsgrkEbgoHk29pk8jUzo/O/GuSDGw8qXb6w0R1+UsXPYACOXM8C8+E=",
"e":"AQAB"},
... }
This works fine decoding the n field using this code (Kotlin calling JDK 8 Base64 class):
Base64.getDecoder().decode(encodedN.toByteArray())
But when using Cognito User Pools which has data at a URL in the form of: https://cognito-idp.${REGION}.amazonaws.com/${POOLID}/.well-known/jwks.json
It has the same type of data, but it will not decode. Instead I end up with errors such as:
Illegal base64 character 5f
Since that is an underscore _ and in the Base64 URL alphabet, I tried changing my decoding to:
Base64.getUrlDecoder().decode(encodedN.toByteArray())
But then the first set of data no longer decodes correctly because it contains / and other invalid characters for Base64 URL encoding.
Is there a method that can handle both of these jwks sets of data with the same decoder?!?
Note: this question is intentionally written and answered by the author (Self-Answered Questions), so that solutions for interesting problems are shared in SO.
The issue is that the Amazon AWS Cognito team is using two different Base64 encoding alphabets for basically the same thing. So you will need to detect which is being used.
If the encoded string ends with = or contains + or / then it is definitely the normal Base64.getDecoder(). If it contains a - or _ then it is definitely the Base64.getUrlDecoder(). Otherwise nothing special is there and it is best to use the Base64.getUrlDecoder() because you do not know if the length would need padding or not.
This translates to (in Kotlin, but logically is applicable to any language):
fun base64SafeDecoder(encoded: String): ByteArray {
val decoder = if (encoded.endsWith('=') || encoded.any { it == '+' || it == '/' }) {
Base64.getDecoder()
}
else {
Base64.getUrlDecoder()
}
return decoder.decode(encoded.toByteArray())
}
This would be a problem for any language that has Base64 decoding in that they might be loose and ignore the invalid character (some do), or they might be strict and throw an exception. Some test websites for Base64 encoding/decoding exhibit both of these behaviors as well, and the silent ignoring of invalid characters is dangerous. You would then have an error later using the results of the decoding later.
You can try using the apache variant of the Base64 decode (org.apache.commons.codec.binary.Base64).
The decodeBase64(String base64String) method handles both base64 and base64 url safe encodings seamlessly. And the isBase64 method provides a check to detect if a string is encoded in either base64 or base64 url safe.

Android can't decode PHP base64 code

I'm trying to decode my Base64 code that encoded by my PHP server, It's work and will be decoded with PHP normally but when I trying to encode it through my Android phone I got the following error:
Base64DecoderException: encoded value has invalid trailing byte
My Base64 Code:
oLAwb6uSn2JXqAFTX+qJXaOawOYF3kDDK2HlCb7ItCeimVCsDE7OYH5OgsixKpIAM6KgkCktnB4HsLQtA5Ig1fQvDrRcct9dQi4m8wPpF7a3sFHSG29j2aItKeouflTtsSZgKWvSjg0gBBGM/7PlvkuK+8W4/GXS0QrqV1jcngWrspYmAdi0GiJbPm8b/zlscOIa1z1df11SuQH5+GiUzqZ4WDFOpoH0WWVW3KmbMQ2yifBmXnhn80qZct6KiN7aL8PHEczhNrRqAKfUuEwmsWOnEOyh7UOU6FcnW3VAo2BWd5dJRGgWb5Py09l0XmrdWdzin7klKtMqXOWQRcvEVT7PKtQxQotRpOa+2IQQirVfybyuMipY9YORuW1hqmc95Tdt1WHdIzVwEtq6NXx9AC5mSklbxrcOpINfS2RPFcK0UUMV2xQKAQ+u8PzTj/KBEmb04ObBbnX6y3uL1KT58lDecA9lIbNYuttlgRMzRdxFOvkk21wou2vtMBtIxk0XFJJGjazqqcxVeSxTvQ68wdNSkRmvteowkSq2Vi09CmOhToRHemFyZgKTxSBoNaFuVuYGVggEFIR9kHVrLxoK2Q==
Any ideas?
if the invalid trailing byte is a carriage return (\n), the easiest solution would be to leave out the closing PHP tag (on all of your PHP files):
<?php
//Code goes here
//Leave out the closing tag: ?>
Otherwise, you may be able to trim surrounding whitespace and carriage returns (in your Java code), before you base64 decode it.

Chinese character in URL with Java

I used the following line in Firefox's URL field :
http://www.baidu.com/s?wd=你
This line was generated by my Java program.
The last Chinese character in the URL field sometimes became: %C4%E3 [Correct]
Other times it became: %E4%BD%A0 [Incorrect]
I tried to use the URL with IE. It shows up still as 你, but the result page search field shows the character as 浣. Could this be a UTF-8 or UTF-16 encoding problem? How do I get the correct code %C4%E3 from the char 你 with my Java program?
URLEncoder.encode(string, encoding)

Categories

Resources