Convert JSON Base64 string to String in Java

Convert JSON Base64 string to String in Java - java

I am trying to convert a protobuf stream to JSON object using the com.google.protobuf.util.JsonFormat class as below.
String jsonFormat = JsonFormat.printer().print(data);
As per the documentation https://developers.google.com/protocol-buffers/docs/proto3#json I am getting the bytes as Base64 string(example "hashedStaEthMac": "QDOMIxG+tTIRi7wlMA9yGtOoJ1g=",
). But I would like to get this a readable string(example "locAlgorithm": "ALGORITHM_ESTIMATION",
). Below is a sample output. is there a way to the JSON object asplain text or any work around to get the actual values.
{
"seq": "71811887",
"timestamp": 1488640438,
"op": "OP_UPDATE",
"topicSeq": "9023777",
"sourceId": "xxxxxxxx",
"location": {
"staEthMac": {
"addr": "xxxxxx"
},
"staLocationX": 1148.1763,
"staLocationY": 980.3377,
"errorLevel": 588,
"associated": false,
"campusId": "n5THo6IINuOSVZ/cTidNVA==",
"buildingId": "7hY/jVh9NRqqxF6gbqT7Jw==",
"floorId": "LV/ZiQRQMS2wwKiKTvYNBQ==",
"hashedStaEthMac": "xxxxxxxxxxx",
"locAlgorithm": "ALGORITHM_ESTIMATION",
"unit": "FEET"
}
}
Expected format is as below.
seq: 85264233
timestamp: 1488655098
op: OP_UPDATE
topic_seq: 10955622
source_id: 00505698749E
location {
sta_eth_mac {
addr: xx:xx:xx:xx:xx:xx
}
sta_location_x: 916.003
sta_location_y: 580.115
error_level: 854
associated: false
campus_id: 9F94C7A3A20836E392559FDC4E274D54
building_id: EE163F8D587D351AAAC45EA06EA4FB27
floor_id: 83144E609EEE3A64BBD22C536A76FF5A
hashed_sta_eth_mac:
loc_algorithm: ALGORITHM_ESTIMATION
unit: FEET
}

Not easily, because the actual values are binary, which is why they're Base64-encoded in the first place.
Try to decode one of these values:
$ echo -n 'n5THo6IINuOSVZ/cTidNVA==' | base64 -D
??ǣ6?U??N'MT
In order to get more readable values, you have to understand what the binary data actually is, and then decide what format you want to use to display it.
The field called staEthMac.addr is 6 bytes and is probably an Ethernet MAC address. It's usually displayed as xx:xx:xx:xx:xx:xx where xx are the hexadecimal values of each byte. So you could decode the Base64 strings into a byte[] and then call a function to convert each byte to hex and delimit them with ':'.
The fields campusId, buildingId, and floorId are 16 bytes (128 bits) and are probably UUIDs. UUIDs are usually displayed as xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx where each x is a hex digit (4 bits). So you could (again) convert the Base64 string to byte[] and then print the hex digits, optionally adding the dashes.
Not sure about sourceId and hashedStaEthMac, but you could just follow the pattern of converting to byte[] and printing as hex. Essentially you're just doing a conversion from base 64 to base 16. You'll wind up with something like this:
$ echo -n 'n5THo6IINuOSVZ/cTidNVA==' | base64 -D | xxd -p
9f94c7a3a20836e392559fdc4e274d54
A point that I'm not sure you are getting is that it's binary data. There is no "readable" version that makes sense like "ALGORITHM_ESTIMATION" does; the best you can do is encode the binary data using letters and numbers so you can at least pronounce it.
Base64 (which encodes binary using 64 different characters) is pronounceable "N five T H lowercase-O six ..." but it's not real friendly because letter case is significant and because it uses letters like O and I that look like numbers. Hex (which encodes binary using just 16 characters) is a little easier to read.

Related

String comparision in UTF8

I have a PHP script which is supposed to return an UTF-8 encoded string. However, in Java I can't seem to compare it with it's internal string in any way.
If I print "OK" and response, they appear the same in console. However, if I check equality
if ( "OK".equals(response) ) {
the result is false. I printed out both in binary, response is 11101111 10111011 10111111 01001111 01001011, the Java's String "OK" however is 01001111 01001011 which is cleary ASCII. I tried to convert it to UTF8 in a few ways, but no avail:
String result2 = new String("OK".getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8);
and
String result2 = new String("OK".getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
are both not working, still return ASCII codes for some reason.
byte[] result2 = "OK".getBytes(StandardCharsets.UTF_8); System.out.print(new String(result2));
While this also gives the correct "OK" result, in binary it still returns ASCII.
I've tried to change communication to numbers instead, but 1 still does not equal to 1, as Integer.parseInt(response) returns "1" is not a String error message, altough in every other aspect, it is recognised as a normal String.
I'm looking for a solution preferably where "OK" is converted to UTF-8 and not response to ASCII, since I need to communicate with a PHP script along with 2 databases, all set to UTF-8. Java is started with the switch -Dfile.encoding=UTF8 to ensure national characters are not broken.

in UTF-8 all characters with codes 127 or less are encoded by a single byte. Therefore "OK" in UTF-8 and ASCII is the same two bytes.
11101111 10111011 10111111 01001111 01001011 it is not just simple "OK" it is
0xEF, 0xBB, 0xBF, "OK"
where 0xEF, 0xBB, 0xBF are a BOM (Byte order mark)
It is symbols which are not displayed by editors but used to determine the encoding.
Probably those symbols appeared in you php script before <?php
You have to configure your editor to remove BOM from the file
UPD
If it is not possible to alter the php script, you can use a workaround:
// check if the first symbol of the response is BOM
if (!response.isEmpty() && (response.charAt(0) == 0xFEFF)) {
// removing the first symbol
response = response.substring(1);
}

Java Base64 MIME decoding/encoding throws away delimiters

I have a Base64-encoded string that looks like "data:image/png;base64,iVBORw0K". I'm trying to decode it back to binary and later encode it again into Base64 using java.util.Base64. Strangely, after decoding and encoding again, I would lose the delimiters and get back "dataimage/pngbase64iVBORw0I=".
This is how I do the decoding and encoding (written in Scala, but you get the idea):
import java.util.Base64
val b64mime = "data:image/png;base64,iVBORw0K"
val decoder = Base64.getMimeDecoder
val encoder = Base64.getMimeEncoder
println(encoder.encodeToString(decoder.decode(b64mime)))
Here is an example: https://scalafiddle.io/sf/TJY7eeg/0
This also happens with javax.xml.bind.DatatypeConverter. What am I doing wrong? Is this the expected behavior?

The string you are trying to deal with looks like an example of a "data:" URL as specified in RFC 2397
The correct way to deal with one of these is parse it into its components, and then decode only the component that is base64 encoded. Here is the syntax
dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
mediatype := [ type "/" subtype ] *( ";" parameter )
data := *urlchar
parameter := attribute "=" value
So this says that everything up to the comma in your example is non-base64 data. You cannot simply treat the whole string as base64 because it contains characters that are not valid in any standard variant of the base64 encoding scheme.
This Q&A talks about RFC 2397 parsers in Java:
Any RFC 2397 Data URI Parser for Java?

Base64 doesnt have those characters in it. It looks like the decoder is ignoring those invalid characters.
# decoder.decode(";")
res10: Array[Byte] = Array()
However if you just decode the last part you get what you want.
# decoder.decode("iVBORw0K")
res9: Array[Byte] = Array(-119, 80, 78, 71, 13, 10)
# encoder.encodeToString(res9)
res12: String = "iVBORw0K"

Base64 encode gives different result on linux CentOS terminal and in Java

I am trying to generate some random password on Linux CentOS and store it in database as base64. Password is 'KQ3h3dEN' and when I convert it with 'echo KQ3h3dEN | base64' as a result I will get 'S1EzaDNkRU4K'.
I have function in java:
public static String encode64Base(String stringToEncode)
{
byte[] encodedBytes = Base64.getEncoder().encode(stringToEncode.getBytes());
String encodedString = new String(encodedBytes, "UTF-8");
return encodedString;
}
And result of encode64Base("KQ3h3dEN") is 'S1EzaDNkRU4='.
So, it is adding "K" instead of "=" in this example. How to ensure that I will always get same result when using base64 on linux and base64 encoding in java?
UPDATE: Updated question as I didn't noticed "K" at the end of linux encoded string. Also, here are few more examples:
'echo KQ3h3dENa | base64' => result='S1EzaDNkRU5hCg==', but it should be 'S1EzaDNkRU5h'
echo KQ3h3dENaa | base64' => result='S1EzaDNkRU5hYQo=', but it should be 'S1EzaDNkRU5hYQ=='

Found solution after few hours of experimenting. It seems like new line was added to the string I wanted to encode. Solution would be :
echo -n KQ3h3dEN | base64
Result will be the same as with java base64 encode.

Padding
The '==' sequence indicates that the last group contained only one byte, and '=' indicates that it contained two bytes.
In theory, the padding character is not needed for decoding, since the number of missing bytes can be calculated from the number of Base64 digits. In some implementations, the padding character is mandatory, while for others it is not used.
So it depends on tools and libraries you use. If base64 with padding is the same as without padding for them, there is no problem. As an insurance you can use on linux tool that generates base64 with padding.

Use withoutPadding() of Base64.Encoder class to get Base64.Encoder instance which encodes without adding any padding character at the end.
check the link :
https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html#withoutPadding

NSData bytes not matching bytes from Java WS with same base64 string

I am using protocol buffers in an iOS application. The app consumes a web service written in Java, which spits back a base64 encoded string.
The base64 string is the same on both ends.
In the app however, whenever I try to convert the string to NSData, the number of bytes may or may not be the same on both ends. The result is a possible invalid protocol buffer exception, invalid end tag.
For example:
Source(bytes) | NSData | Diff
93 93 0
6739 6735 -4
5745 5739 -6
The bytes are equal in the trivial case of an empty protocol buffer.
Here is the Java source:
import org.apache.commons.codec.binary.Base64;
....
public static String bytesToBase64(byte[] bytes) {
return Base64.encodeBase64String(bytes);
}
On the iOS side, I have tried various algorithms from similar questions which all agree in byte size and content.
What could be causing this?

On closer inspection, the issue was my assumption that Base64 is Base64. I was using the url variant in the web service while the app's decode was expecting a normal version.
I noticed underscores in the Base64, which I thought odd.
The Base64 page http://en.wikipedia.org/wiki/Base64 map of value/char shows no underscores, but later in the article goes over variants, which do use underscores.

Perl Client to Java Server

I'm trying to write a perl client program to connect to a Java server application (JDuplicate). I see that the java server uses The DataInput.readUTF and DataInput.writeUTF methods, which the JDuplicate website lists as "Java's modified UTF-8 protocol".
My test program is pretty simple, i'm trying to send client type data, which should invoke a response from the sever, however it just times out:
#!/usr/bin/perl
use strict;
use Encode;
use IO::Socket;
my $remote = IO::Socket::INET->new(
Proto => 'tcp',
PeerAddr => 'localhost',
PeerPort => '10421'
) or die "Cannot connect to server\n";
$|++;
$remote->send(encode_utf8("CLIENTTYPE|JDSC#0.5.9#0.2"));
while (<$remote>) {
print $_,"\n";
}
close($remote);
exit(0);
I've tried $remote->send(pack("U","..."));, I've tried "use utf8;", I've tried binmode($remote, ":utf8"), and I've tried sending just plain ASCII text, nothing ever gets responded to.
I can see the data being sent with tcpdump, all in one packet, but the server itself does nothing with it (other then ack the packet).
Is there something additional i need to do to satisfy the "modified" utf implementation of Java?
Thanks.

You have to implement the protocol correctly:
First, the total number of bytes needed to represent all the characters of s is calculated. If this number is larger than 65535, then a UTFDataFormatException is thrown. Otherwise, this length is written to the output stream in exactly the manner of the writeShort method; after this, the one-, two-, or three-byte representation of each character in the string s is written.
As indicated in the docs for writeShort, it sends a 16-bit quantity in network order.
In Perl, that resembles
sub sendmsg {
my($s,$msg) = #_;
die "message too long" if length($msg) > 0xffff;
my $sent = $s->send(
pack(n => (length($msg) & 0xffff)) .
$msg
);
die "send: $!" unless defined $sent;
die "short write" unless $sent == length($msg) + 2;
}
sub readmsg {
my($s) = #_;
my $buf;
my $nread;
$nread = $s->read($buf, 2);
die "read: $!" unless defined $nread;
die "short read" unless $nread == 2;
my $len = unpack n => $buf;
$nread = $s->read($buf, $len);
die "read: $!" unless defined $nread;
die "short read" unless $nread == $len;
$buf;
}
Although the code above doesn't perform modified UTF encoding, it elicits a response:
my $remote = IO::Socket::INET->new(
Proto => 'tcp',
PeerAddr => 'localhost',
PeerPort => '10421'
) or die "Cannot connect to server: $#\n";
my $msg = "CLIENTTYPE|JDSC#0.5.9#0.2";
sendmsg $remote, $msg;
my $buf = readmsg $remote;
print "[$buf]\n";
Output:
[SERVERTYPE|JDuplicate#0.5.9 beta (build 584)#0.2]

This is unrelated to the main part of your question, but I thought I would explain what the "Java's modified UTF-8" that the API expects is; it's UTF-8, except with UTF-16 surrogate pairs encoded as their own codepoints, instead of having the characters represented by the pairs encoded directly in UTF-8. For instance, take the character U+1D11E MUSICAL SYMBOL G CLEF.
In UTF-8 it's encoded as the four bytes F0 9D 84 9E.
In UTF-16, because it's beyond U+FFFF, it's encoded using the surrogate pair 0xD834 0xDD1E.
In "modified UTF-8", it's given the UTF-8 encoding of the surrogate pair codepoints: that is, you encode "\uD834\uDD1E" into UTF-8, giving ED A0 B4 ED B4 9E, which happens to be fully six bytes long.
When using this format, Java will also encode any embedded nulls using the illegal overlong form C0 80 instead of encoding them as nulls, ensuring that there are never any embedded nulls in a "modified UTF-8" string.
If you're not sending any characters outside of the BMP or any nulls, though, there's no difference from the real thing ;)
Here's some documentation courtesy of Sun.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.