How to serialise msgpack over http

How to serialise msgpack over http - java

tl;dr: Is there an efficient way to convert a msgpack in Java and C# for transport over HTTP.
I've just discovered the msgpack data format. I use JSON just about everything I send over the wire between client and server (that uses HTTP), so I'm keen to try this format out.
I have the Java library and the C# library, but I want to transport msgpacks over HTTP. Is this the intended use, or is it more for a local format?
I noticed a couple of RPC implementations, but they're whole RPC servers.
Thoughts?
-Shane

Transport and encoding are two very different things and it's entirely up to you to choose which transport to use and what data encoding to use, depending on the needs of your application. Sending msgpack data over HTTP is a perfectly valid use case and it is possible, but keep in mind the following two points:
msgpack is a binary encoding, which means it needs to be serialized into bytes before sending, and deserialized from the received bytes on the other end. It also means that it is not human-readable (or writable, for that matters) so it's really hard to inspect (or generate by hand) the HTTP traffic.
unless you intend to stream msgpack-encoded data over HTTP, you'll incur a fairly high overhead cost since the HTTP header size will most likely greatly overshadow the size of the data you're sending. Note that this also applies to JSON, but to a lesser extent since JSON is not as efficient in its encoding.
As far as implementation goes, the sending side would have to serialize your msgpack object into a byte[] before sending it as the request body in your HTTP request. You'll need to set the HTTP Content-Type to application/x-msgpack as well. On the receiving end, read the request body from the input stream (you probably can get your hands on a ByteArrayInputStream and deserialize into your msgpack object).

Related

Is additional base64 encoding necessary when sending files as byte[] from Java service to Java Service via RestTemplate?

I am sending data via json body in a post request from a client (Java) to a server (Java) using a Spring RestTemplate and RestController.
The data is present as a POJO on the client and will be parsed into a POJO with the same structure on the server.
On the client I am converting a file with Files.readAllBytes to byte[] and store it in the content field.
On the server side the whole object including the byte[] will be marshalled to XML using JAXB annotations.
class BinaryObject {
String fileName;
String mimeCode;
byte[] content;
}
Everything is working fine and running as intended.
I heard it could be beneficial to encode the content field before transmitting the date to the server and decode it there before it is marshaled into XML.
My Question
Is it necessary or recommended to additionally encode / decode the content field with base64?

TL;DR
To the best of my knowledge, you are not going against any good practice with your current implementation. One might question the design (exchanging files in JSON ? Storing binary inside XML ?), but this is a separate question.
Still, there is room for possible optmization, but the toolset you use (e.g. Spring rest template + Spring Controler + JSON serialization (jackson) + XML using JAXB) kind of hide the possible optimizations from you.
You have to carrefully weight the pros and cons of working around your comfortable "automat(g)ical" serializations that work well as of today to see if it is worth the trouble to tweak it.
We can nonetheless discuss the theory of what could be done.
A discussion about Base64
Base64 encoding in an efficient way to encode binary data in pure text formats (e.g. MIME strucutres such as email or some HTTP bodies, JSON, XML, ...) but it has two costs : the first is a non negligible size increase (~ 33% size), the second is CPU time.
Sometimes, (but you'd have to profile, check if that is your case), this cost is not negligible, esp. for large files (due to some buffering and char/byte conversions in the frameworks, you could easilly end up using e.g. 4x the size of the encoded file in the Java Heap).
When handling 10kb files at 10 requests/sec, this is usually NOT an issue.
But 10MB files at 100 req/second, well that is another ball park.
So you'd have to check (I doubt your typical server will reach 100 req/s with 10MB files, because that is a 1GB/s incoming network bandwidth).
What is optimizable in your current process
In your current process, you have multiple encodings taking place : the client needs to Base64 encode the bytes read from the file.
When the request hits the server, the server decodes the base64 to a byte[], then your XML serialization (JAXB) reconverts the byte[] to base64.
So in effect, "you" (more exactly, the REST controler side of things) decoded base64 content, all for nothing because the XML side of things could have used it directly.
What could be done
A few things.
Do you need base64 at the calling site ?
First, you do not have to encode at the client side. When using JSON, there is no choice, but the world did not wait for JSON to exchange files (e.g. arbitrary binary content) over HTTP.
If your content is a file name, a MIME type, and a file body, then standard, direct HTTP calls with no JSON at all is perfectly fine.
The MIME type could be mapped to the Content-Type HTTP Header, the file name inside the Content-Disposition HTTP header, and the contents as the raw HTTP body. No base64 needed (but you need your server-side to accept raw HTTP content as is). This is standard as can be.
This change would allow you to remove the encoding (client side), lower the network size of the call (~33% less), and remove one decoding at the server side. The server would just have to base64 encode (once) a raw stream to produce the XML, and you would not even need to buffer the whole file contents for that (you'd have to tweak you JAXB model a bit, but you can JAXB serialize directly bytes from an InputStream, which means, almost no buffer, and since your CPU probably encodes faster than your network serves content, no real latency incurred).
If this, for some reason, is not an option, let's say your client has to send JSON (and therefore base64 content)
Can you avoid decoding at the server side
Sort of. You can use a server-side bean where the content is actually a String and NOT a byte[]. This is hacky, but your REST controler will no longer deserialize base64, it will keep it "as is", which is a JSON string (which happens to be base64 encoded content, but the controler does not care).
So your server will have saved the CPU cost of one base64 decoding, but in exchange, you'll have a base64 String in java heap (compared to the raw byte[], +33% size on Java >=9 with compact strings, +166% size on Java < 9).
If you are to profit from this, you also have to tweak your JAXB to see the base64 encoded String as a byte[], which is not trivial as far as I can tell, unless you modify the JAXB object in such a way that it accepts a String instead of the byte[] which is kind of hacky (if your JAXB objects are generated from a XML schema, this might really become a pain to implement)
All in all this is much harder - probably too much if you are not really hitting the wall, performance wise, on this particular issue.
A few other stuff
Are your files pure binary, or are they actually text ? If there are text, you may benefit from using CDATA encoding on the XML side instead of base64 ?
Is your XML actually a SOAP call ? If so, and if the service supports MTOM, you could avoid base64 completely, but that is an altogether different subject.

How does receiving application knows which serialization mechanism has been used?

I am receiving bytes of data onto a kafka topic and those bytes can be sent (by an application) by using plain Java Serialization or JSON serialization or protocol buffer.
So, now when my application reads those bytes from kafka topic, how does it know which Serilization technique was used, which can be: Java Serialization, JSON Serialization, Protocol Buffer.
Is there a way to check this? Does "Serialization format" differ by these different mechanisms?
Any information to understand this would be of great help.

The serialization technique must be agreed/defined between the sender and the receiver.
It must be the contract between the two parts.

Sending Serialized Data

Basically I am doing some networking with a client and server sending "packets" back and forth to each other. I have it working with basic variable data such as ints or strings passing back and forth, however now I want to pass an object.
So I know I have to serialize the data of the object to pass it through the socket. That is working as well (as I can get the correct information if I serialize then de-serialize right away) but the problem comes in when my server receives a packet.
My server interprets packet data based on the first 2 characters of the packet. So 01foobar is a type of packet correlating to whatever "01" is assigned to and 02foobar is a different packet as well. So I don't know the best way to do this with an object attached. What is mean is this...
The way I have tried to do it right now is, serialize my object and get it's string. Then append on 03 to the front. So basically I have a string that looks like 03[B#3e9513b7 (or whatever) then do getBytes() on that string which gives me another byte[] (so I can send it through the socket). Then when the server receives that information, I can append the 03 off and I'm left with just [B#3e9513b7. The problem is, [B#3e9513b7 is now a string, and not a byte[] and in order to deserialize I need to send it the same byte[] as it gave me when it serialized that data. So that got me looking into a way to make [B#3e9513b7 BE the byte[] (aka, so when I do toString() on that new byte[] it returns [B#3e9513b7) but was having issues assigning it like that because it would give me a new byte[] for [B#3e9513b7 as a string. So obviously then, when I send it to be deserialized it has a byte[] that it doesn't know what to do with and throws an error.
So I have to imagine there's a better way to do this, and I'm just making things more complicated than they should be. Any recommendations? I can provide code snippets if needed.
Thanks guys!
Edit: I guess I should mention that I am using Java with using UDP sockets.

If you are looking for a reliable and efficient solution for client-server communication, I would suggest to look at Netty.
Regarding how to serialize/deserialize your objects, you have many choices as Java serialization, XML, JSON ...
You would have to pass your serialized objects in UDP datagrams. However, be aware that UDP datagram size is limited. If you're exchanging big objects, you may want to switch to TCP transport which is more reliable.
You may also want to look at SOAP/REST web services.

Set response encoding with HttpClient 3.1

I'm using org.apache.commons.httpclient.HttpClient and need to setup response encoding (for some reason server returns incorrect encoding in Content-Type). My way is to get response as raw bytes and convert to String with desired encoding. I'm wondering if there is some better way to do this (eg. setup HttpClient). Thanks for suggestions.

I don't think there's a better answer using HttpClient 3.x APIs.
The HTTP 1.1 spec says clearly that a client "must" respect the character set specified in the response header, and use ISO-8859-1 if no character set is specified. The HttpClient APIs are designed on the assumption that the programmer wants to conform to the HTTP specs. Obviously, you need to break the rules in the spec so that you can talk to the non-compliant server. Not withstanding, this is not a use-case that the API designers saw a need to support explicitly.
If you were using the HttpClient 4.x, you could write your own ResponseHandler to convert the body into an HttpEntity, ignoring the response message's notional character set.

A few notes:
Server serves data, so it's up to server to serve it in an appropriate format. So response encoding is set by server not client. However, client could suggest to server what format it would like via Accept and Accept-Charset:
Accept: text/plain
Accept-Charset: utf-8
However, http servers usually do not convert between formats.
If option 1. does not work, then you should look at the configuration of the server.
When String is sent as raw bytes (and it always is, because this is what networks transmit), there is always the encoding defined. Since server produces this raw bytes, it defines the encoding. So, you can not take raw bytes and use encoding of your choice to create a String. You must use encoding that was used when converted from String to bytes.

Disclaimer: I'm not really knowing HttpClient, only reading the API.
I would use the execute method returning a HttpResponse, then .getEntity().getContent(). This is a pure byte stream, so if you want to ignore the encoding told by the server, you can simply wrap your own InputStreamReader around it.
Okay, looks like I had the wrong version (obviously, there are too much HttpClient classes out there).
But same as before, just located on other classes: the HttpMethod has a getResponseBodyAsStream() method, around which you can now wrap your own InputStreamReader. (Or get the whole array at once, if it is not too big, and convert it to String, as you wrote.)
I think trying to change the response and letting the HttpClient analyze it is not the right way here.
I suggest sending a message to the server administrator/webmaster about the wrong charset, though.

Greetings folks,
Jus in case someone finds this post googling for setting HttpClient to write in UTF-8.
This line of code should be handy...
response.setContentType("text/html; charset=UTF-8");
Best

Byte array versus Base 64 string in RESTful web service

My REST web service has to send an image file to the client. I am confused between 2 options : send the image as a byte array, or should I encode it as a base 64 string ? What are the pros and cons of each ? I may have to compress the image using gzip...should it create problem with any one of methods ? I may even need to expose my method as a SOAP service, which method should I prefer in that case ?
Thanks !!

The wonderful thing about a RESTful interface is that it's just HTTP. So if you expose the "byte array" version via REST, any browser can do an HTTP GET on your REST URL and receive and directly render your image. Returning the payload verbatim is far more RESTful than placing an encoding on it. There is not much to recommend an extra base64 encoding layer via REST.
If you're returning SOAP, you absolutely want to return a base64 string. Raw binary data is not compatible with XML, upon which SOAP is built. You can try to work around it via MTOM, but for general-purpose compatibility with SOAP clients you probably want inlined base64-encoded data.
In general, there's no benefit to be gained by compressing an image file. The image formats themselves internally involve compression, and a second compression pass will not gain any more space savings.

If your service returnes JSON or XML (image + some information), than you should encode image in base 64 because both of them string based, and you want to transfer byte array. The only question, whether you should make it yourself or it should be made my framework you use.
Situation with GZip is clear - relay compression to the servlet container (like tomcat - you can configure, whether response should be gzipped). Alternatively you can use something like GZipFilter.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.