I am writting a .Net/C# client to a Java Server on Solaris.
The Java server is writting Raw byte data in a Gziped format which I need to extract, but I am having trouble to read the data in the right buffer sizes. I read the message not-deterministicly incomplete or complete and can not read the second message in any case.
I am reading the bytes using the NetworkStream class with the DataAvailable property.
My guess is that it could be related to a little/big endian problem.
Do I need to use a special conversion to change the data from big into little Endian? Do I need to read the necessary bytes using the gzip header?
I used to use the same server with an uncompressed protocol before and had no problem using a StreamReader with the ReadLine function before, but that protocol was purely text based.
Edit: Unfortunately I have no choice as the remote server and protocol is given. Is the endiness part of the GZip format or do I only need to convert the header accordingly? The uncompressed data are pure UTF8-encoded strings with line breaks as delimiters.
The GZIP format is not complex. It is available in all its glory in a simple, accessible specification document, IETF RFC 1952.
The GZIP format specifies the bit-order for bytes. It is not tunable with a flag for endianness. The producer of a GZIP stream is responsible for conforming to the spec in that regard, and a consumer of a GZIP stream, likewise.
If I were debugging this, I would look at the bytes on either end of the wire and verify that the bytes going in are the same as the bytes coming out. That's enough to put aside the endian issues.
If you don't have success transmitting a GZIP bytestream, try transmitting test data - 16 bytes of 0xFF, followed by 16 bytes of 0xAA, etc etc. And then, verify that this is the data coming out the other end.
I'm sorry, I don't know what you mean by I read the message not-deterministicly incomplete or complete and can not read the second message in any case. Second message? What second message? The endianness shouldn't affect the amount of data you receive.
It feels to me that you don't have confidence that you are successfully transmitting data. I would suggest that you verify that before working on endian issues and GZIP format issues.
Related
OkHttp does a great job of transparently handling GZIP content encoding. When I call response.body().contentLength() I get the decoded size of the response.
How can I get the number of bytes actually transferred in the HTTP response?
Alternately, getting the value from the original Content-Length header would do.
I am trying to keep track of how many bytes I have downloaded over a metered connection.
Look at EventListener, which tracks bytes transmitted over the network.
https://square.github.io/okhttp/events/
I am sending data via json body in a post request from a client (Java) to a server (Java) using a Spring RestTemplate and RestController.
The data is present as a POJO on the client and will be parsed into a POJO with the same structure on the server.
On the client I am converting a file with Files.readAllBytes to byte[] and store it in the content field.
On the server side the whole object including the byte[] will be marshalled to XML using JAXB annotations.
class BinaryObject {
String fileName;
String mimeCode;
byte[] content;
}
Everything is working fine and running as intended.
I heard it could be beneficial to encode the content field before transmitting the date to the server and decode it there before it is marshaled into XML.
My Question
Is it necessary or recommended to additionally encode / decode the content field with base64?
TL;DR
To the best of my knowledge, you are not going against any good practice with your current implementation. One might question the design (exchanging files in JSON ? Storing binary inside XML ?), but this is a separate question.
Still, there is room for possible optmization, but the toolset you use (e.g. Spring rest template + Spring Controler + JSON serialization (jackson) + XML using JAXB) kind of hide the possible optimizations from you.
You have to carrefully weight the pros and cons of working around your comfortable "automat(g)ical" serializations that work well as of today to see if it is worth the trouble to tweak it.
We can nonetheless discuss the theory of what could be done.
A discussion about Base64
Base64 encoding in an efficient way to encode binary data in pure text formats (e.g. MIME strucutres such as email or some HTTP bodies, JSON, XML, ...) but it has two costs : the first is a non negligible size increase (~ 33% size), the second is CPU time.
Sometimes, (but you'd have to profile, check if that is your case), this cost is not negligible, esp. for large files (due to some buffering and char/byte conversions in the frameworks, you could easilly end up using e.g. 4x the size of the encoded file in the Java Heap).
When handling 10kb files at 10 requests/sec, this is usually NOT an issue.
But 10MB files at 100 req/second, well that is another ball park.
So you'd have to check (I doubt your typical server will reach 100 req/s with 10MB files, because that is a 1GB/s incoming network bandwidth).
What is optimizable in your current process
In your current process, you have multiple encodings taking place : the client needs to Base64 encode the bytes read from the file.
When the request hits the server, the server decodes the base64 to a byte[], then your XML serialization (JAXB) reconverts the byte[] to base64.
So in effect, "you" (more exactly, the REST controler side of things) decoded base64 content, all for nothing because the XML side of things could have used it directly.
What could be done
A few things.
Do you need base64 at the calling site ?
First, you do not have to encode at the client side. When using JSON, there is no choice, but the world did not wait for JSON to exchange files (e.g. arbitrary binary content) over HTTP.
If your content is a file name, a MIME type, and a file body, then standard, direct HTTP calls with no JSON at all is perfectly fine.
The MIME type could be mapped to the Content-Type HTTP Header, the file name inside the Content-Disposition HTTP header, and the contents as the raw HTTP body. No base64 needed (but you need your server-side to accept raw HTTP content as is). This is standard as can be.
This change would allow you to remove the encoding (client side), lower the network size of the call (~33% less), and remove one decoding at the server side. The server would just have to base64 encode (once) a raw stream to produce the XML, and you would not even need to buffer the whole file contents for that (you'd have to tweak you JAXB model a bit, but you can JAXB serialize directly bytes from an InputStream, which means, almost no buffer, and since your CPU probably encodes faster than your network serves content, no real latency incurred).
If this, for some reason, is not an option, let's say your client has to send JSON (and therefore base64 content)
Can you avoid decoding at the server side
Sort of. You can use a server-side bean where the content is actually a String and NOT a byte[]. This is hacky, but your REST controler will no longer deserialize base64, it will keep it "as is", which is a JSON string (which happens to be base64 encoded content, but the controler does not care).
So your server will have saved the CPU cost of one base64 decoding, but in exchange, you'll have a base64 String in java heap (compared to the raw byte[], +33% size on Java >=9 with compact strings, +166% size on Java < 9).
If you are to profit from this, you also have to tweak your JAXB to see the base64 encoded String as a byte[], which is not trivial as far as I can tell, unless you modify the JAXB object in such a way that it accepts a String instead of the byte[] which is kind of hacky (if your JAXB objects are generated from a XML schema, this might really become a pain to implement)
All in all this is much harder - probably too much if you are not really hitting the wall, performance wise, on this particular issue.
A few other stuff
Are your files pure binary, or are they actually text ? If there are text, you may benefit from using CDATA encoding on the XML side instead of base64 ?
Is your XML actually a SOAP call ? If so, and if the service supports MTOM, you could avoid base64 completely, but that is an altogether different subject.
I have duplicate check who didn't work because my zlib hash are different for a same file.
I got a encrypted data (XML file) with AES from my client.
I decrypted the data (with Cipher) and got a byte array of the data zipped and base64 encoded.
I decode base64, unzlib and got my XML file.
If I do it again, I got a different base64 out of the Cipher. I decode it, unzlib and got exactly the same XML as below.
With this problem my duplicate check didnt work because base64 value is different and I didn't understand why.
My base64 value is around 3000 char and only the 10-15 last char are differents.
Actually this software is in PHP and all is good with it. On the new server in JAVA we got this error.
So the client data are correct, JAVA do something I can't explain.
Any idea ?
Thanks
Your question is rather difficult to parse, but I think what you're saying is that if you decompress something compressed by PHP and then recompress it with Java, you get different compressed data. When you decompress that data, you get exactly the original uncompressed data.
If that is correct, then there is no problem. There is no assurance that a different compressor will produce the same result, or even the same compressor since you can have different settings, or even the same compressor with the same settings, since you could be using a different version. "I decode it, unzlib and got exactly the same XML as below.", means that all the compressors and decompressors are doing what they are supposed to do. There is no assurance that decompression followed by compression will ever produce exactly the same result. The only assurance of a lossless compressor is that compression followed by decompression will produce exactly the same result.
You are creating a problem for yourself with "I have duplicate check". Checking the compressed data does not check for duplicated uncompressed data. If you want to look for duplicates, or if you want to check the integrity of your compression, transmission, and decompression process, then you need to do both using the uncompressed data, not the compressed data.
tl;dr: Is there an efficient way to convert a msgpack in Java and C# for transport over HTTP.
I've just discovered the msgpack data format. I use JSON just about everything I send over the wire between client and server (that uses HTTP), so I'm keen to try this format out.
I have the Java library and the C# library, but I want to transport msgpacks over HTTP. Is this the intended use, or is it more for a local format?
I noticed a couple of RPC implementations, but they're whole RPC servers.
Thoughts?
-Shane
Transport and encoding are two very different things and it's entirely up to you to choose which transport to use and what data encoding to use, depending on the needs of your application. Sending msgpack data over HTTP is a perfectly valid use case and it is possible, but keep in mind the following two points:
msgpack is a binary encoding, which means it needs to be serialized into bytes before sending, and deserialized from the received bytes on the other end. It also means that it is not human-readable (or writable, for that matters) so it's really hard to inspect (or generate by hand) the HTTP traffic.
unless you intend to stream msgpack-encoded data over HTTP, you'll incur a fairly high overhead cost since the HTTP header size will most likely greatly overshadow the size of the data you're sending. Note that this also applies to JSON, but to a lesser extent since JSON is not as efficient in its encoding.
As far as implementation goes, the sending side would have to serialize your msgpack object into a byte[] before sending it as the request body in your HTTP request. You'll need to set the HTTP Content-Type to application/x-msgpack as well. On the receiving end, read the request body from the input stream (you probably can get your hands on a ByteArrayInputStream and deserialize into your msgpack object).
I'm trying to understand how platform independent socket communication works, because I would like to share socket data between a Java server and some native Unix and Windows clients. Sockets are platform independent by design, but the data representation is machine-related, hence it is advantageous if the TCP data abstracts the real data format, because a data format that is supported on one system doesn't have to be necessarily supported on another.
For example if I want to send an unsigned int value from a C++ client program to a Java server I must tell the server that this number should be interpreted as a negative integer. How does this kind of abstraction work? With my limited knowledge I would just send a number as text and then append some kind of unique character sequence that tells the receiver what kind of data he received, but I don't know if this is a viable approach.
To be a bit more concrete: I would like to send messages that contain the following content:
At the beginning of the message some kind of short signal or command
so that the receiver exactly knows what to do with the data that will follow.
Then some textual content of arbitrary length.
Followed by a number, which can be also text, but should be
interpreted separately.
At the end maybe a mark that tells the server that the message ends
here.
TCP processes the data in byte chunks. Does this mean when I write an UTF-8 encoded char in one byte that this char is interpreted in the same way on different machines if the client machines take Java's big endian byte order into account? Thanks for any input and help.
Sockets are independent but not the data transmitted in (Types length, byte order, String encoding, ...)
Look at Thrift, Protobuf or Avro if you want to send binary data with cross-languages and cross-platform functionnalities