Cannot deserialize protobuf data from C++ in Java - java

My problem is to serialize protobuf data in C++ and deserialize the data in Java probably.
Here is the code I use to the hints given by dcn:
With this I create the protobuf data in C++ and write it to an ostream which is send via socket.
Name name;
name.set_name("platzhirsch");
boost::asio::streambuf b;
std::ostream os(&b);
ZeroCopyOutputStream *raw_output = new OstreamOutputStream(&os);
CodedOutputStream *coded_output = new CodedOutputStream(raw_output);
coded_output->WriteLittleEndian32(name.ByteSize());
name.SerializeToCodedStream(coded_output);
socket.send(b);
This is the Java side where I try to parse it:
NameProtos.Name name = NameProtos.Name.parseDelimitedFrom(socket.getInputStream());
System.out.println(name.newBuilder().build().toString());
However by this I get this Exception:
com.google.protobuf.UninitializedMessageException: Message missing required fields: name
What am I missing?
The flawed code line is: name.newBuilder().build().toString()
This would have never worked, a new instance is created with uninitialized name field. Anyway the answer here solved the rest of my problem.
One last thing, which I was told in the protobuf mailinglist: In order to flush the CodedOutputStreams, the objects have to be deleted!
delete coded_output;
delete raw_output;

I don't know what received is in your Java code, but your problem may be due to some charset conversion. Note also that protobuf does not delimit the messages when serializing.
Therefore you should use raw data to transmit the messages (byte array or directly (de)serialize from/to streams).
If you intent to send many message you should also send the size before you send the actual messages.
In Java you can do it directly via parseDelimitedFrom(InputStream) and writeDelimitedTo(OutputStream). You can do the same in C++ a litte more complex using CodedOutputStream like
codedOutput.WriteVarint32(protoMessage.ByteSize());
protoMessage.SerializeToCodedStream(&codedOutput);
See also this ealier thread.

You're writing two things to the stream, a size and the Name object, but only trying to read one.
As a general question: why do you feel the need to use CodedInputStream? To quote the docs:
Typically these classes will only be
used internally by the protocol buffer
library in order to encode and decode
protocol buffers. Clients of the
library only need to know about this
class if they wish to write custom
message parsing or serialization
procedures
And to emphasize jtahlborn's comment: why little-endian? Java deals with big-endian values, so will have to convert on reading.

Related

Decoding OPC-UA extension object

How does one decode extension object obtained from HistoryReadResult to HistoryData type ? I read documentation that suggessts simply using the decode() method but the only variant I can find in the source code is that of decode(EncoderContext).
You forgot to mention which stack or SDK you are using, but I can guess that it's the OPC Foundation Java Stack, in which case, you can use EncoderContext.getDefaultInstance(). This will work fine with the standard structure types, such as HistoryData. For server specific types, you may need to use a connection specific EncoderContext.

What is the need to choose serializaiton frameworks, when java provides APIs to do it?

I am reading about Avro, and I am trying to compare avro vs java serialization system. But somehow I am not able to gauge why avro is used for data serialization instead of java serialization. As a matter of fact, why was another system came in to replace the java serialization system?
Here is the summary of my understanding.
To use java serialization capabilities, we will have to make this class implement serilizable interface. If you do so and serialize the object, then during deserialization, something like
e = (Employee) in.readObject();
Next is we can use the getters/setters to play with the employee object.
In avro,
First is schema definition. Next is to use the avro APIs to serialize. Again on deserialization there is something like this.
Next is we can use the getters/setters to play with the employee object.
Question is I don't see any difference, only that the API that's used it different? Can anyone please clarify my doubt?
public AvroHttpRequest deSerealizeAvroHttpRequestJSON(byte[] data) {
DatumReader<AvroHttpRequest> reader
= new SpecificDatumReader<>(AvroHttpRequest.class);
Decoder decoder = null;
try {
decoder = DecoderFactory.get().jsonDecoder(
AvroHttpRequest.getClassSchema(), new String(data));
return reader.read(null, decoder);
} catch (IOException e) {
logger.error("Deserialization error:" + e.getMessage());
}}
Next is we can use the getters/setters to play with the employee object.
Question is I don't see any difference between these two approaches. Both does the same thing. Only that the APIs are different? Can anyone please help me in understanding this better?
The inbuilt java serialization has some pretty significant downsides. For instance, without careful consideration, you may not be able to deserialize an object that may have no changes to data, only changes to the class's methods.
You can also create a case in which the serial uid is the same (set manually) but not actually able to be deserialized because of incompatibility in type between two systems.
A 3rd party serialization library can help mitigate this by using an abstract mapping to pair data together. Well conceived serialization libraries can even provide mappings between different versions of the object.
Finally, the error handling for 3rd party serialization libraries are typically more useful for a developer or operator.

Confusion regarding protobufs

I have a server that makes frequent calls to microservices (actually AWS Lambda functions written in python) with raw JSON payloads and responses on the order of 5-10 MB. These payloads are gzipped to bring their total size under lambda's 6MB limit.
Currently payloads are serialized to JSON, gzipped, and sent to Lambda. The responses are then gunzipped, and deserialized from JSON back into Java POJOs.
Via profiling we have found that this process of serializing, gzipping, gunzipping, and deserializaing is the majority of our servers CPU usage by a large margin. Looking into ways to make serialization more efficient led me to protobufs.
Switching our serialization from JSON to protobufs would certainly make our (de)serialization more efficient, and might also have the added benefit of eliminating the need to gzip to get payloads under 6MB (network latency is not a concern here).
The POJOs in question look something like this (Java):
public class InputObject {
... 5-10 metadata fields containing primitives or other simple objects ...
List<Slots> slots; // usually around 2000
}
public class Slot {
public double field1; //20ish fields with a single double
public double[] field2; //10ish double arrays of length 5
public double[][] field3; //1 2x2 matrix of doubles
}
This is super easy with JSON, gson.toJson(inputObj) and you're good to go. Protobufs seem like a whole different beast, requiring you to use the generated classes and litter your code with stuff like
Blah blah = Blah.newBuilder()
.setFoo(f)
.setBar(b)
.build()
Additionally, this results in an immutable object which requires more hoop jumping to update. Just seems like a bad bad thing to put all that transport layer dependent code into the business logic.
I have seen some people recommend writing wrappers around the generated classes so that all the protobuffy-ness doesn't leak into the rest of the codebase and that seemed like a good idea. But then I am not sure how I could serialize the top level InputObject in one go.
Maybe protobufs aren't the right tool for the job here, but it seems like the go-to solution for inter-service communication when you start looking into improving efficiency.
Am I missing something?
with your proto you can always serialize in one-go. You have an example in the java tuto online:
https://developers.google.com/protocol-buffers/docs/javatutorial
AddressBook.Builder addressBook = AddressBook.newBuilder();
...
FileOutputStream output = new FileOutputStream(args[0]);
addressBook.build().writeTo(output);
Also what you might want to do, is to serialize your proto into a ByteArray, and then encode it in Base64 to carry it through your wrapper:
String yourPayload = BaseEncoding.base64().encode(blah.toByteArray())
You have additional library that can help you to transform existing JSON into a proto, such as JsonFormat:
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/util/JsonFormat
And the usage is straightforward as well:
to serialize as Json:
JsonFormat.printToString(yourProto)
To build from a proto:
JsonFormat.merge(yourJSONstring, yourPrototBuilder);
No need to iterate through each element of the object.
Let me know if this answer your question!

Writing BitSet to output file without overhead?

I get a line of overhead ("java.util.BitSet") when writing a BitSet to an output file using ObjectOutputStream.writeObject().
Anyway around it?
That is not an "overhead", that't the marker that lets Java figure out what type it needs to create when deserializing the object from that file.
Since ObjectInputStream has no idea what you have serialized into a file, and has no way for you to provide a "hint", ObjectOutputStream must "embed" something for the input stream to be able to decide what class needs to be instantiated. That is why it places the "java.util.BitSet" string in front of the data of your BitSet.
You cannot get around writing this marker when you use serialization capabilities built into BitSet class. If you are serializing the object into a file by itself, with no other objects going in with it, you could write the result of toByteArray() call into a file, and call BitSet.valueOf(byteArray) after reading byteArray from the file.

Java Thrift Client and Binary data

So by my understanding for thrift, Java is the only language supported that does not have binary-safe Strings, hence the thrift binary type. My problem is it doesn't seem to work.
My definition File is:
service myService {
int myMethod(1:binary input)
}
My Java client builds a ByteBuffer from binary data that is observed to have positive length, and printable bytes prior to calling myMethod.
Immediately inside the C++ implementation of myMethod (from the thrift generated server skeleton), attempts to print input show it as always being empty of size 0.
Any ideas what I'm missing here? Changing binary to string makes everything work like a charm, minus the fact that I don't want the unsafe java-converted string to deal with later...
Most likely you're having problem because ByteBuffer in Java has mutable state. So, any read operation actually modifies ByteBuffer, since it modifies read position.
The simpliest (whereas not the most effective) way to work with thrift binaries in java is creating binaries as byte arrays and using wrapping them info buffers immidiately before invocation, i.e.:
byte[] input = ....;
myService.myMethod(ByteBuffer.wrap(input));
Another possible solution is to use ByteBuffer.duplicate to keep original buffer safe, i.e.:
ByteBuffer input = ....;
dump(input.duplicate());// dump function may change buffer position
myService.myMethod(input);

Categories

Resources