The only socket programming I have done in the past is simple text streams. I am wondering what is the most effective way to send something like a Java object through a socket.
For instance if I have the following Employee class (Dependent would be a simple class composed of a dependent's information):
public class Employee {
private String name;
private double salary;
private ArrayList<Dependent> dependents;
}
Should I just make the Employee object Serializable and send instances through the socket. Or should I write up an xml file containing the Employees information and send that? Any guidance would be greatly appreciated. Or is there some completely different and better way? Thank you!
If you are only sending data betwen Java JVMs, then either choice is possible.
A textual representation (XML, JSON, or custom) has several advantages:
it's easier to make it interoperable between Java and other languages
it's less brittle in the face of version changes or slightly different versions of your code at each end of the socket
it's vastly easier to test and debug
Depending on the format, it may be a little slower, but this often not significant.
If you are not necessarily tied to using XML you could also try JSON. The google-gson library makes this very trivial. To serialise the code it is as simple as:-
Employee employee = new Employee();
...
Gson gson = new Gson();
String json = gson.toJson(employee);
And to deserialise the String at the other end:-
String socketDataAsString = null;
...<read from socket>...
Gson gson = new Gson();
Employee employee = gson.fromJson(socketDataAsString, Employee.class);
If you must directly use low level sockets, there are a couple of ways you could to it. You could convert it to a text format and send the bytes and then reconstruct it on the other side. If your objects are serializable, you can send them over the socket (http://www.rgagnon.com/javadetails/java-0043.html).
If you have some flexibility, you could use RMI to interact remotely as well.
Related
I am reading about Avro, and I am trying to compare avro vs java serialization system. But somehow I am not able to gauge why avro is used for data serialization instead of java serialization. As a matter of fact, why was another system came in to replace the java serialization system?
Here is the summary of my understanding.
To use java serialization capabilities, we will have to make this class implement serilizable interface. If you do so and serialize the object, then during deserialization, something like
e = (Employee) in.readObject();
Next is we can use the getters/setters to play with the employee object.
In avro,
First is schema definition. Next is to use the avro APIs to serialize. Again on deserialization there is something like this.
Next is we can use the getters/setters to play with the employee object.
Question is I don't see any difference, only that the API that's used it different? Can anyone please clarify my doubt?
public AvroHttpRequest deSerealizeAvroHttpRequestJSON(byte[] data) {
DatumReader<AvroHttpRequest> reader
= new SpecificDatumReader<>(AvroHttpRequest.class);
Decoder decoder = null;
try {
decoder = DecoderFactory.get().jsonDecoder(
AvroHttpRequest.getClassSchema(), new String(data));
return reader.read(null, decoder);
} catch (IOException e) {
logger.error("Deserialization error:" + e.getMessage());
}}
Next is we can use the getters/setters to play with the employee object.
Question is I don't see any difference between these two approaches. Both does the same thing. Only that the APIs are different? Can anyone please help me in understanding this better?
The inbuilt java serialization has some pretty significant downsides. For instance, without careful consideration, you may not be able to deserialize an object that may have no changes to data, only changes to the class's methods.
You can also create a case in which the serial uid is the same (set manually) but not actually able to be deserialized because of incompatibility in type between two systems.
A 3rd party serialization library can help mitigate this by using an abstract mapping to pair data together. Well conceived serialization libraries can even provide mappings between different versions of the object.
Finally, the error handling for 3rd party serialization libraries are typically more useful for a developer or operator.
I have a server that makes frequent calls to microservices (actually AWS Lambda functions written in python) with raw JSON payloads and responses on the order of 5-10 MB. These payloads are gzipped to bring their total size under lambda's 6MB limit.
Currently payloads are serialized to JSON, gzipped, and sent to Lambda. The responses are then gunzipped, and deserialized from JSON back into Java POJOs.
Via profiling we have found that this process of serializing, gzipping, gunzipping, and deserializaing is the majority of our servers CPU usage by a large margin. Looking into ways to make serialization more efficient led me to protobufs.
Switching our serialization from JSON to protobufs would certainly make our (de)serialization more efficient, and might also have the added benefit of eliminating the need to gzip to get payloads under 6MB (network latency is not a concern here).
The POJOs in question look something like this (Java):
public class InputObject {
... 5-10 metadata fields containing primitives or other simple objects ...
List<Slots> slots; // usually around 2000
}
public class Slot {
public double field1; //20ish fields with a single double
public double[] field2; //10ish double arrays of length 5
public double[][] field3; //1 2x2 matrix of doubles
}
This is super easy with JSON, gson.toJson(inputObj) and you're good to go. Protobufs seem like a whole different beast, requiring you to use the generated classes and litter your code with stuff like
Blah blah = Blah.newBuilder()
.setFoo(f)
.setBar(b)
.build()
Additionally, this results in an immutable object which requires more hoop jumping to update. Just seems like a bad bad thing to put all that transport layer dependent code into the business logic.
I have seen some people recommend writing wrappers around the generated classes so that all the protobuffy-ness doesn't leak into the rest of the codebase and that seemed like a good idea. But then I am not sure how I could serialize the top level InputObject in one go.
Maybe protobufs aren't the right tool for the job here, but it seems like the go-to solution for inter-service communication when you start looking into improving efficiency.
Am I missing something?
with your proto you can always serialize in one-go. You have an example in the java tuto online:
https://developers.google.com/protocol-buffers/docs/javatutorial
AddressBook.Builder addressBook = AddressBook.newBuilder();
...
FileOutputStream output = new FileOutputStream(args[0]);
addressBook.build().writeTo(output);
Also what you might want to do, is to serialize your proto into a ByteArray, and then encode it in Base64 to carry it through your wrapper:
String yourPayload = BaseEncoding.base64().encode(blah.toByteArray())
You have additional library that can help you to transform existing JSON into a proto, such as JsonFormat:
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/util/JsonFormat
And the usage is straightforward as well:
to serialize as Json:
JsonFormat.printToString(yourProto)
To build from a proto:
JsonFormat.merge(yourJSONstring, yourPrototBuilder);
No need to iterate through each element of the object.
Let me know if this answer your question!
I want to covert a string based protocol to Json, Performance is key
The String based protocol is something like
<START>A12B13C14D15<END>
and json is
{'A':12,'B':13,'C':14,'D':15}
I can regex parse the string, create a map & serialized to a Json, but it seeems lot of work as I need to convert a stream in realtime.
Would it be more efficient if I just do string manipulation to get the Json output? How can I do the conversion efficiently?
JSON serialization performance is likely not a problem. Don't optimize it prematurely. If you roll your own JSON serializer, you need to put some effort into e.g. getting the escapes right. If the performance does become a problem, take a look at Jackson, which is fairly fast.
Java seems to do regex quite fast, so you might be fine with it but beware that it is quite possible to accidentally build a regex that with some inputs starts backtracking heavily and takes several minutes to evaluate. You could use native String methods to parse the string.
If performance is really a concern, do timing tests on different approaches, select right tools, see what takes time and optimize accordingly.
Lots of ways to go about it, on JSON side. Instead of Map, which is not needed, POJO is often most convenient. Following uses Jackson (https://github.com/FasterXML/jackson-databind) library:
final static ObjectMapper MAPPER = new ObjectMapper(); // remember to reuse for good perf
public class ABCD {
public int A, B, C, D;
}
// if you have output stream handy:
ABCD value = new ABCD(...);
OutputStream out = ...;
MAPPER.writeValue(out, value);
// or if not
byte[] raw = MAPPER.writeValueAsBytes(value);
or, if you want to eliminate even more of overhead (which, really, is unlikely to matter here):
JsonGenerator jgen = MAPPER.getFactory().createGenerator(out);
jgen.writeStartObject();
jgen.writeNumberField("A", valueA);
jgen.writeNumberField("B", valueB);
jgen.writeNumberField("C", valueC);
jgen.writeNumberField("D", valueD);
jgen.writeEndObject();
jgen.close();
and that gets to quite to close to optimal performance you'd get with hand-written code.
In my case I used this library to handle with json in a web application.
Don't remember where to find. May this helps:
http://www.findjar.com/class/org/json/JSONArray.html
I'm considering using Java for a large project but I haven't been able to find anything that remotely represented structures in Java. I need to be able to convert network packets to structures/classes that can be used in the application.
I know that it is possible to use RandomAccessFile but this way is NOT acceptable. So I'm curious if it is possible to "cast" a set of bytes to a structure like I could do in C. If this is not possible then I cannot use Java.
So the question I'm asking is if it is possible to cast aligned data to a class without any extra effort beyond specifying the alignment and data types?
No. You cannot cast a array of bytes to a class object.
That being said, you can use a java.nio.Buffer and easily extract the fields you need to an object like this:
class Packet {
private final int type;
private final float data1;
private final short data2;
public Packet(byte[] bytes) {
ByteBuffer bb = ByteBuffer.wrap(bytes);
bb.order(ByteOrder.BIG_ENDIAN); // or LITTLE_ENDIAN
type = bb.getInt();
data1 = bb.getFloat();
data2 = bb.getShort();
}
}
You're basically asking whether you can use a C-specific solution to a problem in another language. The answer is, predictably, 'no'.
However, it is perfectly possible to construct a class that takes a set of bytes in its constructor and constructs an appropriate instance.
class Foo {
int someField;
String anotherField;
public Foo(byte[] bytes) {
someField = someFieldFromBytes(bytes);
anotherField = anotherFieldFromBytes(bytes);
etc.
}
}
You can ensure there is a one-to-one mapping of class instances to byte arrays. Add a toBytes() method to serialize an instance into bytes.
No, you cannot do that. Java simply doesn't have the same concepts as C.
You can create a class that behaves much like a struct:
public class Structure {
public int field1;
public String field2;
}
and you can have a constructor that takes an array or bytes or a DataInput to read the bytes:
public class Structure {
...
public Structure(byte[] data) {
this(new DataInputStream(new ByteArrayInputStream(data)));
}
public Structure(DataInput in) {
field1 = in.readInt();
field2 = in.readUTF();
}
}
then read bytes off the wire and pump them into Structures:
byte[] bytes = network.read();
DataInputStream stream = new DataInputStream(new ByteArrayInputStream(bytes));
Structure structure1 = new Structure(stream);
Structure structure2 = new Structure(stream);
...
It's not as concise as C but it's pretty close. Note that the DataInput interface cleanly removes any mucking around with endianness on your behalf, so that's definitely a benefit over C.
As Joshua says, serialization is the typical way to do these kinds of things. However you there are other binary protocols like MessagePack, ProtocolBuffers, and AvRO.
If you want to play with the bytecode structures, look at ASM and CGLIB; these are very common in Java applications.
There is nothing which matches your description.
The closest thing to a struct in Java is a simple class which holds values either accessible through it's fields or set/get methods.
The typical means to convert between Java class instances and on-the-wire representations is Java serialization which can be heavily customized as need be. It is what is used by Java's Remote Method Invocation API and works extremely well.
ByteBuffer.wrap(new byte[] {}).getDouble();
No, this is not possible. You're trying to use Java like C, which is bound to cause complications. Either learn to do things the Java way, or go back to C.
In this case, the Java way would probably involve DataInputStream and/or DataOutputStream.
You cannot cast array of bytes to instance of class.
But you can do much much more with java.
Java has internal, very strong and very flexible mechanism of serialization. This is what you need. You can read and write object to/from stream.
If both sides are written in java, there are no problem at all. If one of sides is not java you can customeze your serialization. Start from reading javadoc of java.util.Serializable.
I have a requirement where i need to transfer information through the wire(binary over tcp) between 2 applications. One is in Java and the other in C++. I need a protocol implementation to transfer objects between these 2 applications. The Object classes are present in both the applications (are mapped accordingly). I just need some encoding scheme on one side which retains the Object representation on one side and can be decoded on the other side as a complete Object.
For eg,
C++ class
class Person
{
int age;
string name;
};
Java class
class Person
{
int age;
String name;
}
C++ encoding
Person p;
p.age = 20;
p.name = "somename";
char[] arr = SomeProtocolEncoder.encode(p);
socket.send(arr);
Java decoding
byte[] arr = socket.read();
SomeProtocolIntermediateObject object = SomeProtocolDecoder.decode(arr);
Person p = (Person)ReflectionUtil.get(object);
The protocol should provide some intermediate object which maintains the object representational state so that using reflection i can get back the object later.
Sounds like you want Protobufs: http://code.google.com/apis/protocolbuffers/docs/tutorials.html
Check out Google's protocol buffers.
Thrift is what you're looking for. You just create a definition of the structs and methods you need to call and it does all of the heavy lifting. It's got binary protocols (optionally with zlib compression or ssl). It'll probably do your taxes but you didn't hear that from me.
You might want to check out these projects and choose one:
Protocol Buffers
Thrift
Apache Avro
Here is a Thrift-vs-PB comparison I read recently. You should also refer to this Wiki for performance comparisons between these libraries.
You can check the amef protocol, an example of C++ encoding in amef would be like,
//Create a new AMEF object
AMEFObject *object = new AMEFObject();
//Add a child string object
object->addPacket("This is the Automated Message Exchange Format Object property!!","adasd");
//Add a child integer object
object->addPacket(21213);
//Add a child boolean object
object->addPacket(true);
AMEFObject *object2 = new AMEFObject();
string j = "This is the property of a nested Automated Message Exchange Format Object";
object2->addPacket(j);
object2->addPacket(134123);
object2->addPacket(false);
//Add a child character object
object2->addPacket('d');
//Add a child AMEF Object
object->addPacket(object2);
//Encode the AMEF obejct
string str = new AMEFEncoder()->encode(object,false);
Decoding in java would be like,
byte arr = amef encoded byte array value;
AMEFDecoder decoder = new AMEFDecoder()
AMEFObject object1 = AMEFDecoder.decode(arr,true);
The Protocol implementation has codecs for both C++ and Java, the interesting part is it can retain object class representation in the form of name value pairs,
I required a similar protocol in my last project, when i incidentally stumbled upon this protocol, i had actually modified the base library according to my requirements. Hope this helps you.
What about plain old ASN.1?
It would have the advantage of being really backed by a standard (and widely used). The problem is finding a compiler/runtime for each language.
This project is the ultimate comparison of Java serialization protocols:
https://github.com/eishay/jvm-serializers/wiki
Some libraries also provide C++ serialization.
I've personally ported Python Construct to Java. If there's some interest I'll be happy to start a conversion project to C++ and/or JavaScript!
http://construct.wikispaces.com/
https://github.com/ZiglioNZ/construct