Is it possible to use struct-like constructs in Java? - java

I'm considering using Java for a large project but I haven't been able to find anything that remotely represented structures in Java. I need to be able to convert network packets to structures/classes that can be used in the application.
I know that it is possible to use RandomAccessFile but this way is NOT acceptable. So I'm curious if it is possible to "cast" a set of bytes to a structure like I could do in C. If this is not possible then I cannot use Java.
So the question I'm asking is if it is possible to cast aligned data to a class without any extra effort beyond specifying the alignment and data types?

No. You cannot cast a array of bytes to a class object.
That being said, you can use a java.nio.Buffer and easily extract the fields you need to an object like this:
class Packet {
private final int type;
private final float data1;
private final short data2;
public Packet(byte[] bytes) {
ByteBuffer bb = ByteBuffer.wrap(bytes);
bb.order(ByteOrder.BIG_ENDIAN); // or LITTLE_ENDIAN
type = bb.getInt();
data1 = bb.getFloat();
data2 = bb.getShort();
}
}

You're basically asking whether you can use a C-specific solution to a problem in another language. The answer is, predictably, 'no'.
However, it is perfectly possible to construct a class that takes a set of bytes in its constructor and constructs an appropriate instance.
class Foo {
int someField;
String anotherField;
public Foo(byte[] bytes) {
someField = someFieldFromBytes(bytes);
anotherField = anotherFieldFromBytes(bytes);
etc.
}
}
You can ensure there is a one-to-one mapping of class instances to byte arrays. Add a toBytes() method to serialize an instance into bytes.

No, you cannot do that. Java simply doesn't have the same concepts as C.
You can create a class that behaves much like a struct:
public class Structure {
public int field1;
public String field2;
}
and you can have a constructor that takes an array or bytes or a DataInput to read the bytes:
public class Structure {
...
public Structure(byte[] data) {
this(new DataInputStream(new ByteArrayInputStream(data)));
}
public Structure(DataInput in) {
field1 = in.readInt();
field2 = in.readUTF();
}
}
then read bytes off the wire and pump them into Structures:
byte[] bytes = network.read();
DataInputStream stream = new DataInputStream(new ByteArrayInputStream(bytes));
Structure structure1 = new Structure(stream);
Structure structure2 = new Structure(stream);
...
It's not as concise as C but it's pretty close. Note that the DataInput interface cleanly removes any mucking around with endianness on your behalf, so that's definitely a benefit over C.

As Joshua says, serialization is the typical way to do these kinds of things. However you there are other binary protocols like MessagePack, ProtocolBuffers, and AvRO.
If you want to play with the bytecode structures, look at ASM and CGLIB; these are very common in Java applications.

There is nothing which matches your description.
The closest thing to a struct in Java is a simple class which holds values either accessible through it's fields or set/get methods.
The typical means to convert between Java class instances and on-the-wire representations is Java serialization which can be heavily customized as need be. It is what is used by Java's Remote Method Invocation API and works extremely well.

ByteBuffer.wrap(new byte[] {}).getDouble();

No, this is not possible. You're trying to use Java like C, which is bound to cause complications. Either learn to do things the Java way, or go back to C.
In this case, the Java way would probably involve DataInputStream and/or DataOutputStream.

You cannot cast array of bytes to instance of class.
But you can do much much more with java.
Java has internal, very strong and very flexible mechanism of serialization. This is what you need. You can read and write object to/from stream.
If both sides are written in java, there are no problem at all. If one of sides is not java you can customeze your serialization. Start from reading javadoc of java.util.Serializable.

Related

static creator method for each implementing class of interface

I need to make a server which can process different kinds of messages.
The messages come in a stream of bytes and are identified by the first 2 bytes (opcode).
So I have this interface
public interface Message<T> extends Encodeable {
Message<T> act(T actOn, UserCommandInvocationProtocol<T> protocol);
byte[] encode(); // from Encodeable
}
The server should create a Message according to the opcode in the stream.
my current solution is hashmap that maps each opcode to a matching lambda, for example:
addMessageDecoder(3, nextByte -> {
String[] strArr = strPtrnDec2.decodeNextByte(nextByte);
return strArr != null ? new LoginMessage(strArr[0], strArr[1]) : null; // LoginMessage implements Message
});
the interface used for the lambdas:
interface MessageDecoder<T> {
Message<T> decodeNextByte(byte nextByte);
}
Notice how I have encode() in the Message interface, so each message can turn to bytes.
What I would have liked to have is static decode() for each message that creates a matching Message object, If Java had allowed it, it would have looked something like this:
public interface Message<T> extends Encodeable {
Message<T> act(T actOn, UserCommandInvocationProtocol<T> protocol);
byte[] encode();
static Message<T> decodeNextByte(byte nextByte): // does the job of interface MessageDecoder<T>
}
What would be the best practice here? or maybe my current solution is fine?
Thanks
You can have a default static method in newer versions of Java, but that’s probably not what you want.
You need another class that knows how to create instances of your messages, with a switch hat will read in a couple of bytes, then determine which message you’re trying to decode and then pass it to that. You won’t be able to do that with one single level of indirection. In effect, you want a MessageFactory to do your parsing for you based on the bytes read at the top level.
That said, if you’re looking to do this more widely, you should probably consider a library like protobuf which will do this for you.
There are times when a simple static method will do to sort all this out, and attempts to magical syntactic sugar does not help. there is no getting around the fact that you have to read two bytes, and associate that with an object instance. So write a simple static factory method that does this. It reads two bytes, and returns an instance of the right object.
But where do you specify the associations? This can either be done in a global static hashtable, or each class can specify their associated bytes, and an initializer method will walk through the known classes and build that map. There is not a whole lot of advantage to either way, because at the end of the day, what is required is knowing the list of implemented classes.
I always just make a single static factory method. In there is a simple if-then-else statement that tests the incoming bytes and returns the right instance. If there are a lot of classes, I will build a some sort of hashtable to speed this lookup. But if the number of classes is small, e.g. less than 30, then a simple if-then-else code is probably fast enough.
The advantage of the static factory method is that there is one place where the code is associated with the class, and that is in that one method.

Compact Java Externalization

I am trying to figure out a way to serialize simple Java objects (ie all the fields are primitive types) compactly, without the big header that normally gets added on when you use writeExternal. It does not need to be super general, backwards compatible across versions, or anything like that, I just want it to work with ObjectOutputStreams (or something similar) and not add ~100 bytes to the size of each object I serialize.
More concretely, I have a class that has 3 members: a boolean flag and two longs. I should be able to represent this object in 17 bytes. Here is a simplified version of the code:
class Record implements Externalizable {
bool b;
long id;
long uid;
public void writeExternal(ObjectOutput out) throws IOException {
int size = 1 + 8 + 8; //I know, I know, but there's no sizeof
ByteBuffer buff = ByteBuffer.allocate(size);
if (b) {
buff.put((byte) 1);
} else {
buff.put((byte) 0);
}
buff.putLong(id);
buff.putLong(uid);
out.write(buff.array(), 0, size);
}
}
Elsewhere, these are stored by being passed into a method like the following:
public void store(Object value) throws IOException {
ObjectOutputStream out = getStream();
out.writeObject(value);
out.close();
}
After I store just one of these objects in a file this way, the file has a size of 128 bytes (and 256 for two of them, so it's not amortized). Looking at the file, it is clear that it is writing in a header similar to the one used in default serialization (which, for the record, uses about 376 bytes to store one of these). I can see that my writeExternal method is getting invoked (I put in some logging), so that isn't the problem. Is this just a fundamental limitation of the way ObjectOutputStream deserializes things? Do I need to work on raw DataOutputStreams to get the kind of compactness I want?
[EDIT: In case anyone is wondering, I ended up using DataOutputStreams directly, which turned out to be easier than I'd feared]

Byte based (de)serialization in Java?

I want to try abusing Java classes as structures and for that I'm wondering if it is possible to serialize a byte array to a class and other way around.
So if I have a class like this:
public class Handshake
{
byte command;
byte error;
short size;
int major;
int ts;
char[] secret; // aligned size = 32 bytes
}
Is there an easy way (without having to manually read bytes and fill out the class which requires 3 times as much code) to deserialize a set of bytes into this class? I know that Java doesn't have structs but I'm wondering if it is possible to simplify the serialization process so it does it automatically. The bytes are not from Java's serializer, they are just aligned bytes derived from C structs.
The bytes are not from Java's serializer, they are just aligned bytes
derived from C structs.
Bad idea. It can break as soon as someone compiles that code on a different platform, using a different compiler or settings, etc.
Much better: use a standardized binary interface with implementations in Java and C++ like ASN.1 or Google's Protocol Buffers.
You can write a library to do the deserializtion using reflection. This may result in more code being required, but may suit your needs. It worth nothing that char in Java 16-bit rather than 8 bit and a char[] is a separate Object, unlike in C.
In short you can write a library which reads this data without touching the Handshake class. Only you can decide if this is actually easier than adding a method or two to the handshake class..
Do not do that! I will break sooner or later. Use some binary serialization format, like [Hessian][1], which supports both java and C++ (I'm not aware of anything that works on plain C)
Also remember C does not force size for int's or long's, they are platform dependent.
So if you must use C, and you are forced to write your own library, be very careful.

Binary serialization protocol

I have a requirement where i need to transfer information through the wire(binary over tcp) between 2 applications. One is in Java and the other in C++. I need a protocol implementation to transfer objects between these 2 applications. The Object classes are present in both the applications (are mapped accordingly). I just need some encoding scheme on one side which retains the Object representation on one side and can be decoded on the other side as a complete Object.
For eg,
C++ class
class Person
{
int age;
string name;
};
Java class
class Person
{
int age;
String name;
}
C++ encoding
Person p;
p.age = 20;
p.name = "somename";
char[] arr = SomeProtocolEncoder.encode(p);
socket.send(arr);
Java decoding
byte[] arr = socket.read();
SomeProtocolIntermediateObject object = SomeProtocolDecoder.decode(arr);
Person p = (Person)ReflectionUtil.get(object);
The protocol should provide some intermediate object which maintains the object representational state so that using reflection i can get back the object later.
Sounds like you want Protobufs: http://code.google.com/apis/protocolbuffers/docs/tutorials.html
Check out Google's protocol buffers.
Thrift is what you're looking for. You just create a definition of the structs and methods you need to call and it does all of the heavy lifting. It's got binary protocols (optionally with zlib compression or ssl). It'll probably do your taxes but you didn't hear that from me.
You might want to check out these projects and choose one:
Protocol Buffers
Thrift
Apache Avro
Here is a Thrift-vs-PB comparison I read recently. You should also refer to this Wiki for performance comparisons between these libraries.
You can check the amef protocol, an example of C++ encoding in amef would be like,
//Create a new AMEF object
AMEFObject *object = new AMEFObject();
//Add a child string object
object->addPacket("This is the Automated Message Exchange Format Object property!!","adasd");
//Add a child integer object
object->addPacket(21213);
//Add a child boolean object
object->addPacket(true);
AMEFObject *object2 = new AMEFObject();
string j = "This is the property of a nested Automated Message Exchange Format Object";
object2->addPacket(j);
object2->addPacket(134123);
object2->addPacket(false);
//Add a child character object
object2->addPacket('d');
//Add a child AMEF Object
object->addPacket(object2);
//Encode the AMEF obejct
string str = new AMEFEncoder()->encode(object,false);
Decoding in java would be like,
byte arr = amef encoded byte array value;
AMEFDecoder decoder = new AMEFDecoder()
AMEFObject object1 = AMEFDecoder.decode(arr,true);
The Protocol implementation has codecs for both C++ and Java, the interesting part is it can retain object class representation in the form of name value pairs,
I required a similar protocol in my last project, when i incidentally stumbled upon this protocol, i had actually modified the base library according to my requirements. Hope this helps you.
What about plain old ASN.1?
It would have the advantage of being really backed by a standard (and widely used). The problem is finding a compiler/runtime for each language.
This project is the ultimate comparison of Java serialization protocols:
https://github.com/eishay/jvm-serializers/wiki
Some libraries also provide C++ serialization.
I've personally ported Python Construct to Java. If there's some interest I'll be happy to start a conversion project to C++ and/or JavaScript!
http://construct.wikispaces.com/
https://github.com/ZiglioNZ/construct

Large amount of constants in Java

I need to include about 1 MByte of data in a Java application, for very fast and easy access in the rest of the source code. My main background is not Java, so my initial idea was to convert the data directly to Java source code, defining 1MByte of constant arrays, classes (instead of C++ struct) etc., something like this:
public final/immutable/const MyClass MyList[] = {
{ 23012, 22, "Hamburger"} ,
{ 28375, 123, "Kieler"}
};
However, it seems that Java does not support such constructs. Is this correct? If yes, what is the best solution to this problem?
NOTE: The data consists of 2 tables with each about 50000 records of data, which is to be searched in various ways. This may require some indexes later, with significant more records, maybe 1 million records, saved this way. I expect the application to start up very fast, without iterating through these records.
I personally wouldn't put it in source form.
Instead, include the data in some appropriate raw format in your jar file (I'm assuming you'll be packaging the application or library up) and use Class.getResourceAsStream or ClassLoader.getResourceAsStream to load it.
You may very well want a class to encapsulate loading, caching and providing this data - but I don't see much benefit from converting it into source code.
Due to limitations of the java bytecode files, class-files can not be larger than 64k iirc. (They are simply not intended for this type of data.)
I would load the data upon starting the program, using something like the following lines of code:
import java.io.*;
import java.util.*;
public class Test {
public static void main(String... args) throws IOException {
List<DataRecord> records = new ArrayList<DataRecord>();
BufferedReader br = new BufferedReader(new FileReader("data.txt"));
String s;
while ((s = br.readLine()) != null) {
String[] arr = s.split(" ");
int i = Integer.parseInt(arr[0]);
int j = Integer.parseInt(arr[1]);
records.add(new DataRecord(i, j, arr[0]));
}
}
}
class DataRecord {
public final int i, j;
public final String s;
public DataRecord(int i, int j, String s) {
this.i = i;
this.j = j;
this.s = s;
}
}
(NB: The Scanner is quite slow, so don't be tempted to use it just because it has a simple interface. Stick with some form of BufferedReader and split, or StringTokenizer.)
Efficiency can of course be improved if you transform the data into a binary format. In that case, you can make use of the DataInputStream (but don't forget to go through some BufferedInputStream or BufferedReader)
Depending on how you wish to access the data, you might be better off storing the records in a hash-map (HashMap<Integer, DataRecord>) (having i or j as the key).
If you wish to load the data at the same time as the JVM loads the class file itself (roughly!) you could do the read / initialization, not within a method, but ecapsulated in static { ... }.
For a memory-mapped approach, have a look at the java.nio.channels-package in java. Especially the method
public abstract MappedByteBuffer map(FileChannel.MapMode mode, long position,long size) throws IOException
Complete code examples can be found here.
Dan Bornstein (the lead developer of DalvikVM) explains a solution to your problem in this talk (Look around 0:30:00). However I doubt the solution applies to as much data as a megabyte.
An idea is that you use enumerators, but I'm not sure if this suits to your implementation, and it also depends on how you are planning to use the data.
public enum Stuff {
HAMBURGER (23012, 22),
KIELER (28375, 123);
private int a;
private int b;
//private instantiation, does not need to be called explicitly.
private Stuff(int a, int b) {
this.a = a;
this.b = b;
}
public int getAvalue() {
return this.a;
}
public int getBvalue() {
return this.b;
}
}
These can then be accessed like:
Stuff someThing = Stuff.HAMBURGER;
int hamburgerA = Stuff.HAMBURGER.getA() // = 23012
Another idea is using a static initializer to set private fields of a class.
Putting the data into source could would actually not be the fastest solution, not by a long shot. Loading a Java class is quite complex and slow (at least on a platform that does bytecode verification, not sure about Android).
The fastest possible way to do this would be to define your own binary index format. You could then read that as a byte[] (possibly using memory mapping) or even a RandomAccessFile without interpreting it in any way until you start accessing it. The cost of this would be the complexity of the code that accesses it. With fixed-size records, a sorted list of records that's accessed via binary search would still be pretty simple, but anything else is going to get ugly.
Though before doing that, are you sure this isn't premature optimization? The easiest (and probably still quite fast) solution would be to jsut serialize a Map, List or array - have you tried this and determined that it is, in fact, too slow?
convert the data directly to Java source code, defining 1MByte of constant arrays, classes
Be aware that there are strict constraints on the size of classes and their structures [ref JVM Spec.
This is how you define it in Java, if I understood what you are after:
public final Object[][] myList = {
{ 23012, 22, "Hamburger"} ,
{ 28375, 123, "Kieler"}
};
It looks like you plan to write your own lightweight database.
If you can limit the length of the String to a realistic max size the following might work:
write each entry into a binary file, the entries have the same size, so you waste some bytes with each entry(int a, int b,int stringsize, string, padding)
To read an entry open the file as a random access file, multiply the index with the length of an entry to get the offset and seek the position.
Put the bytes into a bytebuffer and read the values, the String has to be converted with the String(byte[] ,int start, int length,Charset) ctor.
If you can't limit the length of a block dump the strings in an additional file and only store the offsets in your table. This requires an additional file access and makes modifiying the data hard.
Some informationa about random file-access in java can be found here http://java.sun.com/docs/books/tutorial/essential/io/rafs.html.
For faster access you can cache some of your read entries in a Hashmap and always remove the oldest from the map when reading a new one.
Pseudo code (wont compile):
class MyDataStore
{
FileChannel fc = null;
Map<Integer,Entry> mychace = new HashMap<Integer, Entry>();
int chaceSize = 50000;
ArrayList<Integer> queue = new ArrayList();
static final int entryLength = 100;//byte
void open(File f)throws Exception{fc = f.newByteChannel()}
void close()throws Exception{fc.close();fc = null;}
Entry getEntryAt(int index)
{
if(mychace.contains(index))return mychace.get(index);
long pos = index * entryLength; fc.seek(pos);ByteBuffer
b = new ByteBuffer(100);
fc.read(b);
Entry a = new Entry(b);
queue.add(index);
mychace.put(index,a);
if(queue.size()>chacesize)mychace.remove(queue.remove(0));
return a;
}
}
class Entry{
int a; int b; String s;
public Entry(Bytebuffer bb)
{
a = bb.getInt();
b = bb.getInt();
int size = bb.getInt();
byte[] bin = new byte[size];
bb.get(bin);
s = new String(bin);
}
}
Missing from the pseudocode:
writing, since you need it for constant data
total number of entries/sizeof file, only needs an additional integer at the beginning of the file and an additional 4 byte offset for each access operation.
You could also declare a static class (or a set of static classes) exposing the desidered values as methods. After all, you want your code to be able to find the value for a given name, and don't want the value to change.
So: location=MyLibOfConstants.returnHamburgerLocation().zipcode
And you can store this stuff in a hashtable with lazyinitialization, if you thing that calculating it on the fly would be a waste of time.
Isn't a cache what you need?
As classes it is loaded in the memory, not really limited to a defined size, should be as fast as using constants...
Actually it can even search data with some kind of indexes (exemple with the object hashcode...)
You can for exemple create all your data arrays (ex { 23012, 22, "Hamburger"}) and then create 3 hashmap:
map1.put(23012,hamburgerItem);
map2.put(22,hamburgerItem);
map3.put("Hamburger",hamburgerItem);
This way you can search very fast in one of the map according to the parameter you have...
(but this works only if your keys are unique in the map... this is just an exemple that could inspire you)
At work we have a very big webapp (80 weblogic instances) and it's almost what we do: caching everywhere. From a countrylist in database, create a cache...
There are many different kind of caches, you should check the link and choose what you need...
http://en.wikipedia.org/wiki/Cache_algorithms
Java serialization sounds like something that needs to be parsed... not good. Isn't there some kind of standard format for storing data in a stream, that can be read/looked up using a standard API without parsing it?
If you were to create the data in code, then it would all be loaded on first use. This is unlikely to be much more efficient than loading from a separate file - as well as parsing the data in the class file, the JVM has to verify and compile the bytecodes to create each object a million times, rather than just the once if you load it from a loop.
If you want random access and can't use a memory mapped file, then there is a RandomAccessFile which might work. You need either to load a index on start, or you need to make the entries a fixed length.
You might want to check whether the HDF5 libraries run on your platform; it may be overkill for such a simple and small dataset though.
I would recommend to use assets for storing such data.

Categories

Resources