Lets sat I have an object i'd like to store in a direct byte buffer.
I'd like to able access parts of the object from the direct byte buffer without de-serializing the whole object. Is there a safe way to do this?
I'm thinking you could somehow capture the byte array offsets when serializing the object, then once its been written to the direct byte buffer you would adjust these offsets according to the offset of the direct byte buffer. I'm not sure if its possible to do this...
The real question is not, is this possible, since it almost certainly is, but why do you want to do this in the first place?
If you just want to access a few fields from the object, the easiest way to do that will be to deserialize it and then copy those few fields out.
The only reason you might want to avoid the (de-)serialization is for speed, but if this is in one of your busy loops, then you are lost anyway. If the network (de-)serialization is the issue, then you should design your protocol better.
I think the best way of doing is like so, its a bit of a work around, but should be effective.
interface OffsetMemberMap {
Map<String,Long> offsetMemberMap();
}
The idea is to create objects that implement the above interface, this map will store memory addresses against Strings for each member. Child objects would be created first and once added to a DirectByteBuffer the offset position would be stored in the parent within this map.
In order to access a specific member the user would need to supply the string which addresses that member and thus only what's needed would be de-serialized. This would allow you to store large linked objects in DirectByteBuffers whilst being able to only serialize/de-serialize the bits you need when writing/reading.
Related
I was just introduced to the concept of serialisation in Java and while I 'get' the fundamentals, I can't help but feel like it's a bit of an overkill? My logic is that if I have pointers to the objects and I know how many bytes it takes up in memory. Why can't I just theoretically write these bytes to some txt file, along with the some extra bytes to indicate the type. With this, can't I just read these bytes back and restore my original object?
The amount of detail my book goes into serialisation is giving me a good indication that I'm not really understanding the importance of this and that there is probably something more subtle than just writing out all the bytes exactly as they are. Any help is greatly appreciated! (I have some background in c++ if that helps)
Why can't I just theoretically write these bytes to some txt file, along with the some extra bytes to indicate the type. With this, can't I just read these bytes back and restore my original object?
How could anyone ever read them back in? Say I'm writing code that's supposed to read in your file. Please tell me what the third byte means so that I can decode it properly.
What if the internal representation of the object contains pointers to other objects that might be in different memory locations the next time the program runs? For example, it is quite common to manage identical strings by having internal references to the same internal string object. How will writing that reference to a file be sensible given that the internal string object may not exist in the next run?
To write data to a file, you need to write it out in some specific format that actually contains all the information you need to be able to read back in. What happens to work internally for this program at this time just won't do as there's no guarantee another program at another time can make sense of it.
What you suggest works provided;
the order and type of fields doesn't change. Note this is not set at compile time.
the byte order doesn't change.
you don't have any references eg no String, enum, List or Map.
the name&package of the type doesn't change.
We at Chronicle, use a form of serialization which supports this as it's much faster but it's very limiting. You have to be very aware of those limitations and have a problem which is suitable. We also have a form of serialization which have none of these constraints, but it is slower.
The purpose of Java Serialization is to support arbitrary object graphs even if data is exchanged between systems which might arrange the data differently.
Serialization is the process of converting an object stored in memory into a stream of bytes to be transferred over a network, stored in a DB, etc.
But isn't the object already stored in memory as bits and bytes? Why do we need another process to convert the object stored as bytes into another byte representation? Can't we just transmit the object directly over the network?
I think I may be missing something in the way the objects are stored in memory, or the way the object fields are accessed.
Can someone please help me in clearing up this confusion?
Different systems don't store things in memory in the same way. The obvious example is endianness.
Serialization defines a way by which systems using different in-memory representations can communicate.
Another important fact is that the requirements on in-memory and serialized data may be different: when in-memory, fast read (and maybe write) access is desirable; when serialized, small size is desirable. It is easier to create two different formats to fit these two use cases than it is to create one format which is good for both.
An example which springs to mind is LinkedHashMap: this basically stores two versions of the mapping when in memory (one to capture insertion order; one as a traditional hash map). However, you don't need both of these representations to reconstruct the same map from a serialized form: you only need the insertion order of key/value pairs. As such, the serialized form does not store the same data as the in-memory form.
Serialization turns the pre-existing bytes from the memory into a universal form.
This is done because different systems allocate memory in different ways. Thus, we cannot ensure that the object can be saved directly from the memory on one machine and then be loaded back in properly into another, different machine.
Mabe you can find more information on this page of Oracle docs.
Explanation of object serialization from book Thinking In Java.
When you create an object, it exists for as long as you need it, but under no circumstances does it exist when the program terminates. While this makes sense at first, there are situations in which it would be incredibly useful if an object could exist and hold its information even while the program wasn’t running. Then, the next time you started the program, the object would be there and it would have the same information it had the previous time the program was running. Of course, you can get a similar effect by writing the information to a file or to a database, but in the spirit of making everything an object, it would be quite convenient to declare an object to be "persistent," and have all the details taken care of for you.
Java’s object serialization allows you to take any object that implements the Serializable interface and turn it into a sequence of bytes that can later be fully restored to regenerate the original object. This is even true across a network, which means that the serialization mechanism automatically compensates for differences in operating systems. That is, you can create an object on a Windows machine, serialize it, and send it across the network to a Unix machine, where it will be correctly reconstructed. You don’t have to worry about the data representations on the different machines, the byte ordering, or any other details.
Hope this helps you.
Let's go with that set of mind : we take the object as is , and we send it as byte array over the network. another socket/httphandler receives that byte array.
now, two things come to mind:
ho much bytes to send?
what are these bytes? what class do these btyes represent?
you will have to provide this data as well. so for this action alone we need extra 2 steps.
Now, in C# and Java, as opposed to C++, the objects are scattered throught the heap, each object hold references to the objects it containes , so now we have another requirement
recursivly "catch" all the inner object and pack them into the byte array
now we get packed byte array which represent some object hirarchy, we need to tell the other side how to de-pack this byte array back to object+the object it holds so
Send information on how to unpack that byte array to object hirarchy
Some entities a obejct have cannot be sent over the net, such as functions. so now we have yet another step
Strip away things that cannot be serialized, like functions
this process goes on and one, for every new solution you will find many problems. Serialization is the process of taking that byte array you are talking about and making it something that can be handled in other enviroments, like network/files.
Suppose I'm using a Deflater to compress a stream of bytes, and at some intervals I have the option of feeding it with two different byte arrays (two alternative representations of the same info), so that I can choose the most compressible one. Ideally, I would like to be able to clone the state of a "live" deflater, so that I can feed each instance with an array, check the results, and discard the undesirable one.
Alternatively, I'd like to mark the current state (sort of a savepoint) so that, after feeding and compressing with setInput() + deflate() I can rollback/reset to that state to try with different data.
Looking at the API, this seems to me rather impossible... nor even reimplementing the Deflater (not at least if one wants to take advantage of the internal native implementation). Am I right? Any ideas or experiences?
It does not appear that the Java interface to zlib provides zlib's deflateCopy() operation. It is possible that the inherited clone operation is properly implemented and does a deflateCopy(), but I don't know.
I'm trying to design a lightweight way to store persistent data in Java. I've already got a very efficient way to serialize POJOs to DataOutputStreams (and back), but I'm trying to think of a good way to ensure that changes to the data in the POJOs gets serialized when necessary.
This is for a client-side app where I'm trying to keep the size of the eventual distributable as low as possible, so I'm reluctant to use anything that would pull-in heavy-weight dependencies. Right now my distributable is almost 10MB, and I don't want it to get much bigger.
I've considered DB4O but its too heavy - I need something light. Really its probably more a design pattern I need, rather than a library.
Any ideas?
The 'lightest weight' persistence option will almost surely be simply marking some classes Serializable and reading/writing from some fixed location. Are you trying to accomplish something more complex than this? If so, it's time to bundle hsqldb and use an ORM.
If your users are tech savvy, or you're just worried about initial payload, there are libraries which can pull dependencies at runtime, such as Grape.
If you already have a compact data output format in bytes (which I assume you have if you can persist efficiently to a DataOutputStream) then an efficient and general technique is to use run-length-encoding on the difference between the previous byte array output and the new byte array output.
Points to note:
If the object has not changed, the difference in byte arrays will be an array of zeros and hence will compress very small....
For the first time you serialize the object, consider the previous output to be all zeros so that you communicate a complete set of data
You probably want to be a bit clever when the object has variable-sized substructures....
You can also try zipping the difference rather than RLE - might be more efficient in some cases where you have a large object graph with a lot of changes
Learning Java, so be gentle please. Ideally I need to create an array of bytes that will point to a portion of a bigger array:
byte[] big = new byte[1000];
// C-style code starts
load(file,big);
byte[100] sub = big + 200;
// C-style code ends
I know this is not possible in Java and there are two work-arounds that come to mind and would include:
Either copying portion of big into sub iterating through big.
Or writting own class that will take a reference to big + offset + size and implementing the "subarray" through accessor methods using big as the actual underlying data structure.
The task I am trying to solve is to load a file into memory an then gain read-only access to the records stored withing the file through a class. The speed is paramount, hence ideally I'd like to avoid copying or accessor methods. And since I'm learning Java, I'd like to stick with it.
Any other alternatives I've got? Please do ask questions if I didn't explain the task well enough.
Creating an array as a "view" of an other array is not possible in Java. But you could use java.nio.ByteBuffer, which is basically the class you suggest in work-around #2. For instance:
ByteBuffer subBuf = ByteBuffer.wrap(big, 200, 100).slice().asReadOnlyBuffer();
No copying involved (some object creation, though). As a standard library class, I'd also assume that ByteBuffer is more likely to receive special treatment wrt. "JIT" optimizations by the JVM than a custom one.
If you want to read a file fast and with low-level access, check the java nio stuff. Here's an example from java almanac.
You can use a mapped byte buffer to navigate within the file content.
Take a look at the source for java.lang.String (it'll be in the src.zip or src.jar). You will see that they have a an array of cahrs and then an start and end. So, yes, the solution is to use a class to do it.
Here is a link to the source online.
The variables of interest are:
value
offset
count
substring is probably a good method to look at as a starting point.
If you want to read directlry from the file make use of the java.nio.channels.FileChannel class, specifically the map() method - that will let you use memory mapped I/O which will be very fast and use less memory than the copying to arrays.