Partial deserialization of a huge binary file - Java - java

This is my first question to StackOverflow. Please let me know if the question is not clear and need any more details.
I have a class which has three attributes like this:
class SampleClass {
long [] field1;
float[] field2;
float[] field3;
}
A huge SampleClass object is built(with about a billion entries for each array). This object is serialized in one host and the serialized file is uploaded to another machine. Now I want to deserialize only a portion of the file so that I can get a smaller SampleClass object with about 10 indices filled for each field and not the complete object. Because this machine does not have enough capacity to load such a huge object in memory. Is this possible?
The object is serialized using JAVA's writeObject method and it is done by a different utility and so I have no control over it. Thanks in advance.

Forget using the Java serialization API - it's only designed to deserialize everything. If you have no control over how the serialized file is generated, then you should consider parsing the serialized file yourself and extracting the necessary parts - it's not really that hard.
The Java serialization format is well-documented (see e.g. official docs, informative article), and tools exist to parse the format (e.g. Serialysis, jdeserialize) though it isn't particularly hard to write your own tool based on the format spec.
Once you can parse the serialized data, you can simply extract what you need and skip over what you don't need.

Your best bet is to actually serialize only the portion you need, given that you cannot control/override serialization itself. On the machine which serialized entire file and is able to deserialize it:
1) load entire file into object
2) create new object of SampleClass
3) copy elements from required region in each array to blank SampleClass object
4) serialize this smaller version
If it helps any, fields can be made transient so they will not be serialized.
Still, it looks to me that this object should be in database:
It does not fit virtual memory
only portion of it is required at given time.
So you could use hard disk to store it and queries to get required portions.

Related

Serialize same type objects to the same file

I want objects of a same serializable class to serialize in the same file. I have tried to do it by putting everything in an array and then serialize by I want the objects to be serialized individually and saved into the same file
If you want to serialize multiple objects, why not serialize a collection of those objects? When you deserialize the collection you can access the objects again by iterating. Or if you have some unique identifer for the object you can put them in a map instead and serialize that.
Worth mentioning is that Java serialization will be going away in future with the newer versions. You are better off using a JSON serializer / de-serializer in my opinion, unless you of course are trying to hide the contents somehow. I use FasterXML myself and it works really great with POJOs.
Have you already tried this?
FileOutputStream fout = new FileOutputStream("YourPath", true);
The true value as the second parameter allows you to write in this file in append mode.
So you can serialize them individually and they will be serialized in the same file.

Sending object from python to java

In Python i have a class with some string attributes and a function that returns an object of this class with atttributes set (sometimes can return an array of objects).
Theres any way to get this return in Java? Where i can see the strings of the object?
I Tried Jython but couldnt make it work!
Use json.dump function in Python to serialize your object into json format. Then use something like json.org library in Java to parse this object into Java object, some example over here.
Mind that not every object might be serializable, in general data structures like dictionaries or lists are easily serializable, from your description it seems like you want to move an instance of an object from one program into another, which is not possible to be done automatically and requires human work in rewriting the code as instances of classes contain not only data but also functions (methods).
Good luck!

Parsing data from untrusted Java serialized object

I need to parse untrusted Java serialized objects. The data is given to me as a byte array (written at some point by ObjectOutputStream).
I do not want to simply call ObjectInputStream.readObject() and/or load the actual object. I am looking for a way to safely parse the bytes and grab field names & values.
--
Here's a little summary of my attempt so far, after taking a look at the ObjectInputStream procedure for deserializing objects.
I have tried to extract field types/names (as unicode strings) recursively based on expected stream constants. I end up with a list of field names whose values should appear in the byte array in order. I am uneasy about this approach because it is probably buggy. Especially accommodating for what seems to be individual serialization protocols followed by HashMap, ArrayList, etc. But it might work, if I can figure out a way to read the bytes that represent field values:
I can try to read and store primitives based on size/offset, but when I encounter my first object, it gets a bit more complicated -- there is no clear way to distinguish between which bytes are associated with which values anymore (without actually loading the object in the way that ObjectInputStream probably does?).
--
Can anyone suggest either a potential solution that I'm obviously looking past, or a trusted library that can help parse the serialized data without loading objects?
Thank you for reading, and for all comments/suggestions!!! I apologize if something is unclear and I would be happy to clarify if you bear with me.
You can't do this in principle. Any Java class can take over its own Serialization and write arbitrary data to the stream that only it knows how to parse and reconstruct, via code that is only invoked during deserialization.

Store byte array and string in same file

Earlier I was storing only string in my file which can be store in SD card now I would to store byte[] also in same files. So do I just need to store normal to the file like this:
for string:- bufferWritter.write(data);
for bytes:- FileUtils.writeByteArrayToFile(new File("pathname"), myByteArray)
or fos.write(myByteArray);
So if I do like that then how can I differ message whether it is string or byte[].
Even I would like to know that is this is a good way to do?
You would need to write some sort of header to your file which should state what it does represent.
If you have multiple items in one file, also specify the length of the data parts.
[byte] sort of data (0=String, 1=Image)
And then the actual data.
But I would recommend you use a different format like json or make a serializable object.
I would give a try to some JSON implementation (maybe GSON? or some alternative) so you can have stored mixed data types in one file or write your own de/serialization routine so you can store whole objects.
Note: if you implement Serializable interface by a class that represent object to be stored, don't forget to re-generate UUID each time you change contents of that class, it will save you some time figuring out what went wrong

Java statically defined dictionary / HashMap

I have a block of static data that I need to organize into an array containing hash maps. Specifically, I want to have a static object in my app that contains the time zone information like this: https://gist.github.com/pamelafox/986163
Seeing how clean the definition looks like in Python, and knowing how a similarly clean definition can be created with some of the other languages I know, I was hoping there is a cleaner approach to it in Java then just running map.put(...) repeatedly. I have seen this question: How to give the static value to HashMap? but what wondering if there is a better way to do it?
One solution would be to store the data as a normal string in whatever format you can think of and then convert the string representation into the map (static, non-static or as a one-time initialized instance).
An improvement of this method would be to store the data in a file and load it (can be included in .jar package, when you use jar). This solution would have the advantage that data can be easily updated.

Categories

Resources