Reading and writing objects via GZIP streams? - java

I am new to Java. I want to learn to use GZIPstreams. I already have tried this:
ArrayList<SubImage>myObject = new ArrayList<SubImage>(); // SubImage is a Serializable class
ObjectOutputStream compressedOutput = new ObjectOutputStream(
new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(
new File("....")))));
compressedOutput.writeObject(myObject);
and
ObjectInputStream compressedInput = new ObjectInputStream(
new BufferedInputStream(new GZIPInputStream(new FileInputStream(
new File("....")))));
myObject=(ArrayList<SubImage>)compressedInput.readObject();
When the program writes myObject to a file without throwing any exception, but when it reaches the line
myObject=(ArrayList<SubImage>)compressedInput.readObject();
it throws this exception:
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
How can I solve this problem?

You have to flush and close your outputstream. Otherwhise, at least, the BufferedOutputStream will not write everything to the file (it does in big chucks to avoid penalizing performance).
If you call compressedOutput.flush() and compressedOutput.close() it will suffice.
You can try writing a simple string object and checking if the file is well written.
How? If you write a xxx.txt.gz file you can open it with your preferred zip app and look at the xxx.txt. If the app complains, then the content is not full written.
Extended answer to a comment: compressing even more the data
Changing serialization
You could change the standard serialization of SubImage object if it's an object of your own. Check java.io.Serializable javadoc to know how to do it. It's pretty straightforward.
Writing just what you need
Serialization has the drawback that needs to write "it's a SubImage" just before every instance you write. It's not necessary if you know what's going to be there beforehand. So you could try to serialize it more manually.
To write your list, instead of writing an object write directly the values that conform your list. You will need just a DataOutputStream (but ObjectOutputStream is a DOS so you can use it anyway).
dos.writeInt(yourList.size()); // tell how many items
for (SubImage si: yourList) {
// write every field, in order (this should be a method called writeSubImage :)
dos.writeInt(...);
dos.writeInt(...);
...
}
// to read the thing just:
int size = dis.readInt();
for (int i=0; i<size; i++) {
// read every field, in the same order (this should be a method called readSubImage :)
dis.readInt(...);
dis.readInt(...);
...
// create the subimage
// add it to the list you are recreating
}
This method is more manual but if:
you know what's going to be written
you will not need this kind of serialization for many types
it's pretty affordable and definitively more compressed than the Serializable counterpart.
Have in mind that there are alternative frameworks to serialize objects or create string messages (XStream for xml, Google Protocol Buffers for binary messages, and so on). That frameworks could work directly to binary or writing a string that could be then written.
If your app will need more on this, or just curious, maybe you should look at them.
Alternative serialization frameworks
Just looked in SO and found several questions (and answers) addressing this issue:
https://stackoverflow.com/search?q=alternative+serialization+frameworks+java
I've found that XStream is pretty easy and straightforward to use. And JSON is a format pretty readable and succint (and Javascript compatible which could be a plus :).
I should go for:
Object -> JSON -> OutputStreamWriter(UTF-8) -> GZippedOutputStream -> FileOutputStream

Related

Skipping bytes with the Jackson JsonParser

I currently have a FileInputStream that I know contains interleaved objects (Metadata.class and BigInfo.class) in json format, ordered like:
[Metadata1, BigInfo1, Metadata2, BigInfo2, Metadata3, BigInfo3, ...]
I'm using Jackson's JsonParser to read these like parser.readValueAs(Metadata.class) and parser.readValueAs(BigInfo.class).
One thing I'd like to take advantage of is that the Metadata objects contain the length of the following serialized BigInfo objects, as well as whether I need to read it or not. So I want to be able to skip the appropriate number of bytes corresponding to a BigInfo object, if I don't need to read it:
Metadata metadata = parser.readValueAs(Metadata.class);
// Whether I need to read the BigInfo object that comes after
boolean mustRead = metadata.isMustReadBigInfo();
if (!mustRead) {
// Size of the bigInfo object that comes after
int bigInfoSize = metadata.getBigInfoSize();
parser.skip(bigInfoSize); // This 'skip' method is needed
}
I can achieve "skipping" by using parser.skipChildren(), but this will read (and discard) all bytes of the inputStream sequentially, and will be comparatively much slower than the underlying FileInputStream's 'skip' method, which makes use of a random access 'seek' into a position in the file.
I've tried calling 'skip(bigInfoSize)' on the parser's underlying inputStream. However, this doesn't work since JsonParser reads and stores information from the inputStream in an internal buffer, so the inputStream's position is further along than where the parser is at.
Any ideas on how to approach this would be greatly appreciated.
Thanks!
So after looking around for quite a bit, I don't think there's a clean way to do this with the jsonParser.
I ended up implementing a reader for a general InputStream, that looked for '{' and '}' (of course taking in to account nested objects), and parsed out the underlying object through ObjectMapper from the retrieved byte array.
You might be able to do something like this
RandomAccessFile file = new RandomAccessFile(filename, "r");
InputStream inputStream = Channels.newInputStream(file.getChannel());
file.seek(byteLocationToSkipTo); //This allows you to set file pointer to this location
JsonParser parser = new JsonFactory().createParser(inputStream);
Map<String, Object> map = parser.readValueAs(Metadata.class);

How do I check if ObjectInputStream has something to read? [duplicate]

I'm using an ObjectInputStream to call readObject for reading in serialized Objects. I would like to avoid having this method block, so I'm looking to use something like Inputstream.available().
InputStream.available() will tell you there are bytes available and that read() will not block. Is there an equivalent method for seriailzation that will tell you if there are Objects available and readObject will not block?
No. Although you could use the ObjectInputStream in another thread and check to see whether that has an object available. Generally polling isn't a great idea, particularly with the poor guarantees of InputStream.available.
The Java serialization API was not designed to support an available() function. If you implement your own object reader/writer functions, you can read any amount of data off the stream you like, and there is no reporting method.
So readObject() does not know how much data it will read, so it does not know how many objects are available.
As the other post suggested, your best bet is to move the reading into a separate thread.
I have an idea that by adding another InputStream into the chain one can make availability information readable by the client:
HACK!
InputStream is = ... // where we actually read the data
BufferedInputStream bis = new BufferedInputStream(is);
ObjectInputStream ois = new ObjectInputStream(bis);
if( bis.available() > N ) {
Object o = ois.readObject();
}
The tricky point is value of N. It should be big enough to cover both serialization header and object data. If those are varying wildly, no luck.
The BufferedInputStream works for me, and why not just check if(bis.available() > 0) instead of a N value, this works perfectly for me.
I think ObjectInputStream.readObject blocks(= waits until) when no input is to be read. So if there is any input at all in the stream aka if(bis.available() > 0) ObjectInputStream.readObject will not block. Keep in mind that ObjectInputStream.readObject might throw a ClassNotFoundException, and that is't a problem at all to me.

Is put-ing to a ByteBuffer then writing it to a file more efficient than writing the individual field

I want to write ONLY the values of the data members of an object into a file, so here I can can't use serialization since it writes a whole lot other information which i don't need. Here's is what I have implemented in two ways. One using byte buffer and other without using it.
Without using ByteBuffer:
1st method
public class DemoSecond {
byte characterData;
byte shortData;
byte[] integerData;
byte[] stringData;
public DemoSecond(byte characterData, byte shortData, byte[] integerData,
byte[] stringData) {
super();
this.characterData = characterData;
this.shortData = shortData;
this.integerData = integerData;
this.stringData = stringData;
}
public static void main(String[] args) {
DemoSecond dClass= new DemoSecond((byte)'c', (byte)0x7, new byte[]{3,4},
new byte[]{(byte)'p',(byte)'e',(byte)'n'});
File checking= new File("c:/objectByteArray.dat");
try {
if (!checking.exists()) {
checking.createNewFile();
}
// POINT A
FileOutputStream bo = new FileOutputStream(checking);
bo.write(dClass.characterData);
bo.write(dClass.shortData);
bo.write(dClass.integerData);
bo.write(dClass.stringData);
// POINT B
bo.close();
} catch (FileNotFoundException e) {
System.out.println("FNF");
e.printStackTrace();
} catch (IOException e) {
System.out.println("IOE");
e.printStackTrace();
}
}
}
Using byte buffer: One more thing is that the size of the data members will always remain fixed i.e. characterData= 1byte, shortData= 1byte, integerData= 2byte and stringData= 3byte. So the total size of this class is 7byte ALWAYS
2nd method
// POINT A
FileOutputStream bo = new FileOutputStream(checking);
ByteBuffer buff= ByteBuffer.allocate(7);
buff.put(dClass.characterData);
buff.put(dClass.shortData);
buff.put(dClass.integerData);
buff.put(dClass.stringData);
bo.write(buff.array());
// POINT B
I want know which one of the two methods is more optimized? And kindly give the reason also.
The above class DemoSecond is just a sample class.
My original classes will be of size 5 to 50 bytes. I don't think here size might be the issue.
But each of my classes is of fixed size like the DemoSecond
Also there are so many files of this type which I am going to write in the binary file.
PS
if I use serialization it also writes the word "characterData", "shortData", "integerData","stringData" also and other information which I don't want to write in the file. What I am corcern here is about THEIR VALUES ONLY. In case of this example its:'c', 7, 3,4'p','e','n'. I want to write only this 7bytes into the file, NOT the other informations which is USELESS to me.
As you are doing file I/O, you should bear in mind that the I/O operations are likely to be very much slower than any work done by the CPU in your output code. To a first approximation, the cost of I/O is an amount proportional to the amount of data you are writing, plus a fixed cost for each operating system call made to do the I/O.
So in your case you want to minimise the number of operating system calls to do the writing. This is done by buffering data in the application, so the application performs few put larger operating system calls.
Using a byte buffer, as you have done, is one way of doing this, so your ByteBuffer code will be more efficient than your FileOutputStream code.
But there are other considerations. Your example is not performing many writes. So it is likely to be very fast anyway. Any optimisation is likely to be a premature optimisation. Optimisations tend to make code more complicated and harder to understand. To understand your ByteBuffer code a reader needs to understand how a ByteBuffer works in addition to everything they need to understand for the FileOutputStream code. And if you ever change the file format, you are more likely to introduce a bug with the ByteBuffer code (for example, by having a too small a buffer).
Buffering of output is commonly done. So it should not surprise you that Java already provides code to help you. That code will have been written by experts, tested and debugged. Unless you have special requirements you should always use such code rather than writing your own. The code I am referring to is the BufferedOutputStream class.
To use it simply adapt your code that does not use the ByteBuffer, by changing the line of your code that opens the file to
OutputStream bo = new BufferedOutputStream(new FileOutputStream(checking));
The two methods differ only in the byte buffer allocated.
If you are concerning about unnecessary write action to file, there is already a BufferedOutputStream you can use, for which buffer is allocated internally, and if you are writing to same outputstream multiple times, it is definitely more efficient than allocating buffer every time manually.
It would be simplest to use a DataOutputStream around a BufferedOutputStream around the FileOutputStream.
NB You can't squeeze 'shortData' into a byte. Use the various primitives of DataOutputStream, and use the corresponding ones of DataInputStream when reading them back.

writing many java objects to a single file

how can I write many serializable objects to a single file and then read a few of the objects as and when needed?
You'd have to implement the indexing aspect yourself, but otherwise this could be done. When you serialize an object you essentially get back an OutputStream, which you can point to wherever you want. Storing multiple objects into a file this way would be straightforward.
The tough part comes when you want to read "a few" objects back. How are you going to know how to seek to the position in the file that contains the specific object you want? If you're always reading objects back in the same order you wrote them, from the start of the file onwards, this will not be a problem. But if you want to have random access to objects in the "middle" of the stream, you're going to have to come up with some way to determine the byte offset of the specific object you're interested in.
(This method would have nothing to do with synchronization or even Java per se; you've got to design a scheme that will fit with your requirements and environment.)
The writing part is easy. You just have to remember that you have to write all objects 'at once'. You can't create a file with serialized objects, close it and open it again to append more objects. If you try it, you'll get error messages on reading.
For deserializing, I think you have to process the complete file and keep the objects you're interested in. The others will be created but collected by the gc on the next occasion.
Make Object[] for storing your objects. It worked for me.
I'd use a Flat File Database (e. g. Berkeley DB Java Edition). Just write your nodes as rows in a table like:
Node
----
id
value
parent_id
To read more Objects from file:
public class ReadObjectFromFile {
public static Object[] readObject() throws IOException {
Object[] list = null;
try {
byte[] bytes = Files.readAllBytes(Paths.get("src/objectFile.txt"));
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes));
list = (Object[]) ois.readObject();
ois.close();
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
}
return list;
}
}

Java serialization problem

I have following code to serialize my data into a file:
out = new ObjectOutputStream(new FileOutputStream(file));
out.writeObject(chunk);
out.flush();
I read with the following:
in = new ObjectInputStream(new FileInputStream(file));
Chunk chunk = (Chunk) in.readObject();
The weird thing is, when I read the data, all members are set to default and I get no data back that I wrote before.
If I use the XML variant all works fine.
e = new XMLEncoder(new FileOutputStream(file));
e.writeObject(chunk);
e.flush();
and
e = new XMLDecoder(new FileInputStream(file));
Chunk chunk = (Chunk) e.readObject();
What is wrong with the binary format?
Update
Ok i got this now: Chunk is a complex class with classes in, other classes with other classes in and so on. At some point the contained classes is declared as Object and should be Serializable. As Steve mentioned.
Thank you for your answers.
While I can't think of a good reason why one decoder would work differently than another, I'd suggest posting the code of the Chunk object. Things to look at:
Are you declaring any fields transient? These won't get serialized
Are any of the problems occurring with nested objects or collections which themselves may not be serializable?
Are the defaults overwritten in the constructor , or somewhere else that's not going to be called in a deserialization operation?
The only reason I can think of for fields being set to default during serialization would be that they're defined as transient.
If that's not it, try distilling your code to a small, self-contained program that reproduces the problem. Most likely, you will spot the cause of the problem while you do that, otherwise post it here.
Another (admittedly unlikely) possibility besides the obvious transient fields mentioned by others is that Chunk might implement Externalizable but not actually override the necessary writeExternal / readExternal methods. That would also explain why XMLEncoder works.

Categories

Resources