Skipping bytes with the Jackson JsonParser - java

I currently have a FileInputStream that I know contains interleaved objects (Metadata.class and BigInfo.class) in json format, ordered like:
[Metadata1, BigInfo1, Metadata2, BigInfo2, Metadata3, BigInfo3, ...]
I'm using Jackson's JsonParser to read these like parser.readValueAs(Metadata.class) and parser.readValueAs(BigInfo.class).
One thing I'd like to take advantage of is that the Metadata objects contain the length of the following serialized BigInfo objects, as well as whether I need to read it or not. So I want to be able to skip the appropriate number of bytes corresponding to a BigInfo object, if I don't need to read it:
Metadata metadata = parser.readValueAs(Metadata.class);
// Whether I need to read the BigInfo object that comes after
boolean mustRead = metadata.isMustReadBigInfo();
if (!mustRead) {
// Size of the bigInfo object that comes after
int bigInfoSize = metadata.getBigInfoSize();
parser.skip(bigInfoSize); // This 'skip' method is needed
}
I can achieve "skipping" by using parser.skipChildren(), but this will read (and discard) all bytes of the inputStream sequentially, and will be comparatively much slower than the underlying FileInputStream's 'skip' method, which makes use of a random access 'seek' into a position in the file.
I've tried calling 'skip(bigInfoSize)' on the parser's underlying inputStream. However, this doesn't work since JsonParser reads and stores information from the inputStream in an internal buffer, so the inputStream's position is further along than where the parser is at.
Any ideas on how to approach this would be greatly appreciated.
Thanks!

So after looking around for quite a bit, I don't think there's a clean way to do this with the jsonParser.
I ended up implementing a reader for a general InputStream, that looked for '{' and '}' (of course taking in to account nested objects), and parsed out the underlying object through ObjectMapper from the retrieved byte array.

You might be able to do something like this
RandomAccessFile file = new RandomAccessFile(filename, "r");
InputStream inputStream = Channels.newInputStream(file.getChannel());
file.seek(byteLocationToSkipTo); //This allows you to set file pointer to this location
JsonParser parser = new JsonFactory().createParser(inputStream);
Map<String, Object> map = parser.readValueAs(Metadata.class);

Related

Byte array to File object without saving to disk

I have a method that takes in a byte[] that came from Files.readAllBytes() in a different part of the code for either .txt or .docx files. I want to create a new File object from the bytes to later read contents from, without saving that file to disk. Is this possible? Or is there a better way to get the contents from the File bytes?
That's not how it works. a java.io.File object is a light wrapper: Check out the source code - it's got a String field that contains the path and that is all it has aside from some bookkeeping stuff.
It is not possible to represent arbitrary data with a java.io.File object. j.i.File objects represent literal files on disk and are not capable of representing anything else.
Files.readAllBytes gets you the contents from the bytes, that's.. why the method has that name.
The usual solution is that a method in some library that takes a File is overloaded; there will also be a method that takes a byte[], or, if that isn't around, a method that takes an InputStream (you can make an IS from a byte[] easily: new ByteArrayInputStream(byteArr) will do the job).
If the API you are using doesn't contain any such methods, it's a bad API and you should either find something else, or grit your teeth and accept that you're using a bad API, with all the workarounds that this implies, including having to save bytes to disk just to satisfy the asinine API.
But look first; I bet there is a byte[] and/or InputStream variant (or possibly URL or ByteBuffer or ByteStream or a few other more exotic variants).

Java Pattern for generating and parsing data in stream

I have certain protocol that I need to parse or generate. The parsing takes an InputStream and produce different type of objects depending the byte stream. The generator takes different inputs and spit out an OutputStream that allows writing to a target stream. Before reading / writing to the stream there might be some header variables that needs to be initialized.
For code right now looks something like follows:
// Parser.
DataX parsed = DataX.parse(new ByteInputStream(new byte [] {..}));
// Access short field of DataX.
System.out.println(parsed.getX() + parsed.getY()); // data in the header.
// Access some long field by spitting InputStream.
System.out.println(parsed.buildInputStream().readFully()); // data as bytes.
// Generator.
OutputStream outstream =
DataX.Generator(new FileOutputStream('output')).setX(x).setY(y).build();
// Write data.
outstream.write(new byte[] {...});
DataX extends a class Data that implements two methods deserialize and serialize as abstract method which will eventually be called somewhere inside parse() and Generator().
This is a self-made design pattern, so I would like to ask if this makes sense and whether there is a more Java-ist way to do this kind of thing ?
Edit: The reason the stream needs to be incorporate is because the data might be huge (such as a file) and will not be feasible/desirable to store it entirely in the memory.
In general it is a good idea to keep data (header values) and its presentation (streams) separate.
Some component accepts streams (Factory method pattern)
and returns plain objects.
Those objects are serialized to streams via a different component later on.
It shouldn't matter if it is a stream at the moment. If later you want to work with Json objects - the design doesn't need to change dramatically.
I think a synmetrical pattern is easy to understand.
// Parser
DataX header = new DataX(); // uninitialized header
InputStream is = header.input(new FileInputStream(...));
// At this point header is initialized.
// user reads data from is.
// Generator
DataX header = new DataX(); // uninitialized header
header.setX(x).setY(y); // initialize header
OutputStream os = header.output(new FileOutputStream(...));
// At this point header is written to os.
// user writes data to os.

Writing BitSet to output file without overhead?

I get a line of overhead ("java.util.BitSet") when writing a BitSet to an output file using ObjectOutputStream.writeObject().
Anyway around it?
That is not an "overhead", that't the marker that lets Java figure out what type it needs to create when deserializing the object from that file.
Since ObjectInputStream has no idea what you have serialized into a file, and has no way for you to provide a "hint", ObjectOutputStream must "embed" something for the input stream to be able to decide what class needs to be instantiated. That is why it places the "java.util.BitSet" string in front of the data of your BitSet.
You cannot get around writing this marker when you use serialization capabilities built into BitSet class. If you are serializing the object into a file by itself, with no other objects going in with it, you could write the result of toByteArray() call into a file, and call BitSet.valueOf(byteArray) after reading byteArray from the file.

Reading and writing objects via GZIP streams?

I am new to Java. I want to learn to use GZIPstreams. I already have tried this:
ArrayList<SubImage>myObject = new ArrayList<SubImage>(); // SubImage is a Serializable class
ObjectOutputStream compressedOutput = new ObjectOutputStream(
new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(
new File("....")))));
compressedOutput.writeObject(myObject);
and
ObjectInputStream compressedInput = new ObjectInputStream(
new BufferedInputStream(new GZIPInputStream(new FileInputStream(
new File("....")))));
myObject=(ArrayList<SubImage>)compressedInput.readObject();
When the program writes myObject to a file without throwing any exception, but when it reaches the line
myObject=(ArrayList<SubImage>)compressedInput.readObject();
it throws this exception:
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
How can I solve this problem?
You have to flush and close your outputstream. Otherwhise, at least, the BufferedOutputStream will not write everything to the file (it does in big chucks to avoid penalizing performance).
If you call compressedOutput.flush() and compressedOutput.close() it will suffice.
You can try writing a simple string object and checking if the file is well written.
How? If you write a xxx.txt.gz file you can open it with your preferred zip app and look at the xxx.txt. If the app complains, then the content is not full written.
Extended answer to a comment: compressing even more the data
Changing serialization
You could change the standard serialization of SubImage object if it's an object of your own. Check java.io.Serializable javadoc to know how to do it. It's pretty straightforward.
Writing just what you need
Serialization has the drawback that needs to write "it's a SubImage" just before every instance you write. It's not necessary if you know what's going to be there beforehand. So you could try to serialize it more manually.
To write your list, instead of writing an object write directly the values that conform your list. You will need just a DataOutputStream (but ObjectOutputStream is a DOS so you can use it anyway).
dos.writeInt(yourList.size()); // tell how many items
for (SubImage si: yourList) {
// write every field, in order (this should be a method called writeSubImage :)
dos.writeInt(...);
dos.writeInt(...);
...
}
// to read the thing just:
int size = dis.readInt();
for (int i=0; i<size; i++) {
// read every field, in the same order (this should be a method called readSubImage :)
dis.readInt(...);
dis.readInt(...);
...
// create the subimage
// add it to the list you are recreating
}
This method is more manual but if:
you know what's going to be written
you will not need this kind of serialization for many types
it's pretty affordable and definitively more compressed than the Serializable counterpart.
Have in mind that there are alternative frameworks to serialize objects or create string messages (XStream for xml, Google Protocol Buffers for binary messages, and so on). That frameworks could work directly to binary or writing a string that could be then written.
If your app will need more on this, or just curious, maybe you should look at them.
Alternative serialization frameworks
Just looked in SO and found several questions (and answers) addressing this issue:
https://stackoverflow.com/search?q=alternative+serialization+frameworks+java
I've found that XStream is pretty easy and straightforward to use. And JSON is a format pretty readable and succint (and Javascript compatible which could be a plus :).
I should go for:
Object -> JSON -> OutputStreamWriter(UTF-8) -> GZippedOutputStream -> FileOutputStream

writing many java objects to a single file

how can I write many serializable objects to a single file and then read a few of the objects as and when needed?
You'd have to implement the indexing aspect yourself, but otherwise this could be done. When you serialize an object you essentially get back an OutputStream, which you can point to wherever you want. Storing multiple objects into a file this way would be straightforward.
The tough part comes when you want to read "a few" objects back. How are you going to know how to seek to the position in the file that contains the specific object you want? If you're always reading objects back in the same order you wrote them, from the start of the file onwards, this will not be a problem. But if you want to have random access to objects in the "middle" of the stream, you're going to have to come up with some way to determine the byte offset of the specific object you're interested in.
(This method would have nothing to do with synchronization or even Java per se; you've got to design a scheme that will fit with your requirements and environment.)
The writing part is easy. You just have to remember that you have to write all objects 'at once'. You can't create a file with serialized objects, close it and open it again to append more objects. If you try it, you'll get error messages on reading.
For deserializing, I think you have to process the complete file and keep the objects you're interested in. The others will be created but collected by the gc on the next occasion.
Make Object[] for storing your objects. It worked for me.
I'd use a Flat File Database (e. g. Berkeley DB Java Edition). Just write your nodes as rows in a table like:
Node
----
id
value
parent_id
To read more Objects from file:
public class ReadObjectFromFile {
public static Object[] readObject() throws IOException {
Object[] list = null;
try {
byte[] bytes = Files.readAllBytes(Paths.get("src/objectFile.txt"));
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes));
list = (Object[]) ois.readObject();
ois.close();
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
}
return list;
}
}

Categories

Resources