Java Pattern for generating and parsing data in stream - java

I have certain protocol that I need to parse or generate. The parsing takes an InputStream and produce different type of objects depending the byte stream. The generator takes different inputs and spit out an OutputStream that allows writing to a target stream. Before reading / writing to the stream there might be some header variables that needs to be initialized.
For code right now looks something like follows:
// Parser.
DataX parsed = DataX.parse(new ByteInputStream(new byte [] {..}));
// Access short field of DataX.
System.out.println(parsed.getX() + parsed.getY()); // data in the header.
// Access some long field by spitting InputStream.
System.out.println(parsed.buildInputStream().readFully()); // data as bytes.
// Generator.
OutputStream outstream =
DataX.Generator(new FileOutputStream('output')).setX(x).setY(y).build();
// Write data.
outstream.write(new byte[] {...});
DataX extends a class Data that implements two methods deserialize and serialize as abstract method which will eventually be called somewhere inside parse() and Generator().
This is a self-made design pattern, so I would like to ask if this makes sense and whether there is a more Java-ist way to do this kind of thing ?
Edit: The reason the stream needs to be incorporate is because the data might be huge (such as a file) and will not be feasible/desirable to store it entirely in the memory.

In general it is a good idea to keep data (header values) and its presentation (streams) separate.
Some component accepts streams (Factory method pattern)
and returns plain objects.
Those objects are serialized to streams via a different component later on.
It shouldn't matter if it is a stream at the moment. If later you want to work with Json objects - the design doesn't need to change dramatically.

I think a synmetrical pattern is easy to understand.
// Parser
DataX header = new DataX(); // uninitialized header
InputStream is = header.input(new FileInputStream(...));
// At this point header is initialized.
// user reads data from is.
// Generator
DataX header = new DataX(); // uninitialized header
header.setX(x).setY(y); // initialize header
OutputStream os = header.output(new FileOutputStream(...));
// At this point header is written to os.
// user writes data to os.

Related

Byte array to File object without saving to disk

I have a method that takes in a byte[] that came from Files.readAllBytes() in a different part of the code for either .txt or .docx files. I want to create a new File object from the bytes to later read contents from, without saving that file to disk. Is this possible? Or is there a better way to get the contents from the File bytes?
That's not how it works. a java.io.File object is a light wrapper: Check out the source code - it's got a String field that contains the path and that is all it has aside from some bookkeeping stuff.
It is not possible to represent arbitrary data with a java.io.File object. j.i.File objects represent literal files on disk and are not capable of representing anything else.
Files.readAllBytes gets you the contents from the bytes, that's.. why the method has that name.
The usual solution is that a method in some library that takes a File is overloaded; there will also be a method that takes a byte[], or, if that isn't around, a method that takes an InputStream (you can make an IS from a byte[] easily: new ByteArrayInputStream(byteArr) will do the job).
If the API you are using doesn't contain any such methods, it's a bad API and you should either find something else, or grit your teeth and accept that you're using a bad API, with all the workarounds that this implies, including having to save bytes to disk just to satisfy the asinine API.
But look first; I bet there is a byte[] and/or InputStream variant (or possibly URL or ByteBuffer or ByteStream or a few other more exotic variants).

Skipping bytes with the Jackson JsonParser

I currently have a FileInputStream that I know contains interleaved objects (Metadata.class and BigInfo.class) in json format, ordered like:
[Metadata1, BigInfo1, Metadata2, BigInfo2, Metadata3, BigInfo3, ...]
I'm using Jackson's JsonParser to read these like parser.readValueAs(Metadata.class) and parser.readValueAs(BigInfo.class).
One thing I'd like to take advantage of is that the Metadata objects contain the length of the following serialized BigInfo objects, as well as whether I need to read it or not. So I want to be able to skip the appropriate number of bytes corresponding to a BigInfo object, if I don't need to read it:
Metadata metadata = parser.readValueAs(Metadata.class);
// Whether I need to read the BigInfo object that comes after
boolean mustRead = metadata.isMustReadBigInfo();
if (!mustRead) {
// Size of the bigInfo object that comes after
int bigInfoSize = metadata.getBigInfoSize();
parser.skip(bigInfoSize); // This 'skip' method is needed
}
I can achieve "skipping" by using parser.skipChildren(), but this will read (and discard) all bytes of the inputStream sequentially, and will be comparatively much slower than the underlying FileInputStream's 'skip' method, which makes use of a random access 'seek' into a position in the file.
I've tried calling 'skip(bigInfoSize)' on the parser's underlying inputStream. However, this doesn't work since JsonParser reads and stores information from the inputStream in an internal buffer, so the inputStream's position is further along than where the parser is at.
Any ideas on how to approach this would be greatly appreciated.
Thanks!
So after looking around for quite a bit, I don't think there's a clean way to do this with the jsonParser.
I ended up implementing a reader for a general InputStream, that looked for '{' and '}' (of course taking in to account nested objects), and parsed out the underlying object through ObjectMapper from the retrieved byte array.
You might be able to do something like this
RandomAccessFile file = new RandomAccessFile(filename, "r");
InputStream inputStream = Channels.newInputStream(file.getChannel());
file.seek(byteLocationToSkipTo); //This allows you to set file pointer to this location
JsonParser parser = new JsonFactory().createParser(inputStream);
Map<String, Object> map = parser.readValueAs(Metadata.class);

How do I choose which of the main Java IO classes I need to write strings to a file?

There are many different classes in the IO package, some seems to only have suble differences.
If you are to write some Strings to a text file, and read them for later use.
Which classes would you use for each of those two purposes and why?
BufferedInputStream
BufferedOutputStream
BufferedWriter
BufferedReader
DataInputStream
DataOutputStream
FileInputStream
FileOutputStream
FileReader
FileWriter
InputStreamReader
OutputStreamReader
Reader
Writer
It all depends on what you want to do.
Output Streams
In case you want to write byte data, you would use some subclass of OutputStream that is used to write bytes in destination defined by concrete subclass.
For example FileOutputStream is used to write data in file byte by byte (or bunch of bytes).
BufferedOutputStream is an optimized extension of FileOutputStream to write blocks of bytes to minimize number of disk operations. So when you call write, actually it may or may not be written in file - depends on buffer's state whether it is full or not. If buffer reaches it's max capacity, all bytes are flushed to file at once.
ByteArrayOutputStream is using just block of memory as a destination instead of file. Current stream content is obtained via toByteArray() method (why would you need such a stream will be explained later).
ObjectOutputStream is used to write objects (class instances) in some destination defined by the underlying OutputStream. For example, ObjectOutputStream(FileOutputStream) would write an objects in file, while ObjectOutputStream(ByteArrayOutputStream) would write an objects in memory. The last option allows you to serialize objects in byte buffer (array of bytes) which can be then send somewhere via network.
Note, that any object you want to write somewhere via ObjectOutputStream has to implement Serializable interface. Since object may contain references to another objects, all the objects that are accessible from the object to be serialized also has to implement Serializable interface as serialization, by default, is process of writting of full graph of objects - assuming you are not using transient keyword to exclude class field from serialization, or you are not defining at object's class a special methods writeObject/readObject that overrides default serialization behaviour. These methods are designed to implement custom serialization i. e. you can define on your own how to write/read class field in/from ObjectOutputStream/ObjectInputStream. Suppose class A for which you are implementing custom serialization contains an object of class B. Instead of writting object b as a full graph, i. e. instead of calling oos.writeObject(b), where oos is an argument of the writeObject method being implemented in class A, you would write in stream only some fields of class B e. g. oos.writeBoolean(b.isEnabled), oos.writeInt(b.value). As long as object b is not written via oos.writeObject(b), it does not have to implement Serializable interface. For more details see Serializable documentation.
DataOutputStream is used to write primitives like boolean, char, int, long, float, double. Since any object can be decomposed to primitives e. g. class A { int i = 0; B b = new B();}, class B {double d = 0.0;} can be written simply as int a, double d, you can use DataOutputStream to serialize objects in compressed way unlike serialization which writes much more data e. g. class path, class version, ... so ClassLoader can identify a given class in runtime, instantiate this class and finally load data to this empty object. Note that instantiating class by it's name + initialising object in this way is much slower than instantiating class on your own and initializing it's fields on your own from "raw" DataOutputStream (that's why OS Android prefers custom Parcelable interface over standard serialization, which works as writting/reading to/from DataOutputStream/DataInputStream).
In case you want to write string data, you would use OutputStreamWriter a higher level of output-stream which writes characters/Strings in destination defined by OutputStream (in case of writting in file, you would pass FileOutputStream, in case of writtin in memory you would pass ByteArrayOutputStream).
FileWriter is an extension of OutputStreamWriter designed for writting in file.
BufferedWriter works the same as BufferedOutputStream except it is designed for work with strings and characters.
CharArrayWriter works the same as ByteArrayOutputStream except characters are stored in char[] which is obtained by calling getCharArray().
StringWriter is similar to CharArrayWriter except it allows you to write strings which are stored in StringBuffer that is obtained by calling getBuffer().
PrintWriter allows you to write formatted strings in specified destination defined by OutputStream (this writer is designed for easy logging).
Input Streams
The same concept is applied to input-streams (readers) associated with given output-streams (writers) mentioned above.
Readers and InputStreams are for input while Writers and OutputStreams are for output.
Readers and Writers are for text (with a character set) while InputStreams and OutputStreams are for binary data (such as images).
Buffering helps with performance because reads and writes are batched rather than making a system call each time any method is used.
If it has "File" in the name, it is used for files.
So for your specific case you want to write text to a file, the most obvious candidate is a FileWriter:
Writer out = new FileWriter(
new File("foo.txt")
);
If you want better performance for making many small writes to it, you could wrap it in a buffer:
Writer out = new BufferedWriter(
new FileWriter(
new File("foo.txt")
)
);
The FileWriter assumes the default character encoding when writing text. If you want to choose the encoding, or ensure that it is consistent on all platforms, you could create the writer from an OutputStream specifying the encoding explicitly:
Writer out = new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream(
new File("foo.txt")
), "UTF-8"
)
);

Reading and writing objects via GZIP streams?

I am new to Java. I want to learn to use GZIPstreams. I already have tried this:
ArrayList<SubImage>myObject = new ArrayList<SubImage>(); // SubImage is a Serializable class
ObjectOutputStream compressedOutput = new ObjectOutputStream(
new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(
new File("....")))));
compressedOutput.writeObject(myObject);
and
ObjectInputStream compressedInput = new ObjectInputStream(
new BufferedInputStream(new GZIPInputStream(new FileInputStream(
new File("....")))));
myObject=(ArrayList<SubImage>)compressedInput.readObject();
When the program writes myObject to a file without throwing any exception, but when it reaches the line
myObject=(ArrayList<SubImage>)compressedInput.readObject();
it throws this exception:
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
How can I solve this problem?
You have to flush and close your outputstream. Otherwhise, at least, the BufferedOutputStream will not write everything to the file (it does in big chucks to avoid penalizing performance).
If you call compressedOutput.flush() and compressedOutput.close() it will suffice.
You can try writing a simple string object and checking if the file is well written.
How? If you write a xxx.txt.gz file you can open it with your preferred zip app and look at the xxx.txt. If the app complains, then the content is not full written.
Extended answer to a comment: compressing even more the data
Changing serialization
You could change the standard serialization of SubImage object if it's an object of your own. Check java.io.Serializable javadoc to know how to do it. It's pretty straightforward.
Writing just what you need
Serialization has the drawback that needs to write "it's a SubImage" just before every instance you write. It's not necessary if you know what's going to be there beforehand. So you could try to serialize it more manually.
To write your list, instead of writing an object write directly the values that conform your list. You will need just a DataOutputStream (but ObjectOutputStream is a DOS so you can use it anyway).
dos.writeInt(yourList.size()); // tell how many items
for (SubImage si: yourList) {
// write every field, in order (this should be a method called writeSubImage :)
dos.writeInt(...);
dos.writeInt(...);
...
}
// to read the thing just:
int size = dis.readInt();
for (int i=0; i<size; i++) {
// read every field, in the same order (this should be a method called readSubImage :)
dis.readInt(...);
dis.readInt(...);
...
// create the subimage
// add it to the list you are recreating
}
This method is more manual but if:
you know what's going to be written
you will not need this kind of serialization for many types
it's pretty affordable and definitively more compressed than the Serializable counterpart.
Have in mind that there are alternative frameworks to serialize objects or create string messages (XStream for xml, Google Protocol Buffers for binary messages, and so on). That frameworks could work directly to binary or writing a string that could be then written.
If your app will need more on this, or just curious, maybe you should look at them.
Alternative serialization frameworks
Just looked in SO and found several questions (and answers) addressing this issue:
https://stackoverflow.com/search?q=alternative+serialization+frameworks+java
I've found that XStream is pretty easy and straightforward to use. And JSON is a format pretty readable and succint (and Javascript compatible which could be a plus :).
I should go for:
Object -> JSON -> OutputStreamWriter(UTF-8) -> GZippedOutputStream -> FileOutputStream

Lazy access to InputStream by offering InputStream getter

An API that I'm implementing deals with InputStreams containing hierarchically structured data i.e. nested blocks, including a number of images in the leaf blocks. (If you must know, it is CBEFF data that I'm parsing.) Each block of data is prefixed with a header containing some meta-data about that block.
1st level 1 header
1st level 2 header
1st level 2 data block
2nd level 2 header
2nd level 2 data block
2nd level 1 header
3rd level 2 header
3rd level 2 data block
The original InputStream is an argument to the constructor of my API classes and is passed around down the hierarchy.
Currently I'm reading the images into byte arrays in the constructor of my API classes, so each constructor blocks while reading the complete data that that class is responsible for and later on when clients call the relevant getter method of that API class they will get the image data served from memory. I'd much rather offer the contained images in the form of some kind of lazy InputStreams to clients of my API, so that the image bytes are only read from the original InputStream as a result of clients reading bytes of the resulting InputStream delivered by the getter. This makes it possible, for example, to do progressive rendering, which is useful as the original InputStream is slow.
Is there an elegant way to solve this with InputStreams?
InputStream isn't suitable for random access. So reading parts of it isn't going to work in most cases, even though you can achieve similar effects on some input streams using a combination of reset and skip. But not all streams support reset, and skipping bytes is often as expensive as reading bytes.
So I suggest you try some alternate approach. Either you buffer the whole stream to some random-access buffer, like a temporary file, which still means reading all the bytes off the stream in the first place. Or you find a way to get random access to the original source. You didn't specify what kind of source you're dealing with, but e.g. for a HTTP connection you can download parts using a range request. Similar solutions might work for other sources.
No matter how you implement the random access (and seeing your comment, you'll likely do so using an InputStream with reset and skip), you can create your own class to represent a part of that stream. You can let that class itself be an instance of InputStream by subclassing FilterInputStream.
cLass SubStream extends FilterInputStream {
private long offset;
public SubStream(long offset, InputStream parent) {
super(parent);
this.offset = offset;
}
public SubStream(InputStream parent) {
this(0, parent);
}
#Override public void reset() throws IOException {
in.reset();
in.skip(offset);
}
public SubStream subStream(long offset) {
return new FilterInputStream(this.offset + offset, in);
}
public Object syncObject() {
return in;
}
}
You'd have to ensure that any operation using one of these streams calls reset first. If you need to enforce a proper end-of-stream treatment, you'll have to override most read implementations. If concurrent access might be possible, then you'll want to synchronize operations on the underlying stream. So the code using this class could look something like this:
synchronized(part.syncObject()) {
part.reset();
return read(part);
}

Categories

Resources