Lazy access to InputStream by offering InputStream getter - java

An API that I'm implementing deals with InputStreams containing hierarchically structured data i.e. nested blocks, including a number of images in the leaf blocks. (If you must know, it is CBEFF data that I'm parsing.) Each block of data is prefixed with a header containing some meta-data about that block.
1st level 1 header
1st level 2 header
1st level 2 data block
2nd level 2 header
2nd level 2 data block
2nd level 1 header
3rd level 2 header
3rd level 2 data block
The original InputStream is an argument to the constructor of my API classes and is passed around down the hierarchy.
Currently I'm reading the images into byte arrays in the constructor of my API classes, so each constructor blocks while reading the complete data that that class is responsible for and later on when clients call the relevant getter method of that API class they will get the image data served from memory. I'd much rather offer the contained images in the form of some kind of lazy InputStreams to clients of my API, so that the image bytes are only read from the original InputStream as a result of clients reading bytes of the resulting InputStream delivered by the getter. This makes it possible, for example, to do progressive rendering, which is useful as the original InputStream is slow.
Is there an elegant way to solve this with InputStreams?

InputStream isn't suitable for random access. So reading parts of it isn't going to work in most cases, even though you can achieve similar effects on some input streams using a combination of reset and skip. But not all streams support reset, and skipping bytes is often as expensive as reading bytes.
So I suggest you try some alternate approach. Either you buffer the whole stream to some random-access buffer, like a temporary file, which still means reading all the bytes off the stream in the first place. Or you find a way to get random access to the original source. You didn't specify what kind of source you're dealing with, but e.g. for a HTTP connection you can download parts using a range request. Similar solutions might work for other sources.
No matter how you implement the random access (and seeing your comment, you'll likely do so using an InputStream with reset and skip), you can create your own class to represent a part of that stream. You can let that class itself be an instance of InputStream by subclassing FilterInputStream.
cLass SubStream extends FilterInputStream {
private long offset;
public SubStream(long offset, InputStream parent) {
super(parent);
this.offset = offset;
}
public SubStream(InputStream parent) {
this(0, parent);
}
#Override public void reset() throws IOException {
in.reset();
in.skip(offset);
}
public SubStream subStream(long offset) {
return new FilterInputStream(this.offset + offset, in);
}
public Object syncObject() {
return in;
}
}
You'd have to ensure that any operation using one of these streams calls reset first. If you need to enforce a proper end-of-stream treatment, you'll have to override most read implementations. If concurrent access might be possible, then you'll want to synchronize operations on the underlying stream. So the code using this class could look something like this:
synchronized(part.syncObject()) {
part.reset();
return read(part);
}

Related

Java Interface for saving values in Sets or Lists and other Objects

I want to code a program, where the user can decide in what object to store values. The user can use Sets, Lists and Files(e.g. .txt, .xml). I want to write an interface, where in the end it doesn't matter which ("storing-")object the user chooses, so that I don't have to write the same methods for every decision.
How should I make an interface for that? Is the approach with the interface even suitable enough and what else do I need to do/consider?
import java.io.File;
public class StoreValues implements SaveInGeneral<SomeObject>{
//user's decision (LinkedList, Set or File)
if(decision == 1){
SaveInGeneral<SomeObject> obj = new LinkedList<>();
}
if(decision == 2){
SaveInGeneral<SomeObject> obj = new File();
}
//...
obj.add(someObject);
}
SaveInGeneral doesn't fit common naming strategies, which involve trying to name things with nouns. I'd call it Storage, for example.
The generics doesn't seem useful here - the whole point is to abstract away what the underlying storage mechanism is. So get rid of that.
Then, just define what, exactly, 'save an object' means. For example, a List can store items (.add(newItem)), but you can retrieve items by index (.get(5)), create an iterator (with .iterator()) so that you can for (String v : list) through it, and ask it its size (.size()), etc.
What kind of primitives are you looking for?
Presumably if all this does is store objects and nothing else, the one and only method you're looking for is .store(Object o).
The problem is, the task: "Store an arbitrary object on disk" just does not work. Most objects cannot be stored to disk at all. I strongly suggest you then limit the .store() method to only allow things you know how to store. You could go with Serializable, but that is a giant can of worms (serializable is extremely convoluted), or you need to involve third party libraries such as Jackson that attempt to marshall objects into e.g. JSON or XML.
You then need to think about the needs of your various targeted platforms (files, databases, lists, sets, etc), and cross that off vs. the needs of your code that needs to store things. Find the subset such that it consists solely of things which are feasible to implement in all targeted storage mechanisms, and which is sufficient for your code that needs a storage backend.
This can get complicated fast. For example, when reading out JSON produced by Jackson, you need to provide which class you want to read the JSON into, which is not a thing lists need (they know which object kind they stored already). Files, in turn, don't like it if you keep writing a tiny blob of data, then close the file, then open it again - the overhead means that this:
loop 1000 times and do: Open file, write ~50 bytes of data, close file.
is literally about 1000 times slower vs:
Open file, loop 1000 times and do: write 50 bytes of data. Then close file.
In other words, you'd have to update your API to involve an opening and a closing step, or accept that the file based storage backend is incredibly (1000x) slow.
Here is the most simple take - let's store only Strings because that's easy to send to anything from DBs to files to lists to network sockets, and let's just accept an inefficient algorithm for now:
public interface Storage {
public void store(String data) throws IOException;
}
some implementations:
public class ListBasedStorage implements Storage {
private final List<String> list = new ArrayList<String>();
public List<String> getBackingList() {
return list;
}
public void store(String data) {
list.add(data);
}
}
public class FileBasedStorage implements Storage {
private final Path target;
private static final Charset CHARSET = StandardCharsets.UTF_8;
public FileBasedStorage(Path p) {
this.target = target;
}
public void store(String data) throws IOException {
String line = data.replaceAll("\\R", " ") + "\n";
Files.write(target, line, CHARSET, StandardOpenOption.APPEND);
}
}
and to use this:
public static void main(String[] args) throws Exception {
Storage storage = new FileBasedStorage(Paths.get("mydata.txt"));
new MyApp().sayHello(storage);
}
public void sayHello(Storage storage) throws IOException {
storage.store("Hello!");
storage.store("World");
}
You can then start complicating matters by adding more data types or using e.g. JSON and a JSON marshaller like jackson to turn this data into stuff you can put in a file or db, adding retrieval code (where storage can also be asked how many entries are included, and e.g. asking for an iterator to go through the data, etcetera), and adding a 2-layered approach where you ask storage for a 'session', which must be safely closed using try (Session session = storage.start()) {}), in order to have fast file and DB writes (both files and DBs are transactional, in the sense that they work far better if you explicitly start, do stuff, and then save all you just did).

What is the use case for null(Input/Output)Stream API in Java?

With Java 11, I could initialize an InputStream as:
InputStream inputStream = InputStream.nullInputStream();
But I am unable to understand a potential use case of InputStream.nullInputStream or a similar API for OutputStream
i.e. OutputStream.nullOutputStream.
From the API Javadocs, I could figure out that it
Returns a new InputStream that reads no bytes. The returned stream is
initially open. The stream is closed by calling the close() method.
Subsequent calls to close() have no effect. While the stream is open,
the available(), read(), read(byte[]), ...
skip(long), and transferTo() methods all behave as if end of stream
has been reached.
I went through the detailed release notes further which states:
There are various times where I would like to use methods that require
as a parameter a target OutputStream/Writer for sending output, but
would like to execute those methods silently for their other effects.
This corresponds to the ability in Unix to redirect command output to
/dev/null, or in DOS to append command output to NUL.
Yet I fail to understand what are those methods in the statement as stated as .... execute those methods silently for their other effects. (blame my lack of hands-on with the APIs)
Can someone help me understand what is the usefulness of having such an input or output stream with a help of an example if possible?
Edit: One of a similar implementation I could find on browsing further is apache-commons' NullInputStream, which does justify the testing use case much better.
Sometimes you want to have a parameter of InputStream type, but also to be able to choose not to feed your code with any data. In tests it's probably easier to mock it but in production you may choose to bind null input instead of scattering your code with ifs and flags.
compare:
class ComposableReprinter {
void reprint(InputStream is) throws IOException {
System.out.println(is.read());
}
void bla() {
reprint(InputStream.nullInputStream());
}
}
with this:
class ControllableReprinter {
void reprint(InputStream is, boolean for_real) throws IOException {
if (for_real) {
System.out.println(is.read());
}
}
void bla() {
reprint(new BufferedInputStream(), false);
}
}
or this:
class NullableReprinter {
void reprint(InputStream is) throws IOException {
if (is != null) {
System.out.println(is.read());
}
}
void bla() {
reprint(null);
}
}
It makes more sense with output IMHO. Input is probably more for consistency.
This approach is called Null Object: https://en.wikipedia.org/wiki/Null_object_pattern
I see it as a safer (1) and more expressive (2) alternative to initialising a stream variable with null.
No worries about NPEs.
[Output|Input]Stream is an abstraction. In order to return a null/empty/mock stream, you had to deviate from the core concept down to a specific implementation.
I think nullOutputStream is very easy and clear: just to discard output (similar to > /dev/null) and/or for testing (no need to invent an OutputStream).
An (obviously basic) example:
OutputStream out = ... // an easy way to either print it to System.out or just discard all prints, setting it basically to the nullOutputStream
out.println("yeah... or not");
exporter.exportTo(out); // discard or real export?
Regarding nullInputStream it's probably more for testing (I don't like mocks) and APIs requiring an input stream or (this now being more probable) delivering an input stream which does not contain any data, or you can't deliver and where null is not a viable option:
importer.importDocument("name", /* input stream... */);
InputStream inputStream = content.getInputStream(); // better having no data to read, then getting a null
When you test that importer, you can just use a nullInputStream there, again instead of inventing your own InputStream or instead of using a mock. Other use cases here rather look like a workaround or misuse of the API ;-)
Regarding the return of an InputStream: that rather makes sense. If you haven't any data you may want to return that nullInputStream instead of null so that callers do not have to deal with null and can just read as they would if there was data.
Finally, these are just convenience methods to make our lifes easier without adding another dependency ;-) and as others already stated (comments/answers), it's basically an implementation of the null object pattern.
Using the null*Stream might also have the benefit that tests are executed faster... if you stream real data (of course... depending on size, etc.) you may just slow down your tests unnecessarily and we all want tests to complete fast, right? (some will put in mocks here... well...)

Java Pattern for generating and parsing data in stream

I have certain protocol that I need to parse or generate. The parsing takes an InputStream and produce different type of objects depending the byte stream. The generator takes different inputs and spit out an OutputStream that allows writing to a target stream. Before reading / writing to the stream there might be some header variables that needs to be initialized.
For code right now looks something like follows:
// Parser.
DataX parsed = DataX.parse(new ByteInputStream(new byte [] {..}));
// Access short field of DataX.
System.out.println(parsed.getX() + parsed.getY()); // data in the header.
// Access some long field by spitting InputStream.
System.out.println(parsed.buildInputStream().readFully()); // data as bytes.
// Generator.
OutputStream outstream =
DataX.Generator(new FileOutputStream('output')).setX(x).setY(y).build();
// Write data.
outstream.write(new byte[] {...});
DataX extends a class Data that implements two methods deserialize and serialize as abstract method which will eventually be called somewhere inside parse() and Generator().
This is a self-made design pattern, so I would like to ask if this makes sense and whether there is a more Java-ist way to do this kind of thing ?
Edit: The reason the stream needs to be incorporate is because the data might be huge (such as a file) and will not be feasible/desirable to store it entirely in the memory.
In general it is a good idea to keep data (header values) and its presentation (streams) separate.
Some component accepts streams (Factory method pattern)
and returns plain objects.
Those objects are serialized to streams via a different component later on.
It shouldn't matter if it is a stream at the moment. If later you want to work with Json objects - the design doesn't need to change dramatically.
I think a synmetrical pattern is easy to understand.
// Parser
DataX header = new DataX(); // uninitialized header
InputStream is = header.input(new FileInputStream(...));
// At this point header is initialized.
// user reads data from is.
// Generator
DataX header = new DataX(); // uninitialized header
header.setX(x).setY(y); // initialize header
OutputStream os = header.output(new FileOutputStream(...));
// At this point header is written to os.
// user writes data to os.

Is put-ing to a ByteBuffer then writing it to a file more efficient than writing the individual field

I want to write ONLY the values of the data members of an object into a file, so here I can can't use serialization since it writes a whole lot other information which i don't need. Here's is what I have implemented in two ways. One using byte buffer and other without using it.
Without using ByteBuffer:
1st method
public class DemoSecond {
byte characterData;
byte shortData;
byte[] integerData;
byte[] stringData;
public DemoSecond(byte characterData, byte shortData, byte[] integerData,
byte[] stringData) {
super();
this.characterData = characterData;
this.shortData = shortData;
this.integerData = integerData;
this.stringData = stringData;
}
public static void main(String[] args) {
DemoSecond dClass= new DemoSecond((byte)'c', (byte)0x7, new byte[]{3,4},
new byte[]{(byte)'p',(byte)'e',(byte)'n'});
File checking= new File("c:/objectByteArray.dat");
try {
if (!checking.exists()) {
checking.createNewFile();
}
// POINT A
FileOutputStream bo = new FileOutputStream(checking);
bo.write(dClass.characterData);
bo.write(dClass.shortData);
bo.write(dClass.integerData);
bo.write(dClass.stringData);
// POINT B
bo.close();
} catch (FileNotFoundException e) {
System.out.println("FNF");
e.printStackTrace();
} catch (IOException e) {
System.out.println("IOE");
e.printStackTrace();
}
}
}
Using byte buffer: One more thing is that the size of the data members will always remain fixed i.e. characterData= 1byte, shortData= 1byte, integerData= 2byte and stringData= 3byte. So the total size of this class is 7byte ALWAYS
2nd method
// POINT A
FileOutputStream bo = new FileOutputStream(checking);
ByteBuffer buff= ByteBuffer.allocate(7);
buff.put(dClass.characterData);
buff.put(dClass.shortData);
buff.put(dClass.integerData);
buff.put(dClass.stringData);
bo.write(buff.array());
// POINT B
I want know which one of the two methods is more optimized? And kindly give the reason also.
The above class DemoSecond is just a sample class.
My original classes will be of size 5 to 50 bytes. I don't think here size might be the issue.
But each of my classes is of fixed size like the DemoSecond
Also there are so many files of this type which I am going to write in the binary file.
PS
if I use serialization it also writes the word "characterData", "shortData", "integerData","stringData" also and other information which I don't want to write in the file. What I am corcern here is about THEIR VALUES ONLY. In case of this example its:'c', 7, 3,4'p','e','n'. I want to write only this 7bytes into the file, NOT the other informations which is USELESS to me.
As you are doing file I/O, you should bear in mind that the I/O operations are likely to be very much slower than any work done by the CPU in your output code. To a first approximation, the cost of I/O is an amount proportional to the amount of data you are writing, plus a fixed cost for each operating system call made to do the I/O.
So in your case you want to minimise the number of operating system calls to do the writing. This is done by buffering data in the application, so the application performs few put larger operating system calls.
Using a byte buffer, as you have done, is one way of doing this, so your ByteBuffer code will be more efficient than your FileOutputStream code.
But there are other considerations. Your example is not performing many writes. So it is likely to be very fast anyway. Any optimisation is likely to be a premature optimisation. Optimisations tend to make code more complicated and harder to understand. To understand your ByteBuffer code a reader needs to understand how a ByteBuffer works in addition to everything they need to understand for the FileOutputStream code. And if you ever change the file format, you are more likely to introduce a bug with the ByteBuffer code (for example, by having a too small a buffer).
Buffering of output is commonly done. So it should not surprise you that Java already provides code to help you. That code will have been written by experts, tested and debugged. Unless you have special requirements you should always use such code rather than writing your own. The code I am referring to is the BufferedOutputStream class.
To use it simply adapt your code that does not use the ByteBuffer, by changing the line of your code that opens the file to
OutputStream bo = new BufferedOutputStream(new FileOutputStream(checking));
The two methods differ only in the byte buffer allocated.
If you are concerning about unnecessary write action to file, there is already a BufferedOutputStream you can use, for which buffer is allocated internally, and if you are writing to same outputstream multiple times, it is definitely more efficient than allocating buffer every time manually.
It would be simplest to use a DataOutputStream around a BufferedOutputStream around the FileOutputStream.
NB You can't squeeze 'shortData' into a byte. Use the various primitives of DataOutputStream, and use the corresponding ones of DataInputStream when reading them back.

Java: FilterInputStream what are the advantages and use compared to other streams

I’ve been reading on InputStream, FileInputStream, ByteArrayInputStream and how their use seems quite clear (output streams too).
What I’m struggling is to understand the use of FilterInputStream & FilterOutputStream:
What is the advantage of using it compared to the other stream classes?
When should I use it?
Please provide a theoretical explanation and a basic example.
FilterInputStream is an example of the the Decorator pattern.
This class must be extended, since its constructor is protected. The derived class would add additional capabilities, but still expose the basic interface of an InputStream.
For example, a BufferedInputStream provides buffering of an underlying input stream to make reading data faster, and a DigestInputStream computes a cryptographic hash of data as it's consumed.
You would use this to add functionality to existing code that depends on the InputStream or OutputStream API. For example, suppose that you use some library that saves data to an OutputStream. The data are growing too large, so you want to add compression. Instead of modifying the data persistence library, you can modify your application so that it "decorates" the stream that it currently creates with a ZipOutputStream. The library will use the stream just as it used the old version that lacked compression.
You use them when you want to decorate the stream of data.
Remember that these stream class instances wrap themselves around another stream instance (whether another subclass of one of these or not) and add some feature, add some processing, make some changes to the data as it passes through.
For example, you might want to remove all the multiple spaces from some stream. You make your own subclass of FilterInputStream and override the read() method. I'm not going to bother all the details but here's some sorta-java for the method in the subclass:
private boolean lastWasBlank = false;
public int read() {
int chr = super.read();
if (chr == ' ') {
if (lastWasBlank) {
return read();
} else {
lastWasBlank = true;
}
} else {
lastWasBlank = false;
}
return chr;
}
In real life, you would probably mess with the other two read() methods too.
Other uses:
Log everything flowing through the stream
Duplicate the 'tee' utility so the stream being read is handled two ways.
Convert line endings between Windows, Mac and Unix/Linux formats
Add delays to simulate slow transmission methods like modems or serial ports or wireless network connections.
FilterInputStream and FilterOutputStream are there to ease the job of developers who wish to implement their own input/output streams. Implementations such as BufferedInputStream may add their own decorations around the basic InputStream API while delegating on the super class - FilteredInputStream in this case - the methods they don't need to override.
Neither FilterInputStream nor FilterOutputStream are designed for end users to use directly.

Categories

Resources