Java ByteBuffer for Zipfile

Java ByteBuffer for Zipfile - java

I have a binary file that contains big endian data. I am using this code to read it in
FileChannel fileInputChannel = new FileInputStream(fileInput).getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect((int)fileInputChannel.size());
while (bb.remaining() > 0)
fileInputChannel.read(bb);
fileInputChannel.close();
bb.flip();
I have to do something identical for zip files. In other words decompress the file from a zip file and order it. I understand I can read it in via ZipInputStream but then I have to provide the coding for the "endianness". With ByteBuffer you can use ByteOrder.
Is there an NIO alternative for zip files ?

If you have your ZipInputStream, just use Channels.newChannel to convert it to a Channel then proceed as you wish. But you should keep in mind that it might be possible that a ZipInputStream can’t predict its uncompressed size so you might have to guess the appropriate buffer size and possibly re-allocate a bigger buffer when needed. And, since the underlying API uses byte arrays, there is no benefit in using direct ByteBuffers in the case of ZipInputStream, i.e. I recommend using ByteBuffer.allocate instead of ByteBuffer.allocateDirect for this use case.
By the way you can replace while(bb.remaining() > 0) with while(bb.hasRemaining()). And since Java 7 you can use FileChannel.open to open a FileChannel without the detour via FileInputStream.

Related

Why to use ByteArrayInputStream rather than byte[] in Java

As I understand ByteArrayInputStream is used to read byte[] data.
Why should I use it rather than simple byte[] (for example reading it from DB).
What is the different between them?

If the input is always a byte[], then you're right, there's often no need for the stream. And if you don't need it, don't use it. One additional advantage of a ByteArrayInputStream is that it serves as a very strong indication that you intend the bytes to be read-only (since the stream doesn't provide an interface for changing them), though it's important to note that a programmer can often still access the bytes directly, so you shouldn't use that in a situation where security is a concern.
But if it's sometimes a byte[], sometimes a file, sometimes a network connection, etc, then you need some sort of abstraction for "a stream of bytes, and I don't care where they come from." That's what an InputStream is. When the source happens to be a byte array, ByteArrayInputStream is a good InputStream to use.
This is helpful in many situations, but to give two concrete examples:
You're writing a library that takes bytes and processes them somehow (maybe it's an image processing library, for instance). Users of your library may supply bytes from a file, or from a byte[] in memory, or from some other source. So, you provide an interface that accepts an InputStream — which means that if what they have is a byte[], they need to wrap it in a ByteArrayInputStream.
You're writing code that reads a network connection. But to unit test that code, you don't want to have to open up a connection; you want to just supply some bytes in the code. So the code takes an InputStream, and your test provides a ByteArrayInputStream.

A ByteArrayInputStream contains an internal buffer that contains bytes that
may be read from the stream. An internal counter keeps track of the next byte to be supplied by the read method.
ByteArrayInputStream is like wrapper which protects underlying array from external modification
It has high order read ,mark ,skip functions
A stream also has the advantage that you don't have to have all bytes in memory at the same time, which is convenient if the size of the data is large and can easily be handled in small chunks.
Reference doc
Where as if you choose byte[] ,then you have to generate wheels to do reading ,skipping and track current index explicitly
byte data[] = { 65, 66, 67, 68, 69 }; // data
for (int index = 0; index < data.length; index++) {
System.out.print((char) data[index] + " ");
}
int c = 0;
ByteArrayInputStream bInput = new ByteArrayInputStream(data);
while ((bInput.read()) != -1) {
System.out.println(Character.toUpperCase((char) c));
}

ByteArrayInputStream is a good wrapper for byte[], the core is understanding stream, a stream is an ordered sequence of bytes of indeterminate length.Input streams move bytes of data into a
java program from some generally external source, in java io, you can decorate one stream to another stream to get more function. but the performance maybe bad. the power of the stream metaphor is that difference between these source and destinations are abstracted way,all input and output operations are simply traded as streams using the same class and the same method,you don not learn a new API for every different kind of device, the same API that read file can read network sockets,serial ports, Bluetooth transmissions, and more.

Reading certain number of bytes into ByteBuffer

I have a binary file of 10MB. I need to read it in chunks of different size (e.g 300, 273 bytes). For reading I use FileChannel and ByteBuffer. Right now for each iteration of reading I allocate new ByteBuffer of size, that I need to read.
Is there possible to allocate only once (lets say 200 KB) for ByteBuffer and read into it (300 , 273 bytes etc. )? I will not read more than 200KB at once. The entire file must be read.
UPD
public void readFile (FileChannel fc, int amountOfBytesToRead)
{
ByteBuffer bb= ByteBuffer.allocate(amountOfBytesToRead);
fc.read(bb);
bb.flip();
// do something with bytes
bb = null;
}
I can not read whole file at once due to memory constraints. That's why I performing reading in chunks. Efficiency is also very important (that is why I don't want to use my current approach with multiple allocations). Thanks

Declare several ByteBuffers of the sizes you need and use scatter-read: read(ByteBuffer[] dsts, ...).
Or forget about NIO and use DataInputStream,readFully(). If you put a BufferedInputStream underneath you won't suffer any performance loss: it may even be faster.

What is the fastest way to load a big 2D int array from a file?

I'm loading a 2D array from file, it's 15,000,000 * 3 ints big (it will be 40,000,000 * 3 eventually). Right now, I use dataInputStream.readInt() to sequentially read the ints. It takes ~15 seconds. Can I make it significantly (at least 3x) faster or is this about as fast as I can get?

Yes, you can. From benchmark of 13 different ways of reading files:
If you have to pick the fastest approach, it would be one of these:
FileChannel with a MappedByteBuffer and array reads.
FileChannel with a direct ByteBuffer and array reads.
FileChannel with a wrapped array ByteBuffer and direct array access.
For the best Java read performance, there are 4 things to remember:
Minimize I/O operations by reading an array at a time, not a byte at
a time. An 8 KB array is a good size (that's why it's a default value for BufferedInputStream).
Minimize method calls by getting data an array at a time, not a byte
at a time. Use array indexing to get at bytes in the array.
Minimize thread synchronization locks if you don't need thread
safety. Either make fewer method calls to a thread-safe class, or use
a non-thread-safe class like FileChannel and MappedByteBuffer.
Minimize data copying between the JVM/OS, internal buffers, and
application arrays. Use FileChannel with memory mapping, or a direct
or wrapped array ByteBuffer.

Map your file into memory!
Java 7 code:
FileChannel channel = FileChannel.open(Paths.get("/path/to/file"),
StandardOpenOption.READ);
ByteBuffer buf = channel.map(0, channel.size(),
FileChannel.MapMode.READ_ONLY);
// use buf
See here for more details.
If you use Java 6, you'll have to:
RandomAccessFile file = new RandomAccessFile("/path/to/file", "r");
FileChannel channel = file.getChannel();
// same thing to obtain buf
You can even use .asIntBuffer() on the buffer if you want. And you can read only what you actually need to read, when you need to read it. And it does not impact your heap.

How to create a ZipFile from a byte array in Java

In Java you have to do
new ZipFile(new File("xxx.zip"));
to unzip a zip file.
Now I get a byte array whose content is a zip file. I get this byte array from database instead of a file. I would like to unzip this "byte array file" but there is no ZipFile constructor for byte array or String (I mean the content instead of the file path).
Is there any solution? (Of course I do not want to write this byte array to an actual file and read it into memory again.)
Thanks!

Use a ByteArrayInputStream inside the ZipInputStream which is created from a byte array
byte[] ba;
InputStream is = new ByteArrayInputStream(ba);
InputStream zis = new ZipInputStream(is);
Use zis to read the contents uncompressed

Implementations of ZipFile requires it to be a file. Memory mapping may be used, for instance. In general it is assumed that the archive may be huge.
As #Perception mentions, ZipInputStream can be used to read sequentially through the stream. Third-party libraries may be available to replace ZipFile.

How to initialize a ByteBuffer if you don't know how many bytes to allocate beforehand?

Is this:
ByteBuffer buf = ByteBuffer.allocate(1000);
...the only way to initialize a ByteBuffer?
What if I have no idea how many bytes I need to allocate..?
Edit: More details:
I'm converting one image file format to a TIFF file. The problem is the starting file format can be any size, but I need to write the data in the TIFF to little endian. So I'm reading the stuff I'm eventually going to print to the TIFF file into the ByteBuffer first so I can put everything in Little Endian, then I'm going to write it to the outfile. I guess since I know how long IFDs are, headers are, and I can probably figure out how many bytes in each image plane, I can just use multiple ByteBuffers during this whole process.

The types of places that you would use a ByteBuffer are generally the types of places that you would otherwise use a byte array (which also has a fixed size). With synchronous I/O you often use byte arrays, with asynchronous I/O, ByteBuffers are used instead.
If you need to read an unknown amount of data using a ByteBuffer, consider using a loop with your buffer and append the data to a ByteArrayOutputStream as you read it. When you are finished, call toByteArray() to get the final byte array.
Any time when you aren't absolutely sure of the size (or maximum size) of a given input, reading in a loop (possibly using a ByteArrayOutputStream, but otherwise just processing the data as a stream, as it is read) is the only way to handle it. Without some sort of loop, any remaining data will of course be lost.
For example:
final byte[] buf = new byte[4096];
int numRead;
// Use try-with-resources to auto-close streams.
try(
final FileInputStream fis = new FileInputStream(...);
final ByteArrayOutputStream baos = new ByteArrayOutputStream()
) {
while ((numRead = fis.read(buf)) > 0) {
baos.write(buf, 0, numRead);
}
final byte[] allBytes = baos.toByteArray();
// Do something with the data.
}
catch( final Exception e ) {
// Do something on failure...
}
If you instead wanted to write Java ints, or other things that aren't raw bytes, you can wrap your ByteArrayOutputStream in a DataOutputStream:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
while (thereAreMoreIntsFromSomewhere()) {
int someInt = getIntFromSomewhere();
dos.writeInt(someInt);
}
byte[] allBytes = baos.toByteArray();

Depends.
Library
Converting file formats tends to be a solved problem for most problem domains. For example:
Batik can transcode between various image formats (including TIFF).
Apache POI can convert between office spreadsheet formats.
Flexmark can generate HTML from Markdown.
The list is long. The first question should be, "What library can accomplish this task?" If performance is a consideration, your time is likely better spent optimising an existing package to meet your needs than writing yet another tool. (As a bonus, other people get to benefit from the centralised work.)
Known Quantities
Reading a file? Allocate file.size() bytes.
Copying a string? Allocate string.length() bytes.
Copying a TCP packet? Allocate 1500 bytes, for example.
Unknown Quantities
When the number of bytes is truly unknown, you can do a few things:
Make a guess.
Analyze example data sets to buffer; use the average length.
Example
Java's StringBuffer, unless otherwise instructed, uses an initial buffer size to hold 16 characters. Once the 16 characters are filled, a new, longer array is allocated, and then the original 16 characters copied. If the StringBuffer had an initial size of 1024 characters, then the reallocation would not happen as early or as often.
Optimization
Either way, this is probably a premature optimization. Typically you would allocate a set number of bytes when you want to reduce the number of internal memory reallocations that get executed.
It is unlikely that this will be the application's bottleneck.

The idea is that it's only a buffer - not the whole of the data. It's a temporary resting spot for data as you read a chunk, process it (possibly writing it somewhere else). So, allocate yourself a big enough "chunk" and it normally won't be a problem.
What problem are you anticipating?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.