Transfer data from ReadableByteChannel to file - java

Goal: Decrypt data from one source and write the decrypted data to a file.
try (FileInputStream fis = new FileInputStream(targetPath.toFile());
ReadableByteChannel channel = newDecryptedByteChannel(path, associatedData))
{
FileChannel fc = fis.getChannel();
long position = 0;
while (position < ???)
{
position += fc.transferFrom(channel, position, CHUNK_SIZE);
}
}
The implementation of newDecryptedByteChannel(Path,byte[]) should not be of interest, it just returns a ReadableByteChannel.
Problem: What is the condition to end the while loop? When is the "end of the byte channel" reached? Is transferFrom the right choice here?
This question might be related (answer is to just set the count to Long.MAX_VALUE). Unfortunately this doesn't help me because the docs say that up to count bytes may be transfered, depending upon the natures and states of the channels.
Another thought was to just check whether the amount of bytes actually transferred is 0 (returned from transferFrom), but this condition may be true if the source channel is non-blocking and has fewer than count bytes immediately available in its input buffer.

It is one of the bizarre features of FileChannel. transferFrom() that it never tells you about end of stream. You have to know the input length independently.
I would just use streams for this: specifically, a CipherInputStream around a BufferedInputStream around a FileInputStream, and a FileOutputStream.
But the code you posted doesn't make any sense anyway. It can't work. You are transferring into the input file, and via a channel that was derived from a FileInputStream, so it is read-only, so transferFrom() will throw an exception.

As commented by #user207421, as you are reading from ReadableByteChannel, the target channel needs to be derived from FileOutputStream rather than FileInputStream. And the condition for ending loop in your code should be the size of file underlying the ReadableByteChannel which is not possible to get from it unless you are able to get FileChannel and find the size through its size method.
The way I could find for transferring is through ByteBuffer as below.
ByteBuffer buf = ByteBuffer.allocate(1024*8);
while(readableByteChannel.read(buf)!=-1)
{
buf.flip();
fc.write(buf); //fc is FileChannel derived from FileOutputStream
buf.compact();
}
buf.flip();
while(buf.hasRemainig())
{
fc.write(buf);
}

Related

Java - Resetting InputStream

I'm dealing with some Java code in which there's an InputStream that I read one time and then I need to read it once again in the same method.
The problem is that I need to reset it's position to the start in order to read it twice.
I've found a hack-ish solution to the problem:
is.mark(Integer.MAX_VALUE);
//Read the InputStream is fully
// { ... }
try
{
is.reset();
}
catch (IOException e)
{
e.printStackTrace();
}
Does this solution lead to some unespected behaviours? Or it will work in it's dumbness?
As written, you have no guarantees, because mark() is not required to report whether it was successful. To get a guarantee, you must first call markSupported(), and it must return true.
Also as written, the specified read limit is very dangerous. If you happen to be using a stream that buffers in-memory, it will potentially allocate a 2GB buffer. On the other hand, if you happen to be using a FileInputStream, you're fine.
A better approach is to use a BufferedInputStream with an explicit buffer.
It depends on the InputStream implementation. You can also think whether it will be better if you use byte[]. The easiest way is to use Apache commons-io:
byte[] bytes = IOUtils.toByteArray(inputSream);
You can't do this reliably; some InputStreams (such as ones connected to terminals or sockets) don't support mark and reset (see markSupported). If you really have to traverse the data twice, you need to read it into your own buffer.
Instead of trying to reset the InputStream load it into a buffer like a StringBuilder or if it's a binary data stream a ByteArrayOutputStream. You can then process the buffer within the method as many times as you want.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int read = 0;
byte[] buff = new byte[1024];
while ((read = inStream.read(buff)) != -1) {
bos.write(buff, 0, read);
}
byte[] streamData = bos.toByteArray();
For me, the easiest solution was to pass the object from which the InputStream could be obtained, and just obtain it again. In my case, it was from a ContentResolver.

How do I use a ChannelBufferOutputStream to check compression size

In a java program I am compressing an InputStream like this:
ChannelBufferOutputStream outputStream = new ChannelBufferOutputStream(ChannelBuffers.dynamicBuffer(BUFFER_SIZE));
GZIPOutputStream compressedOutputStream = new GZIPOutputStream(outputStream);
try {
IOUtils.copy(inputStream, compressedOutputStream);
} finally {
// this should print the byte size after compression
System.out.println(outputStream.writtenBytes());
}
I am testing this code with a json file that is ~31.000 byte uncompressed and ~7.000 byte compressed on disk. Sending a InputStream that is wrapping the uncompressed json file to the code above, outputStream.writtenBytes() returns 10 which would indicate that it compressed down to only 10 byte. That seems wrong, so I wonder where the problem is. ChannelBufferOutputStream javadoc says: Returns the number of written bytes by this stream so far. So it should be working.
Try calling GZIPOutputStream.finish() or flush() methods before counting bytes
If that does not work, you can create a proxy stream, whose mission - to count the number of bytes that have passed through it

How can I use ReadableByteChannel to get file contents and store it in byteBuffer?

Below is the code that I have written. I want to do the simple thing, storing binary file data into byteBuffer.
File file = new File(fileName);
try {
ReadableByteChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = ByteBuffer.allocateDirect(file.length());
// How can use buf.read to get all the contents?
} catch (Exception e){
}
I was wondering
how can I use read to get all data from channel and store it in ByteBuffer
if there is more elegant way to allocate ByteBuffer, other than using File object to get the length of the file
I prefer to use memory mapping.
FileChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = channel.map(MapMode.READ_ONLY,0,channel.size());
If the file is greater than 2 GB, you have to have more than one mapping. On the plus side this takes around 10 ms regardless of size and doesn't use much heap or direct memory regardless of the size of the file.
From the ReadableByteChannel Javadocs
read(ByteBuffer dst)
An attempt is made to read up to r bytes from the channel, where r is the number of bytes remaining in the buffer, that is, dst.remaining(), at the moment this method is invoked.
So ... channel.read(buf);
As for your second question, if you want to read the entire contents of the file into memory at once that seems like a reasonable approach.

Checking if a stream is a zip file

We have a requirement to determine whether an incoming InputStream is a reference to an zip file or zip data. We do not have reference to the underlying source of the stream. We aim to copy the contents of this stream into an OutputStream directed at an alternate location.
I tried reading the stream using ZipInputStream and extracting a ZipEntry. The ZipEntry is null if the stream is a regular file - as expected - however, in checking for a ZipEntry I loose the initial couple of bytes from the stream. Hence, by the time I know that the stream is a regular stream, I have already lost initial data from the stream.
Any thoughts around how to check if the InputStream is an archive without data loss would be helpful.
Thanks.
Assuming your original inputstream is not buffered, I would try wrapping the original stream in a BufferedInputStream, before wrapping that in a ZipInputStream to check. You can use "mark" and "reset" in the BufferedInputStream to return to the initial position in the stream, after your check.
This is how I did it.
Using mark/reset to restore the stream if the GZIPInputStream detects incorrect zip format (throws the ZipException).
/**
* Wraps the input stream with GZIPInputStream if needed.
* #param inputStream
* #return
* #throws IOException
*/
private InputStream wrapIfZip(InputStream inputStream) throws IOException {
if (!inputStream.markSupported()) {
inputStream = new BufferedInputStream(inputStream);
}
inputStream.mark(1000);
try {
return new GZIPInputStream(inputStream);
} catch (ZipException e) {
inputStream.reset();
return inputStream;
}
}
You can check first bytes of stream for ZIP local header signature (PK 0x03 0x04), that would be enough for most cases. If you need more precision, you should take last ~100 bytes and check for central directory locator fields.
You have described a java.io.PushbackInputStream - in addition to read(), it has an unread(byte[]) which allows you push them bck to the front of the stream, and to re-read() them again.
It's in java.io since JDK1.0 (though I admit I haven't seen a use for it until today).
It sounds a bit like a hack, but you could implement a proxy java.io.InputStream to sit between ZipInputStream and the stream you originally passed to ZipInputStream's constructor. Your proxy would stream to a buffer until you know whether it's a ZIP file or not. If not, then the buffer saves your day.

how to write a file without allocating the whole byte array into memory?

This is a newbie question, I know. Can you guys help?
I'm talking about big files, of course, above 100MB. I'm imagining some kind of loop, but I don't know what to use. Chunked stream?
One thins is for certain: I don't want something like this (pseudocode):
File file = new File(existing_file_path);
byte[] theWholeFile = new byte[file.length()]; //this allocates the whole thing into memory
File out = new File(new_file_path);
out.write(theWholeFile);
To be more specific, I have to re-write a applet that downloads a base64 encoded file and decodes it to the "normal" file. Because it's made with byte arrays, it holds twice the file size in memory: one base64 encoded and the other one decoded. My question is not about base64. It's about saving memory.
Can you point me in the right direction?
Thanks!
From the question, it appears that you are reading the base64 encoded contents of a file into an array, decoding it into another array before finally saving it.
This is a bit of an overhead when considering memory. Especially given the fact that Base64 encoding is in use. It can be made a bit more efficient by:
Reading the contents of the file using a FileInputStream, preferably decorated with a BufferedInputStream.
Decoding on the fly. Base64 encoded characters can be read in groups of 4 characters, to be decoded on the fly.
Writing the output to the file, using a FileOutputStream, again preferably decorated with a BufferedOutputStream. This write operation can also be done after every single decode operation.
The buffering of read and write operations is done to prevent frequent IO access. You could use a buffer size that is appropriate to your application's load; usually the buffer size is chosen to be some power of two, because such a number does not have an "impedance mismatch" with the physical disk buffer.
Perhaps a FileInputStream on the file, reading off fixed length chunks, doing your transformation and writing them to a FileOutputStream?
Perhaps a BufferedReader? Javadoc: http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/io/BufferedReader.html
Use this base64 encoder/decoder, which will wrap your file input stream and handle the decoding on the fly:
InputStream input = new Base64.InputStream(new FileInputStream("in.txt"));
OutputStream output = new FileOutputStream("out.txt");
try {
byte[] buffer = new byte[1024];
int readOffset = 0;
while(input.available() > 0) {
int bytesRead = input.read(buffer, readOffset, buffer.length);
readOffset += bytesRead;
output.write(buffer, 0, bytesRead);
}
} finally {
input.close();
output.close();
}
You can use org.apache.commons.io.FileUtils. This util class provides other options too beside what you are looking for. For example:
FileUtils.copyFile(final File srcFile, final File destFile)
FileUtils.copyFile(final File input, final OutputStream output)
FileUtils.copyFileToDirectory(final File srcFile, final File destDir)
And so on.. Also you can follow this tut.

Categories

Resources