In a java program I am compressing an InputStream like this:
ChannelBufferOutputStream outputStream = new ChannelBufferOutputStream(ChannelBuffers.dynamicBuffer(BUFFER_SIZE));
GZIPOutputStream compressedOutputStream = new GZIPOutputStream(outputStream);
try {
IOUtils.copy(inputStream, compressedOutputStream);
} finally {
// this should print the byte size after compression
System.out.println(outputStream.writtenBytes());
}
I am testing this code with a json file that is ~31.000 byte uncompressed and ~7.000 byte compressed on disk. Sending a InputStream that is wrapping the uncompressed json file to the code above, outputStream.writtenBytes() returns 10 which would indicate that it compressed down to only 10 byte. That seems wrong, so I wonder where the problem is. ChannelBufferOutputStream javadoc says: Returns the number of written bytes by this stream so far. So it should be working.
Try calling GZIPOutputStream.finish() or flush() methods before counting bytes
If that does not work, you can create a proxy stream, whose mission - to count the number of bytes that have passed through it
Related
Goal: Decrypt data from one source and write the decrypted data to a file.
try (FileInputStream fis = new FileInputStream(targetPath.toFile());
ReadableByteChannel channel = newDecryptedByteChannel(path, associatedData))
{
FileChannel fc = fis.getChannel();
long position = 0;
while (position < ???)
{
position += fc.transferFrom(channel, position, CHUNK_SIZE);
}
}
The implementation of newDecryptedByteChannel(Path,byte[]) should not be of interest, it just returns a ReadableByteChannel.
Problem: What is the condition to end the while loop? When is the "end of the byte channel" reached? Is transferFrom the right choice here?
This question might be related (answer is to just set the count to Long.MAX_VALUE). Unfortunately this doesn't help me because the docs say that up to count bytes may be transfered, depending upon the natures and states of the channels.
Another thought was to just check whether the amount of bytes actually transferred is 0 (returned from transferFrom), but this condition may be true if the source channel is non-blocking and has fewer than count bytes immediately available in its input buffer.
It is one of the bizarre features of FileChannel. transferFrom() that it never tells you about end of stream. You have to know the input length independently.
I would just use streams for this: specifically, a CipherInputStream around a BufferedInputStream around a FileInputStream, and a FileOutputStream.
But the code you posted doesn't make any sense anyway. It can't work. You are transferring into the input file, and via a channel that was derived from a FileInputStream, so it is read-only, so transferFrom() will throw an exception.
As commented by #user207421, as you are reading from ReadableByteChannel, the target channel needs to be derived from FileOutputStream rather than FileInputStream. And the condition for ending loop in your code should be the size of file underlying the ReadableByteChannel which is not possible to get from it unless you are able to get FileChannel and find the size through its size method.
The way I could find for transferring is through ByteBuffer as below.
ByteBuffer buf = ByteBuffer.allocate(1024*8);
while(readableByteChannel.read(buf)!=-1)
{
buf.flip();
fc.write(buf); //fc is FileChannel derived from FileOutputStream
buf.compact();
}
buf.flip();
while(buf.hasRemainig())
{
fc.write(buf);
}
Below is the code that I have written. I want to do the simple thing, storing binary file data into byteBuffer.
File file = new File(fileName);
try {
ReadableByteChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = ByteBuffer.allocateDirect(file.length());
// How can use buf.read to get all the contents?
} catch (Exception e){
}
I was wondering
how can I use read to get all data from channel and store it in ByteBuffer
if there is more elegant way to allocate ByteBuffer, other than using File object to get the length of the file
I prefer to use memory mapping.
FileChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = channel.map(MapMode.READ_ONLY,0,channel.size());
If the file is greater than 2 GB, you have to have more than one mapping. On the plus side this takes around 10 ms regardless of size and doesn't use much heap or direct memory regardless of the size of the file.
From the ReadableByteChannel Javadocs
read(ByteBuffer dst)
An attempt is made to read up to r bytes from the channel, where r is the number of bytes remaining in the buffer, that is, dst.remaining(), at the moment this method is invoked.
So ... channel.read(buf);
As for your second question, if you want to read the entire contents of the file into memory at once that seems like a reasonable approach.
I am having trouble using Deflater to write a GZIP file. I created a default header and used CRC32 to keep track of the checksum.
The file I am zipping is smaller than my buffer, but the output I get for this compressor is ~200 bytes larger than it should be (gzip creates a file of 457 bytes while my code is creating a file of 652 bytes. I printed the compressedSize and it says it was 634 bytes) I did a hexdump on my final file, and it says that both my trailer and my main file is incorrect, but my header is correct. I am not allowed to use GZIPOutputStream for this assignment, but I used it's code to write the header and trailer. The amount of bytes read in is correct.
The "manage" object is an object that does the reading and writing from System.in and System.out in a synchronized matter (this is for multithreading), and I verified that they should read and write a file in order. I looked at the GZIPOutputStream source and the DeflaterOutputStream source, and my code looks similar, so I am unsure why my compressor is giving me such a large compressed byte array. I played with the Deflater levels and the strategies, but they give me the same result.
EDIT: The constructor for my Deflater is
Deflater compressor = new Deflater(Deflater.DEFAULT_LEVEL, true);
CRC32 checksum = new CRC32();
checksum.reset();
int uncompressedLength = 0;
uncompressedLength = manage.read(buff, threadNum, prime);
if (uncompressedLength > 0)
{
checksum.update(buff, 0, uncompressedLength);
compressor.setInput(buff);
compressor.finish();
byte[] output = new byte[BUFFER_SIZE];
compressor.deflate(output);
int compressedDataLength = (int) compressor.getBytesWritten();
manage.write(output, compressedDataLength, threadNum, (int) checksum.getValue(), uncompressedLength);
The Deflater class has three constructors. The one with two arguments uses a boolean that, if true, indicates ZLIB header and checksum fields should not be used, which is what GZIP needs. The other two constructors (one no-args, the other only specifying compression level) default to using those header and checksum fields. In other words, it's like the two-args constructor with false. Maybe try the one with the boolean and set it to true?
Here's the constructor doc.
This is a newbie question, I know. Can you guys help?
I'm talking about big files, of course, above 100MB. I'm imagining some kind of loop, but I don't know what to use. Chunked stream?
One thins is for certain: I don't want something like this (pseudocode):
File file = new File(existing_file_path);
byte[] theWholeFile = new byte[file.length()]; //this allocates the whole thing into memory
File out = new File(new_file_path);
out.write(theWholeFile);
To be more specific, I have to re-write a applet that downloads a base64 encoded file and decodes it to the "normal" file. Because it's made with byte arrays, it holds twice the file size in memory: one base64 encoded and the other one decoded. My question is not about base64. It's about saving memory.
Can you point me in the right direction?
Thanks!
From the question, it appears that you are reading the base64 encoded contents of a file into an array, decoding it into another array before finally saving it.
This is a bit of an overhead when considering memory. Especially given the fact that Base64 encoding is in use. It can be made a bit more efficient by:
Reading the contents of the file using a FileInputStream, preferably decorated with a BufferedInputStream.
Decoding on the fly. Base64 encoded characters can be read in groups of 4 characters, to be decoded on the fly.
Writing the output to the file, using a FileOutputStream, again preferably decorated with a BufferedOutputStream. This write operation can also be done after every single decode operation.
The buffering of read and write operations is done to prevent frequent IO access. You could use a buffer size that is appropriate to your application's load; usually the buffer size is chosen to be some power of two, because such a number does not have an "impedance mismatch" with the physical disk buffer.
Perhaps a FileInputStream on the file, reading off fixed length chunks, doing your transformation and writing them to a FileOutputStream?
Perhaps a BufferedReader? Javadoc: http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/io/BufferedReader.html
Use this base64 encoder/decoder, which will wrap your file input stream and handle the decoding on the fly:
InputStream input = new Base64.InputStream(new FileInputStream("in.txt"));
OutputStream output = new FileOutputStream("out.txt");
try {
byte[] buffer = new byte[1024];
int readOffset = 0;
while(input.available() > 0) {
int bytesRead = input.read(buffer, readOffset, buffer.length);
readOffset += bytesRead;
output.write(buffer, 0, bytesRead);
}
} finally {
input.close();
output.close();
}
You can use org.apache.commons.io.FileUtils. This util class provides other options too beside what you are looking for. For example:
FileUtils.copyFile(final File srcFile, final File destFile)
FileUtils.copyFile(final File input, final OutputStream output)
FileUtils.copyFileToDirectory(final File srcFile, final File destDir)
And so on.. Also you can follow this tut.
I'm updating some old code to grab some binary data from a URL instead of from a database (the data is about to be moved out of the database and will be accessible by HTTP instead). The database API seemed to provide the data as a raw byte array directly, and the code in question wrote this array to a file using a BufferedOutputStream.
I'm not at all familiar with Java, but a bit of googling led me to this code:
URL u = new URL("my-url-string");
URLConnection uc = u.openConnection();
uc.connect();
InputStream in = uc.getInputStream();
ByteArrayOutputStream out = new ByteArrayOutputStream();
final int BUF_SIZE = 1 << 8;
byte[] buffer = new byte[BUF_SIZE];
int bytesRead = -1;
while((bytesRead = in.read(buffer)) > -1) {
out.write(buffer, 0, bytesRead);
}
in.close();
fileBytes = out.toByteArray();
That seems to work most of the time, but I have a problem when the data being copied is large - I'm getting an OutOfMemoryError for data items that worked fine with the old code.
I'm guessing that's because this version of the code has multiple copies of the data in memory at the same time, whereas the original code didn't.
Is there a simple way to grab binary data from a URL and save it in a file without incurring the cost of multiple copies in memory?
Instead of writing the data to a byte array and then dumping it to a file, you can directly write it to a file by replacing the following:
ByteArrayOutputStream out = new ByteArrayOutputStream();
With:
FileOutputStream out = new FileOutputStream("filename");
If you do so, there is no need for the call out.toByteArray() at the end. Just make sure you close the FileOutputStream object when done, like this:
out.close();
See the documentation of FileOutputStream for more details.
I don't know what you mean with "large" data, but try using the JVM parameter
java -Xmx 256m ...
which sets the maximum heap size to 256 MByte (or any value you like).
If you need the Content-Length and your web-server is somewhat standard conforming, then it should provide you a "Content-Length" header.
URLConnection#getContentLength() should give you that information upfront so that you are able to create your file. (Be aware that if your HTTP server is misconfigured or under control of an evil entity, that header may not match the number of bytes received. In that case, why dont you stream to a temp-file first and copy that file later?)
In addition to that: A ByteArrayInputStream is a horrible memory allocator. It always doubles the buffer size, so if you read a 32MB + 1 byte file, then you end up with a 64MB buffer. It might be better to implement a own, smarter byte-array-stream, like this one:
http://source.pentaho.org/pentaho-reporting/engines/classic/trunk/core/source/org/pentaho/reporting/engine/classic/core/util/MemoryByteArrayOutputStream.java
subclassing ByteArrayOutputStream gives you access to the buffer and the number of bytes in it.
But of course, if all you want to do is to store de data into a file, you are better off using a FileOutputStream.