IOUtils.copyLarge not working for compressed input - java

I am attempting to transfer a gzipped file using IOUtils.copyLarge. When I transfer from a GZIPInputStream to a non-compressed output, it works fine, but when I transfer the original InputStream (attempting to leave it compressed) the end result is 0 bytes.
I have verified the input file is correct. Here is an example of what works
IOUtils.copyLarge(new GZIPInputStream(inputStream), out)
This of course results in an uncompressed file being written out. I would like to keep the file compressed as it is in the original input.
When I try val countBytes = IOUtils.copyLarge(inputStream, out) the result is 0, and the resulting file is empty. The desired result is simply copying the already compressed gzip file to a new destination maintaining compression.
Reading the documentation for the API, I should be using this correctly. Any ideas on what is preventing it from working?

Related

Read AWS S3 GZIP Object using GetObjectRequest with range

I am trying to read a big AWS S3 Compressed Object(gz).I don't want to read the whole object, want to read it in parts,so that i can process the uncompressed data in parallel
I am reading it with GetObjectRequest with "Range" Header, where i am setting byte range.
However, when i give a byte range in between (100,200), it fails with "Not in GZIP format"
The reason for failure is , AWS request return a stream,however when i parse it to GZIPInputStream it fails as "GZIPInputStream" expects the first byte (GZIP_MAGIC = 0x8b1f) to confirm is it gzip , which is not present in the stream.
GetObjectRequest rangeObjectRequest = new GetObjectRequest(<<Bucket>>, <<Key>>).withRange(100, 200);
S3Object object = s3Client.getObject(rangeObjectRequest);
S3ObjectInputStream rawData = object.getObjectContent();
InputStream data = new GZIPInputStream(rawData);
Can anyone guide the right approach?
GZIP is a compression format in which each byte in the file depends on all of the bytes that precede it. Which means that you can't pick an arbitrary byte range out of the file and make sense of it.
If you need to read byte ranges, you'll need to store it uncompressed.
You could also create your own file storage format that stores chunks of the file as separately-compressed blocks. You could do this using the ZIP format, where each file in the archive represents a specific block size. But you'd need to implement your own ZIP directory reader to make that work.

Video File to String and back - Corrupted File

I'm writing an app that takes a 5 second video (.mp4 file) and uploads it to a server as a base64 String. The user can download the video, at which point the String should be converted back to a File.
I'm running into a problem where the downloaded File is saved, but it is slightly smaller (2 to 3 KB smaller) than the original File and is unplayable. This issue does not happen every time that a video is taken; I have been trying to find a pattern for this behavior, but haven't found one yet.
videoFile is a File created through a MediaRecorder recording. I can play back this original File without issue.
Encoding snippet:
//videoFile path: Environment.getExternalStorageDirectory() + "/Pictures/" + timestamp + ".mp4"
byte[] bytes = FileUtils.readFileToByteArray(videoFile);
String encodedVideo = Base64.encodeToString(bytes, Base64.URL_SAFE);
Decoding snippet:
File download = new File(Environment.getExternalStorageDirectory() +"/Pictures/test_"+videoFile.getName());
byte[] decode = Base64.decode(encodedVideo, Base64.URL_SAFE);
FileUtils.writeByteArrayToFile(download, decode);
Log.i("compare arrays",""+(Arrays.toString(bytes)).compareTo(Arrays.toString(decoded)));
The downloaded File is being stored in the same directory as the original File.
Edit to clarify:
The "compare arrays" Log statement gives a result of 0, so the arrays bytes and decode should have the same contents.
I added additional log statements:
Log.i("compare contents", ""+FileUtils.contentEquals(videoFile,download));
Log.i("original checksum", ""+FileUtils.checksumCRC32(videoFile));
Log.i("download checksum", ""+FileUtils.checksumCRC32(download));
to compare the contents of the files more directly.
The log statement comparing the arrays always returns 0, but the contentEquals log statement is not always true. When it is false, the checksums of the files are different. Since the byte arrays have the same contents, I believed the checksums and the actual file contents would be the same as well. This is clearly incorrect, but I don't know how to resolve this issue. Again, I have not found a discernible pattern for when the downloaded File is incorrect.
Any help is appreciated. Where am I going wrong?

Java writing two buffered streams to response OutputStream

Currently am working in exporting the Huge data to Excel file using Jasper Reports. Am trying to split the dataset into chunks and creating jasper print object and exporting to multiple Excel files.
After that am reading the file using FileInput, BufferdInputStream and copying to response.getOutputstream. Like that I need to read from each file and should copy to response.getOutputstream.
But at the end, when exported file is corrupted, and data is not readable and don't know what format it is.
Any workable solution to export huge data using Jasper Reports is also appreciated.
while ((readBytes = buf1.read(buffer)) != -1) {
servletOutputStream.write(buffer, 0, readBytes);
servletOutputStream.flush();
}
Above code am repeating in loop to read the data from each file by replacing
buf1 with new data.

How to convert byte array to file

I have connected to an ftp location using;
URL url = new URL("ftp://user:password#mydomain.com/" + file_name +";type=i");
I read the content into a byte array as shown below;
byte[] buffer = new byte[1024];
int count = 0;
while((count = fis.read(buffer)) > 0)
{
//check if bytes in buffer is a file
}
I want to be able to check if the bytes in buffer is a file without explicitly passing a specific file to write to it like;
File xfile= new File("dir1/");
FileOutputStream fos = new FileOutputStream(xfile);
fos.write(bytes);
if(xfile.isFile())
{
}
In an Ideal world something like this;
File xfile = new File(buffer);//Note: you cannot do this in java
if(xfile.isFile())
{
}
isFile() is to check if the bytes read from the ftp is file. I don't want to pass an explicit file name as I do not know the name of the file on the ftp location.
Any solutions available?
What is a file?
A computer file is a block of arbitrary information [...] which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished.
Your bytes that are stored in the byte array will be a part of a file if you write them on some kind of durable storage.
Sure, we often say that we read a file or write a file, but basically we read bytes from a file and write bytes to a file.
So we can't test a byte array whether it's content is a file or not. Simply because every byte array can be used to create a file (even an empty array).
BTW - the ftp server does not send a file, it (1) reads bytes and (2) a filename and (3) sends the bytes and (4) the filename so that a client can (5) read the bytes and (6) the filename and use both datasets to (7) create a file. The ftp server doesn't have to access a file, it can take bytes and names from a database or create both in memory...
I guess you cannot check if the byte[] array is a file or not. Why dont' you just use already written and tested library like maybe for example: http://commons.apache.org/net/
There is no way to do that easily.
A file is a byte array on a disk and a byte array will be a file if you write it to disk. There is no reliable way of telling what is in the data you just received, without parsing the data and checking if you can find a valid file header in it.
Where is isFile() file means the content fetched from from the ftp stream is a file.
The answer to that is simple. You can't do it because it doesn't make any sense.
What you have read from the stream IS a sequence of bytes stored in memory.
A file is a sequence of bytes stored on a disk (typically).
These are not the same thing. (Or if you want to get all theoretical / philosophical you have to answer the question "when is a sequence of bytes a file, and when is it not a file".
Now a more sensible question to ask might be:
How do I know if the stuff I fetched by FTP is the contents of a file on the FTP server.
(... as distinct from a rendering of a directory or something).
The answer is that you can't be sure if you fetched the file by opening an URLConnection to the FTP server ... like you have done. It is like asking "is '(123) 555-5555' a phone number?". It could be a phone number, or it could just be a sequence of characters that look like a phone number.

how to write a file without allocating the whole byte array into memory?

This is a newbie question, I know. Can you guys help?
I'm talking about big files, of course, above 100MB. I'm imagining some kind of loop, but I don't know what to use. Chunked stream?
One thins is for certain: I don't want something like this (pseudocode):
File file = new File(existing_file_path);
byte[] theWholeFile = new byte[file.length()]; //this allocates the whole thing into memory
File out = new File(new_file_path);
out.write(theWholeFile);
To be more specific, I have to re-write a applet that downloads a base64 encoded file and decodes it to the "normal" file. Because it's made with byte arrays, it holds twice the file size in memory: one base64 encoded and the other one decoded. My question is not about base64. It's about saving memory.
Can you point me in the right direction?
Thanks!
From the question, it appears that you are reading the base64 encoded contents of a file into an array, decoding it into another array before finally saving it.
This is a bit of an overhead when considering memory. Especially given the fact that Base64 encoding is in use. It can be made a bit more efficient by:
Reading the contents of the file using a FileInputStream, preferably decorated with a BufferedInputStream.
Decoding on the fly. Base64 encoded characters can be read in groups of 4 characters, to be decoded on the fly.
Writing the output to the file, using a FileOutputStream, again preferably decorated with a BufferedOutputStream. This write operation can also be done after every single decode operation.
The buffering of read and write operations is done to prevent frequent IO access. You could use a buffer size that is appropriate to your application's load; usually the buffer size is chosen to be some power of two, because such a number does not have an "impedance mismatch" with the physical disk buffer.
Perhaps a FileInputStream on the file, reading off fixed length chunks, doing your transformation and writing them to a FileOutputStream?
Perhaps a BufferedReader? Javadoc: http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/io/BufferedReader.html
Use this base64 encoder/decoder, which will wrap your file input stream and handle the decoding on the fly:
InputStream input = new Base64.InputStream(new FileInputStream("in.txt"));
OutputStream output = new FileOutputStream("out.txt");
try {
byte[] buffer = new byte[1024];
int readOffset = 0;
while(input.available() > 0) {
int bytesRead = input.read(buffer, readOffset, buffer.length);
readOffset += bytesRead;
output.write(buffer, 0, bytesRead);
}
} finally {
input.close();
output.close();
}
You can use org.apache.commons.io.FileUtils. This util class provides other options too beside what you are looking for. For example:
FileUtils.copyFile(final File srcFile, final File destFile)
FileUtils.copyFile(final File input, final OutputStream output)
FileUtils.copyFileToDirectory(final File srcFile, final File destDir)
And so on.. Also you can follow this tut.

Categories

Resources