Convert GZIPOutputStream to byte array

Convert GZIPOutputStream to byte array - java

I already have a GZIPOutput stream which has already defined. Now I want to convert it to a byte array.
I tried the code below. But it gives an error.
GZIPOutputStream zipStream = createGZIP();
byte[] compressedData = zipStream.toByteArray();
error : cannot resolve method "toByteArray()"
checked GZIP compression to a byte array , but it is inputting a byte[]. I need to convert a gzip which I already have.

I have a GZIPOutput stream which has already defined. Now I want to convert it to a byte array.
You will need to modify the method that is creating the GZIPOutputStream so that it sends it to a ByteArrayOutputStream.
Alternatively, after closing the FileOutputStream for the file where you are (presumably) writing the compressed data, open it for input and read it into a byte array.
Concerning your current attempt:
GZIPOutputStream zipStream = createGZIP();
byte[] compressedData = zipStream.toByteArray();
This approach is not going to work.
GZIPOutputStream provides no API methods for retrieving the compressed data.
GZIPOutputStream provides no API methods for retrieving the stream that GZIPOutputStream is writing to. (And even if there was, most OutputStream types don't allow you to retrieve the data ...)
In general, it is better to find and read the javadocs for the classes that you use. Programming by guessing what methods they provide is liable to lead you to waste your time when your guess are wrong.

Related

Why to use ByteArrayInputStream rather than byte[] in Java

As I understand ByteArrayInputStream is used to read byte[] data.
Why should I use it rather than simple byte[] (for example reading it from DB).
What is the different between them?

If the input is always a byte[], then you're right, there's often no need for the stream. And if you don't need it, don't use it. One additional advantage of a ByteArrayInputStream is that it serves as a very strong indication that you intend the bytes to be read-only (since the stream doesn't provide an interface for changing them), though it's important to note that a programmer can often still access the bytes directly, so you shouldn't use that in a situation where security is a concern.
But if it's sometimes a byte[], sometimes a file, sometimes a network connection, etc, then you need some sort of abstraction for "a stream of bytes, and I don't care where they come from." That's what an InputStream is. When the source happens to be a byte array, ByteArrayInputStream is a good InputStream to use.
This is helpful in many situations, but to give two concrete examples:
You're writing a library that takes bytes and processes them somehow (maybe it's an image processing library, for instance). Users of your library may supply bytes from a file, or from a byte[] in memory, or from some other source. So, you provide an interface that accepts an InputStream — which means that if what they have is a byte[], they need to wrap it in a ByteArrayInputStream.
You're writing code that reads a network connection. But to unit test that code, you don't want to have to open up a connection; you want to just supply some bytes in the code. So the code takes an InputStream, and your test provides a ByteArrayInputStream.

A ByteArrayInputStream contains an internal buffer that contains bytes that
may be read from the stream. An internal counter keeps track of the next byte to be supplied by the read method.
ByteArrayInputStream is like wrapper which protects underlying array from external modification
It has high order read ,mark ,skip functions
A stream also has the advantage that you don't have to have all bytes in memory at the same time, which is convenient if the size of the data is large and can easily be handled in small chunks.
Reference doc
Where as if you choose byte[] ,then you have to generate wheels to do reading ,skipping and track current index explicitly
byte data[] = { 65, 66, 67, 68, 69 }; // data
for (int index = 0; index < data.length; index++) {
System.out.print((char) data[index] + " ");
}
int c = 0;
ByteArrayInputStream bInput = new ByteArrayInputStream(data);
while ((bInput.read()) != -1) {
System.out.println(Character.toUpperCase((char) c));
}

ByteArrayInputStream is a good wrapper for byte[], the core is understanding stream, a stream is an ordered sequence of bytes of indeterminate length.Input streams move bytes of data into a
java program from some generally external source, in java io, you can decorate one stream to another stream to get more function. but the performance maybe bad. the power of the stream metaphor is that difference between these source and destinations are abstracted way,all input and output operations are simply traded as streams using the same class and the same method,you don not learn a new API for every different kind of device, the same API that read file can read network sockets,serial ports, Bluetooth transmissions, and more.

Appending to an array vs writing to file

I'm writing a program which takes in a byte array of potentially millions of bytes, reads each one from a ByteArrayInputStream, and if the byte is not "printable" (ascii 32-126), that byte is encoded in a certain way and written to a ByteArrayOutputStream instance; if the byte is "printable" it is directly written to that same ByteArrayOutputStream instance.
So from a broader view I am taking in a byte array, and getting back a similar byte array except certain characters have been encoded.
My question is: would it be faster to write my data out to a file or to continuously be writing to this OutputStream?

It will be faster to write the data to your output stream. Writing to a file will involve disk access, which is slower than access to the RAM where the byte array inside the ByteArrayOutputStream lives.
However, if you eventually want to write your byte array out to some other place (say a file) then the intermediate step of the ByteArrayOutputStream is unnecessary and you should just write straight to the end destination e.g. FileOutputStream.

Java file IO truncated while reading large files using BufferedInputStream

I have a function in which I am only given a BufferedInputStream and no other information about the file to be read. I unfortunately cannot alter the method definition as it is called by code I don't have access to. I've been using the code below to read the file and place its contents in a String:
public String[] doImport(BufferedInputStream stream) throws IOException, PersistenceException {
int bytesAvail = stream.available();
byte[] bytesRead = new byte[bytesAvail];
stream.read(bytesRead);
stream.close();
String fileContents = new String(bytesRead);
//more code here working with fileContents
}
My problem is that for large files (>2Gb), this code causes the program to either run extremely slowly or truncate the data, depending on the computer the program is executed on. Does anyone have a recommendation regarding how to deal with large files in this situation?

You're assuming that available() returns the size of the file; it does not. It returns the number of bytes available to be read, and that may be any number less than or equal to the size of the file.
Unfortunately there's no way to do what you want in just one shot without having some other source of information on the length of the file data (i.e., by calling java.io.File.length()). Instead, you have to possibly accumulate from multiple reads. One way is by using ByteArrayOutputStream. Read into a fixed, finite-size array, then write the data you read into a ByteArrayOutputStream. At the end, pull the byte array out. You'll need to use the three-argument forms of read() and write() and look at the return value of read() so you know exactly how many bytes were read into the buffer on each call.

I'm not sure why you don't think you can read it line-by-line. BufferedInputStream only describes how the underlying stream is accessed, it doesn't impose any restrictions on how you ultimately read data from it. You can use it just as if it were any other InputStream.
Namely, to read it line-by-line you could do
InputStreamReader streamReader = new InputStreamReader(stream);
BufferedInputReader lineReader = new BufferedInputReader(streamReader);
String line = lineReader.readLine();
...
[Edit] This response is to the original wording of the question, which asked specifically for a way to read the input file line-by-line.

Creating a File from byte array

So - I've got a third party library that needs a File as input. I've got a byte array.
I don't want to write the bytes to disk .. I'd like to keep this in memory. Any idea on how I can create a File from the provided byte array (without writing to disk)?

Sorry, not possible. A File is inherently an on-disk entity, unless you have a RAM disk - but that's not something you can create in Java.
That's exactly the reason why APIs should not be based on File objects (or be overloaded to accept an InputStream).

There's one possibility, but it's a real long-shot.
If the API uses new FileReader(file) or new FileInputStream(file) then you're hosed, but...
If it converts the file to a URL or URI (using toURL() or toURI()) then, since File is not final, you can pass in a subclass of File in which you control the construction of the URL/URI and, more importantly, the handler.
But the chances are VERY slim!

So I see there is an accepted answer (and this is old), but I found a way to do this. I was using the IDOL On Demand API and needed to convert a byte array to a File.
Here is an example of taking a byte array of an image and turning into a File:
//imageByte is the byte array that is already defined
BufferedImage image = null;
ByteArrayInputStream bis = new ByteArrayInputStream(imageByte);
image = ImageIO.read(bis);
bis.close();
// write the image to a file
File outputfile = new File("image.png");
ImageIO.write(image, "png", outputfile);
And so outputfile is a File that can be used later in your program.

How to initialize a ByteBuffer if you don't know how many bytes to allocate beforehand?

Is this:
ByteBuffer buf = ByteBuffer.allocate(1000);
...the only way to initialize a ByteBuffer?
What if I have no idea how many bytes I need to allocate..?
Edit: More details:
I'm converting one image file format to a TIFF file. The problem is the starting file format can be any size, but I need to write the data in the TIFF to little endian. So I'm reading the stuff I'm eventually going to print to the TIFF file into the ByteBuffer first so I can put everything in Little Endian, then I'm going to write it to the outfile. I guess since I know how long IFDs are, headers are, and I can probably figure out how many bytes in each image plane, I can just use multiple ByteBuffers during this whole process.

The types of places that you would use a ByteBuffer are generally the types of places that you would otherwise use a byte array (which also has a fixed size). With synchronous I/O you often use byte arrays, with asynchronous I/O, ByteBuffers are used instead.
If you need to read an unknown amount of data using a ByteBuffer, consider using a loop with your buffer and append the data to a ByteArrayOutputStream as you read it. When you are finished, call toByteArray() to get the final byte array.
Any time when you aren't absolutely sure of the size (or maximum size) of a given input, reading in a loop (possibly using a ByteArrayOutputStream, but otherwise just processing the data as a stream, as it is read) is the only way to handle it. Without some sort of loop, any remaining data will of course be lost.
For example:
final byte[] buf = new byte[4096];
int numRead;
// Use try-with-resources to auto-close streams.
try(
final FileInputStream fis = new FileInputStream(...);
final ByteArrayOutputStream baos = new ByteArrayOutputStream()
) {
while ((numRead = fis.read(buf)) > 0) {
baos.write(buf, 0, numRead);
}
final byte[] allBytes = baos.toByteArray();
// Do something with the data.
}
catch( final Exception e ) {
// Do something on failure...
}
If you instead wanted to write Java ints, or other things that aren't raw bytes, you can wrap your ByteArrayOutputStream in a DataOutputStream:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
while (thereAreMoreIntsFromSomewhere()) {
int someInt = getIntFromSomewhere();
dos.writeInt(someInt);
}
byte[] allBytes = baos.toByteArray();

Depends.
Library
Converting file formats tends to be a solved problem for most problem domains. For example:
Batik can transcode between various image formats (including TIFF).
Apache POI can convert between office spreadsheet formats.
Flexmark can generate HTML from Markdown.
The list is long. The first question should be, "What library can accomplish this task?" If performance is a consideration, your time is likely better spent optimising an existing package to meet your needs than writing yet another tool. (As a bonus, other people get to benefit from the centralised work.)
Known Quantities
Reading a file? Allocate file.size() bytes.
Copying a string? Allocate string.length() bytes.
Copying a TCP packet? Allocate 1500 bytes, for example.
Unknown Quantities
When the number of bytes is truly unknown, you can do a few things:
Make a guess.
Analyze example data sets to buffer; use the average length.
Example
Java's StringBuffer, unless otherwise instructed, uses an initial buffer size to hold 16 characters. Once the 16 characters are filled, a new, longer array is allocated, and then the original 16 characters copied. If the StringBuffer had an initial size of 1024 characters, then the reallocation would not happen as early or as often.
Optimization
Either way, this is probably a premature optimization. Typically you would allocate a set number of bytes when you want to reduce the number of internal memory reallocations that get executed.
It is unlikely that this will be the application's bottleneck.

The idea is that it's only a buffer - not the whole of the data. It's a temporary resting spot for data as you read a chunk, process it (possibly writing it somewhere else). So, allocate yourself a big enough "chunk" and it normally won't be a problem.
What problem are you anticipating?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.