We are using Apache Camel for compressing and decompressing our files.
We use the standard .marshal().gzip() and .unmarshall().gzip() APIs.
Our problem is that when we get really large files, say 800MB to more than 1GB file size, our application runs out of memory, since the entire file is loading into memory for compression and decompression.
Are there any camel apis or java libraries which will help zip/unzip the file without loading the entire file in memory.
There is a similar unanswered question here
Explanation
Use a different approach: Stream the file.
That is, don't load it into memory completely but read it byte per byte and simultaneously write it back byte per byte .
Get an InputStream to the file, wrap some GZipInputStream around. Read byte per byte, write to an OutputStream.
The opposite if you want to compress an archive. Then you wrap the OutputStream by some GZipOutputStream.
Code
The example uses Apache Commons Compress but the logic of the code remains the same for all libraries.
Unpacking a gz archive:
Path inputPath = Paths.get("archive.tar.gz");
Path outputPath = Paths.get("archive.tar");
try (InputStream fin = Files.newInputStream(inputPath );
OutputStream out = Files.newOutputStream(outputPath);) {
GZipCompressorInputStream in = new GZipCompressorInputStream(
new BufferedInputStream(fin));
// Read and write byte by byte
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
out.write(buffer, 0, n);
}
}
Packing as gz archive:
Path inputPath = Paths.get("archive.tar");
Path outputPath = Paths.get("archive.tar.gz");
try (InputStream in = Files.newInputStream(inputPath);
OutputStream fout = Files.newOutputStream(outputPath);) {
GZipCompressorOutputStream out = new GZipCompressorOutputStream(
new BufferedOutputStream(fout));
// Read and write byte by byte
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
out.write(buffer, 0, n);
}
}
You could also wrap BufferedReader and PrintWriter around if you feel more comfortable with them. They manage the buffering themselves and you can read and write lines instead of bytes. Note that this only works correct if you read a file with lines and not some other format.
I'm trying to get a zip file from the server.
Im using HttpURLConnection to get InputStream and this is what i have:
myInputStream.toString().getBytes().toString() is equal to [B#4.....
byte[] bytes = Base64.decode(myInputStream.toString(), Base64.DEFAULT);
String string = new String(bytes, "UTF-8");
string == �&ܢ��z�m����y....
I realy tried to unzip this file but nothing works, also there is so many questions, should I use GZIPInputStream or ZipInputStream? Do I have to save this stream as file, or I can work on InputStream
Please help, my boss is getting impatient:O
I have no idea what is in this file i have to find out:)
GZipInputStream and ZipInputStream are two different formats. https://en.wikipedia.org/wiki/Gzip
It is not a good idea to retrieve a string directly from the stream.From an InputStream, you can create a File and write data into it using a FileOutputStream.
Decoding in Base 64 is something else. If your stream has already decoded the format upstream, it's OK; otherwise you have to encapsulate your stream with another input stream that decodes the Base64 format.
The best practice is to use a buffer to avoid memory overflow.
Here is some Kotlin code that decompresses the InputStream zipped into a file. (simpler than java because the management of byte [] is tedious) :
val fileBinaryDecompress = File(..path..)
val outputStream = FileOutputStream(fileBinaryDecompress)
readFromStream(ZipInputStream(myInputStream), BUFFER_SIZE_BYTES,
object : ReadBytes {
override fun read(buffer: ByteArray) {
outputStream.write(buffer)
}
})
outputStream.close()
interface ReadBytes {
/**
* Called after each buffer fill
* #param buffer filled
*/
#Throws(IOException::class)
fun read(buffer: ByteArray)
}
#Throws(IOException::class)
fun readFromStream(inputStream: InputStream, bufferSize: Int, readBytes: ReadBytes) {
val buffer = ByteArray(bufferSize)
var read = 0
while (read != -1) {
read = inputStream.read(buffer, 0, buffer.size)
if (read != -1) {
val optimizedBuffer: ByteArray = if (buffer.size == read) {
buffer
} else {
buffer.copyOf(read)
}
readBytes.read(optimizedBuffer)
}
}
}
If you want to get the file from the server without decompressing it, remove the ZipInputStream() decorator.
Usually, there is no significant difference between GZIPInputStream or ZipInputStream, so if at all, both should work.
Next, you need to identify whether the zipped stream was Base64 encoded, or the some Base64 encoded contents was put into a zipped stream - from what you put to your question, it seems to be the latter option.
So you should try
ZipInputStream zis = new ZipInputStream( myInputStream );
ZipEntry ze = zis.getNextEntry();
InputStream is = zis.getInputStream( ze );
and proceed from there ...
basically by setting inputStream to be GZIPInputStream should be able to read the actual content.
Also for simplicity using IOUtils package from apache.commons makes your life easy
this works for me:
InputStream is ; //initialize you IS
is = new GZIPInputStream(is);
byte[] bytes = IOUtils.toByteArray(is);
String s = new String(bytes);
System.out.println(s);
I currently am accessing a streaming h264 file and want to save it off for the ability to slice frames. However, I'm having issues saving/opening the .flv file
When pointing to the URL in the address bar - I am told it's an x-flv file.
I then attempt to do the following to save a chunk of the stream.
URL url = new URL("http://foo.bar.com/foo/bar");
HttpURLConnection conn = (HttpURLConnection) url
.openConnection(proxy);
conn.setRequestMethod("GET");
File f = new File("C:\\tmpArea\\tmp.flv");
FileWriter fr = new FileWriter(f);
bw = new BufferedWriter(fr);
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream())));
String output = "";
int i = 0;
while (((output = br.readLine()) != null) && i < 100000) {
bw.write(output);
i++;
}
Upon doing this I've attempted to open the file in VLC Media Player and am told:
No suitable decoder module: VLC does not support the audio or video format "undf".
Unfortunately there is no way for you to fix this.
I then thought, well maybe it's not really an FLV file and that's on me. So I used a run of the mill hex-editor. Opening up the file in the HexEditor gives me the following information:
FLV
onMetaData
duration
width
height
videodatarate
framerate
videocodecid
audiodatarate
audiosamplerate
audiosamplesize
stereo
audiocodecid
encoder
Lavf52.10.6.0
filesize....
Is there a different way I should be trying to save off this data? Is there a conversion/codec issue I'm not seeing?
You are using a Reader and Writer, which are intended to read bytes and convert them to characters, to read a binary file that consists of bytes. The conversion from bytes to characters will corrupt the data. You should be using InputStream and OutputStream instead.
Hello Stack Overflow community,
I am doing multistep processing on some data I am receiving with a java Servlet. The current process I have is that I input the files to a server using Apache File Upload and convert them to a File. Then once input1 is populated with data, I run through a flow similar to this (where the process functions are xsl transforms):
File input1 = new File(FILE_NAME); // <---this is populated with data
File output1 = new File(TEMP_FILE); // <---this is the temporary file
InputStream read = new FileInputStream(input1);
OuputStream out = new FileOutputStream(output1);
process1ThatReadsProcessesOutputs( read, out);
out.close();
read.close();
//this is basically a repeat of the above process!
File output2 = new File(RESULT_FILE); // <--- This is the result file
InputStream read1 = new FileInputStream(output1);
OutputStream out1 = new FileOutputStream(output2);
Process2ThatReadsProcessesOutputs( read1, out1);
read1.close();
out1.close();
…
So my question is if there is a better way to do this so I do not have to create those temporary Files and recreate streams to those Files? (I am assuming I am incurring a decent performace penatly)
I saw this Most Efficient Way to create InputStream from OutputStream but I am not sure if this is the best route to go...
Just replace FileOutputStream to ByteArrayInputStream vice/versa.
Example:
ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());
I don't know why are you converting the FileItem retrieved with Apache Commons if you don't really needed. You can use the same InputStream that each FileItem has to using and read the content of the uploaded file:
// create/retrieve a new file upload handler
ServletFileUpload upload = ...;
// parse the request
List<FileItem> items = (List<FileItem>) upload.parseRequest(request);
/* get the FileItem from the List. Yes, it's not a best practice because you must verify
how many you receive, and check everything is ok, etc.
Let's suppose you've done it */
//...
FileItem item = items.get(0);
// get the InputStrem to read the contents of the file
InputStream is = item.getInputStream();
So finally, you can use the InputStream object to read the uploaded stream sent by the client avoiding unnecessary instantiations.
And yes, it's really recommended to use Buffered clases like BufferedInputStream and BufferedOutputStream.
The other idea could be to avoid FileOutputStream (the middle one) and replace it with ByteArrayOutputStream if you don't need to be written in disk (always is slower than working in memory).
Java 9 brings a new answer to the question:
// All bytes from an InputStream at once
byte[] result = new ByteArrayInputStream(buf)
.readAllBytes();
// Directly redirect an InputStream to an OutputStream
new ByteArrayInputStream(buf)
.transferTo(System.out);
Someone explain to me what InputStream and OutputStream are?
I am confused about the use cases for both InputStream and OutputStream.
If you could also include a snippet of code to go along with your explanation, that would be great. Thanks!
The goal of InputStream and OutputStream is to abstract different ways to input and output: whether the stream is a file, a web page, or the screen shouldn't matter. All that matters is that you receive information from the stream (or send information into that stream.)
InputStream is used for many things that you read from.
OutputStream is used for many things that you write to.
Here's some sample code. It assumes the InputStream instr and OutputStream osstr have already been created:
int i;
while ((i = instr.read()) != -1) {
osstr.write(i);
}
instr.close();
osstr.close();
InputStream is used for reading, OutputStream for writing. They are connected as decorators to one another such that you can read/write all different types of data from all different types of sources.
For example, you can write primitive data to a file:
File file = new File("C:/text.bin");
file.createNewFile();
DataOutputStream stream = new DataOutputStream(new FileOutputStream(file));
stream.writeBoolean(true);
stream.writeInt(1234);
stream.close();
To read the written contents:
File file = new File("C:/text.bin");
DataInputStream stream = new DataInputStream(new FileInputStream(file));
boolean isTrue = stream.readBoolean();
int value = stream.readInt();
stream.close();
System.out.printlin(isTrue + " " + value);
You can use other types of streams to enhance the reading/writing. For example, you can introduce a buffer for efficiency:
DataInputStream stream = new DataInputStream(
new BufferedInputStream(new FileInputStream(file)));
You can write other data such as objects:
MyClass myObject = new MyClass(); // MyClass have to implement Serializable
ObjectOutputStream stream = new ObjectOutputStream(
new FileOutputStream("C:/text.obj"));
stream.writeObject(myObject);
stream.close();
You can read from other different input sources:
byte[] test = new byte[] {0, 0, 1, 0, 0, 0, 1, 1, 8, 9};
DataInputStream stream = new DataInputStream(new ByteArrayInputStream(test));
int value0 = stream.readInt();
int value1 = stream.readInt();
byte value2 = stream.readByte();
byte value3 = stream.readByte();
stream.close();
System.out.println(value0 + " " + value1 + " " + value2 + " " + value3);
For most input streams there is an output stream, also. You can define your own streams to reading/writing special things and there are complex streams for reading complex things (for example there are Streams for reading/writing ZIP format).
From the Java Tutorial:
A stream is a sequence of data.
A program uses an input stream to read data from a source, one item at a time:
A program uses an output stream to write data to a destination, one item at time:
The data source and data destination pictured above can be anything that holds, generates, or consumes data. Obviously this includes disk files, but a source or destination can also be another program, a peripheral device, a network socket, or an array.
Sample code from oracle tutorial:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("xanadu.txt");
out = new FileOutputStream("outagain.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
This program uses byte streams to copy xanadu.txt file to outagain.txt , by writing one byte at a time
Have a look at this SE question to know more details about advanced Character streams, which are wrappers on top of Byte Streams :
byte stream and character stream
you read from an InputStream and write to an OutputStream.
for example, say you want to copy a file. You would create a FileInputStream to read from the source file and a FileOutputStream to write to the new file.
If your data is a character stream, you could use a FileReader instead of an InputStream and a FileWriter instead of an OutputStream if you prefer.
InputStream input = ... // many different types
OutputStream output = ... // many different types
byte[] buffer = new byte[1024];
int n = 0;
while ((n = input.read(buffer)) != -1)
output.write(buffer, 0, n);
input.close();
output.close();
OutputStream is an abstract class that represents writing output. There are many different OutputStream classes, and they write out to certain things (like the screen, or Files, or byte arrays, or network connections, or etc). InputStream classes access the same things, but they read data in from them.
Here is a good basic example of using FileOutputStream and FileInputStream to write data to a file, then read it back in.
A stream is a continuous flow of liquid, air, or gas.
Java stream is a flow of data from a source into a destination. The source or destination can be a disk, memory, socket, or other programs. The data can be bytes, characters, or objects. The same applies for C# or C++ streams. A good metaphor for Java streams is water flowing from a tap into a bathtub and later into a drainage.
The data represents the static part of the stream; the read and write methods the dynamic part of the stream.
InputStream represents a flow of data from the source, the OutputStream represents a flow of data into the destination.
Finally, InputStream and OutputStream are abstractions over low-level access to data, such as C file pointers.
Stream: In laymen terms stream is data , most generic stream is binary representation of data.
Input Stream : If you are reading data from a file or any other source , stream used is input stream. In a simpler terms input stream acts as a channel to read data.
Output Stream : If you want to read and process data from a source (file etc) you first need to save the data , the mean to store data is output stream .
An output stream is generally related to some data destination like a file or a network etc.In java output stream is a destination where data is eventually written and it ends
import java.io.printstream;
class PPrint {
static PPrintStream oout = new PPrintStream();
}
class PPrintStream {
void print(String str) {
System.out.println(str)
}
}
class outputstreamDemo {
public static void main(String args[]) {
System.out.println("hello world");
System.out.prinln("this is output stream demo");
}
}
For one kind of InputStream, you can think of it as a "representation" of a data source, like a file.
For example:
FileInputStream fileInputStream = new FileInputStream("/path/to/file/abc.txt");
fileInputStream represents the data in this path, which you can use read method to read bytes from the file.
For the other kind of InputStream, they take in another inputStream and do further processing, like decompression.
For example:
GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
gzipInputStream will treat the fileInputStream as a compressed data source. When you use the read(buffer, 0, buffer.length) method, it will decompress part of the gzip file into the buffer you provide.
The reason why we use InputStream because as the data in the source becomes larger and larger, say we have 500GB data in the source file, we don't want to hold everything in the memory (expensive machine; not friendly for GC allocation), and we want to get some result faster (reading the whole file may take a long time).
The same thing for OutputStream. We can start moving some result to the destination without waiting for the whole thing to finish, plus less memory consumption.
If you want more explanations and examples, you have check these summaries: InputStream, OutputStream, How To Use InputStream, How To Use OutputStream
In continue to the great other answers, in my simple words:
Stream - like mentioned #Sher Mohammad is data.
Input stream - for example is to get input – data – from the file. The case is when I have a file (the user upload a file – input) – and I want to read what we have there.
Output Stream – is the vice versa. For example – you are generating an excel file, and output it to some place.
The “how to write” to the file, is defined at the sender (the excel workbook class) not at the file output stream.
See here example in this context.
try (OutputStream fileOut = new FileOutputStream("xssf-align.xlsx")) {
wb.write(fileOut);
}
wb.close();