Decompress large binary files - java

I have a function to decompress large zip files using the below method. They are times where I run into OutOfMemoryError error because the file is just too large. Is there a way I can optimize my code? I have read something about breaking the file into smaller parts that can fit into memory and decompress but I don't know how to do that. Any help or suggestion is appreciated.
private static String decompress(String s){
String pathOfFile = null;
try(BufferedReader reader = new BufferedReader(new InputStreamReader(new GZIPInputStream(new FileInputStream(s)), Charset.defaultCharset()))){
File file = new File(s);
FileOutputStream fos = new FileOutputStream(file);
String line;
while((line = reader.readLine()) != null){
fos.write(line.getBytes());
fos.flush();
}
pathOfFile = file.getAbsolutePath();
} catch (IOException e) {
e.printStackTrace();
}
return pathOfFile;
}
The stacktrace:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
at java.base/java.util.ArrayList.grow(ArrayList.java:237)
at java.base/java.util.ArrayList.ensureCapacity(ArrayList.java:217)

Don't use Reader classes because you don't need to write output file character by character or line by line. You should read and write byte by byte with InputStream.transferTo() method:
try(var in = new GZIPInputStream(new FileInputStream(inFile));
var out = new FileOutputStream(outFile)) {
in.transferTo(out);
}
Also you probably don't need to call flush() explicitly, doing it after every line is wasteful.

Related

Can't read from binary file - read some lines in UTF-8, some in binary

I have this code:
import java.io.*;
import java.nio.charset.StandardCharsets;
public class Main {
public static void main(String[] args) {
zero("zero.out");
System.out.println(zeroRead("zero.out"));
}
public static String zeroRead(String name) {
try (FileInputStream fos = new FileInputStream(name);
BufferedInputStream bos = new BufferedInputStream(fos);
DataInputStream dos = new DataInputStream(bos)) {
StringBuffer inputLine = new StringBuffer();
String tmp;
String s = "";
while ((tmp = dos.readLine()) != null) {
inputLine.append(tmp);
System.out.println(tmp);
}
dos.close();
return s;
}
catch (IOException e) {
e.printStackTrace();
}
return null;
}
public static void zero(String name) {
File file = new File(name);
String text = "König" + "\t";
try (FileOutputStream fos = new FileOutputStream(file);
BufferedOutputStream bos = new BufferedOutputStream(fos);
DataOutputStream dos = new DataOutputStream(bos)) {
dos.write(text.getBytes(StandardCharsets.UTF_8));
dos.writeInt(50);
} catch (IOException e) {
e.printStackTrace();
}
}
}
zero() method writes data into file: the string is written in UTF-8, while the number is written in binary. zeroRead() read the data from file.
The file looks like this after zero() is executed:
This is what zeroRead() returns:
How do I read the real data König\t50 from the file?
DataInputStream's readLine method has javadoc that is almost yelling that it doesn't want to be used. You should heed this javadoc: That method is bad and you should not use it. It doesn't do charset encoding.
Your file format is impossible as stated: You have no idea when to stop reading the string and start reading the binary numbers. However, the way you've described things, it sounds like the string is terminated by a newline, so, the \n character.
There is no easy 'just make this filter-reader and call .nextLine on it available, as they tend to buffer. You can try this:
InputStreamReader isr = new InputStreamReader(bos, StandardCharsets.UTF_8);
However, basic readers do not have a readLine method, and if you wrap this in a BufferedReader, it may read past the end (the 'buffer' in that name is not just there for kicks). You'd have to handroll a method that fetches one character at a time, appending them to a stringbuilder, ending on a newline:
StringBuilder out = new StringBuilder();
for (int c = isr.read(); c != -1 && c != '\n'; c = isr.read())
out.append((char) c);
String line = out.toString();
will get the job done and won't read 'past' the newline and gobble up your binary number.

How to efficiently read and write to files using minimal RAM

My aim is to read from a large file, process 2 lines at a time, and write the result to a new file(s). These files can get very large, from 1GB to 150GB in size, so I'd like to attempt to do this processing using the least RAM possible
The processing is very simple: The lines split by a tab delimited, certain elements are selected, and the new String is written to the new files.
So far I have attempted using BufferedReader to read the File and PrintWriter to output the lines to a file:
while((line1 = br.readLine()) != null){
if(!line1.startsWith("#")){
line2 = br.readLine();
recordCount++;
one.println(String.format("%s\n%s\n+\n%s",line1.split("\t")[0] + ".1", line1.split("\t")[9], line1.split("\t")[10]));
two.println(String.format("%s\n%s\n+\n%s",line2.split("\t")[0] + ".2", line2.split("\t")[9], line2.split("\t")[10]));
}
}
I have also attempted to uses Java8 Streams to read and write from the file:
stream.forEach(line -> {
if(!line.startsWith("#")) {
try {
if (counter.getAndIncrement() % 2 == 0)
Files.write(path1, String.format("%s\n%s\n+\n%s", line.split("\t")[0] + ".1", line.split("\t")[9], line.split("\t")[10]).getBytes(), StandardOpenOption.APPEND);
else
Files.write(path2, String.format("%s\n%s\n+\n%s", line.split("\t")[0] + ".2", line.split("\t")[9], line.split("\t")[10]).getBytes(), StandardOpenOption.APPEND);
}catch(IOException ioe){
}
}
});
Finally, I have tried to use an InputStream and scanner to read the file and PrintWriter to output the lines:
inputStream = new FileInputStream(inputFile);
sc = new Scanner(inputStream, "UTF-8");
String line1, line2;
PrintWriter one = new PrintWriter(new FileOutputStream(dotOne));
PrintWriter two = new PrintWriter(new FileOutputStream(dotTwo));
while(sc.hasNextLine()){
line1 = sc.nextLine();
if(!line1.startsWith("#")) {
line2 = sc.nextLine();
one.println(String.format("%s\n%s\n+\n%s",line1.split("\t")[0] + ".1", line1.split("\t")[9], line1.split("\t")[10]));
two.println(String.format("%s\n%s\n+\n%s",line2.split("\t")[0] + ".2", line2.split("\t")[9], line2.split("\t")[10]));
}
}
The issue that I'm facing is that the program seems to be storing either the data to write, or the input file data into RAM.
All of the above methods do work, but use more RAM than I'd like them to.
Thanks in advance,
Sam
What you did not try is a MemoryMappedByteBuffer. The FileChannel.map might be usable for your purpose, not allocating in java memory.
Functioning code with a self made byte buffer would be:
try (FileInputStream fis = new FileInputStream(source);
FileChannel fic = fis.getChannel();
FileOutputStream fos = new FileOutputStream(target);
FileChannel foc = fos.getChannel()) {
ByteBuffer buffer = ByteBuffer.allocate(1024);
while (true) {
int nread = fic.read(buffer);
if (nread == -1) {}
break;
}
buffer.flip();
foc.write(buffer);
buffer.clear();
}
}
Using fic.map to consecutively map regions into OS memory seems easy, but
such more complex code I would need to test first.
When creating PrintWriter set autoFlush to true:
new PrintWriter(new FileOutputStream(dotOne), true)
This way the buffered data will be flushed with every println.

How to copy image in java using bufferedreader/writer

File file = new File("download.png");
File newfile = new File("D:\\Java.png");
BufferedReader br=null;
BufferedWriter bw=null;
try {
FileReader fr = new FileReader(file);
FileWriter fw = new FileWriter(newfile);
br = new BufferedReader(fr);
bw = new BufferedWriter(fw);
char[] buf = new char[1024];
int bytesRead;
while ((bytesRead = br.read(buf)) > 0) {
bw.write(buf, 0, bytesRead);
}
bw.flush();
}
catch (Exception e) {
e.printStackTrace();
} finally {
try {
br.close();
bw.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Whats wrong with this code. Is it possible with BufferedReader and Writer Class??
I know how to to make copy of image using InputStream and OutputStream, So don't paste solution using that!!
Whats wrong with this code.
You're using text-based classes for binary data.
Is it possible with BufferedReader and Writer Class?
Not while you're dealing with binary data, no.
I know how to to make copy of image using InputStream and OutputStream, So don't paste solution using that!
That's the solution you should use, because those are the classes designed for binary data.
Fundamentally, using Reader or Writer for non-text data is broken, and asking for trouble. If you open up the file in a text editor and don't see text, it's not a text file... (Alternatively, it could be a text file that you're using the wrong encoding for, but things like images and sound aren't naturally text.)
Use javax.imageio.ImageIO utility class, which has lots of utility method related to images processing.
try{
File imagefile = new File("download.png");
BufferedImage image = ImageIO.read(imagefile);
ImageIO.write(image, "png",new File("D:\\Java.png"));
.....
}

Android Read a Text file Exactly

I have a text file that has been signed and I need to read this file into a string exactly as it is. The code I am currently using:
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
invitationText.append(line);
invitationText.append('\n');
}
invitationText.deleteCharAt(invitationText.length()-1);
Works if the file has no return at the end, but if it did have a return then the signature check would fail. There's a lot of questions around this so I'm having a hard time finding one that specifically answer this, so this may be a duplicate question. Some restrictions I have though are:
It can't use the methods added in Java 7 (I'm on android, I don't have access)
It can't use the org.apache IOUtils method (I can't bring in that library)
Whether it loops or reads the whole thing in one go doesn't matter to me I just need 100% guarantee that regardless of carriage returns in the file, the file will get read in exactly as it is on disk.
Here's what I use:
public static String readResponseFromFile() throws IOException {
File path = "some_path";
File file = new File(path, "/" + "some.file");
path.mkdirs();
String response = null;
if (file != null) {
InputStream os = new FileInputStream(file);
try {
byte[] bytes = new byte[(int) file.length()];
os.read(bytes);
response = new String(bytes);
os.close();
} catch (IOException ioEx) {
throw ioEx;
} finally {
if (os != null) {
os.close();
}
}
}
return response;
}

Java file not written to stream with new line characters

We're streaming a CSV file from a web service. It appears that we're losing the new line characters when streaming - the client gets the file all on a single line. Any idea what we're doing wrong?
Code:
public static void writeFile(OutputStream out, File file) throws IOException {
BufferedReader input = new BufferedReader(new FileReader(file)); //File input stream
String line;
while ((line = input.readLine()) != null) { //Read file
out.write(line.getBytes()); //Write to output stream
out.flush();
}
input.close();
}
Don't use BufferedReader. You already have an OutputStream at hands, so just get an InputStream of the file and pipe the bytes from input to output it the usual Java IO way. This way you also don't need to worry about newlines being eaten by BufferedReader:
public static void writeFile(OutputStream output, File file) throws IOException {
InputStream input = null;
byte[] buffer = new byte[10240]; // 10KB.
try {
input = new FileInputStream(file);
for (int length = 0; (length = input.read(buffer)) > 0;) {
output.write(buffer, 0, length);
}
} finally {
if (input != null) try { input.close(); } catch (IOException logOrIgnore) {}
}
}
Using a Reader/Writer would involve character encoding problems if you don't know/specify the encoding beforehand. You actually also don't need to know about them here. So just leave it aside.
To improve performance a bit more, you can always wrap the InputStream and OutputStream in an BufferedInputStream and BufferedOutputStream respectively.
The readline method uses the newline chars to delimit what gets read, so the newlines themselves are not returned by readLine.
Don't use readline, you can use a BufferedInputStream and read the file one byte at a time if you want, or pass your own buffer into OutputStream.write.
Note that, like BalusC and Michael Borgwardt say, Readers and Writers are for text, if you just want to copy the file you should use InputStream and OutputStream, you are only concerned with bytes.
There are several things wrong with that code. It may also mutilate any NON-ASCII text since it converts via the platform default encoding twice - and for no good reason at all.
Don't use a Reader to read the file, use a FileInputStream and transfer bytes, avoiding the unnecessary and potentially destructive charset conversions. The line break problem will also be gone.
Any idea what we're doing wrong?
Yes. This line drops the "new line character"
while ((line = input.readLine()) != null) {
And then you write it without it:
out.write(line.getBytes());
This this related question.
BufferedReader.ReadLine() does not preserve the newline. Thus you'll have to add it when writing it out
You can use a PrintWriter which offers a prinln() method. This will also save you from converting the string into an array of chars.
public static void writeFile(OutputStream o, File file) throws IOException {
PrintWriter out = new PrintWriter(new OutputStreamWriter(o));
BufferedReader input = new BufferedReader(new FileReader(file)); //File input stream
String line;
while ((line = input.readLine()) != null) { //Read file
out.println(line); //Write to output stream
out.flush();
}
input.close();
}

Categories

Resources