How to efficiently read and write to files using minimal RAM - java

My aim is to read from a large file, process 2 lines at a time, and write the result to a new file(s). These files can get very large, from 1GB to 150GB in size, so I'd like to attempt to do this processing using the least RAM possible
The processing is very simple: The lines split by a tab delimited, certain elements are selected, and the new String is written to the new files.
So far I have attempted using BufferedReader to read the File and PrintWriter to output the lines to a file:
while((line1 = br.readLine()) != null){
if(!line1.startsWith("#")){
line2 = br.readLine();
recordCount++;
one.println(String.format("%s\n%s\n+\n%s",line1.split("\t")[0] + ".1", line1.split("\t")[9], line1.split("\t")[10]));
two.println(String.format("%s\n%s\n+\n%s",line2.split("\t")[0] + ".2", line2.split("\t")[9], line2.split("\t")[10]));
}
}
I have also attempted to uses Java8 Streams to read and write from the file:
stream.forEach(line -> {
if(!line.startsWith("#")) {
try {
if (counter.getAndIncrement() % 2 == 0)
Files.write(path1, String.format("%s\n%s\n+\n%s", line.split("\t")[0] + ".1", line.split("\t")[9], line.split("\t")[10]).getBytes(), StandardOpenOption.APPEND);
else
Files.write(path2, String.format("%s\n%s\n+\n%s", line.split("\t")[0] + ".2", line.split("\t")[9], line.split("\t")[10]).getBytes(), StandardOpenOption.APPEND);
}catch(IOException ioe){
}
}
});
Finally, I have tried to use an InputStream and scanner to read the file and PrintWriter to output the lines:
inputStream = new FileInputStream(inputFile);
sc = new Scanner(inputStream, "UTF-8");
String line1, line2;
PrintWriter one = new PrintWriter(new FileOutputStream(dotOne));
PrintWriter two = new PrintWriter(new FileOutputStream(dotTwo));
while(sc.hasNextLine()){
line1 = sc.nextLine();
if(!line1.startsWith("#")) {
line2 = sc.nextLine();
one.println(String.format("%s\n%s\n+\n%s",line1.split("\t")[0] + ".1", line1.split("\t")[9], line1.split("\t")[10]));
two.println(String.format("%s\n%s\n+\n%s",line2.split("\t")[0] + ".2", line2.split("\t")[9], line2.split("\t")[10]));
}
}
The issue that I'm facing is that the program seems to be storing either the data to write, or the input file data into RAM.
All of the above methods do work, but use more RAM than I'd like them to.
Thanks in advance,
Sam

What you did not try is a MemoryMappedByteBuffer. The FileChannel.map might be usable for your purpose, not allocating in java memory.
Functioning code with a self made byte buffer would be:
try (FileInputStream fis = new FileInputStream(source);
FileChannel fic = fis.getChannel();
FileOutputStream fos = new FileOutputStream(target);
FileChannel foc = fos.getChannel()) {
ByteBuffer buffer = ByteBuffer.allocate(1024);
while (true) {
int nread = fic.read(buffer);
if (nread == -1) {}
break;
}
buffer.flip();
foc.write(buffer);
buffer.clear();
}
}
Using fic.map to consecutively map regions into OS memory seems easy, but
such more complex code I would need to test first.

When creating PrintWriter set autoFlush to true:
new PrintWriter(new FileOutputStream(dotOne), true)
This way the buffered data will be flushed with every println.

Related

Replace Timestamp in Written Text File

I need to read a file and write it in a new file, the file that I read has it own time stamp, but in the written file I need to replace it with the current time eg( 2020-09-20 19:30 change it to 2020-09-30 01:30) I have been able to read and write it using the following code, but I am struggling with the change time stamp part
FileInputStream inputRead = null;
FileOutputStream outWrite = null;
try{
File infile =new File("test.txt");
File outfile =new File("test_log.txt");
inputRead = new FileInputStream(infile);
outWrite = new FileOutputStream(outfile);
byte[] buffer = new byte[2048];
int length;
while ((length = inputRead.read(buffer)) > 0){
System.getProperty("line.separator");
outWrite.write(buffer, 0, length);
}
inputRead.close();
outWrite.close();
}
catch(IOException ioe){
ioe.printStackTrace();
}
}
Sorry forgot to mention the Line Read is as follows " 20-09-2020 19:30 (Some Parameter) " to "(current date) + (Some Parameter)
Try this
File infile = new File("test.txt");
File outfile = new File("test_log.txt");
DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm");
try (
BufferedReader inputRead = new BufferedReader(new InputStreamReader(new FileInputStream(infile)));
BufferedWriter outWrite = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outfile)))
) {
String line;
while ((line = inputRead.readLine()) != null) {
outWrite.write(System.lineSeparator());
String modifiedString = line.replaceAll("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}", dateTimeFormatter.format(LocalDateTime.now()));
outWrite.write(modifiedString);
}
}
Expanding my answer a bit for clarity.
I've added try-with-resources, you don't need to close the streams manually which should otherwise be done in a finally (if it crashes they are left open).
I've added readers which is a higher level api on top of the stream. for convenience.
I've added search and replace on the string using regex that matches your example string. In my example above i'm using strings which could probably be optimized if needed.
String nextTimeStamp = "2020-09-30 01:30";
Path infile = Paths.get("test.txt");
Path outfile = Paths.get("test_log.txt");
try (Stream<String> lines = Files.lines(infile, Charset.defaultCharset());
PrintWriter out = new PrintWriter(outfile.toFile()) {
lines.map(line -> line.replaceAll("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}", nextTimeStamp)
.forEach(line -> out.println(line));
}
(For a fixed Charset add an extra parameter with this Charset to PrintWriter too.
Java String is for text, byte[] for binary data. And on String one can do replace and other text operations.
line.replaceAll uses a regular expression to match. If you have the exact string to be replaced, you can use line.replace.
The syntax of what is called try-with-resources accepts AutoCloseable's inside try (...) and will ensure the objects are closed even on exception or return/break.

Decompress large binary files

I have a function to decompress large zip files using the below method. They are times where I run into OutOfMemoryError error because the file is just too large. Is there a way I can optimize my code? I have read something about breaking the file into smaller parts that can fit into memory and decompress but I don't know how to do that. Any help or suggestion is appreciated.
private static String decompress(String s){
String pathOfFile = null;
try(BufferedReader reader = new BufferedReader(new InputStreamReader(new GZIPInputStream(new FileInputStream(s)), Charset.defaultCharset()))){
File file = new File(s);
FileOutputStream fos = new FileOutputStream(file);
String line;
while((line = reader.readLine()) != null){
fos.write(line.getBytes());
fos.flush();
}
pathOfFile = file.getAbsolutePath();
} catch (IOException e) {
e.printStackTrace();
}
return pathOfFile;
}
The stacktrace:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
at java.base/java.util.ArrayList.grow(ArrayList.java:237)
at java.base/java.util.ArrayList.ensureCapacity(ArrayList.java:217)
Don't use Reader classes because you don't need to write output file character by character or line by line. You should read and write byte by byte with InputStream.transferTo() method:
try(var in = new GZIPInputStream(new FileInputStream(inFile));
var out = new FileOutputStream(outFile)) {
in.transferTo(out);
}
Also you probably don't need to call flush() explicitly, doing it after every line is wasteful.

How to deal with read/write function for others file type in Java

I am doing a program which will receiving any file from my remote server, these files can be .doc, .pdf and some others file type. I will read the content in those files and write it into another new files with same file extension. But when i receive a .doc file from remote server and i try read the file and write into another file, it's show me something like this #²Ó\ç¨ Þ¢·S \Ò Þ¢·S \Ò PK £ JT in my test.doc. i have no idea on this issues, i try PrintStream,BufferedWriter or PrintWriter but unfortunately it's wouldn't help anything This is my source code for read/write the file
try
{
InputStream is1 = con1.getInputStream();
BufferedReader read1 = new BufferedReader (new InputStreamReader(is1));
String data1 = "" ;
while ((data1 = read1.readLine()) != null)
{
PrintWriter pw = new PrintWriter("test.doc","UTF-8");
pw.write(data1);
pw.close();
}
System.out.println("done");
}
catch(IOException e)
{
e.printStackTrace();
}
May i know what is the best way to do the read/write if we having difference file type ?
These file types have binary data and should not be read as characters. (Also, note that you are creating a new PrintWriter every time through the while loop. This will never work.) Just deal with the binary data directly. Something like this (untested) might work:
InputStream is1 = con1.getInputStream();
BufferedInputStream bis = new BufferedInputStream(is1);
byte[] buffer = new byte[2048]; // or whatever size you want
int n;
OutputStream os = new FileOutputStream("test.doc");
while ((n = bis.read(buffer)) >= 0) {
os.write(buffer, 0, n);
}
os.close();
bis.close();
Also, you should be using a try with resources statement.
Other approach (more concise, more expressive):
Files.copy(is1, Paths.get("test.doc"), StandardCopyOption.REPLACE_EXISTING);

how can I delete line from txt?

I mean , I want to delete line from my text on android. How can I delete?
I do not want to read one txt and create another with removing line. I want to delete line from my existing txt.
thanks.
This is a pretty tricky problem, despite it looking a trivial one. In case of variable lines length, maybe your only option is reading the file line by line to indentify offset and length of the target line. Then copying the following portion of the file starting at offset, eventually truncating the file lenght to its original size minus the the target line's length. I use a RandomAccessFile to access the internal pointer and also read by lines.
This program requires two command line arguments:
args[0] is the filename
args[1] is the target line number (1-based: first line is #1)
public class RemoveLine {
public static void main(String[] args) throws IOException {
// Use a random access file
RandomAccessFile file = new RandomAccessFile(args[0], "rw");
int counter = 0, target = Integer.parseInt(args[1]);
long offset = 0, length = 0;
while (file.readLine() != null) {
counter++;
if (counter == target)
break; // Found target line's offset
offset = file.getFilePointer();
}
length = file.getFilePointer() - offset;
if (target > counter) {
file.close();
throw new IOException("No such line!");
}
byte[] buffer = new byte[4096];
int read = -1; // will store byte reads from file.read()
while ((read = file.read(buffer)) > -1){
file.seek(file.getFilePointer() - read - length);
file.write(buffer, 0, read);
file.seek(file.getFilePointer() + length);
}
file.setLength(file.length() - length); //truncate by length
file.close();
}
}
Here is the full code, including a JUnit test case. The advantage of using this solution is that it should be fully scalable with respect to memory, ie since it uses a fixed buffer, its memory requirements are predictable and don't change according to the input file size.
Try storing file into a String buffer replace what you intend to replace, then replace the contents of the file entirely.
You can delete a line by copying the rest of the data in the file, then flushing the file and finally writing the copied data.Thr following code searches for the string to be deleted and skips the code to copy in a stringBuider . The contents of stringBuilder are copied to the same file after flushing
try {
InputStream inputStream = openFileInput(FILENAME);
FileOutputStream fos = openFileOutput("temp", Context.MODE_APPEND);
if (inputStream != null) {
InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
String receiveString = "";
String deleteString = "<string you want to del>";
StringBuilder stringBuilder = new StringBuilder();
while ((receiveString = bufferedReader.readLine()) != null) {
if (!reciveString.equals(deleteline)) {
stringBuilder.append(receiveString);
}
}
fos.flush();
fos = openFileOutput(FILENAME, Context.MODE_PRIVATE);
fos.write(stringBuilder.toString().getBytes());
fos.close();
inputStream.close();
}

how to make readLine() of Filereader to go to previous location

I am creating one text file which will connected to some server.
this text file will receive its contents from the server.
It will receive the some text data continuously.
To limit the file size , I am checking no.of lines in the file and if exceeds the mark I am clearing file content. Server will write from the beginning.
Below is the code I have used to do this :
LineNumberReader myReader = new LineNumberReader( new FileReader(new File("mnt/sdcard/abc.txt")));
while(true) {
while(myReader.readLine() != null) {
counter ++;
}
if(counter > 100 ) {
PrintWriter writer = new PrintWriter("/mnt/sdcard/abc.txt");
writer.print("");
writer.close();
writer = null;
counter = 0;
}
}
But after I clear the contents in a file my "counter" not increasing.
But my file is having some data.
I think after reading done I have set my "myReader" to some intial..?
If its how to set that to initial so that .readLine() should start from begining.
Shouldn't you close myReader before writing to the file??
LineNumberReader myReader = new LineNumberReader( new FileReader(new File("mnt/sdcard/abc.txt")));
while(true)
{
while(myReader.readLine() != null) {
counter++;
}
if(counter > 100 )
{
//CLOSE myReader
myReader.close();
PrintWriter writer = new PrintWriter("/mnt/sdcard/abc.txt");
writer.print("");
writer.close();
writer = null;
counter = 0;
//REOPEN myReader
myReader = new LineNumberReader( new FileReader(new File("mnt/sdcard/abc.txt")));
}
}
Shouldn't you make sure that changes to the file done by the server and changes to the file done by this loop are synchronized??
can you show how and where counter is declared and what other code might be modifying it? it is a matter of guessing without seeing that. meanwhile, maybe you can consider not reading the file content all the time and use the file size to determine if you should clean it.
long limit= .... //add your limit in bytes
long fileSize = new File("mnt/sdcard/abc.txt").length();
if (fileSize > limit){
//clean the file
}
please also note to check what was mentioned in the other answers regarding closing the file or trying to clean it while it is open and the server is writing to it.
Issue a myReader.reset() after clearing the contents.

Categories

Resources