Merge huge files without loading whole file into memory? - java

I want to merge huge files containing strings into one file and tried to use nio2. I do not want to load the whole file into memory, so I tried it with BufferedReader:
public void mergeFiles(filesToBeMerged) throws IOException{
Path mergedFile = Paths.get("mergedFile");
Files.createFile(mergedFile);
List<Path> _filesToBeMerged = filesToBeMerged;
try (BufferedWriter writer = Files.newBufferedWriter(mergedFile,StandardOpenOption.APPEND)) {
for (Path file : _filesToBeMerged) {
// this does not work as write()-method does not accept a BufferedReader
writer.append(Files.newBufferedReader(file));
}
} catch (IOException e) {
System.err.println(e);
}
}
I tried it with this, this works, hower, the format of the strings (e.g. new lines, etc is not copied to the merged file):
...
try (BufferedWriter writer = Files.newBufferedWriter(mergedFile,StandardOpenOption.APPEND)) {
for (Path file : _filesToBeMerged) {
// writer.write(Files.newBufferedReader(file));
String line = null;
BufferedReader reader = Files.newBufferedReader(file);
while ((line = reader.readLine()) != null) {
writer.append(line);
writer.append(System.lineSeparator());
}
reader.close();
}
} catch (IOException e) {
System.err.println(e);
}
...
How can I merge huge Files with NIO2 without loading the whole file into memory?

If you want to merge two or more files efficiently you should ask yourself, why on earth are you using char based Reader and Writer to perform that task.
By using these classes you are performing a conversion of the file’s bytes to characters from the system’s default encoding to unicode and back from unicode to the system’s default encoding. This means the program has to perform two data conversion on the entire files.
And, by the way, BufferedReader and BufferedWriter are by no means NIO2 artifacts. These classes exists since the very first version of Java.
When you are using byte-wise copying via real NIO functions, the files can be transferred without being touched by the Java application, in the best case the transfer will be performed directly in the file system’s buffer:
import static java.nio.file.StandardOpenOption.*;
import java.io.IOException;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.Paths;
public class MergeFiles
{
public static void main(String[] arg) throws IOException {
if(arg.length<2) {
System.err.println("Syntax: infiles... outfile");
System.exit(1);
}
Path outFile=Paths.get(arg[arg.length-1]);
System.out.println("TO "+outFile);
try(FileChannel out=FileChannel.open(outFile, CREATE, WRITE)) {
for(int ix=0, n=arg.length-1; ix<n; ix++) {
Path inFile=Paths.get(arg[ix]);
System.out.println(inFile+"...");
try(FileChannel in=FileChannel.open(inFile, READ)) {
for(long p=0, l=in.size(); p<l; )
p+=in.transferTo(p, l-p, out);
}
}
}
System.out.println("DONE.");
}
}

With
Files.newBufferedReader(file).readLine()
you create a new Buffer everytime and it gets always reset in the first line.
Replace with
BufferedReader reader = Files.newBufferedReader(file);
while ((line = reader.readLine()) != null) {
writer.write(line);
}
and .close() the reader when done.

readLine() does not yield the line ending ("\n" or "\r\n"). That was the error.
while ((line = reader.readLine()) != null) {
writer.write(line);
writer.write("\r\n"); // Windows
}
You might also disregard this filtering of (possibly different) line endings, and use
try (OutputStream out = new FileOutputStream(file);
for (Path source : filesToBeMerged) {
Files.copy(path, out);
out.write("\r\n".getBytes(StandardCharsets.US_ASCII));
}
}
This writes a newline explicitly, in the case that the last line does not end with a line break.
There might still be a problem with the optional, ugly Unicode BOM character to mark the text as UTF-8/UTF-16LE/UTF-16BE at the beginning of the file.

Related

Deleting and Renaming file java

I am trying to delete and rename a file however the delete() and rename() function does not work. I can't seem to find the bug in the code as it should run properly by logic (i think). Can anyone tell me why it can't delete a fill. this code works except deleting old txt and renaming temp.txt to old file.
public Boolean deleteItem(String item){
try{
// creating and opening file
File f = new File("temp.txt");
f.delete(); // to delete existing data inside file;
File old = new File(file);
FileWriter writer = new FileWriter(new File("temp.txt"), true);
FileReader fr = new FileReader(old);
BufferedReader reader = new BufferedReader(fr);
String s;
// creating temporary item object
String[] strArr;
//searching for data inside the file
while ((s = reader.readLine()) != null){
strArr = s.split("\\'");
if (!strArr[0].equals(item)){
writer.append(s + System.getProperty("line.separator"));
}
}
//rename old file to file.txt
old.delete();
boolean successful = f.renameTo(new File(file));
writer.flush();
writer.close();
fr.close();
reader.close();
return successful;
}
catch(Exception e){ e.printStackTrace();}
return false;
}
The logic seems a little tangled. Here's what I think it looks like.
You delete file.txt
You create a new file.txt and copy 'file' into it
You delete 'file'
You rename file.txt to 'file'
You close input and output files
My guess would be that your operating system (unspecified) is preventing deletes and renames of open files. Move the closing to before the delete/rename. And check the return from those functions.
Aside: as a minor improvement to readability, you don't need to keep calling 'new File(xxx)' with the same xxx. A File is just a representation of the name of the file. Do it once. And 'File tempFile = new File("file.txt")' would be easier to follow than calling it 'f'.
Don't use the old java.io.File. It is notorious for its lax error handling and useless error messages. Use the "newer" NIO.2 java.nio.file.Path and the java.nio.file.Files methods that were added in Java 7.
E.g. the file.delete() method returns false if the file was not deleted. No exception is thrown, so you'll never know why, and since you don't even check the return value, you don't know that it didn't delete the file either.
The file isn't deleted because you still have it open. Close the files before attempting to delete+rename, and do it using try-with-resources, also added in Java 7.
Your code should be the following, through capturing exceptions and turning them into a boolean return value is error-prone (see issue with file.delete()).
public boolean deleteItem(String item){
try {
// creating and opening file
Path tempFile = Paths.get("temp.txt");
Files.deleteIfExists(tempFile); // Throws exception if delete failed
Path oldFile = Paths.get(file);
try ( BufferedWriter writer = Files.newBufferedWriter(tempFile);
BufferedReader reader = Files.newBufferedReader(oldFile); ) {
//searching for data inside the file
for (String line; (line = reader.readLine()) != null; ) {
String[] strArr = line.split("\\'");
if (! strArr[0].equals(item)){
writer.append(line + System.lineSeparator());
}
}
} // Files are flushed and closed here
// replace file with temp file
Files.delete(oldFile); // Throws exception if delete failed
Files.move(tempFile, oldFile); // Throws exception if rename failed
return true;
} catch (Exception e) {
e.printStackTrace();
return false;
}
}
Basically you want to remove selected lines from a text file.
The following code uses the stream API1. It filters out all the unwanted lines and writes the lines that you do want to a temporary file. Then it renames the temporary file to your original file, thus effectively removing the unwanted lines from the original file. Note that I am assuming that your "global" variable file is a string.
/* Following imports required.
import java.io.File
import java.io.IOException;
import java.io.PrintWriter;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
*/
Path path = Paths.get(file);
final PrintWriter[] pws = new PrintWriter[1];
try {
File tempFile = File.createTempFile("temp", ".txt");
tempFile.deleteOnExit();
pws[0] = new PrintWriter(tempFile);
Files.lines(path)
.filter(l -> !item.equals(l.split("'")[0]))
.forEach(l -> pws[0].println(l));
pws[0].flush();
pws[0].close();
Files.move(tempFile.toPath(), path, StandardCopyOption.REPLACE_EXISTING);
}
catch (IOException xIo) {
xIo.printStackTrace();
}
finally {
if (pws[0] != null) {
pws[0].close();
}
}
1 Stream API was introduced in Java 8

How to create and write a .Txt file in Java? [duplicate]

This question already has an answer here:
Java PrintWriter not working
(1 answer)
Closed 6 years ago.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class FileGenerator {
/**
* #param args
*/
public static void main(String[] args) {
File outputFile;
BufferedReader reader;
FileWriter fileWriter;
try {
outputFile = new File("test.txt");
outputFile.createNewFile();
fileWriter = new FileWriter(outputFile, false);
reader = new BufferedReader(new FileReader("template.txt"));
StringBuilder sb = new StringBuilder();
String line = reader.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = reader.readLine();
}
String everything = sb.toString();
fileWriter.write(everything);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
}
}
}
The fileWriter creates test.txt but the string inside of test.txt is empty. i want it doesnt happen empty. by the way you may say "String everything" can be empty. But it isnt. When i try without reader txt i mean "String everything = "some text", it happens same. it happens empty
The file is empty because the contents of everything are smaller than the operating systems and / or Java's I/O buffers, and the program ends without properly closing the file.
When you write something to a file, and you need to ensure that it is written without closing the file already, call flush().
Whenever you open an I/O resource, close it using close() after use. close() implies flushing the buffers.
Java 7 provides try-with-resources for that, like this:
try (FileWriter writer = new FileWriter("foo.txt")) {
writer.write("Hello, world!\n");
writer.flush();
// do more stuff with writer
} // <- closes writer implicitly as part of the try-with-resources feature
As suggested in the comments, you need to do fileWriter.close() in order to close the output stream. If it is a buffered writer, then closing it not necessary as explained here.
Is it necessary to close a FileWriter, provided it is written through a BufferedWriter?

Java fileinput (read and write) to a file looping thru each line

I have this method that access a exisitng file, loop thru each line and replace (string to string) a certain line if the condition is met:
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.InputStreamReader;
private void UpdateConfig() {
try {
FileInputStream fstream = new FileInputStream("c:\\user\\config.properties");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
while ((strLine = br.readLine()) != null) {
if (strLine.contains("FTPDate=2014/07/01 00:59:00")) {
System.out.println("FILE " + strLine);
strLine.replace("FTPDate=2014/07/01 00:59:00", "FTPDate=2014/09/10 00:00:00");
//strLine.replace("((19|20)\\d\\d/(0?[1-9]|1[012])/(0?[1-9]|[12][0-9]|3[01])) ([2][0-3]|[0-1][0-9]|[1-9]):[0-5][0-9]:([0-5][0-9]|[6][0])", "2014/09/10 00:00:00");
System.out.println("FILE " + strLine);
}
}
in.close();
} catch (Exception e) {
}
}
In the sysout it seems its being replaced:
FILE FTPDateTejas=2014/07/01 00:59:00
FILE FTPDateTejas=2014/09/10 00:00:00
But when I check the file, the date still stays the same. Am I missing something? anyone knows what I missed out? thank you
When you are doing:
strLine = br.readLine() it loads the next line from the BufferedReader into memory. This means that you have your data on disk and in memory and that those two are not linked to each other in any way. When doing modifications on strLine I believe you have in your code:
strLine = strLine.replace("FTPDate=2014/07/01 00:59:00", "FTPDate=2014/09/10 00:00:00");
As replace doesn't modify the contents of the objects on which it is being called but returns a new String objects (Strings are immutable). So what that does it creates a new object but does not modify your on disk data (as I said, it's not linked to it any more!).
You could think "ok then how do I link those two and override the file in place?". Well Java does provide random file access as described in the doc but the only thing you can do with it is modify characters at a certain position, you cannot insert things in the middle. So what you would have to do is read the rest of your file, make your modification and then append that rest of the file, yes you need to shift things in case your new string with which you are substituting would be shorter/longer than what you are replacing.
That's why an easier solution would be to:
open a new file to write to
write line by line to it (the strings after the replace)
delete the old file and rename the new file
Without copying the file the code would look something like this:
private void UpdateConfig() {
File fstream = new File("c:\\user\\config.properties");
File file = new File("c:\\user\\config.properties-new");
try {
file.createNewFile();
} catch (IOException e) {
// handle
}
try (FileReader in = new FileReader(fstream);
FileWriter fw = new FileWriter(file.getAbsoluteFile())) {
try (BufferedReader br = new BufferedReader(in);
BufferedWriter bw = new BufferedWriter(fw)) {
String strLine;
while ((strLine = br.readLine()) != null) {
if (strLine.contains("FTPDate=2014/07/01 00:59:00")) {
System.out.println("FILE " + strLine);
strLine = strLine.replace("FTPDate=2014/07/01 00:59:00",
"FTPDate=2014/09/10 00:00:00");
//strLine.replace("((19|20)\\d\\d/(0?[1-9]|1[012])/(0?[1-9]|[12][0-9]|3[01])) ([2][0-3]|[0-1][0-9]|[1-9]):[0-5][0-9]:([0-5][0-9]|[6][0])", "2014/09/10 00:00:00");
bw.write(strLine);
System.out.println("FILE " + strLine);
}
}
}
// copy files here
} catch (IOException e) {
// handle
}
}
There might be some logical/syntactic problems as I was writing in in a plain text editor. I modified the code a bit to use Java 7's try-with-resources, which is a cleaner way of closing resources than what you were doing - in your code when an exception would be thrown the stream might not had been closed.

Reading in text file in Java

I wrote some code to read in a text file and to return an array with each line stored in an element. I can't for the life of me work out why this isn't working...can anyone have a quick look? The output from the System.out.println(line); is null so I'm guessing there's a problem reading the line in, but I can't see why. Btw, the file i'm passing to it definitely has something in it!
public InOutSys(String filename) {
try {
file = new File(filename);
br = new BufferedReader(new FileReader(file));
bw = new BufferedWriter(new FileWriter(file));
} catch (Exception e) {
e.printStackTrace();
}
}
public String[] readFile() {
ArrayList<String> dataList = new ArrayList<String>(); // use ArrayList because it can expand automatically
try {
String line;
// Read in lines of the document until you read a null line
do {
line = br.readLine();
System.out.println(line);
dataList.add(line);
} while (line != null && !line.isEmpty());
br.close();
} catch (Exception e) {
e.printStackTrace();
}
// Convert the ArrayList into an Array
String[] dataArr = new String[dataList.size()];
dataArr = dataList.toArray(dataArr);
// Test
for (String s : dataArr)
System.out.println(s);
return dataArr; // Returns an array containing the separate lines of the
// file
}
First, you open a FileWriter once after opening a FileReader using new FileWriter(file), which open a file in create mode. So it will be an empty file after you run your program.
Second, is there an empty line in your file? if so, !line.isEmpty() will terminate your do-while-loop.
You're using a FileWriter to the file you're reading, so the FileWriter clears the content of the file. Don't read and write to the same file concurrently.
Also:
don't assume a file contains a line. You shouldn't use a do/while loop, but rather a while loop;
always close steams, readers and writers in a finally block;
catch(Exception) is a bad practice. Only catch the exceptions you want, and can handle. Else, let them go up the stack.
I'm not sure if you're looking for a way to improve your provided code or just for a solution for "Reading in text file in Java" as the title said, but if you're looking for a solution I'd recommend using apache commons io to do it for you. The readLines method from FileUtils will do exactly what you want.
If you're looking to learn from a good example, FileUtils is open source, so you can take a look at how they chose to implement it by looking at the source.
There are several possible causes for your problem:
The file path is incorrect
You shouldn't try to read/write the same file at the same time
It's not such a good idea to initialize the buffers in the constructor, think of it - some method might close the buffer making it invalid for subsequent calls of that or other methods
The loop condition is incorrect
Better try this approach for reading:
try {
String line = null;
BufferedReader br = new BufferedReader(new FileReader(file));
while ((line = br.readLine()) != null) {
System.out.println(line);
dataList.add(line);
}
} finally {
if (br != null)
br.close();
}

Java - Reading input from a file. java.io.FilterInputStream.available(Unknown Source)?

I haven't written any Java in years and I went back to refresh my memory with a simple 'read-from-file' example. Here is my code..
import java.io.*;
public class filereading {
public static void main(String[] args) {
File file = new File("C:\\file.txt");
FileInputStream fs = null;
BufferedInputStream bs = null;
DataInputStream ds = null;
try
{
fs = new FileInputStream(file);
bs = new BufferedInputStream(bs);
ds = new DataInputStream(ds);
while(ds.available()!= 0)
{
String readLine = ds.readLine();
System.out.println(readLine);
}
ds.close();
bs.close();
fs.close();
}
catch(FileNotFoundException e)
{
e.printStackTrace();
}
catch(IOException e)
{
e.printStackTrace();
}
}
}
This compiles fine (although apparently ds.readLine() is deprected), but at runtime, this gives me
Exception in thread "main"
java.lang.NullPointerException at
java.io.FilterInputStream.available(Unknown
Source) at
filereading.main(filereading.java:21)
What gives?
You made a simple typo:
ds = new DataInputStream(ds);
should be
ds = new DataInputStream(bs);
Your code is initializing the DataInputStream with a null source, since ds hasn't been created yet.
Having said that, Jon Skeet's answer gives a better way to write a file-reading program (and you should always use Readers/Writers rather than Streams when dealing with text).
To read a text file, use BufferedReader - in this case, wrapped round an InputStreamReader, wrapped round a FileInputStream. (This allows you to set the encoding explicitly - which you should definitely do.) You should also close resources in finally blocks, of course.
You should then read lines until readLine() returns null, rather than relying on available() IMO. I suspect you'll find that readLine() was returning null for the last line in the file, even though available() returned 2 to indicate the final \r\n. Just a hunch though.
String line;
while ((line = reader.readLine()) != null)
{
System.out.println(line);
}

Categories

Resources