union of two files with array

union of two files with array - java

I am beginner of Java.
I am trying to read two files and then get the union of them. I should use an array with size 100. (only one array allowed)
First, I read all records from file1, and write them to the output, file3. For that purpose, I read 100 records at a time, and write them to file3 using iteration.
After that, like file1, this time I read second file as 100 records at a time, and write them to the array, memory[]. Then I find the common records, if the record which I read from file2 is not in file1, I write it to the output file. I do this until reader2.readLine() gets null and I re-open file1 in each iteration.
This is what I have done so far, almost done, but it gives NullPointerException. Any help would be appreciated.
Edit: ok, now it doesn't give any exception, but it doesn't find the different records and can't write them. I guess the last for loop and booleans don't work , why? please help...
import java.io.*;
public class FileUnion
{
private static long startTime, endTime;
public static void main(String[] args) throws IOException
{
System.out.println("PROCESSING...");
reset();
startTimer();
String[] memory = new String[100];
int memorySize = memory.length;
File file1 = new File("stdlist1.txt");
BufferedReader reader1 = new BufferedReader(new FileReader(file1));
File file3 = new File("union.txt");
BufferedWriter writer = new BufferedWriter(new FileWriter(file3));
int numberOfLinesFile1 = 0;
String line1 = null;
String line11 = null;
while((line1 = reader1.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line1;
i++;
if(i < memorySize)
{
line1 = reader1.readLine();
}
}
for (int i = 0; i < memorySize; i++)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
reader1.close();
File file2 = new File("stdlist2.txt");
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String line2 = null;
while((line2 = reader2.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
line2 = reader2.readLine();
}
}
for (int k = 0; k < memorySize; k++ )
{
boolean found = false;
File f1 = new File("stdlist1.txt");
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for (int m = 0; m < numberOfLinesFile1; m++)
{
line11 = buff1.readLine();
if (line11.equals(memory[k]) && found == false);
{
found = true;
}
}
buff1.close();
if (found == false)
{
writer.write(memory[k]);
writer.newLine();
}
}
}
reader2.close();
writer.close();
endTimer();
long time = duration();
System.out.println("PROCESS COMPLETED SUCCESSFULLY");
System.out.println("Duration: " + time + " ms");
}
public static void startTimer()
{
startTime = System.currentTimeMillis();
}
public static void endTimer()
{
endTime = System.currentTimeMillis();
}
public static long duration()
{
return endTime - startTime;
}
public static void reset()
{
startTime = 0;
endTime = 0;
}
}

memory[k] is null. Why is this null? Because in this code:
while((line2 = reader2.readLine()) != null)
{
for (int i = 0; i < 100; i++)
{
memory[i] = line1;
i++;
if(i < 100)
{
line2 = reader2.readLine();
}
}
you say memory[i] = line1;
line1 however is always null because you used it before in a loop which ended when line1 is null.
I believe you intended to write **memory[i] = line2;** in the above code :)

You have to check that you've not yet reached the end of the file. In all loops where you have a lineX = readerX.readLine(), immediately check whether lineX == null and break out of the loop if it is.
Edit my own answer because code doesn't show well in comments.
while(!line11.equals(memory[k]))
{
line11 = buff1.readLine();
}
It's line11 that is (sometimes) null here. If memory[k] is not in file1, what happens?

Related

Accessing indexes that may not exist Java [duplicate]

This question already has answers here:
What causes a java.lang.ArrayIndexOutOfBoundsException and how do I prevent it?
(26 answers)
Closed 5 years ago.
When I am referencing lines as stringArray[i+2] (I mean, there was a problem with [i+1] as well), I get the ArrayIndexOutOfBoundsException. is there any way that I can safely reference those lines without the possibility of attempting to call an index that does not exist, without fundamentally changing my code?
import java.io.*;
import java.util.Scanner;
public class Test {
public static void main(String [] args) {
/** Gets input from text file **/
//defines file name for use
String fileName = "temp.txt";
//try-catches for file location
Scanner fullIn = null;
try {
fullIn = new Scanner(new FileReader(fileName));
} catch (FileNotFoundException e) {
System.out.println("File Error : ");
}
Scanner in = null;
try {
in = new Scanner(new FileReader(fileName));
} catch (FileNotFoundException e) {
System.out.println("Error: File " + fileName + " has not been found. Try adjusting the file address or moving the file to the correct location." );
e.printStackTrace();
}
//finds the amount of blocks in the file
int blockCount = 0;
for (;in.hasNext() == true;in.next()) {
blockCount++;
}
//adding "" to every value of stringArray for each block in the file; created template for populating
String[] stringArray = new String[blockCount];
for (int x = 0; x == blockCount;x++) {
stringArray[x] = "";
}
//we are done with first scanner
in.close();
//populating array with individual blocks
for(int x = 0; x < blockCount; x++) {
stringArray[x]=fullIn.next();
}
//we are done with second scanner
fullIn.close();
//for later
Scanner reader;
boolean isLast;
for (int i = 0; i < stringArray.length; i++) {
isLast = true;
String currWord = stringArray[i].trim();
int nextNew = i+1;
String nextWord = stringArray[nextNew].trim();
String thirdWord = stringArray[nextNew+1].trim();
String fourthWord = stringArray[nextNew+2].trim();
if (stringArray.length != i) {
isLast = false;
}
String quotes = "\"";
if (isLast == false) {
if (currWord.equalsIgnoreCase("say") && nextWord.startsWith(quotes) && nextWord.endsWith(quotes)) {
System.out.println(nextWord.substring(1, nextWord.length()-1));
}
if (currWord.equalsIgnoreCase("say") && isFileThere.isFileThere(nextWord) == true){
System.out.println(VariableAccess.accessIntVariable(nextWord));
}
if (currWord.equalsIgnoreCase("lnsay") && nextWord.startsWith(quotes) && nextWord.endsWith(quotes)){
System.out.print(nextWord.substring(1, nextWord.length()-1) + " ");
}
if (currWord.equalsIgnoreCase("get")) {
reader = new Scanner(System.in); // Reading from System.ins
Variable.createIntVariable(nextWord, reader.nextInt()); // Scans the next token of the input as an int
//once finished
reader.close();
}
if (currWord.equalsIgnoreCase("int") && thirdWord.equalsIgnoreCase("=")) {
String tempName = nextWord;
try {
int tempVal = Integer.parseInt(fourthWord);
Variable.createIntVariable(tempName, tempVal);
} catch (NumberFormatException e) {
System.out.println("Integer creation error");
}
}
}
}
}
}

The problem is that you are looping over the entire stringArray. When you get to the last elements of the stringArray and this
String nextWord = stringArray[nextNew].trim();
String thirdWord = stringArray[nextNew+1].trim();
String fourthWord = stringArray[nextNew+2].trim();
executes, stringArray[nextNew + 2] will not exist because you are at the end of the array.
Consider shortening your loop like so
for (int i = 0; i < stringArray.length - 3; i++) {

Since you are already checking for last word, all you have to is move these 4 lines of code:
int nextNew = i+1;
String nextWord = stringArray[nextNew].trim();
String thirdWord = stringArray[nextNew+1].trim();
String fourthWord = stringArray[nextNew+2].trim();
in your:
if (isLast == false) {
That should solve your problem. Also you should check for length - 1 and not length to check the last word.
for (int i = 0; i < stringArray.length; i++) {
isLast = true;
String currWord = stringArray[i].trim();
if (stringArray.length-1 != i) {
isLast = false;
}
String quotes = "\"";
if (isLast == false) {
int nextNew = i+1;
String nextWord = stringArray[nextNew].trim();
String thirdWord = stringArray[nextNew+1].trim();
String fourthWord = stringArray[nextNew+2].trim();
// rest of the code

Java - Comparing two huge text files

I am trying to develop a basic java program to compare two huge text files and print non matching records .i.e. similar to minus function in SQL. but I am not getting the expected results because all the records are getting printed even though both files are same. Also suggest me whether this approach is performance efficient for comparing two huge text files.
import java.io.*;
public class CompareTwoFiles {
static int count1 = 0 ;
static int count2 = 0 ;
static String arrayLines1[] = new String[countLines("\\Files_Comparison\\File1.txt")];
static String arrayLines2[] = new String[countLines("\\Files_Comparison\\File2.txt")];
public static void main(String args[]){
findDifference("\\Files_Comparison\\File1.txt","\\Files_Comparison\\File2.txt");
displayRecords();
}
public static int countLines(String File){
int lineCount = 0;
try {
BufferedReader br = new BufferedReader(new FileReader(File));
while ((br.readLine()) != null) {
lineCount++;
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return lineCount;
}
public static void findDifference(String File1, String File2){
String contents1 = null;
String contents2 = null;
try
{
FileReader file1 = new FileReader(File1);
FileReader file2 = new FileReader(File2);
BufferedReader buf1 = new BufferedReader(file1);
BufferedReader buf2 = new BufferedReader(file2);
while ((contents1 = buf1.readLine()) != null)
{
arrayLines1[count1] = contents1 ;
count1++;
}
while ((contents2 = buf2.readLine()) != null)
{
arrayLines2[count2] = contents2 ;
count2++;
}
}catch (Exception e){
e.printStackTrace();
}
}
public static void displayRecords() {
for (int i = 0 ; i < arrayLines1.length ; i++) {
String a = arrayLines1[i];
for (int j = 0; j < arrayLines2.length; j++){
String b = arrayLines2[j];
boolean result = a.contains(b);
if(result == false){
System.out.println(a);
}
}
}
}
}

Based upon your explanation you do not need embedded loops
consider
public static void displayRecords() {
for (int i = 0 ; i < arrayLines1.length && i < arrayLines2.length; i++)
{
String a = arrayLines1[i];
String b = arrayLines2[i];
if(!a.contains(b){
System.out.println(a);
}
}

For the performance wise, you should try to match the size of the files. If the sizes(in bytes) are exactly the same, you might not need to compare them.

Why is my mode program not printing anything?

Sorry for click bait title, but it is my problem, and I can't really change to wording without losing the question.
I have the following code which is meant to select a file, read it, and find it's mode, and I think I got it done, but I have one issue
public class ModeFinder
{
public static int countDoubles(File file) throws FileNotFoundException
{
Scanner reader = new Scanner(file);
int count = 0;
while (reader.hasNextDouble())
{
count++;
reader.nextDouble();
}
reader.close();
return count;
}
public static void main(String args[]) throws FileNotFoundException
{
String filename;
FileDialog filePicker = new FileDialog(new JFrame());
filePicker.setVisible(true);
filename = filePicker.getFile();
String folderName = filePicker.getDirectory();
filename = folderName + filename;
System.out.println("filename = " +filename);
File inputFile = new File(filename);
Scanner fileReader = new Scanner (inputFile);
int maxValue = 0,
maxCount = 0;
int[] a = new int[countDoubles(inputFile)];
while (fileReader.hasNextInt())
{
for (int i = 0; i < a.length; i++)
{
int count = 0;
for (int j = 0; j < a.length; j++)
{
if (a[j] == a[i])
count++;
}
if (count > maxCount)
{
maxCount = count;
maxValue = a[i];
}
}
}
System.out.println("The most common grade is: " +maxValue);
}
}
The last bit with the most common grade doesn't even print and I don't know why.

You aren't calling nextInt to get the value from the file so your while loop is going to loop forever. You need something like:
while (fileReader.hasNextInt())
{
int value = fileReader.nextInt();
...

reading from the file and writing to the file in java

I am beginner with Java.
This is my approach:
I am trying to read two files and then get the union of them. I should am using an array with size 100. (just one array allowed, reading and writing line by line or arrayList or other structures are not allowed.)
First, I read all records from file1, and write them to the output, a third file. For that purpose, I read 100 record at a time, and write them to the third file using iteration.
After that, like first file, this time I read second file as 100 records at a time, and write them to the memory[]. Then I find the common records, if the record which I read from File2 is not in File1, I write it to the output file. I do this until reader2.readLine() gets null and I re-open file1 in each iteration.
This is what I have done so far, almost done. Any help would be appreciated.
Edit: ok, now it doesn't give any exception, but it can't find the different records and can't write them. I guess the last for loop and booleans don't work , why? I really need help. Thanks for your patience.
import java.io.*;
public class FileUnion
{
private static long startTime, endTime;
public static void main(String[] args) throws IOException
{
System.out.println("PROCESSING...");
reset();
startTimer();
String[] memory = new String[100];
int memorySize = memory.length;
File file1 = new File("stdlist1.txt");
BufferedReader reader1 = new BufferedReader(new FileReader(file1));
File file3 = new File("union.txt");
BufferedWriter writer = new BufferedWriter(new FileWriter(file3));
int numberOfLinesFile1 = 0;
String line1 = null;
String line11 = null;
while((line1 = reader1.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line1;
i++;
if(i < memorySize)
{
line1 = reader1.readLine();
}
}
for (int i = 0; i < memorySize; i++)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
reader1.close();
File file2 = new File("stdlist2.txt");
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String line2 = null;
while((line2 = reader2.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
line2 = reader2.readLine();
}
}
for (int k = 0; k < memorySize; k++ )
{
boolean found = false;
File f1 = new File("stdlist1.txt");
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for (int m = 0; m < numberOfLinesFile1; m++)
{
line11 = buff1.readLine();
if (line11.equals(memory[k]) && found == false);
{
found = true;
}
}
buff1.close();
if (found == false)
{
writer.write(memory[k]);
writer.newLine();
}
}
}
reader2.close();
writer.close();
endTimer();
long time = duration();
System.out.println("PROCESS COMPLETED SUCCESSFULLY");
System.out.println("Duration: " + time + " ms");
}
public static void startTimer()
{
startTime = System.currentTimeMillis();
}
public static void endTimer()
{
endTime = System.currentTimeMillis();
}
public static long duration()
{
return endTime - startTime;
}
public static void reset()
{
startTime = 0;
endTime = 0;
}
}

EDIT! Redo.
Ok, so to use 100 lines at a time you need to check for null, otherwise trying to write null to a file could cause errors.
You are checking if the file is at the end once, and then gathering 99 more peices of info without checking for null.
What if when this line is called:
while((line2 = reader2.readLine()) != null)
there is only 1 line left in the file? Then your memory array contains 99 instances of null, and you try to write null to the file 99 times. That's worse case scenario.
I don't really know how much help we are supposed to give to people looking for homework help, on most sites I'm familiar with it's not even allowed.
here is an example of one way to write the first file.
String line1 = reader1.readLine();
boolean end_of_file1 = false;
while(!end_of_file)
{
for (int i = 0; i < memorySize)
{
memory[i] = line1;
i++;
if(i < memorySize)
{
if((line1 = reader1.readLine()) == null)
{
end_of_file1 = true;
}
}
}
for (int i = 0; i < memorySize; i++)
{
if(!memory[i] == null)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
}
reader1.close();
once you have that, to make the checking for copies easier, make a public static boolean that checks the file for it, then you can call that, it will make the code cleaner.
public static boolean isUsed(String f1, String item, int dist)
{
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for(int i = 0;i<dist;i++)
{
String line = buff1.readLine()
if(line == null){
return false;
}
if(line.equals(item))
{
return true;
}
}
return false;
}
Then use the same method as writing file 1, only before writing each line check to see if !isUsed()
boolean end_of_file2 = false;
memory = new String[memorySize];// Reset the memory, erase old data from file1
int numberOfLinesFile2=0;
String line2 = reader2.readLine();
while(!end_of_file2)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
if((line2 = reader2.readLine()) == null)
{
end_of_file2 = true;
}
}
}
for (int i = 0; i < memorySize; i++)
{
if(!memory[i] == null)
{
//Check is current item was used in file 1.
if(!isUsed(file1, memory[i], numberOfLinesFile1)){//If not used already
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile2++;
}
}
}
}
reader2.close();
writer.close();
Hope this helps. Notice I'm not supplying the full code, because I've learned that just pasting the code will make it more likely for copy and paste to just use a code without understanding it. I hope you find it useful.

Number of lines in a file in Java

I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file
I was wondering if there is a smarter way to do that

This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.
public static int countLinesOld(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:
public static int countLinesNew(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int readChars = is.read(c);
if (readChars == -1) {
// bail out if nothing to read
return 0;
}
// make it easy for the optimizer to tune this loop
int count = 0;
while (readChars == 1024) {
for (int i=0; i<1024;) {
if (c[i++] == '\n') {
++count;
}
}
readChars = is.read(c);
}
// count remaining characters
while (readChars != -1) {
for (int i=0; i<readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
readChars = is.read(c);
}
return count == 0 ? 1 : count;
} finally {
is.close();
}
}
Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLinesOld has a few outliers, and countLinesNew has none and while it's only a bit faster, the difference is statistically significant. LineNumberReader is clearly slower.

I have implemented another solution to the problem, I found it more efficient in counting rows:
try
(
FileReader input = new FileReader("input.txt");
LineNumberReader count = new LineNumberReader(input);
)
{
while (count.skip(Long.MAX_VALUE) > 0)
{
// Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
}
result = count.getLineNumber() + 1; // +1 because line index starts at 0
}

The accepted answer has an off by one error for multi line files which don't end in newline. A one line file ending without a newline would return 1, but a two line file ending without a newline would return 1 too. Here's an implementation of the accepted solution which fixes this. The endsWithoutNewLine checks are wasteful for everything but the final read, but should be trivial time wise compared to the overall function.
public int count(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean endsWithoutNewLine = false;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
endsWithoutNewLine = (c[readChars - 1] != '\n');
}
if(endsWithoutNewLine) {
++count;
}
return count;
} finally {
is.close();
}
}

With java-8, you can use streams:
try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) {
long numOfLines = lines.count();
...
}

The answer with the method count() above gave me line miscounts if a file didn't have a newline at the end of the file - it failed to count the last line in the file.
This method works better for me:
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}

I tested the above methods for counting lines and here are my observations for Different methods as tested on my system
File Size : 1.6 Gb
Methods:
Using Scanner : 35s approx
Using BufferedReader : 5s approx
Using Java 8 : 5s approx
Using LineNumberReader : 5s approx
Moreover Java8 Approach seems quite handy :
Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
[Return type : long]

I know this is an old question, but the accepted solution didn't quite match what I needed it to do. So, I refined it to accept various line terminators (rather than just line feed) and to use a specified character encoding (rather than ISO-8859-n). All in one method (refactor as appropriate):
public static long getLinesCount(String fileName, String encodingName) throws IOException {
long linesCount = 0;
File file = new File(fileName);
FileInputStream fileIn = new FileInputStream(file);
try {
Charset encoding = Charset.forName(encodingName);
Reader fileReader = new InputStreamReader(fileIn, encoding);
int bufferSize = 4096;
Reader reader = new BufferedReader(fileReader, bufferSize);
char[] buffer = new char[bufferSize];
int prevChar = -1;
int readCount = reader.read(buffer);
while (readCount != -1) {
for (int i = 0; i < readCount; i++) {
int nextChar = buffer[i];
switch (nextChar) {
case '\r': {
// The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
linesCount++;
break;
}
case '\n': {
if (prevChar == '\r') {
// The current line is terminated by a carriage return immediately followed by a line feed.
// The line has already been counted.
} else {
// The current line is terminated by a line feed.
linesCount++;
}
break;
}
}
prevChar = nextChar;
}
readCount = reader.read(buffer);
}
if (prevCh != -1) {
switch (prevCh) {
case '\r':
case '\n': {
// The last line is terminated by a line terminator.
// The last line has already been counted.
break;
}
default: {
// The last line is terminated by end-of-file.
linesCount++;
}
}
}
} finally {
fileIn.close();
}
return linesCount;
}
This solution is comparable in speed to the accepted solution, about 4% slower in my tests (though timing tests in Java are notoriously unreliable).

/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (Stream<String> lines = Files.lines(file.toPath())) {
return lines.count();
}
}
Tested on JDK8_u31. But indeed performance is slow compared to this method:
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {
byte[] c = new byte[1024];
boolean empty = true,
lastEmpty = false;
long count = 0;
int read;
while ((read = is.read(c)) != -1) {
for (int i = 0; i < read; i++) {
if (c[i] == '\n') {
count++;
lastEmpty = true;
} else if (lastEmpty) {
lastEmpty = false;
}
}
empty = false;
}
if (!empty) {
if (count == 0) {
count = 1;
} else if (!lastEmpty) {
count++;
}
}
return count;
}
}
Tested and very fast.

A straight-forward way using Scanner
static void lineCounter (String path) throws IOException {
int lineCount = 0, commentsCount = 0;
Scanner input = new Scanner(new File(path));
while (input.hasNextLine()) {
String data = input.nextLine();
if (data.startsWith("//")) commentsCount++;
lineCount++;
}
System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount);
}

I concluded that wc -l:s method of counting newlines is fine but returns non-intuitive results on files where the last line doesn't end with a newline.
And #er.vikas solution based on LineNumberReader but adding one to the line count returned non-intuitive results on files where the last line does end with newline.
I therefore made an algo which handles as follows:
#Test
public void empty() throws IOException {
assertEquals(0, count(""));
}
#Test
public void singleNewline() throws IOException {
assertEquals(1, count("\n"));
}
#Test
public void dataWithoutNewline() throws IOException {
assertEquals(1, count("one"));
}
#Test
public void oneCompleteLine() throws IOException {
assertEquals(1, count("one\n"));
}
#Test
public void twoCompleteLines() throws IOException {
assertEquals(2, count("one\ntwo\n"));
}
#Test
public void twoLinesWithoutNewlineAtEnd() throws IOException {
assertEquals(2, count("one\ntwo"));
}
#Test
public void aFewLines() throws IOException {
assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
}
And it looks like this:
static long countLines(InputStream is) throws IOException {
try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
char[] buf = new char[8192];
int n, previousN = -1;
//Read will return at least one byte, no need to buffer more
while((n = lnr.read(buf)) != -1) {
previousN = n;
}
int ln = lnr.getLineNumber();
if (previousN == -1) {
//No data read at all, i.e file was empty
return 0;
} else {
char lastChar = buf[previousN - 1];
if (lastChar == '\n' || lastChar == '\r') {
//Ending with newline, deduct one
return ln;
}
}
//normal case, return line number + 1
return ln + 1;
}
}
If you want intuitive results, you may use this. If you just want wc -l compatibility, simple use #er.vikas solution, but don't add one to the result and retry the skip:
try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
while(lnr.skip(Long.MAX_VALUE) > 0){};
return lnr.getLineNumber();
}

How about using the Process class from within Java code? And then reading the output of the command.
Process p = Runtime.getRuntime().exec("wc -l " + yourfilename);
p.waitFor();
BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
int lineCount = 0;
while ((line = b.readLine()) != null) {
System.out.println(line);
lineCount = Integer.parseInt(line);
}
Need to try it though. Will post the results.

It seems that there are a few different approaches you can take with LineNumberReader.
I did this:
int lines = 0;
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
String line = count.readLine();
if(count.ready())
{
while(line != null) {
lines = count.getLineNumber();
line = count.readLine();
}
lines+=1;
}
count.close();
System.out.println(lines);
Even more simply, you can use the Java BufferedReader lines() Method to return a stream of the elements, and then use the Stream count() method to count all of the elements. Then simply add one to the output to get the number of rows in the text file.
As example:
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
int lines = (int)count.lines().count() + 1;
count.close();
System.out.println(lines);

This funny solution works really good actually!
public static int countLines(File input) throws IOException {
try (InputStream is = new FileInputStream(input)) {
int count = 1;
for (int aChar = 0; aChar != -1;aChar = is.read())
count += aChar == '\n' ? 1 : 0;
return count;
}
}

On Unix-based systems, use the wc command on the command-line.

Only way to know how many lines there are in file is to count them. You can of course create a metric from your data giving you an average length of one line and then get the file size and divide that with avg. length but that won't be accurate.

If you don't have any index structures, you'll not get around the reading of the complete file. But you can optimize it by avoiding to read it line by line and use a regex to match all line terminators.

Best Optimized code for multi line files having no newline('\n') character at EOF.
/**
*
* #param filename
* #return
* #throws IOException
*/
public static int countLines(String filename) throws IOException {
int count = 0;
boolean empty = true;
FileInputStream fis = null;
InputStream is = null;
try {
fis = new FileInputStream(filename);
is = new BufferedInputStream(fis);
byte[] c = new byte[1024];
int readChars = 0;
boolean isLine = false;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if ( c[i] == '\n' ) {
isLine = false;
++count;
}else if(!isLine && c[i] != '\n' && c[i] != '\r'){ //Case to handle line count where no New Line character present at EOF
isLine = true;
}
}
}
if(isLine){
++count;
}
}catch(IOException e){
e.printStackTrace();
}finally {
if(is != null){
is.close();
}
if(fis != null){
fis.close();
}
}
LOG.info("count: "+count);
return (count == 0 && !empty) ? 1 : count;
}

Scanner with regex:
public int getLineCount() {
Scanner fileScanner = null;
int lineCount = 0;
Pattern lineEndPattern = Pattern.compile("(?m)$");
try {
fileScanner = new Scanner(new File(filename)).useDelimiter(lineEndPattern);
while (fileScanner.hasNext()) {
fileScanner.next();
++lineCount;
}
}catch(FileNotFoundException e) {
e.printStackTrace();
return lineCount;
}
fileScanner.close();
return lineCount;
}
Haven't clocked it.

if you use this
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
you cant run to big num rows, likes 100K rows, because return from reader.getLineNumber is int. you need long type of data to process maximum rows..

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

union of two files with array - java

Related

Accessing indexes that may not exist Java [duplicate]

Java - Comparing two huge text files

Why is my mode program not printing anything?

reading from the file and writing to the file in java

Number of lines in a file in Java

Categories

Resources