I am trying to develop a basic java program to compare two huge text files and print non matching records .i.e. similar to minus function in SQL. but I am not getting the expected results because all the records are getting printed even though both files are same. Also suggest me whether this approach is performance efficient for comparing two huge text files.
import java.io.*;
public class CompareTwoFiles {
static int count1 = 0 ;
static int count2 = 0 ;
static String arrayLines1[] = new String[countLines("\\Files_Comparison\\File1.txt")];
static String arrayLines2[] = new String[countLines("\\Files_Comparison\\File2.txt")];
public static void main(String args[]){
findDifference("\\Files_Comparison\\File1.txt","\\Files_Comparison\\File2.txt");
displayRecords();
}
public static int countLines(String File){
int lineCount = 0;
try {
BufferedReader br = new BufferedReader(new FileReader(File));
while ((br.readLine()) != null) {
lineCount++;
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return lineCount;
}
public static void findDifference(String File1, String File2){
String contents1 = null;
String contents2 = null;
try
{
FileReader file1 = new FileReader(File1);
FileReader file2 = new FileReader(File2);
BufferedReader buf1 = new BufferedReader(file1);
BufferedReader buf2 = new BufferedReader(file2);
while ((contents1 = buf1.readLine()) != null)
{
arrayLines1[count1] = contents1 ;
count1++;
}
while ((contents2 = buf2.readLine()) != null)
{
arrayLines2[count2] = contents2 ;
count2++;
}
}catch (Exception e){
e.printStackTrace();
}
}
public static void displayRecords() {
for (int i = 0 ; i < arrayLines1.length ; i++) {
String a = arrayLines1[i];
for (int j = 0; j < arrayLines2.length; j++){
String b = arrayLines2[j];
boolean result = a.contains(b);
if(result == false){
System.out.println(a);
}
}
}
}
}
Based upon your explanation you do not need embedded loops
consider
public static void displayRecords() {
for (int i = 0 ; i < arrayLines1.length && i < arrayLines2.length; i++)
{
String a = arrayLines1[i];
String b = arrayLines2[i];
if(!a.contains(b){
System.out.println(a);
}
}
For the performance wise, you should try to match the size of the files. If the sizes(in bytes) are exactly the same, you might not need to compare them.
Related
I'm creating an application which will get the all the rpms in the table, well when I want to append it to a textfile something wrong, Please see the code below.
public class rpms(){
public static void main(String[] args) {
URLget rpms = new URLget();
try {
getTdSibling(sendGetRequest(URL).toString());
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void getTdSibling(String sourceTd) throws FileNotFoundException, UnsupportedEncodingException {
String fragment = sourceTd;
Document doc = Jsoup.parseBodyFragment(fragment);
for (Element table : doc.select("table")) {
for (Element row : table.select("tr")) {
Elements lines = row.select("td");
String linesToStr = lines.text();
String[] linestoStrArray = linesToStr.split("\n");
for (String line : linestoStrArray)
if (!line.contains("Outdated")){
//System.out.println(""+line);// display the rpms that do not have outdated
for (int i = 0; i < lines.size(); i++) {
if(!lines.eq(i).text().toString().equals(" ")){
splitStr(lines.eq(i).text().toString());
}
}
}
}
}
}
public static void splitStr(String str1) throws FileNotFoundException, UnsupportedEncodingException{
ArrayList<String> outputContent = new ArrayList<String>();
String[] split1 = str1.split(" ");
for (int i = 0; i < split1.length; i++) {
if(fileExplode(split1[i])){
System.out.println(split1[i]);
outputContent.add(split1[i]);
}
}
copyFile(outputContent);
}
public static void copyFile(ArrayList<String> fileCon1) throws FileNotFoundException, UnsupportedEncodingException{
PrintWriter writer1 = new PrintWriter("C:\\Users\\usersb\\Downloads\\rpms\\newrpms.txt", "UTF-8");
for(int i = 0 ; i < fileCon1.size() ; i++){
writer1.println(fileCon1.get(i));
}
System.out.println("updated newrpms.txt");
writer1.close();
}
public static boolean fileExplode(String str1) {
boolean hasRPM = false;
String[] split1 = str1.replace(".", " ").split(" ");
for (int i = 0; i < split1.length; i++) {
if ((i + 1) == split1.length) {
if (split1[i].endsWith("rpm")
|| (split1[i].length() > 2 && split1[i].charAt(0) == '.' && split1[i].charAt(1) == 'r'
&& split1[i].charAt(2) == 'p' && split1[i]
.charAt(3) == 'm')) {
hasRPM = true;
}
break;
}
}
return hasRPM;
}
}
After I execute the code. The file is empty. what should I do to get the same output displayed in this statemen System.out.println(split1[i]);
Add writer1.flush(); befor you close the PrintWriter writer1.close();
or
This is a working code for write a String value to text file, Try this in your copyFile method
PrintWriter out = null;
try {
out = new PrintWriter("C:\\45.txt"); //path where you want to create your file
} catch (FileNotFoundException ex) {
}
StringBuffer writesb = null;
writesb = new StringBuffer();
//append text
writesb.append("w");
//Line Spacer.writes to a new line
writesb.append(System.getProperty("line.separator"));
writesb.append("e");
out.print(writesb.toString());
out.flush();
out.close();
I'm going to show all of my code here so you guys get a gist of what I'm doing.
import java.io.*;
import java.util.*;
public class Plagiarism {
public static void main(String[] args) {
Plagiarism myPlag = new Plagiarism();
if (args.length == 0) {
System.out.println("Error: No files input");
}
else if (args.length > 0) {
try {
List<String> foo = new ArrayList<String>();
for (int i = 0; i < 2; i++) {
BufferedReader reader = new BufferedReader (new FileReader (args[i]));
foo = simplify(reader);
for (int j = 0; j < foo.size(); j++) {
System.out.print(foo.get(j));
}
}
int blockSize = Integer.valueOf(args[2]);
System.out.println(args[2]);
// String line = foo.toString();
List<String> list = new ArrayList<String>();
for (int k = 0; k < foo.size() - blockSize; k++) {
list.add(foo.toString().substring(k, k+blockSize));
}
System.out.println(list);
}
catch (Exception e) {
e.printStackTrace();
}
}
}
public static List<String> simplify(BufferedReader input) throws IOException {
String line = null;
List<String> myList = new ArrayList<String>();
while ((line = input.readLine()) != null) {
myList.add(line.replaceAll("[^a-zA-Z]","").toLowerCase());
}
return myList;
}
}
This is the code that is using substring.
int blockSize = Integer.valueOf(args[2]);
//"foo" is an ArrayList<String> which I have to convert toString() to use substring().
String line = foo.toString();
List<String> list = new ArrayList<String>();
for (int k = 0; k < line.length() - blockSize; k++) {
list.add(line.substring(k, k+blockSize));
}
System.out.println(list);
When I specify blockSize as 4 in cmd this is the result:
[[, a, , ab, abc ]
the text file (standardised using my other code) is this:
abcdzaabcdd
so the result should be this:
[abcd, bcdz, cdza, ] etc.
Any help?
Thanks in advance.
Here is code showing how to improve a little your code. Main change is returning simplified string from simplify method instead of List<String> of simplified lines, which after converting it to string returned String in form
[value0, value1, value2, ...]
Now code returns String in form value0value1value2.
Another change is lowering indentation lever by removing unnecessary else if statement and braking control flow with System.exit(0); (you can also use return; here).
class Plagiarism {
public static void main(String[] args) throws Exception {
//you are not using 'myPlag' anywhere, you can safely remove it
// Plagiarism myPlag = new Plagiarism();
if (args.length == 0) {
System.out.println("Error: No files input");
System.exit(0);
}
String foo = null;
for (int i = 0; i < 2; i++) {
BufferedReader reader = new BufferedReader(new FileReader(args[i]));
foo = simplify(reader);
System.out.println(foo);
}
int blockSize = Integer.valueOf(args[2]);
System.out.println(args[2]);
List<String> list = new ArrayList<String>();
for (int k = 0; k < foo.length() - blockSize; k++) {
list.add(foo.toString().substring(k, k + blockSize));
}
System.out.println(list);
}
public static String simplify(BufferedReader input)
throws IOException {
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = input.readLine()) != null) {
sb.append(line.replaceAll("[^a-zA-Z]", "").toLowerCase());
}
return sb.toString();
}
}
I have been given this question for practice and am kind of stuck on how to complete it. It basically asks us to create a program which uses a BufferedReader object to read values(55, 96, 88, 32) given in a txt file (say "s.txt") and then return the smallest value of the given values.
So far I have got two parts of the program but i'm not sure how to join them together.
import java.io.*;
class CalculateMin
{
public static void main(String[] args)
{
try {
BufferedReader br = new BufferedReader(new FileReader("grades.txt"));
int numberOfLines = 5;
String[] textInfo = new String[numberOfLines];
for (int i = 0; i < numberOfLines; i++) {
textInfo[i] = br.readLine();
}
br.close();
} catch (IOException ie) {
}
}
}
and then I have the loop which I made but i'm not sure how to implement it into the program above. Eugh I know i'm complicating things.
int[] numArray;
numArray = new int[Integer.parseInt(br.readLine())];
int smallestSoFar = numArray[0];
for (int i = 0; i < numArray.length; i++) {
if (numArray[i] < smallestSoFar) {
smallestSoFar = numArray[i];
}
}
Appreciate your help
Try this code, it iterates through the entire file comparing number from each line with the previously read lowest number-
public static void main(String[] args) {
try {
BufferedReader br = new BufferedReader(new FileReader("grades.txt"));
String line;
int lowestNumber = Integer.MAX_VALUE;
int number;
while ((line = br.readLine()) != null) {
try {
number = Integer.parseInt(line);
lowestNumber = number < lowestNumber ? number : lowestNumber;
} catch (NumberFormatException ex) {
// print the error saying that the line does not contain a number
}
}
br.close();
System.out.println("Lowest number is " + lowestNumber);
} catch (IOException ie) {
// print the exception
}
}
I am beginner of Java.
I am trying to read two files and then get the union of them. I should use an array with size 100. (only one array allowed)
First, I read all records from file1, and write them to the output, file3. For that purpose, I read 100 records at a time, and write them to file3 using iteration.
After that, like file1, this time I read second file as 100 records at a time, and write them to the array, memory[]. Then I find the common records, if the record which I read from file2 is not in file1, I write it to the output file. I do this until reader2.readLine() gets null and I re-open file1 in each iteration.
This is what I have done so far, almost done, but it gives NullPointerException. Any help would be appreciated.
Edit: ok, now it doesn't give any exception, but it doesn't find the different records and can't write them. I guess the last for loop and booleans don't work , why? please help...
import java.io.*;
public class FileUnion
{
private static long startTime, endTime;
public static void main(String[] args) throws IOException
{
System.out.println("PROCESSING...");
reset();
startTimer();
String[] memory = new String[100];
int memorySize = memory.length;
File file1 = new File("stdlist1.txt");
BufferedReader reader1 = new BufferedReader(new FileReader(file1));
File file3 = new File("union.txt");
BufferedWriter writer = new BufferedWriter(new FileWriter(file3));
int numberOfLinesFile1 = 0;
String line1 = null;
String line11 = null;
while((line1 = reader1.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line1;
i++;
if(i < memorySize)
{
line1 = reader1.readLine();
}
}
for (int i = 0; i < memorySize; i++)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
reader1.close();
File file2 = new File("stdlist2.txt");
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String line2 = null;
while((line2 = reader2.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
line2 = reader2.readLine();
}
}
for (int k = 0; k < memorySize; k++ )
{
boolean found = false;
File f1 = new File("stdlist1.txt");
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for (int m = 0; m < numberOfLinesFile1; m++)
{
line11 = buff1.readLine();
if (line11.equals(memory[k]) && found == false);
{
found = true;
}
}
buff1.close();
if (found == false)
{
writer.write(memory[k]);
writer.newLine();
}
}
}
reader2.close();
writer.close();
endTimer();
long time = duration();
System.out.println("PROCESS COMPLETED SUCCESSFULLY");
System.out.println("Duration: " + time + " ms");
}
public static void startTimer()
{
startTime = System.currentTimeMillis();
}
public static void endTimer()
{
endTime = System.currentTimeMillis();
}
public static long duration()
{
return endTime - startTime;
}
public static void reset()
{
startTime = 0;
endTime = 0;
}
}
memory[k] is null. Why is this null? Because in this code:
while((line2 = reader2.readLine()) != null)
{
for (int i = 0; i < 100; i++)
{
memory[i] = line1;
i++;
if(i < 100)
{
line2 = reader2.readLine();
}
}
you say memory[i] = line1;
line1 however is always null because you used it before in a loop which ended when line1 is null.
I believe you intended to write **memory[i] = line2;** in the above code :)
You have to check that you've not yet reached the end of the file. In all loops where you have a lineX = readerX.readLine(), immediately check whether lineX == null and break out of the loop if it is.
Edit my own answer because code doesn't show well in comments.
while(!line11.equals(memory[k]))
{
line11 = buff1.readLine();
}
It's line11 that is (sometimes) null here. If memory[k] is not in file1, what happens?
I am beginner with Java.
This is my approach:
I am trying to read two files and then get the union of them. I should am using an array with size 100. (just one array allowed, reading and writing line by line or arrayList or other structures are not allowed.)
First, I read all records from file1, and write them to the output, a third file. For that purpose, I read 100 record at a time, and write them to the third file using iteration.
After that, like first file, this time I read second file as 100 records at a time, and write them to the memory[]. Then I find the common records, if the record which I read from File2 is not in File1, I write it to the output file. I do this until reader2.readLine() gets null and I re-open file1 in each iteration.
This is what I have done so far, almost done. Any help would be appreciated.
Edit: ok, now it doesn't give any exception, but it can't find the different records and can't write them. I guess the last for loop and booleans don't work , why? I really need help. Thanks for your patience.
import java.io.*;
public class FileUnion
{
private static long startTime, endTime;
public static void main(String[] args) throws IOException
{
System.out.println("PROCESSING...");
reset();
startTimer();
String[] memory = new String[100];
int memorySize = memory.length;
File file1 = new File("stdlist1.txt");
BufferedReader reader1 = new BufferedReader(new FileReader(file1));
File file3 = new File("union.txt");
BufferedWriter writer = new BufferedWriter(new FileWriter(file3));
int numberOfLinesFile1 = 0;
String line1 = null;
String line11 = null;
while((line1 = reader1.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line1;
i++;
if(i < memorySize)
{
line1 = reader1.readLine();
}
}
for (int i = 0; i < memorySize; i++)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
reader1.close();
File file2 = new File("stdlist2.txt");
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
String line2 = null;
while((line2 = reader2.readLine()) != null)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
line2 = reader2.readLine();
}
}
for (int k = 0; k < memorySize; k++ )
{
boolean found = false;
File f1 = new File("stdlist1.txt");
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for (int m = 0; m < numberOfLinesFile1; m++)
{
line11 = buff1.readLine();
if (line11.equals(memory[k]) && found == false);
{
found = true;
}
}
buff1.close();
if (found == false)
{
writer.write(memory[k]);
writer.newLine();
}
}
}
reader2.close();
writer.close();
endTimer();
long time = duration();
System.out.println("PROCESS COMPLETED SUCCESSFULLY");
System.out.println("Duration: " + time + " ms");
}
public static void startTimer()
{
startTime = System.currentTimeMillis();
}
public static void endTimer()
{
endTime = System.currentTimeMillis();
}
public static long duration()
{
return endTime - startTime;
}
public static void reset()
{
startTime = 0;
endTime = 0;
}
}
EDIT! Redo.
Ok, so to use 100 lines at a time you need to check for null, otherwise trying to write null to a file could cause errors.
You are checking if the file is at the end once, and then gathering 99 more peices of info without checking for null.
What if when this line is called:
while((line2 = reader2.readLine()) != null)
there is only 1 line left in the file? Then your memory array contains 99 instances of null, and you try to write null to the file 99 times. That's worse case scenario.
I don't really know how much help we are supposed to give to people looking for homework help, on most sites I'm familiar with it's not even allowed.
here is an example of one way to write the first file.
String line1 = reader1.readLine();
boolean end_of_file1 = false;
while(!end_of_file)
{
for (int i = 0; i < memorySize)
{
memory[i] = line1;
i++;
if(i < memorySize)
{
if((line1 = reader1.readLine()) == null)
{
end_of_file1 = true;
}
}
}
for (int i = 0; i < memorySize; i++)
{
if(!memory[i] == null)
{
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile1++;
}
}
}
reader1.close();
once you have that, to make the checking for copies easier, make a public static boolean that checks the file for it, then you can call that, it will make the code cleaner.
public static boolean isUsed(String f1, String item, int dist)
{
BufferedReader buff1 = new BufferedReader(new FileReader(f1));
for(int i = 0;i<dist;i++)
{
String line = buff1.readLine()
if(line == null){
return false;
}
if(line.equals(item))
{
return true;
}
}
return false;
}
Then use the same method as writing file 1, only before writing each line check to see if !isUsed()
boolean end_of_file2 = false;
memory = new String[memorySize];// Reset the memory, erase old data from file1
int numberOfLinesFile2=0;
String line2 = reader2.readLine();
while(!end_of_file2)
{
for (int i = 0; i < memorySize; )
{
memory[i] = line2;
i++;
if(i < memorySize)
{
if((line2 = reader2.readLine()) == null)
{
end_of_file2 = true;
}
}
}
for (int i = 0; i < memorySize; i++)
{
if(!memory[i] == null)
{
//Check is current item was used in file 1.
if(!isUsed(file1, memory[i], numberOfLinesFile1)){//If not used already
writer.write(memory[i]);
writer.newLine();
numberOfLinesFile2++;
}
}
}
}
reader2.close();
writer.close();
Hope this helps. Notice I'm not supplying the full code, because I've learned that just pasting the code will make it more likely for copy and paste to just use a code without understanding it. I hope you find it useful.