Java Text File comparison - Excess and Missing - java

I have a requirement to compare 2 text files (MasterCopy.txt and ClientCopy.txt). I would like to get the list of strings which are missing in ClientCopy.txt. Also need to get the list of strings which are in excess.
Contents of MasterCopy.txt
London
Paris
Rome
Contents of ClientCopy.txt
London
Berlin
Rome
Amsterdam
I would like to get these results
Missing:
Paris
Excess:
Berlin
Amsterdam

Two ideas that come to mind are getting the diff of the two files:
https://code.google.com/p/java-diff-utils/
From their wiki
Task 1: Compute the difference between to files and print its deltas
Solution:
import difflib.*;
public class BasicJavaApp_Task1 {
// Helper method for get the file content
private static List<String> fileToLines(String filename) {
List<String> lines = new LinkedList<String>();
String line = "";
try {
BufferedReader in = new BufferedReader(new FileReader(filename));
while ((line = in.readLine()) != null) {
lines.add(line);
}
} catch (IOException e) {
e.printStackTrace();
}
return lines;
}
public static void main(String[] args) {
List<String> original = fileToLines("originalFile.txt");
List<String> revised = fileToLines("revisedFile.xt");
// Compute diff. Get the Patch object. Patch is the container for computed deltas.
Patch patch = DiffUtils.diff(original, revised);
for (Delta delta: patch.getDeltas()) {
System.out.println(delta);
}
}
}
or to use a HashSet:
http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html
Modification of #Nic's answer to use HashSet:
Scanner s = new Scanner(new File(“MasterCopy.txt”));
HashSet<String> masterlist = new HashSet<String>();
while (s.hasNext()){
masterlist.put(s.next());
}
s.close();
s = new Scanner(new File(“ClientCopy.txt”));
HashSet<String> clientlist = new HashSet<String>();
while (s.hasNext()){
clientlist.put(s.next());
}
s.close();
//Do the comparison
ArrayList<String> missing = new ArrayList<String>();
ArrayList<String> excess = new ArrayList<String>();
//Check for missing or excess
for(String line : masterlist){
if(clientlist.get(line) == null) missing.add(line);
}
for(String line : clientlist){
if(masterlist.get(line) == null) excess.add(line);
}

If execution time is not a big factor you can do this, assuming you are only comparing each line:
//Get the files into lists
Scanner s = new Scanner(new File(“MasterCopy.txt”));
HashSet<String> masterlist = new HashSet<String>();
while (s.hasNext()){
masterlist.add(s.next());
}
s.close();
s = new Scanner(new File(“ClientCopy.txt”));
HashSet<String> clientlist = new HashSet<String>();
while (s.hasNext()){
clientlist.add(s.next());
}
s.close();
//Do the comparison
HashSet<String> missing = new HashSet<String>();
HashSet<String> excess = new HashSet<String>();
//Check for missing or excess
for(String s : masterlist){
if(!clientlist.contains(s)) missing.add(s);
}
for(String s : clientlist){
if(!masterlist.contains(s)) excess.add(s);
}

Related

How to remove specific duplicate data in array after sorting?

This is my code:
FileWriter writers = null;
try {
BufferedReader reader = new BufferedReader(new FileReader("Database.txt"));
ArrayList<Data> dataList = new ArrayList<>();
String line = "";
while ((line = reader.readLine()) != null) {
//split string and construct Data object and add it to dataList
dataList.add(parse(line));
}
reader.close();
Collections.sort(dataList);
writers = new FileWriter("final.txt");
for (Data d : dataList) {
writers.write(d.toString());
writers.write("\r\n");
}
writers.close();
} catch (Exception ex) {
ex.printStackTrace();
} finally {
}
Input/Output in this code:
input: mamy, 30, new, old
daddy, 21, new, new
output: daddy, 21,new,new
mamy , 30, new, old
Expected output:
daddy,21,new
mamy,30,new,old
My Problem is how to remove duplicate in array before storing it to final.txt? any suggestion?
I think Set is perfect for you, it eliminates duplicates.
Set<Data> dataSet = new HashSet<>(dataList);
To remove duplicates use this code right before sorting.
ArrayList<Data> newDataList = new ArrayList<>();
for (Data element : dataList) {
if (!newDataList.contains(element)) {
newDataList.add(element);
}
}
dataList = newDataList;

Reading a text file into multiple arrays in Java

I'm currently working on a program that reads in a preset text file and then manipulates the data in various ways. I've got the data manipulation to work with some dummy data but I still need to get the text file read in correctly.
The test file looks like this for 120 lines:
Aberdeen,Scotland,57,9,N,2,9,W,5:00,p.m. Adelaide,Australia,34,55,S,138,36,E,2:30,a.m. Algiers,Algeria,36,50,N,3,0,E,6:00,p.m.(etc etc)
So each of these needs to be read into its own array, in order String[] CityName,String[] Country,int[] LatDeg,int[] LatMin,String[] NorthSouth,int[] LongDeg,int LongMin,String[] EastWest,int[] Time.String[] AMPM
So the problem is that while I'm reasonably comfortable with buffered readers, designing this particular function has proven difficult. In fact, I've been drawing a blank for the past few hours. It seems like it would need multiple loops and counters but I can't figure out the precisely how.
I am assuming that you have one city per line type of file structure. If it is not, it will require a bit of tweaking in the following solution:
I will do the following way if I am more comfortable with BufferReader as you say:
List<List<String>> addresses = new ArrayList<List<String>>();
try(BufferedReader br = new BufferedReader(new FileReader(file))) {
for(String line; (line = br.readLine()) != null; ) {
addresses.add(line.split(","));
}
}
Later, let's say you want to retrieve the country information of say 'Adelaid', you can try the following:
for (List<String> cityInfo : addresses) {
if("Adelaid".equals(cityInfo.get(0)) {
country = cityInfo.get(1);
}
}
Instead of creating different arrays (like String[] CityName,String[] Country, etc.,), try using a Domain Object.
Here, you can have a Domain object or Custom class Location with attributes
public class Location
{
private String cityName;
private String country;
private String latDeg;
etc
getters();
setters();
}`
Then you can write a file reader, each line item in the file will be a Location. So result will have
Location[] locations;
or
List locations;`
To carry out this task I should think the first thing you want to do is establish how many lines of data actually exist within the data file. You say it is 120 lines but what if it happens that it will be more or less? We would want to know exactly what it is so as to properly initialize all our different Arrays. We can use a simple method to accomplish this, let's call it the getFileLinesCount() method which will ulitmately return a Integer value that would be the number of text lines the data file holds:
private int getFileLinesCount(final String filePath) {
int lines = 0;
try{
File file =new File(filePath);
if(file.exists()){
FileReader fr = new FileReader(file);
try (LineNumberReader lnr = new LineNumberReader(fr)) {
while (lnr.readLine() != null){ lines++; }
}
}
else {
throw new IllegalArgumentException("GetFileLinesCount() Method Error!\n"
+ "The supplied file path does not exist!\n(" + filePath + ")");
}
}
catch(IOException e){ e.printStackTrace(); }
return lines;
}
Place this method somewhere within your main class. Now you need to Declare and initialize all your Arrays:
String filePath = "C:\\My Files\\MyDataFile.txt";
int lines = getFileLinesCount(filePath);
String[] CityName = new String[lines];
String[] Country = new String[lines];
int[] LatDeg = new int[lines];
int[] LatMin = new int[lines];
String[] NorthSouth = new String[lines];
int[] LongDeg = new int[lines];
int[] LongMin = new int[lines];
String[] EastWest = new String[lines];
int[] Time = new int[lines];
String[] AMPM = new String[lines];
Now to fill up all those Arrays:
public static void main(String args[]) {
loadUpArrays();
// Do whatever you want to do
// with all those Arrays.....
}
private void loadUpArrays() {
// Read in the data file.
try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
String sCurrentLine;
int x = 0;
// Read in one line at a time and Fill the Arrays...
while ((sCurrentLine = br.readLine()) != null) {
// Split each line read into an array upon itself.
String[] fileLine = sCurrentLine.split(",");
// Fill our required Arrays...
CityName[x] = fileLine[0];
Country[x] = fileLine[1];
LatDeg[x] = Integer.parseInt(fileLine[2]);
LatMin[x] = Integer.parseInt(fileLine[3]);
NorthSouth[x] = fileLine[4];
LongDeg[x] = Integer.parseInt(fileLine[5]);
LongMin[x] = Integer.parseInt(fileLine[6]);
EastWest[x] = fileLine[7];
Time[x] = Integer.parseInt(fileLine[8]);
AMPM[x] = fileLine[9];
x++;
}
br.close();
}
catch (IOException ex) { ex.printStackTrace(); }
}
Now, I haven't tested this, I just quickly punched it out but I think you can get the jest of it.
EDIT:
As #Mad Physicist has so graciously pointed out within his comment below, a List can be used to eliminate the need to count file lines therefore eliminating the need to read the data file twice. All the file lines can be placed into the List and the number of valid file lines can be determined by the size of the List. Filling of your desired arrays can now also be achieved by iterating through the List elements and processing the data accordingly. Everything can be achieved with a single method we'll call fillArrays(). Your Arrays declaration will be a little different however:
String[] CityName;
String[] Country;
int[] LatDeg;
int[] LatMin;
String[] NorthSouth;
int[] LongDeg;
int[] LongMin;
String[] EastWest;
String[] Time;
String[] AMPM;
public static void main(String args[]) {
fillArrays("C:\\My Files\\MyDataFile.txt");
// Whatever you want to do with all
// those Arrays...
}
private void fillArrays(final String filePath) {
List<String> fileLinesList = new ArrayList<>();
try{
File file = new File(filePath);
if(file.exists()){
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String strg;
while((strg = br.readLine()) != null){
// Make sure there is no blank line. If not
// then add line to List.
if (!strg.equals("")) { fileLinesList.add(strg); }
}
br.close();
}
}
else {
throw new IllegalArgumentException("GetFileLinesCount() Method Error!\n"
+ "The supplied file path does not exist!\n(" + filePath + ")");
}
// Initialize all the Arrays...
int lines = fileLinesList.size();
CityName = new String[lines];
Country = new String[lines];
LatDeg = new int[lines];
LatMin = new int[lines];
NorthSouth = new String[lines];
LongDeg = new int[lines];
LongMin = new int[lines];
EastWest = new String[lines];
Time = new String[lines];
AMPM = new String[lines];
// Fill all the Arrays...
for (int i = 0; i < fileLinesList.size(); i++) {
String[] lineArray = fileLinesList.get(i).split(",");
CityName[i] = lineArray[0];
Country[i] = lineArray[1];
LatDeg[i] = Integer.parseInt(lineArray[2]);
LatMin[i] = Integer.parseInt(lineArray[3]);
NorthSouth[i] = lineArray[4];
LongDeg[i] = Integer.parseInt(lineArray[5]);
LongMin[i] = Integer.parseInt(lineArray[6]);
EastWest[i] = lineArray[7];
Time[i] = lineArray[8];
AMPM[i] = lineArray[9];
}
}
catch(IOException e){ e.printStackTrace(); }
}
On another note...your Time Array can not be Integer since in data, what is considered the time contains a colon (:) which is a alpha character therefore (in case you haven't noticed) I have changed its declaration to String[]

Comparing two text files in random order with Java

I am trying to compare two text files that are randomized and print out the lines that match in both of the files.
File 1:
Student1
Student2
Student3
Student4
File 2:
Student6
Student1
Student2
I want the output as
Student1
Student2
My code is below.
public static void main(String[] args) throws IOException {
String first = "file1.txt";
String second = "file2.txt";
BufferedReader fBr = new BufferedReader(new FileReader(first));
BufferedReader sBr = new BufferedReader(new FileReader(second));
PrintWriter writer = new PrintWriter("test.txt", "UTF-8");
while ((first = fBr.readLine()) != null) {
String partOne1 = fBr.readLine();
String partTwo1 = sBr.readLine();
while ((second = sBr.readLine()) != null) {
System.out.println(first);
writer.println(first);
break;
}
}
writer.close();
fBr.close();
sBr.close();
It's quite simple=) Try to store all results from first file and compare with all lines from second. It will be like this:
package com.company;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
public class Main {
public static void main(String[] args) throws IOException {
String first = "file1.txt";
String second = "file2.txt";
BufferedReader fBr = new BufferedReader(new FileReader(first));
BufferedReader sBr = new BufferedReader(new FileReader(second));
ArrayList<String> strings = new ArrayList<String>();
while ((first = fBr.readLine()) != null) {
strings.add(first);
}
fBr.close();
while ((second = sBr.readLine()) != null) {
if (strings.contains(second)) {
System.out.println(second);
}
}
sBr.close();
}
}
It's better to use memory when possible, your 'while' inside different while can work too long time and obfuskate logic.
Another alternative is to put both your files in two arraylists and use the arraylist's retainAll() method to get the common files. And do the operations on it like printing or something else.
public static void main(String[] args) throws IOException {
String first = "file1.txt";
String second = "file2.txt";
BufferedReader fBr = new BufferedReader(new FileReader(first));
BufferedReader sBr = new BufferedReader(new FileReader(second));
List<String> firstFile = new ArrayList<>();
List<String> secondFile = new ArrayList<>();
PrintWriter writer = new PrintWriter("test.txt", "UTF-8");
while ((first = fBr.readLine()) != null) {
firstFile.add(first);
}
while ((second = sBr.readLine()) != null) {
secondFile.add(second);
}
List<String> commonFile = new ArrayList<>(firstFile);
commonFile.retainAll(secondFile);
System.out.println(commonFile);
writer.close();
fBr.close();
sBr.close();
}
If you are using Java8 , the following is a terse way of achieving this logic. Please note that this is applicable for Java8 only. It uses some lambda expressions and features available without a lot of boilerplate code. Hope you find it amusing atleast
List<String> file1Lines = Files.readAllLines(Paths.get("C:\\DevelopmentTools\\student-file1.txt"), Charset.defaultCharset());
List<String> file2Lines = Files.readAllLines(Paths.get("C:\\DevelopmentTools\\student-file2.txt"), Charset.defaultCharset());
List<String> matchingStrings = file1Lines.stream().
filter(studentInfo -> file2Lines.contains(studentInfo))
.collect(Collectors.toList());
matchingStrings.forEach(System.out::println);
Prints :
Student1 , Student2
If you want an elegant solution:
Sort both
Compare as sorted lists
First of all, this is very simple. Secondly, sorting is so incredibly well optimized, this will usually be faster than anything manually written, and yield elegant and easy to understand code.
Most of the other solutions here are O(n*m). This approach is O(n log n + m log m) with small constants. You could use a hashmap for lookups, which would theoretically yield O(n + m) but may have too large constants.
Here is sample code it will print matching values and also non matching values in 2 lists
private static void getMatchAndDiff(List<String> list1, List<String> list2) {
List<String> tempList2=new ArrayList<>(list2);
List<String> tempList1=new ArrayList<>(list1);
list1.retainAll(list2);
System.out.println("Matching results: ");
list1.forEach(System.out::println);
System.out.println("Non Matching results: ");
tempList2.removeAll(list1);
tempList1.removeAll(list2);
System.out.println(tempList1+"\n"+tempList2);
}

need to find remove duplicates from a text file comparing 1st and 5th string from every line

As part of a project I'm working on, I'd like to clean up a file I generate of duplicate line entries. These duplicates often won't occur near each other, however. I came up with a method of doing so in Java (which basically find a duplicates in the file, I stored two strings in two arrayLists and iterating but it was not working because of nested for loops i am getting into the condition manyways.
I need an integrated solution for this, however. Preferably in Java. Any ideas?
List item
public class duplicates {
static BufferedReader reader = null;
static BufferedWriter writer = null;
static String currentLine;
public static void main(String[] args) throws IOException {
int count=0,linecount=0;;
String fe = null,fie = null,pe=null;
File file = new File("E:\\Book.txt");
ArrayList<String> list1=new ArrayList<String>();
ArrayList<String> list2=new ArrayList<String>();
reader = new BufferedReader(new FileReader(file));
while((currentLine = reader.readLine()) != null)
{
StringTokenizer st = new StringTokenizer(currentLine,"/"); //splits data into strings
while (st.hasMoreElements()) {
count++;
fe=(String) st.nextElement();
//System.out.print(fe+"/// ");
//System.out.println("count="+count);
if(count==1){ //stores 1st string
pe=fe;
// System.out.println("first element "+fe);
}
else if(count==5){
fie=fe; //stores 5th string
// System.out.println("fifth element "+fie);
}
}
count=0;
if(linecount>0){
for(String s1:list1)
{
for(String s2:list2){
if(pe.equals(s1)&&fie.equals(s2)){ //checking condition
System.out.println("duplicate found");
//System.out.println(s1+ " "+s2);
}
}
}
}
list1.add(pe);
list2.add(fie);
linecount++;
}
}
}
i/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/book1/_cwc/B737/customer/Special_Reports/
/jangeer/_cwc/Crj_200/customer/plots/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/
o/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
Use a Set<String> instead of Arraylist<String>.
Duplicates aren't allowed in a Set, so if you just add everyline to it, then get them back out, you'll have all distinct strings.
Performance-wise it's also quicker than your nested for-loop.
public static void removeDups() {
String[] input = new String[] { //Lets say you read whole file in this string array
"/book1/_cwc/B737/customer/Special_Reports/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/book1/_cwc/B737/customer/Special_Reports/",
"/jangeer/_cwc/Crj_200/customer/plots/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
"/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/01_Highlights/",
"/jangeer/_cwc/ERJ170/customer/01_Highlights/"
};
ArrayList<String> outPut = new ArrayList<>(); //The array list for storing output i.e. distincts.
Arrays.stream(input).distinct().forEach(x -> outPut.add(x)); //using java 8 and stream you get distinct from input
outPut.forEach(System.out::println); //I will write back to the file, just for example I am printing out everything but you can write back the output to file using your own implementation.
}
The output when I ran this method was
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/
EDIT
Non Java 8 answer
public static void removeDups() {
String[] input = new String[] {
"/book1/_cwc/B737/customer/Special_Reports/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/book1/_cwc/B737/customer/Special_Reports/",
"/jangeer/_cwc/Crj_200/customer/plots/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
"/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/01_Highlights/",
"/jangeer/_cwc/ERJ170/customer/01_Highlights/"
};
LinkedHashSet<String> output = new LinkedHashSet<String>(Arrays.asList(input)); //output is your set of unique strings in preserved order
}

Read File from text file on Android doesn't work

Update :: I have changed my code a bit but still bug with lines.get(0) which makes my app crashed again.
final List<String> lines = new ArrayList<String>();
String line;
int i = 0;
BufferedReader buffreader = null;
try {
buffreader = new BufferedReader(
new InputStreamReader(getAssets().open("test.txt")));
while((line = buffreader.readLine()) != null)
{
lines.add(line);
i++;
}
} catch (IOException e) {
}
This
final String lines[]= {};
creates an array of length 0, so you can't access any elements.
Using a List<String> would be a better idea, since you don'T know how many lines you'll read.
List<String> lines = new ArrayList<String>();
// in the loop:
lines.add( line );
use ArrayList<String> array = new ArrayList<String>();
this will allocate memory for you as needed.

Categories

Resources