Read and discard data CSV

Read and discard data CSV - java

I have one csv with one row who have diferent users (users.csv), in the other hand I also have a csv with users (users2.csv)..
The problem is that I want to "compare?" these two documents and discard users from users2.csv to users1.csv if they exist in this file.
Please ideas or advice, how could I do it??

Load the first file into a List<String> users.
Load the second file into a List<String> users2.
use apache commons-collections CollectionUtils.removeAll(Collection<E> users, Collection<?> users2)
To load a file in a list you can find inspiration here.
Et voilà.
This only works if the size of the files is acceptable to load in memory. Otherwise it requires another approach like sorting both files using command line sort commands and walk through both files reading line by line and decide to write to output or not.

You can use BeyondCompare to compare the two csvs. It will distinctively identify the missing user along with other data mismatch if any. In case if you want to do it programatically, you can create a user bean (and override equals method to compare username or any other you want) after copying csv into list/map of beans.

Best way I see,
1) Read both the files using Java NIO Api (That's actually very fast)separately and store them into list.
Path path = Paths.get("src/main/resources/shakespeare.txt");
try {
Files.lines(path).forEach(System.out::println);//print each line
} catch (IOException ex) {
ex.printStackTrace();//handle exception here
}
2) Compare both list using java 8 predictor.
public static List < String > filterAndGetEmployees(List < String> employees,
Predicate < String > predicate) {
return list.stream().filter(predicate).collect(Collectors. < String > toList());
}
3) If you wish to write file again , You can go like,
Path path = Paths.get("src/main/resources/shakespeare.txt");
try(BufferedWriter writer = Files.newBufferedWriter(path, Charset.forName("UTF-8"))){
writer.write("To be, or not to be. That is the question.");
}catch(IOException ex){
ex.printStackTrace();
}
Hope this will help you..

Related

Checking for duplicate string in file java

I have an file, where I am writing data to it. I've tried googling, but all examples I have tried have just confused me more.
I am inputting data into a file, and this is happening correctly, where the items selected are being appended to the file. Where my issue is, is that I want to check whether the string being inputted already exists in the file, and if it does, I want to skip it.
The code I am using to input the data to the file is below, but I am not sure how to change it to check for a duplicate.
for (EventsObj p : boxAdapter.getBox()) {
if (p.box){
String result = p.name + " " + p.price;
try {
// open file for writing
OutputStreamWriter out= new OutputStreamWriter(openFileOutput("UserEvents.txt",MODE_APPEND));
// write the contents to the file
out.write(result);
out.write('\n');
// close the file
out.close();
}
catch (java.io.IOException e) {
//do something if an IOException occurs.
Toast.makeText(this, "Sorry Text could't be added", Toast.LENGTH_LONG).show();
}
}
}
It is getting the checkboxes ticked, then getting the name and price related to it and appending it to file. But I want to carry out a check that this does not already exist. Any help would be appreciated and I've exhausted google and tried many things.

So, if I understood your question correctly the file contains a number of strings delimited by newline.
What you want to do is to read the file contents line by line, and store the lines in a HashSet<String>. Then, you open the file for appending and append the additional string, but only if the file did not contain the string already. As the other answer suggested, you use the contains method. However, unlike the other answer I'm not suggesting to use a list of strings; instead, I'm suggesting the use of a HashSet as it's more efficient.
While reading the file contents line by line, you can perform some basic checks: does the file already contain duplicate rows? You may want to handle those by giving the user a warning that the file format is invalid. Or you may want to proceed nevertheless.

You should firstly read from the file and create a list of strings with all your inputs.
Then before adding to the file you can check if the list of strings contains the string you want to add (just make sure that the strings share the same format such that a match will be found). If it returns false add to the file, if yes don't add to the file.
Shouldn't be such a tremendous task. You can make use of the contains method.

You might need to keep the contents of the file in a String in your program. A little inefficient, but at the moment I do not see any other way but to keep track of things in your program instead of on the file.
So before you run the program which appends text to the file, the very first thing you should probably do is parse the file for all text:
File yourFile = new File("file-path-goes-here");
Scanner input = null;
try {
input = new Scanner (new FileInputStream(yourFile) );
}
catch (FileNotFoundException e) {;;;}
String textFromFile = "";
while (input.hasNextLine())
textFromFile += input.nextLine() + "\n";
//Now before adding to the file simply run something like this
if(textFromFile.indexOf("string-to-write-to-file") != -1)
;//do not write to file
else {
;//write to file and add to textFromFile
textFromFile += "string-you-added-to-file" + "\n";
}
Hope this answers your question. Let me know if something is not clear.

Regarding stitching of multiple files into a single file

I work on query latencies and have a requirement where I have several files which contain data. I want to aggregate this data into a single file. I use a naive technique where I open each file and collect all the data in a global file. I do this for all the files but this is time taking. Is there a way in which you can stitch the end of one file to the beginning of another and create a big file containing all the data. I think many people might have faced this problem before. Can anyone kindly help ?

I suppose you are currently doing the opening and appending by hand; otherwise I do not know why it would take a long time to aggregate the data, especially since you describe the amount of files using multiple and several which seem to indicate it's not an enormous number.
Thus, I think you are just looking for a way to automatically to the opening and appending for you. In that case, you can use an approach similar to below. Note this creates the output file or overwrites it if it already exists, then appends the contents of all specified files. If you want to call the method multiple times and append to the same file instead of overwriting an existing file, an alternative is to use a FileWriter instead with true as a second argument to its constructor so it will append to an existing file.
void aggregateFiles(List<String> fileNames, String outputFile) {
PrintWriter writer = null;
try {
writer = new PrintWriter(outputFile);
for(String fileName : fileNames) {
Path path = Paths.get(fileName);
String fileContents = new String(Files.readAllBytes(path));
writer.println(fileContents);
}
} catch(IOException e) {
// Handle IOException
} finally {
if(writer != null) writer.close();
}
}
List<String> files = new ArrayList<>();
files.add("f1.txt");
files.add("someDir/f2.txt");
files.add("f3.txt");
aggregateFiles(files, "output.txt");

Eclipse: how/where to include a text file in a Java project?

I'm using Eclipse (SDK v4.2.2) to develop a Java project (Java SE, v1.6) that currently reads information from external .txt files as part of methods used many times in a single pass. I would like to include these files in my project, making them "native" to make the project independent of external files. I don't know where to add the files into the project or how to add them so they can easily be used by the appropriate method.
Searching on Google has not turned up any solid guidance, nor have I found any similar questions on this site. If someone knows how to do add files and where they should go, I'd greatly appreciate any advice or even a point in the right direction. Also, if any additional information about the code or the .txt files is required, I'll be happy to provide as much detail as possible.
UPDATE 5/20/2013: I've managed to get the text files into the classpath; they're located in a package under a folder called 'resc' (per dharam's advice), which is on the same classpath level as the 'src' folder in which my code is packaged. Now I just need to figure out how to get my code to read these files properly. Specifically, I want to read a selected file into a two-dimensional array, reading line-by-line and splitting each line by a delimiter. Prior to packaging the files directly within the workspace, I used a BufferedReader to do this:
public static List<String[]> fileRead(String d) {
// Initialize File 'f' with path completed by passed-in String 'd'.
File f = new File("<incomplete directory path goes here>" + d);
// Initialize some variables to be used shortly.
String s = null;
List<String> a = new ArrayList<String>();
List<String[]> l = new ArrayList<String[]>();
try {
// Use new BufferedReader 'in' to read in 'f'.
BufferedReader in = new BufferedReader(new FileReader(f));
// Read the first line into String 's'.
s = in.readLine();
// So long as 's' is NOT null...
while(s != null) {
// Split the current line, using semi-colons as delimiters, and store in 'a'.
// Convert 'a' to array 'aSplit', then add 'aSplit' to 'l'.
a = Arrays.asList(s.split("\\s*;\\s*"));
String[] aSplit = a.toArray(new String[2]);
l.add(aSplit);
// Read next line of 'f'.
s = in.readLine();
}
// Once finished, close 'in'.
in.close();
} catch (IOException e) {
// If problems occur during 'try' code, catch exception and include StackTrace.
e.printStackTrace();
}
// Return value of 'l'.
return l;
}
If I decide to use the methods described in the link provided by Pangea (using getResourceAsStream to read in the file as an InputStream), I'm not sure how I would be able to achieve the same results. Would someone be able to help me find a solution on this same question, or should I ask about that issue into a different question to prevent headaches?

You can put them anywhere you wish, but depends on what you want to achieve through putting the file.
A general practice is to create a folder with name resc/resource and put files in it. Include the folder in classpath.

You can store the files within a java package and read them as classpath resources. For e.g. you can add the text files to a java package say com.foo and use this thread to know how to read them: How to really read text file from classpath in Java
This way they are independent of the environment and are co-packaged with code itself.

Add the files in the projects classpath.(you can find the class path of the project by right click the project in eclipse->Build Path->configure build path)

I guess you want an internal .txt file.
Package Explorer => Right Click at your project => New => File . Then text a file name and Finish it.
The path in your code should look like this:
Scanner diskScanner = new Scanner(new File("YourFile"));

Check if archives are identical

I'm using a shell script to automatically create a zipped backup of various directories every hour. If I haven't been working on any of them for quite some time, this creates alot of duplicate archives. MD5 hashes of the files don't match, because they do have different filenames & creation dates etc.
Other than making sure there won't be duplicates in the first place, another option is checking if filesizes match, but that doesn't necesseraly mean they are duplicates.
Filenames are done like so;
Qt_2012-03-15_23_00.tgz
Qt_2012-03-16_00_00.tgz
So maybe it would be an option to check if files have identical filesizes consequently (if that's the right word for it.)
Pseudo code:
int previoussize = 0;
String previouspath = null;
String Filename = null;
String workDir = "/path/to/workDir ";
String processedDir = "/path/to/processedDir ";
//Loop over all files
for file in workDir
{
//Match
if(file.size() == previoussize)
{
if(previouspath!=null) //skip first loop
{
rm previouspath; //Delete file
}
}
else //No Match
{
/*If there's no match, we can move the previous file
to another directory so it doesn't get checked again*/
if(previouspath!=null) //skip first loop
{
mv previouspath processedDir/Filename;
}
}
previoussize = file.size();
previouspath = file.path();
Filename = file.name();
}
Example:
Qt_2012-03-15_23_00.tgz 10KB
Qt_2012-03-16_00_00.tgz 10KB
Qt_2012-03-16_01_00.tgz 10KB
Qt_2012-03-16_02_00.tgz 15KB
Qt_2012-03-16_03_00.tgz 10KB
Qt_2012-03-16_04_00.tgz 10KB
If I'm correct this would only delete the first 2 and the second to last one. The third and the fourth should be moved to the processedDir.
So I guess I have 2 questions:
Would my pseudo code work the way I intend it to? (I find these things rather confusing.)
Is there a better/simpler/faster way? Because even though the chance of accidentally deleting non-identicals like that is very small, it's still a chance.

I can think of a couple of alternatives:
Deploy a version control system such as Git, Subversion, etc, and write a script that periodically checks in any changes. This will save a lot of space because only files that have actually changed get saved, and because changes to text files will be stored as diffs.
Use an incremental backup tool. This article lists a number of alternatives.
Normal practice is to put the version control system / backups on a different machine, but you don't have to do that.

Not clear if this need to run as a batch. If it's manual, you can run BeyondCompare or any decent comparison tool to diff the two archives

How would i delete data from an external file?

The format i have in the external file is
name
tel no
mob no
address
From the gui i would like to delete a contact which is in the format of above using my delete button.
i have completed the export method and was wondering if deleting would be similar,, here is my code for exporting.
{
FileOutputStream file;
PrintStream out;
try { file = new FileOutputStream("../files/example.buab", true);
out = new PrintStream(file);
out.println(txtname.getText());
out.println(txtnum.getText());
out.println(txtmob.getText());
out.println(txtadd1.getText());
System.err.println ("");
out.close();
}
catch (Exception e)
{
System.err.println ("Error in writing to file");
}
}

Do you really have to delete the contact on the file immediatly?
Usually you would do something like this:
Import the file content into your model, iaw a list of Contact objects
Apply all your edits to the model (change values, add a contact, delete a contact)
Save your edits, iaw overwrite the file with your model.
Much, much easier then trying to delete a single row on a file...

I assume you really have to use a file and cannot use a table in a database.
First of all you will have to assign an id to each contact, so that you can point to a certain contact, the id must be unique other than that it can be everything.
Why not organizing that file as an xml? Is this something your spec allows?

Easiest way is to read it fully, skip the lines which are supposed to be deleted and then write it fully back to the file, hereby overwriting the original one. But that's also the least efficient way. For best results, you need to organize your data more in a model.
Why don't you use a (embedded) database instead so that you can just go ahead with a simple SQL DELETE statement?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Read and discard data CSV - java

Related

Checking for duplicate string in file java

Regarding stitching of multiple files into a single file

Eclipse: how/where to include a text file in a Java project?

Check if archives are identical

How would i delete data from an external file?

Categories

Resources