Java OpenCSV - 2 List comparison and duplication - java

i am going to make a application, comparising 2 .csv lists, using OpenCSV. It should works like that:
Open 2 .csv files ( each file has columns: Name,Emails)
Save results ( and here is a prbolem, i don't know if it should be save to table or something)
Compare From List1 and List2 value of "Emails column".
If Email from List 1 appear on List2 - delete it(from list 1)
Export results to new .csv file
I don't know if it's good algorithm. Please Tell me which option to saving results of reading .csv file is best in that case.
Kind Regards

You can get around this more easily with univocity-parsers as it can read your data into columns:
CsvParserSettings parserSettings = new CsvParserSettings(); //parser config with many options, check the tutorial
parserSettings.setHeaderExtractionEnabled(true); // uses the first row as headers
// To get the values of all columns, use a column processor
ColumnProcessor rowProcessor = new ColumnProcessor();
parserSettings.setRowProcessor(rowProcessor);
CsvParser parser = new CsvParser(parserSettings);
//This will parse everything and pass the data to the column processor
parser.parse(new FileReader(new File("/path/to/your/file.csv")));
//Finally, we can get the column values:
Map<String, List<String>> columnValues = rowProcessor.getColumnValuesAsMapOfNames();
Let's say you parsed the second CSV with that. Just grab the emails and create a set:
Set<String> emails = new HashSet<>(columnValues.get("Email"));
Now just iterate over the first CSV and check if the emails are in the emails set.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

If you have a hard requirement to use openCSV then here is what I believe is the easiest solution:
First off I like Jeronimo's suggestion about the HashSet. Read the second csv file first using the CSVToBean and save off the email addresses in the HashSet.
Then create a Filter class that implements the CSVToBeanFilter interface. In the constructor pass in the set and in the allowLine method you look up the email address and return true if it is not in the set (so you have a quick lookup).
Then you pass the filter in the CsvToBean.parse when reading/parsing the first file and all you will get are the records from the first file whose email addresses are not on the second file. The CSVToBeanFilter javadoc has a good example that shows how this works.
Lastly use the BeanToCSV to create a file from the filtered list.
In interest of fairness I am the maintainer of the openCSV project and it is also open source and free (Apache V2.0 license).

Related

How to keep zero begin string when export data using opencsv library

I using opencsv library in java and export csv. But i have problem. When i used string begin zero look like : 0123456 , when i export it remove 0 and my csv look like : 123456. Zero is missing. I using way :
"\"\t"+"0123456"+ "\""; but when csv export it look like : "0123456" . I don't want it. I want 0123456. I don't want edit from excel because some end user don't know how to edit. How to export csv using open csv and keep 0 begin string. Please help
I think it is not really the problem when generating CSV but the way excel treats the data when opened via explorer.
Tried this code, and viewed the CSV in a text editor ( not excel ), notice that it shows up correctly, though when opened in excel, leading 0s are lost !
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"));
// feed in your array (or convert your data to an array)
String[] entries = "0123131#21212#021213".split("#");
List<String[]> a = new ArrayList<>();
a.add(entries);
//don't apply quotes
writer.writeAll(a,false);
writer.close();
If you are really sure that you want to see the leading 0s for numeric values when excel is opened by user, then each cell entry be in format ="dataHere" format; see code below:
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"));
// feed in your array (or convert your data to an array)
String[] entries = "=\"0123131\"#=\"21212\"#=\"021213\"".split("#");
List<String[]> a = new ArrayList<>();
a.add(entries);
writer.writeAll(a);
writer.close();
This is how now excel shows when opening excel from windows explorer ( double clicking ):
But now, if we see the CSV in a text editor, with the modified data to "suit" excel viewing, it shows as :
Also see link :
format-number-as-text-in-csv-when-open-in-both-excel-and-notepad
have you tried to use String like this "'"+"0123456". ' char will mark number as text when parse into excel
For me OpenCsv works correctly ( vers. 5.6 ).
for example my csv file has a row as the following extract:
"999739059";;;"abcdefgh";"001024";
and opencsv reads the field "1024" as 001024 corretly. Of course I have mapped the field in a string, not in a Double.
But, if you still have problems, you can grab a simple yet powerful parser that fully adheres with RFC 4180 standard:
mykong.com
Mykong shows you some examples using opencsv directly and, in the end, he writes a simple parser to use if you don't want to import OpenCSV , and the parser works very well , and you can use it if you still have any problems.
So you have an easy-to-understand source code of a simple parser that you can modify as you want if you still have any problem or if you want to customize it for your needs.

Java - method to lookup a specific ID from txt file and return that line details

I am writing a method to lookup a specific ID that is stored within a txt file.
These details are assigned to an arrayList titled list, if the lookup string matches the data stored in list then it reads the id,firstname,surname (IE the whole line of the txt file) and then creates an instance of another class profile.
I then want to add this lookup data to a new arrayList titled lookup then to output it. I have the below method however, it does not work and just jumps to my else clause.
Could anyone tell me where i'm going wrong and how to fix would be appreciated. Thanks.
Could you instead use a TupleMap for the same effect?
// create our map Map peopleByForename
= new HashMap>();
// populate it peopleByForename.put("Bob", new Tuple2(new Person("Bob
Smith",
new Person("Bob Jones"));
// read from it Tuple bobs = peopleByForename["Bob"];
Person bob1 = bobs.Item1; Person bob2 = bobs.Item2;
Then an example of reading the key:value from the txt file can be found here : Java read txt file to hashmap, split by ":" using Buffered Reader.
If you are using Java8, you can use Lambda to help you. Just replace this line:
if(list.contains(IDlookup))
to this one:
boolean containsId = list.stream().anyMatch((Profile p) -> p.getId().equals(IDlookup));
if (containsId)

Map Multiple CSV to single POJO

I have many CSV files with different column header. Currently I am reading those csv files and map them to different POJO classes based on their column header. So some of the CSV files have around 100 column headers which makes difficult to create a POJO class.
So Is there any technique where I can use single pojo, so when reading those csv files can map to a single POJO class or I should read the CSV file line by line and parse accordingly or I should create the POJO during runtime(javaassist)?
If I understand your problem correctly, you can use uniVocity-parsers to process this and get the data in a map:
//First create a configuration object - there are many options
//available and the tutorial has a lot of examples
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(settings);
parser.beginParsing(new File("/path/to/your.csv"));
// you can also apply some transformations:
// NULL year should become 0000
parser.getRecordMetadata().setDefaultValueOfColumns("0000", "Year");
// decimal separator in prices will be replaced by comma
parser.getRecordMetadata().convertFields(Conversions.replace("\\.00", ",00")).set("Price");
Record record;
while ((record = parser.parseNextRecord()) != null) {
Map<String, String> map = record.toFieldMap(/*you can pass a list of column names of interest here*/);
//for performance, you can also reuse the map and call record.fillFieldMap(map);
}
Or you can even parse the file and get beans of different types in a single step. Here's how you do it:
CsvParserSettings settings = new CsvParserSettings();
//Create a row processor to process input rows. In this case we want
//multiple instances of different classes:
MultiBeanListProcessor processor = new MultiBeanListProcessor(TestBean.class, AmountBean.class, QuantityBean.class);
// we also need to grab the headers from our input file
settings.setHeaderExtractionEnabled(true);
// configure the parser to use the MultiBeanProcessor
settings.setRowProcessor(processor);
// create the parser and run
CsvParser parser = new CsvParser(settings);
parser.parse(new File("/path/to/your.csv"));
// get the beans:
List<TestBean> testBeans = processor.getBeans(TestBean.class);
List<AmountBean> amountBeans = processor.getBeans(AmountBean.class);
List<QuantityBean> quantityBeans = processor.getBeans(QuantityBean.class);
See an example here and here
If your data is too big and you can't hold everything in memory, you can stream the input row by row by using the MultiBeanRowProcessor instead. The method rowProcessed(Map<Class<?>, Object> row, ParsingContext context) will give you a map of instances created for each class in the current row. Inside the method, just call:
AmountBean amountBean = (AmountBean) row.get(AmountBean.class);
QuantityBean quantityBean = (QuantityBean) row.get(QuantityBean.class);
...
//perform something with the instances parsed in a row.
Hope this helps.
Disclaimer: I'm the author of this library. It's open-source and free (Apache 2.0 license)
To me, creating a POJO class is not a good idea in this case. As neither number of columns nor number of files are constant.
Therefore, it is better to use something more dynamic for which you do not have to change your code to a great extent just to support more columns OR files.
I would go for a List (Or Map) of Map List<Map<>> for a given csv file.
Where each map represents a row in your csv file with key as column name.
You can easily extend it to multiple csv files.

Reading JSON file with BigQuery to make table

I'm new to Google Dataflow, and can't get this thing to work with JSON. I've been reading throughout the documentation, but can't solve my problem.
So, following the WordCount example i figured how data is loaded from .csv file with next line
PCollection<String> input = p.apply(TextIO.Read.from(options.getInputFile()));
where inputFile in .csv file from my gcloud bucket. I can transform read lines from .csv with:
PCollection<TableRow> table = input.apply(ParDo.of(new ExtractParametersFn()));
(Extract ParametersFn defined by me). So far so good!
But then I realize my .csv file is too big and had to convert it to JSON (https://cloud.google.com/bigquery/preparing-data-for-bigquery).
Since BigQueryIO is supposedly better for reading JSON, I tried with the following code:
PCollection<TableRow> table = p.apply(BigQueryIO.Read.from(options.getInputFile()));
(inputFile is then JSON file and the output when reading with BigQuery is PCollection with TableRows) I tried with TextIO too (which returns PCollection with Strings) and neither of the two IO options work.
What am I missing? The documentation is really not that detailed to find an answer there, but perhaps some of you guys already dealt with this problem before?
Any suggestions would be very appreciated. :)
I believe there are two options to consider:
Use TextIO with TableRowJsonCoder to ingest the JSON files (e.g., like it is done in the TopWikipediaSessions example);
Import the JSON files into a bigquery table (https://cloud.google.com/bigquery/loading-data-into-bigquery), and then use BigQueryIO.Read to read from the table.

How to create new instances of classes based on a CSV file

Let's say I have a class, Car, and I'm trying to import a large set of data to create multiple instances of "Car".
My CSV file is laid out like so:
Car Manufacturer,Model,Color,Owner,MPG,License Plate,Country of Origin,VIN,... etc
The point is, there is a lot of data that needs to be in the constructor. If there's only a few of these, it wouldn't be that bad to manually instantiate it by writing Car FordFocus = new Car(Ford,Focus,Blue,John Doe,108-J1AZ,USA,194241-12e1...), but if I have hundreds of these, is there any way to import all this data to make the classes?
As George mentions, you need a tool. I have used opencsv before to achieve this.
opencsv provides you three mapping strategies (which can be further extended) for mapping a CSV row to bean. The simplest is ColumnPositionMappingStrategy. So if your CSV format is fixed, e.g. the header row looks like:
Car Manufacturer,Model,Color,Owner,MPG,License Plate,Country of Origin,VIN,... etc
This code snippet will help you. I have also used HeaderColumnNameTranslateMappingStrategy which lets you map CSV header names to bean field names e.g. "Car Manufacturer" -> carManufacturer.
CSVReader csvReader = new CSVReader(new FileReader(csvFile));
ColumnPositionMappingStrategy<Car> strategy = new ColumnPositionMappingStrategy<Car>();
strategy.setType(Car.class);
String[] columns = new String[] {"CarManufacturer","Model","Color","Owner","MPG","LicensePlate","CountryOfOrigin","VIN"}; // the fields to bind do in your JavaBean
strategy.setColumnMapping(columns);
CsvToBean<Car> csv = new CsvToBean<Car>();
List<Car> list = csv.parse(strategy, csvReader);
A self contained sample program can be found here
Reflection is a possibility.
You can associate an attribute with a position in your CSV file (a column).
See for example of setting attribute with reflection : https://docs.oracle.com/javase/tutorial/reflect/member/fieldValues.html
You can read the csv file line by line and can create the Car object by constructor in loop.

Categories

Resources