Java csv parser/writer - java

I'm trying to get a CSV from some data retrieved by Oracle. I have just to write the csv, using the result of the query as column of csv. This is my code:
// get data
final List<myDto> dataCsv = myDao.getdata();
StringWriter writer = new StringWriter();
CSVWriter csvWriter = new CSVWriter(writer,';');
List<String[]> result = toStringArray(dataCsv);
csvWriter.writeAll(result);
csvWriter.close();
return Response.ok(result).header("Content-Disposition", "attachment; filename=" + fileName).build();`
Obviously it can't find toStringArray(). But have I to build it? Do I really need it? How do I have to edit the edit to get it working?

If you just follow the example from the link that you've given, you'll see what they're doing...
private static List<String[]> toStringArray(List<Employee> emps) {
List<String[]> records = new ArrayList<String[]>();
//add header record
records.add(new String[]{"ID","Name","Role","Salary"});
Iterator<Employee> it = emps.iterator();
while(it.hasNext()){
Employee emp = it.next();
records.add(new String[]{emp.getId(),emp.getName(),emp.getRole(),emp.getSalary()});
}
return records;
}
Essentially, you need to build a List of String[]. Each String[] represents a single line of data for the CSV, with each element of the array being a value. So, yes, you need to build a List from your data model and pass it to the CSVWriter's writeAll() method.
The first String[] in the list is the column headers for the CSV. The subsequent String arrays are the data itself.

Apache Commons CSV
The Apache Commons CSV library can help with the chore of reading/writing CSV files. It handles several variants of the CSV format, including the one used by Oracle.
• CSVFormat.ORACLE
Employee.java
Let's make a class for Employee.
package work.basil.example;
import java.util.Objects;
public class Employee {
public Integer id;
public String name, role;
public Integer salary;
public Employee ( Integer id , String name , String role , Integer salary ) {
Objects.requireNonNull( id ); // etc.
this.id = id;
this.name = name;
this.role = role;
this.salary = salary;
}
#Override
public String toString ( ) {
return "Employee{ " +
"id=" + id +
" | name='" + name + '\'' +
" | role='" + role + '\'' +
" | salary=" + salary +
" }";
}
}
Example app
Make another class to mimic retrieving your DTOs. Then we write to a CSV file.
Obviously it can't find toStringArray(). But have I to build it? Do I really need it? How do I have to edit the edit to get it working?
To answer your specific Question, there is no toStringArray method to create field values for the CSV from your DTO object‘s member variables.
Binding
This idea of mapping input or output data with member variables of a Java object is generally known as binding.
There are sophisticated binding libraries for Java to bind your objects with XML and with JSON, JAXB and JSON-B respectively. Objects can be automatically written out to XML or JSON text, as well as “re-hydrated” back to objects when read from such XML or JSON text.
But for CSV with a simpler library such as Apache Commons CSV, we read and write each field of data individually for each object. You pass each DTO object member variable, and Commons CSV will write those values out to the CSV text with any needed encapsulating quote marks, commas, and escaping.
You can see this in the code below, in this line:
printer.printRecord( e.id , e.name , e.role , e.salary );
EmployeeIo.java
Here is the entire EmployeeIo.java file where Io means input-output.
package work.basil.example;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
public class EmployeeIo {
public static void main ( String[] args ) {
EmployeeIo app = new EmployeeIo();
app.doIt();
}
private void doIt ( ) {
// Mimic a collection of DTO objects coming from the database.
List < Employee > employees = new ArrayList <>( 3 );
employees.add( new Employee( 101 , "Alice" , "Boss" , 11_000 ) );
employees.add( new Employee( 102 , "Bob" , "Worker" , 12_000 ) );
employees.add( new Employee( 103 , "Carol" , "Worker" , 13_000 ) );
Path path = Paths.get( "/Users/basilbourque/Employees.csv" );
this.write( employees , path );
}
public void write ( final List < Employee > employees , final Path path ) {
try ( final CSVPrinter printer = CSVFormat.ORACLE.withHeader( "Id" , "Name" , "Role" , "Salary" ).print( path , StandardCharsets.UTF_8 ) ; ) {
for ( Employee e : employees ) {
printer.printRecord( e.id , e.name , e.role , e.salary );
}
} catch ( IOException e ) {
e.printStackTrace();
}
}
}
When run, we produce a file.

Related

I cannot understand how an ArrayList can have Class object as a datatype

import java.time.LocalDate;
import java.util.ArrayList;
class Books{
String bookName, authorName;
public Books(String bName, String aName){
this.authorName = aName;
this.bookName = bName;
}
#Override
public String toString(){
return "Book Details{Book name: "+bookName+", Author: "+authorName+"}";
}
}
public class Ex7_LibraryManagementSystem {
What is going on here? I'm new to java so I don't get the ArrayList that much. Are we creating an ArrayList with a class datatype??? Does this part falls in Advance Java or do I need to revise my basics again? I'm confused with all these class Books passing as an argument thing
ArrayList<Books> booksList;
Ex7_LibraryManagementSystem(ArrayList<Books> bookName){
this.booksList = bookName;
}
public void addBooks(Books b){
this.booksList.add(b);
System.out.println("Book Added Successfully");
}
public void issuedBooks(Books b,String issuedTo,String issuedOn){
if(booksList.isEmpty()){
System.out.println("All Books are issued.No books are available right now");
}
else{
if(booksList.contains(b))
{
this.booksList.remove(b);
System.out.println("Book "+b.bookName+" is issued successfully to "+issuedTo+" on "+issuedOn);
}
else{
System.out.println("Sorry! The Book "+b.bookName+" is already been issued to someone");
}
}
}
public void returnBooks(Books b, String returnFrom){
this.booksList.add(b);
System.out.println("Book is returned successfully from "+returnFrom+" to the Library");
}
public static void main(String[] args) {
Also please Explain why are we creating this below ArrayList
ArrayList<Books> book1 = new ArrayList<>();
LocalDate ldt = LocalDate.now();
Books b1 = new Books("Naruto","Kisishima");
book1.add(b1);
Books b2 = new Books("Naruto Shippuden","Kisishima");
book1.add(b2);
Books b3 = new Books("Attack On Titan","Bhaluche");
book1.add(b3);
Books b4 = new Books("Akame Ga Kill","Killer bee");
book1.add(b4);
Books b5 = new Books("Death Note","Light");
book1.add(b5);
Ex7_LibraryManagementSystem l = new Ex7_LibraryManagementSystem(book1);
// l.addBooks(new Books("Boruto","Naruto"));
l.issuedBooks(b3,"Sanan",ldt.getDayOfMonth()+"/"+ldt.getMonthValue()+"/"+ldt.getYear());
l.issuedBooks(b1,"Sandy",ldt.getDayOfMonth()+"/"+ldt.getMonthValue()+"/"+ldt.getYear());
// l.issuedBooks(b2,"Suleman",ldt.getDayOfMonth()+"/"+ldt.getMonthValue()+"/"+ldt.getYear());
// l.issuedBooks(b4,"Sanju",ldt.getDayOfMonth()+"/"+ldt.getMonthValue()+"/"+ldt.getYear());
// l.issuedBooks(b5,"Thor",ldt.getDayOfMonth()+"/"+ldt.getMonthValue()+"/"+ldt.getYear());
l.issuedBooks(b1,"anuj",ldt.getDayOfMonth()+"/"+ldt.getMonthValue()+"/"+ldt.getYear());
}
}
Please Help Me...Thank you!
Generics
You asked:
ArrayList<Books> booksList;
What is going on here? I'm new to java so I don't get the ArrayList that much. Are we creating an ArrayList with a class datatype???
You need to learn about Generics in Java.
ArrayList is collection, a data structure for holding objects.
<Book> (after fixing your misnomer Books) is telling the compiler that we intend to store only objects of the Book class in this particular collection.
If we mistakenly try to put a Dog object or an Invoice object into that collection, the compiler will complain. You will get an error message at compile-time explaining that only objects of the Book class can be put into that collection.
Also, you can put objects that are from a class that is a subclass of Book. Imagine you had HardCoverBook and SoftCoverBook classes that both extend from the Book class. Objects of those subclasses can also go into a collection of Book objects.
Other issues
Naming is important. Clear naming makes your code easier to read and comprehend.
So your class Books describes a single book. So it should be named in the singular, Book.
When collecting a bunch of book objects, such as a List, name that collection in the plural. For example List < Book > books.
Your book class could be more briefly written as a record. And we could shorten the names of your member fields.
record Book( String title, String author ) {}
We could shorten Ex7_LibraryManagementSystem to Library.
We need two lists rather than the one seen in your code. Given your scenario, we want to move books between a list for books on hand and a list of books loaned out.
More naming: The argument in your constructor should not be bookName, it should be something like initialInventory. And the type of that parameter should be simply Collection rather than specifically ArrayList or even List.
When passing in a collection of Book objects, copy them into our internally-managed lists. We don’t want the calling code to be able to change the collection between our back. By making a copy, we take control.
For that matter, your books are not ordered, so no need for List. If the book objects are meant to be unique, we can collect them as Set rather than List — but I'll ignore that point.
Your addBooks adds only a single book, so rename in the singular.
"Issue" is an odd term; "loan" seems more appropriate to a library. Similarly, the issuedBooks method could use better naming, including not being in past-tense. Use date-time classes to represent date-time values, such as LocalDate for a date-only value (without time-of-day, and without time zone or offset). Mark those arguments final to avoid accidentally changing them in your method.
loanBook ( final Book book , final String borrower , final LocalDate dateLoaned ) { … }
I recommend checking for conditions that should never happen, to make sure all is well. So rather than assume a book is on loan, verify. If things seem amiss, report.
After those changes, we have something like this.
package work.basil.example.lib;
import java.time.Instant;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Set;
record Book( String title , String author )
{
}
public class Library
{
private List < Book > booksOnHand, booksOnLoan;
Library ( Collection < Book > initialInventory )
{
this.booksOnHand = new ArrayList <>( initialInventory );
this.booksOnLoan = new ArrayList <>( this.booksOnHand.size() );
}
public void addBook ( Book b )
{
this.booksOnHand.add( b );
System.out.println( "Book added successfully." );
}
public void loanBook ( final Book book , final String borrower , final LocalDate dateLoaned )
{
if ( this.booksOnHand.isEmpty() )
{
System.out.println( "All Books are issued. No books are available right now." );
}
else
{
if ( this.booksOnHand.contains( book ) )
{
this.booksOnHand.remove( book );
this.booksOnLoan.add( book );
System.out.println( "Book " + book.title() + " by " + book.author() + " is loaned to " + borrower + " on " + dateLoaned );
}
else if ( this.booksOnLoan.contains( book ) )
{
System.out.println( "Sorry! The Book " + book.title() + " by " + book.author() + " is out on loan." );
}
else
{
System.out.println( "ERROR – We should never have reached this point in the code. " );
}
}
}
public void returnBook ( Book book , String returnFrom )
{
if ( this.booksOnLoan.contains( book ) )
{
this.booksOnLoan.remove( book );
this.booksOnHand.add( book );
System.out.println( "The Book " + book.title() + " by " + book.author() + " has been returned to the Library." );
}
else
{
System.out.println( "The Book " + book.title() + " by " + book.author() + " is not out on loan, so it cannot be returned to the Library." );
}
}
public String reportInventory ( )
{
StringBuilder report = new StringBuilder();
report.append( "On hand: " + this.booksOnHand );
report.append( "\n" );
report.append( "On load: " + this.booksOnLoan );
return report.toString();
}
public static void main ( String[] args )
{
List < Book > stockOfBooks =
List.of(
new Book( "Naruto" , "Kisishima" ) ,
new Book( "Naruto Shippuden" , "Kisishima" ) ,
new Book( "Attack On Titan" , "Bhaluche" ) ,
new Book( "Akame Ga Kill" , "Killer bee" ) ,
new Book( "Death Note" , "Light" )
);
Book b1 = stockOfBooks.get( 0 ), b2 = stockOfBooks.get( 1 ), b3 = stockOfBooks.get( 2 );
Library library = new Library( stockOfBooks );
library.loanBook( b3 , "Sanan" , LocalDate.now() );
library.loanBook( b1 , "Sandy" , LocalDate.now().plusDays( 1 ) );
library.loanBook( b2 , "anuj" , LocalDate.now().plusDays( 2 ) );
library.returnBook( b1 , "Sandy" );
System.out.println( library.reportInventory() );
}
}
When run.
Book Attack On Titan by Bhaluche is loaned to Sanan on 2022-04-19
Book Naruto by Kisishima is loaned to Sandy on 2022-04-20
Book Naruto Shippuden by Kisishima is loaned to anuj on 2022-04-21
The Book Naruto by Kisishima has been returned to the Library.
On hand: [Book[title=Akame Ga Kill, author=Killer bee], Book[title=Death Note, author=Light], Book[title=Naruto, author=Kisishima]]
On load: [Book[title=Attack On Titan, author=Bhaluche], Book[title=Naruto Shippuden, author=Kisishima]]

Searching data from arraylist

I am a newbie of c++.
Now I am doing a project need to read a customer list from a csv file and then search if there is a username like "Ali" and printout all the data about Ali.
How can I search "Ali" and printout all the data about Ali like CustomerNo , Name , PhoneNo and Status?
And if there is multiple data with "Ali" , how can I printout all of them either?
Here is my code:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.Iterator;
public class LoadCustomer {
public static void main(String[] args) throws IOException{
System.out.println ("Load customer from file");
ArrayList<Customer> customers = readCustomerFromFile();
System.out.println (customers);
System.out.println ();
private static ArrayList<Customer> readCustomerFromFile() throws IOException{
ArrayList<Customer> customers = new ArrayList<>();
List<String> lines = Files.readAllLines(Paths.get("customer.csv"));
for (int i = 1 ; i < lines.size() ; i++){
String[] items = lines.get(i).split(",");
int customerNo = Integer.parseInt(items[0]);
int phoneNo = Integer.parseInt(items[2]);
customers.add (new Customer(customerNo,items[1],phoneNo,items[3]));
}
return customers;
}
}
Here is my Customer class:(added getName getter)
public class Customer {
private int customerNo;
private String name;
private int phoneNo;
private String status;
public Customer () {}
public Customer (int customerNo, String name, int phoneNo, String status){
this.customerNo = customerNo;
this.name = name;
this.phoneNo = phoneNo;
this.status = status;
}
public String getName(){
return name;
}
public String toString(){
return customerNo + " " + name + " " + phoneNo + " " + status;
}
public String toCSVString(){
return customerNo + "," + name + "," + phoneNo + "," + status;
}
}
And here is my data:
CustomerNo Name PhoneNo Status
1 Ali 12345 Normal
2 Siti 23456 Normal
3 Rone 78910 Normal
4 Jean 56789 Normal
5 Roby 28573 Normal
6 Ali 78532 Normal
Thank you very much for your attention.
Edited :
Here is my code for this program:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Collectors;
public class FindCustomer {
public static void main(String[] args) throws IOException{
System.out.println ("Load customer from file");
java.util.Map<String, List<Customer>> customers =
Files.lines(Paths.get("customer.csv"))
.map(line -> line.split(","))
.map(field -> new Customer(
Integer.parseInt(field[0]), field[1],
Integer.parseInt(field[2]), field[3]))
.collect(Collectors
.groupingBy(Customer::getName));
System.out.println (customers);
}
}
Bit of a broad question.
If you expect to do this a lot, and on a boatload of data, do what everybody else does when they are faced with a lot of relational data that they need to run queries on. Use a database, like postgres or h2. To interact with those from java, use JDBI or JOOQ.
If this is just a small simple text file and/or you're trying to learn some java, well, you still have two options here: You can loop through the data, or, you can build a mapping.
The loop option is simple:
for (Customer c : customers) if (c.getName().equals("Ali")) {
// do what you want here. 'c' holds the customer object of Ali.
}
But this does, of course, require a full run through all the entries every time. Another option is to build a mapping:
var map = new HashMap<String, Customer>();
for (Customer c : customers) map.put(c.getName(), c);
// map now maps a customer name to the customer object.
Customer ali = map.get("Ali");
maps have the advantage that they are near instant lookup. Even if the map contains a million entries, map.get(x) is (near) instant. A decent solution if you have lots of data + the need to do lots of lookups. But, you have to build a complete map for everything you care to query on. So, if you want to do lookups on name, and then later something like 'get all customers with a 6 digit phone number whose status is Normal', then, get a database.
As was suggested a map would be useful. You can create one on the fly as you read in the file.
Splits the line
creates a customer.
and groups it by name in a map.
Now the map will hold for each name, all customers that have that name.
Map<String, List<Customer>> customers =
Files.lines(Paths.get("customer.csv"))
.map(line -> line.split("\\s*,\\s*"))
.map(field -> new Customer(
Integer.parseInt(field[0]), field[1],
Integer.parseInt(field[2]), field[3]))
.collect(Collectors
.groupingBy(Customer::getName));
To get the List of customers for the name Ali do the following.
List<Customer> ali = customers.get("Ali");
Now it's up to you to format or otherwise use the list as required. You will still need to handle exceptions via try/catch.

Java CSV Writing

Im currently trying to write data into excel for a report. I can write data to the csv file however its not coming out in excel in the order I want. I need the data to print under best and worst fitness in each column instead of it all print under Average. Here is the relevant code, any help would be appreciated:
String [] Fitness = "Average fitness#Worst fitness #Best Fitness".split("#");
writer.writeNext(Fitness);
//takes data from average fitness and stores as an int
int aFit = myPop.individuals[25].getFitness();
//converts int to string
String aFit1 = Integer.toString(aFit);
//converts string to string array
String aFit2 [] = aFit1.split(" ");
//writes to csv
writer.writeNext(aFit2);
//String [] nextCol = "#".split("#");
int wFit = myPop.individuals[49].getFitness();
String wFit1 = Integer.toString(wFit);
String wFit2 [] = wFit1.split(" ");
writer.writeNext(wFit2);
int bFit = myPop.individuals[1].getFitness();
String bFit1 = Integer.toString(bFit);
String bFit2 [] = bFit1.split(" ");
writer.writeNext(bFit2);
enter image description here
I think you should call your "writeNext" method once per line of datas:
String [] Fitness = "Average fitness#Worst fitness #Best Fitness".split("#");
writer.writeNext(Fitness);
int aFit = myPop.individuals[25].getFitness();
String aFit1 = Integer.toString(aFit);
int wFit = myPop.individuals[49].getFitness();
String wFit1 = Integer.toString(wFit);
int bFit = myPop.individuals[1].getFitness();
String bFit1 = Integer.toString(bFit);
writer.writeNext(new String[]{aFit1, wFit1, bFit1});
From the docs at
CSVWriter.html#writeNext(java.lang.String[])
public void writeNext(String[] nextLine)
- Writes the next line to the file.
The String array to provide is
A string array with each comma-separated element as a separate entry.
You are writing 3 separate lines instead of 1 and each line you write contains an Array with a single entry.
writer.writeNext(aFit2);
writer.writeNext(wFit2);
writer.writeNext(bFit2);
Solution:
Create a single Array with all 3 entries (column values) and write that once on a single line.
I am assuming you are using CSVWriter to write to a CSV file. Please make sure to mention as much details as possible in a question, it makes it much more readable to others.
As you can see from the documentation of CSVWriter:
void writeNext(String[] nextLine)
Writes the next line to the file.
The writeNext method actually writes the array to the an individual line of the file. From your code:
writer.writeNext(aFit2);
writer.writeNext(wFit2);
writer.writeNext(bFit2);
So, instead of doing this `String aFit2 [] = aFit1.split(" ");
Create an array of the values and then pass that array to writeNext
As an example, you can consider you own example of passing the array of column names, which gets written in a single line:
writer.writeNext(Fitness);
Apache Commons CSV
Here is the same kind of solution, but using the Apache Commons CSV library. This library specifically supports the Microsoft Excel variant of CSV format, so you may find it particularly useful.
CSVFormat.Predefined.EXCEL
Your data, both read and written in this example.
The Commons CSV library can read the first row as header names.
Here is a complete example app in a single .java file. First the app reads from an existing WorstBest.csv data file:
Average,Worst,Best
10,5,15
11,5,16
10,6,16
11,6,15
10,5,16
10,5,16
10,4,16
Each row is represented as a List of three String objects, a List< String >. We add each row to a collection, a list of lists, a List< List< String > >.
Then we write out that imported data to another file. Each written file is name WorstBest_xxx.csv where xxx is the current moment in UTC.
package com.basilbourque.example;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class WorstBest {
public static void main ( String[] args ) {
WorstBest app = new WorstBest();
List < List < String > > data = app.read();
app.write( data );
}
private List < List < String > > read ( ) {
List < List < String > > listOfListsOfStrings = List.of();
try {
// Locate file to import and parse.
Path path = Paths.get( "/Users/basilbourque/WorstBest.csv" );
if ( Files.notExists( path ) ) {
System.out.println( "ERROR - no file found for path: " + path + ". Message # 3cf416de-c33b-4c39-8507-5fbb72e113f2." );
}
// Hold data read from file.
int initialCapacity = ( int ) Files.lines( path ).count();
listOfListsOfStrings = new ArrayList <>( initialCapacity );
// Read CSV file.
BufferedReader reader = Files.newBufferedReader( path );
Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
for ( CSVRecord record : records ) {
// Average,Worst,Best
// 10,5,15
// 11,5,16
String average = record.get( "Average" ); // Must use annoying zero-based index counting.
String worst = record.get( "Worst" );
String best = record.get( "Best" );
// Collect
listOfListsOfStrings.add( List.of( average , worst , best ) ); // For real work, you would define a class to hold these values.
}
} catch ( IOException e ) {
e.printStackTrace();
}
return listOfListsOfStrings;
}
private void write ( List < List < String > > listOfListsOfStrings ) {
Objects.requireNonNull( listOfListsOfStrings );
// Determine file in which to write data.
String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" ); // Colons are forbidden in names by some file systems such as HFS+.
Path path = Paths.get( "/Users/basilbourque/WorstBest_" + when + ".csv" );
// Loop collection of data (a list of lists of strings).
try ( final CSVPrinter printer = CSVFormat.EXCEL.withHeader( "Average" , "Worst" , "Best" ).print( path , StandardCharsets.UTF_8 ) ; ) {
for ( List < String > list : listOfListsOfStrings ) {
printer.printRecord( list.get( 1 - 1 ) , list.get( 2 - 1 ) , list.get( 3 - 1 ) ); // Annoying zero-based index counting.
}
} catch ( IOException e ) {
e.printStackTrace();
}
}
}

How to read multiple csv files with different formats in java [duplicate]

This question already has answers here:
how to read csv file without knowing header using java?
(3 answers)
Closed 6 years ago.
I am implementing csv file listener using watchservice. My requirement is :
Get csv files from the registered directory with java watch service. I am struggling to process the received files and stored data into the database. the problem here is - CSV file format is not predefined, file can have any number of columns and different headers. I have referred many sites but not found any solution. please help.
this sample code helps you to process your files:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Main {
private static String HEADER_DELIMITER = ",";
private static String DATA_DELIMITER = ",";
public static void main(String[] args) {
/**
* for each file read lines and then ...
*/
String headerLine = "id,name,family";
String[] otherLines = { "1,A,B", "2,C,D" };
List<Student> students = new ArrayList<Student>();
String[] titles = headerLine.split(HEADER_DELIMITER);
for (String line : otherLines) {
String[] cells = line.split(DATA_DELIMITER);
Student student = new Student();
int i = 0;
for (String cell : cells) {
student.add(titles[i], cell);
i++;
}
students.add(student);
}
System.out.println(students);
/*
* output:
* [Student [data={id=1, family=B, name=A}], Student [data={id=2, family=D, name=C}]]
*/
/**
* save students in your table.
*/
}
static class Student {
Map<String, String> data = new HashMap<String, String>();
public void add(String key, String value) {
data.put(key, value);
}
#Override
public String toString() {
return "Student [data=" + data + "]";
}
}
}
for saving the result in your data base, it depends on your decision about your data model. for example if these csv files are common in most columns and are different in a few columns I suggest this data model:
you should have a table (for example Student) with common columns (id, name, family) and another table with only 3 columns (studentForeignKey, key, value) and store other extra columns in this table. for example (id=1, key=activity, value=TA) and it means that the student with id = 1 has TA activity.
I hope this answer could help you

Fast CSV parsing

I have a java server app that download CSV file and parse it. The parsing can take from 5 to 45 minutes, and happens each hour.This method is a bottleneck of the app so it's not premature optimization. The code so far:
client.executeMethod(method);
InputStream in = method.getResponseBodyAsStream(); // this is http stream
String line;
String[] record;
reader = new BufferedReader(new InputStreamReader(in), 65536);
try {
// read the header line
line = reader.readLine();
// some code
while ((line = reader.readLine()) != null) {
// more code
line = line.replaceAll("\"\"", "\"NULL\"");
// Now remove all of the quotes
line = line.replaceAll("\"", "");
if (!line.startsWith("ERROR"){
//bla bla
continue;
}
record = line.split(",");
//more error handling
// build the object and put it in HashMap
}
//exceptions handling, closing connection and reader
Is there any existing library that would help me to speed up things? Can I improve existing code?
Apache Commons CSV
Have you seen Apache Commons CSV?
Caveat On Using split
Bear in mind is that split only returns a view of the data, meaning that the original line object is not eligible for garbage collection whilst there is a reference to any of its views. Perhaps making a defensive copy will help? (Java bug report)
It also is not reliable in grouping escaped CSV columns containing commas
opencsv
Take a look at opencsv.
This blog post, opencsv is an easy CSV parser, has example usage.
The problem of your code is that it's using replaceAll and split which are very costly operation. You should definitely consider using a csv parser/reader that would do a one pass parsing.
There is a benchmark on github
https://github.com/uniVocity/csv-parsers-comparison
that unfortunately is ran under java 6. The number are slightly different under java 7 and 8. I'm trying to get more detail data for different file size but it's work in progress
see https://github.com/arnaudroger/csv-parsers-comparison
Apart from the suggestions made above, I think you can try improving your code by using some threading and concurrency.
Following is the brief analysis and suggested solution
From the code it seems that you are reading the data over the network (most possibly apache-common-httpclient lib).
You need to make sure that bottleneck that you are saying is not in the data transfer over the network.
One way to see is just dump the data in some file (without parsing) and see how much does it take. This will give you an idea how much time is actually spent in parsing (when compared to current observation).
Now have a look at how java.util.concurrent package is used. Some of the link that you can use are (1,2)
What you ca do is the tasks that you are doing in for loop can be executed in a thread.
Using the threadpool and concurrency will greatly improve your performance.
Though the solution involves some effort, but at the end this will surly help you.
opencsv
You should have a look at OpenCSV. I would expect that they have performance optimizations.
A little late here, there is now a few benchmarking projects for CSV parsers. Your selection will depend on the exact use-case (i.e. raw data vs data binding etc).
SimpleFlatMapper
uniVocity
sesseltjonna-csv (disclaimer: I wrote this parser)
Quirk-CSV
The new kid on the block. It uses java annotations and is built on apache-csv which one of the faster libraries out there for csv parsing.
This library is also thread safe as well if you wanted to re-use the CSVProcessor you can and should.
Example:
Pojo
#CSVReadComponent(type = CSVType.NAMED)
#CSVWriteComponent(type = CSVType.ORDER)
public class Pojo {
#CSVWriteBinding(order = 0)
private String name;
#CSVWriteBinding(order = 1)
#CSVReadBinding(header = "age")
private Integer age;
#CSVWriteBinding(order = 2)
#CSVReadBinding(header = "money")
private Double money;
#CSVReadBinding(header = "name")
public void setA(String name) {
this.name = name;
}
#Override
public String toString() {
return "Name: " + name + System.lineSeparator() + "\tAge: " + age + System.lineSeparator() + "\tMoney: "
+ money;
}}
Main
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.util.*;
public class SimpleMain {
public static void main(String[] args) {
String csv = "name,age,money" + System.lineSeparator() + "Michael Williams,34,39332.15";
CSVProcessor processor = new CSVProcessor(Pojo.class);
List<Pojo> list = new ArrayList<>();
try {
list.addAll(processor.parse(new StringReader(csv)));
list.forEach(System.out::println);
System.out.println();
StringWriter sw = new StringWriter();
processor.write(list, sw);
System.out.println(sw.toString());
} catch (IOException e) {
}
}}
Since this is built on top of apache-csv you can use the powerful tool CSVFormat. Lets say the delimiter for the csv are pipes (|) instead of commas(,) you could for Example:
CSVFormat csvFormat = CSVFormat.DEFAULT.withDelimiter('|');
List<Pojo> list = processor.parse(new StringReader(csv), csvFormat);
Another benefit are inheritance is also consider.
For other examples on handling reading/writing non-primitive data
For speed you do not want to use replaceAll, and you don't want to use regex either. What you basically always want to do in critical cases like that is making a state-machine character by character parser. I've done that having rolled the whole thing into an Iterable function. It also takes in the stream and parses it without saving it out or caching it. So if you can abort early that's likely going to go fine as well. It should also be short enough and well coded enough to make it obvious how it works.
public static Iterable<String[]> parseCSV(final InputStream stream) throws IOException {
return new Iterable<String[]>() {
#Override
public Iterator<String[]> iterator() {
return new Iterator<String[]>() {
static final int UNCALCULATED = 0;
static final int READY = 1;
static final int FINISHED = 2;
int state = UNCALCULATED;
ArrayList<String> value_list = new ArrayList<>();
StringBuilder sb = new StringBuilder();
String[] return_value;
public void end() {
end_part();
return_value = new String[value_list.size()];
value_list.toArray(return_value);
value_list.clear();
}
public void end_part() {
value_list.add(sb.toString());
sb.setLength(0);
}
public void append(int ch) {
sb.append((char) ch);
}
public void calculate() throws IOException {
boolean inquote = false;
while (true) {
int ch = stream.read();
switch (ch) {
default: //regular character.
append(ch);
break;
case -1: //read has reached the end.
if ((sb.length() == 0) && (value_list.isEmpty())) {
state = FINISHED;
} else {
end();
state = READY;
}
return;
case '\r':
case '\n': //end of line.
if (inquote) {
append(ch);
} else {
end();
state = READY;
return;
}
break;
case ',': //comma
if (inquote) {
append(ch);
} else {
end_part();
break;
}
break;
case '"': //quote.
inquote = !inquote;
break;
}
}
}
#Override
public boolean hasNext() {
if (state == UNCALCULATED) {
try {
calculate();
} catch (IOException ex) {
}
}
return state == READY;
}
#Override
public String[] next() {
if (state == UNCALCULATED) {
try {
calculate();
} catch (IOException ex) {
}
}
state = UNCALCULATED;
return return_value;
}
};
}
};
}
You would typically process this quite helpfully like:
for (String[] csv : parseCSV(stream)) {
//<deal with parsed csv data>
}
The beauty of that API there is worth the rather cryptic looking function.
Apache Commons CSV ➙ 12 seconds for million rows
Is there any existing library that would help me to speed up things?
Yes, the Apache Commons CSV project works very well in my experience.
Here is an example app that uses Apache Commons CSV library to write and read rows of 24 columns: An integer sequential number, an Instant, and the rest are random UUID objects.
For 10,000 rows, the writing and the read each take about half a second. The reading includes reconstituting the Integer, Instant, and UUID objects.
My example code lets you toggle on or off the reconstituting of objects. I ran both with a million rows. This creates a file of 850 megs. I am using Java 12 on a MacBook Pro (Retina, 15-inch, Late 2013), 2.3 GHz Intel Core i7, 16 GB 1600 MHz DDR3, Apple built-in SSD.
For a million rows, ten seconds for reading plus two seconds for parsing:
Writing: PT25.994816S
Reading only: PT10.353912S
Reading & parsing: PT12.219364S
Source code is a single .java file. Has a write method, and a read method. Both methods called from a main method.
I opened a BufferedReader by calling Files.newBufferedReader.
package work.basil.example;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.time.Instant;
import java.util.UUID;
public class CsvReadingWritingDemo
{
public static void main ( String[] args )
{
CsvReadingWritingDemo app = new CsvReadingWritingDemo();
app.write();
app.read();
}
private void write ()
{
Instant start = Instant.now();
int limit = 1_000_000; // 10_000 100_000 1_000_000
Path path = Paths.get( "/Users/basilbourque/IdeaProjects/Demo/csv.txt" );
try (
Writer writer = Files.newBufferedWriter( path, StandardCharsets.UTF_8 );
CSVPrinter printer = new CSVPrinter( writer , CSVFormat.RFC4180 );
)
{
printer.printRecord( "id" , "instant" , "uuid_01" , "uuid_02" , "uuid_03" , "uuid_04" , "uuid_05" , "uuid_06" , "uuid_07" , "uuid_08" , "uuid_09" , "uuid_10" , "uuid_11" , "uuid_12" , "uuid_13" , "uuid_14" , "uuid_15" , "uuid_16" , "uuid_17" , "uuid_18" , "uuid_19" , "uuid_20" , "uuid_21" , "uuid_22" );
for ( int i = 1 ; i <= limit ; i++ )
{
printer.printRecord( i , Instant.now() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() , UUID.randomUUID() );
}
} catch ( IOException ex )
{
ex.printStackTrace();
}
Instant stop = Instant.now();
Duration d = Duration.between( start , stop );
System.out.println( "Wrote CSV for limit: " + limit );
System.out.println( "Elapsed: " + d );
}
private void read ()
{
Instant start = Instant.now();
int count = 0;
Path path = Paths.get( "/Users/basilbourque/IdeaProjects/Demo/csv.txt" );
try (
Reader reader = Files.newBufferedReader( path , StandardCharsets.UTF_8) ;
)
{
CSVFormat format = CSVFormat.RFC4180.withFirstRecordAsHeader();
CSVParser parser = CSVParser.parse( reader , format );
for ( CSVRecord csvRecord : parser )
{
if ( true ) // Toggle parsing of the string data into objects. Turn off (`false`) to see strictly the time taken by Apache Commons CSV to read & parse the lines. Turn on (`true`) to get a feel for real-world load.
{
Integer id = Integer.valueOf( csvRecord.get( 0 ) ); // Annoying zero-based index counting.
Instant instant = Instant.parse( csvRecord.get( 1 ) );
for ( int i = 3 - 1 ; i <= 22 - 1 ; i++ ) // Subtract one for annoying zero-based index counting.
{
UUID uuid = UUID.fromString( csvRecord.get( i ) );
}
}
count++;
if ( count % 1_000 == 0 ) // Every so often, report progress.
{
//System.out.println( "# " + count );
}
}
} catch ( IOException e )
{
e.printStackTrace();
}
Instant stop = Instant.now();
Duration d = Duration.between( start , stop );
System.out.println( "Read CSV for count: " + count );
System.out.println( "Elapsed: " + d );
}
}

Categories

Resources