CSVPrinter with break line characters

CSVPrinter with break line characters - java

I'm using org.apache.commons.csv.CSVPrinter (Java 8) in order to produce a CSV text file starting from a DB RecordSet. I have a description field in my DB table on where the user can insert whatever he want, such as a new line!
As I import the CSV on Excel or Google Spreadsheet each line with a new line character in the description corrupts the CSV structure, obviously.
Should I replace/remove these characters manually or is there a way to configure CSVPrinter in order to remove it automatically?
Thank you all in advance.
F
Edit: here a code snippet:
CSVFormat csvFormat = CSVFormat.DEFAULT.withRecordSeparator("\n").withQuoteMode(QuoteMode.ALL).withQuote('"');
CSVPrinter csvPrinter = new CSVPrinter(csvContent, csvFormat);
// prepare a list of string gathered from the DB. I explicitly use a String array because I need to perform some text editing to DB content before writing it in the CSV
List fasciaOrariaRecord = new ArrayList();
fasciaOrariaRecord.add(...);
fasciaOrariaRecord.add(...);
// ...
csvPrinter.printRecord(csvHeader);
// more rows...
csvPrinter.close();

Any value with line endings should be escaped with quotes. If your CSV library is not doing this for you automatically I'd recommend using univocity-parsers. In your particular case, there is a pre-built routine you can use to dump database contents into CSV.
Try this:
ResultSet resultSet = statement.executeQuery("SELECT * FROM table");
//Get a CSV writer settings object pre-configured for Excel
CsvWriterSettings writerSettings = Csv.writeExcel();
writerSettings.setHeaderWritingEnabled(true); //writes the column names to the output file
CsvRoutines routines = new CsvRoutines(writerSettings);
//use an encoding Excel likes
routines.write(resultSet, new File("/path/to/output.csv"), "windows-1252");
Hope this helps.
Disclaimer: I'm the author of this library. It's open source and free (Apache 2.0 license)

Related

Generate CSV via Apache CSV in UTF-8

how to write CSV File in UTF-8 via Apache CSV?
I am trying generate csv by following code where Files.newBufferedWriter() encode text into UTF-8 by default, but when I open generated text in excel there are senseless characters.
I create CSVPrinter like this:
CSVPrinter csvPrinter = new CSVPrinter(Files.newBufferedWriter(Paths.get(filePath)), CSVFormat.EXCEL);
next I set headers
csvPrinter.printRecord(headers);
and next in loop I print values into writer like this
csvPrinter.printRecord("value1", "valu2", ...);
I also tried upload file into online CSV lint validator and it tells that I am using ASCII-8BIT instead of UTF-8. What I did wrong?

Microsoft software tends to assume windows-12* or UTF-16LE charsets, unless the content starts with a byte order mark which the software will use to identify the charset. Try adding a byte order mark at the start of your file:
try (BufferedWriter writer = Files.newBufferedWriter(Paths.get(filePath))) {
writer.write('\ufeff');
CSVPrinter csvPrinter = new CSVPrinter(writer);
//...
}

How to keep zero begin string when export data using opencsv library

I using opencsv library in java and export csv. But i have problem. When i used string begin zero look like : 0123456 , when i export it remove 0 and my csv look like : 123456. Zero is missing. I using way :
"\"\t"+"0123456"+ "\""; but when csv export it look like : "0123456" . I don't want it. I want 0123456. I don't want edit from excel because some end user don't know how to edit. How to export csv using open csv and keep 0 begin string. Please help

I think it is not really the problem when generating CSV but the way excel treats the data when opened via explorer.
Tried this code, and viewed the CSV in a text editor ( not excel ), notice that it shows up correctly, though when opened in excel, leading 0s are lost !
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"));
// feed in your array (or convert your data to an array)
String[] entries = "0123131#21212#021213".split("#");
List<String[]> a = new ArrayList<>();
a.add(entries);
//don't apply quotes
writer.writeAll(a,false);
writer.close();
If you are really sure that you want to see the leading 0s for numeric values when excel is opened by user, then each cell entry be in format ="dataHere" format; see code below:
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"));
// feed in your array (or convert your data to an array)
String[] entries = "=\"0123131\"#=\"21212\"#=\"021213\"".split("#");
List<String[]> a = new ArrayList<>();
a.add(entries);
writer.writeAll(a);
writer.close();
This is how now excel shows when opening excel from windows explorer ( double clicking ):
But now, if we see the CSV in a text editor, with the modified data to "suit" excel viewing, it shows as :
Also see link :
format-number-as-text-in-csv-when-open-in-both-excel-and-notepad

have you tried to use String like this "'"+"0123456". ' char will mark number as text when parse into excel

For me OpenCsv works correctly ( vers. 5.6 ).
for example my csv file has a row as the following extract:
"999739059";;;"abcdefgh";"001024";
and opencsv reads the field "1024" as 001024 corretly. Of course I have mapped the field in a string, not in a Double.
But, if you still have problems, you can grab a simple yet powerful parser that fully adheres with RFC 4180 standard:
mykong.com
Mykong shows you some examples using opencsv directly and, in the end, he writes a simple parser to use if you don't want to import OpenCSV , and the parser works very well , and you can use it if you still have any problems.
So you have an easy-to-understand source code of a simple parser that you can modify as you want if you still have any problem or if you want to customize it for your needs.

opencsv content - values after comma

I am using Java and opencsv(2.3) to create csv files.
It is created properly. But when I am opening the file I see all the data appears in single column.
In order to align the values into separate columns
1.I select "Text to Columns" in data tab of excel
2.And I select Delimiter as ";"
I see all the values are splitted into separte columns properly but the values after comma are getting vanished
CSVWriter I use to create CSV files:
File file = new File(fileName);
CSVWriter writer = new CSVWriter(new FileWriter(fileName, true), ';');
String[] col= new String[4];
for(Customer c : CustomerList) {
col[0] = c.getCustomerName();
col[1] = c.getCustomerId();
col[2] = c.getCustomerBirthDate();
col[3] = c.getRegFee(); /** 145,65**/
col[4] = c.getRegPlace();
writer.writeNext(col);
}
writer.close();
CSV File - Actual content:
"Micky";"1";"19901220";"455,56";"Place1"
"Grace";"2";"19901231";"465,87";"Place2"
CSV File - while opening using excel:
"Micky";"1";"19901220";"455" // , 56 and Place1 are vanished
"Grace";"2";"19901231";"465" // , 87 and Place2 are vanished

I think the problem is to do with the way you're importing it to Excel.
Using your sample above, I've created a CSV file and opened it in Notepad to verify the content.
If you double-click a CSV file (and have Excel associated with that file type) it will open in Excel and it looks like Excel is attempting to use the comma as a delimiter by default. It displays the data across 2 columns.
If you open Excel, then import the CSV file you can tell Excel that your file is delimited and that the semi-colon is the delimiter. Import using the From Text menu item from the Data tab:
It will then display correctly:

Error Parsing due to CSV Differences Before/After Saving (Java w/ Apache Commons CSV)

I have a 37 column CSV file that I am parsing in Java with Apache Commons CSV 1.2. My setup code is as follows:
//initialize FileReader object
FileReader fileReader = new FileReader(file);
//intialize CSVFormat object
CSVFormat csvFileFormat = CSVFormat.DEFAULT.withHeader(FILE_HEADER_MAPPING);
//initialize CSVParser object
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
//Get a list of CSV file records
List<CSVRecord> csvRecords = csvFileParser.getRecords();
// process accordingly
My problem is that when I copy the CSV to be processed to my target directory and run my parsing program, I get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: Index for header 'Title' is 7 but CSVRecord only has 6 values!
at org.apache.commons.csv.CSVRecord.get(CSVRecord.java:110)
at launcher.QualysImport.createQualysRecords(Unknown Source)
at launcher.QualysImport.importQualysRecords(Unknown Source)
at launcher.Main.main(Unknown Source)
However, if I copy the file to my target directory, open and save it, then try the program again, it works. Opening and saving the CSV adds back the commas needed at the end so my program won't compain about not having enough headers to read.
For context, here is a sample line of before/after saving:
Before (failing): "data","data","data","data"
After (working): "data","data",,,,"data",,,"data",,,,,,
So my question: why does the CSV format change when I open and save it? I'm not changing any values or encoding, and the behavior is the same for MS-DOS or regular .csv format when saving. Also, I'm using Excel to copy/open/save in my testing.
Is there some encoding or format setting I need to be using? Can I solve this programmatically?
Thanks in advance!
EDIT #1:
For additional context, when I first view an empty line in the original file, it just has the new line ^M character like this:
^M
After opening in Excel and saving, it looks like this with all 37 of my empty fields:
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,^M
Is this a Windows encoding discrepancy?

Maybe that's a compatibility issue with whatever generated the file in the first place. It seems that Excel accepts a blank line as a valid row with empty strings in each column, with the number of columns to match some other row(s). Then it saves it according to CSV conventions with the column delimiter.
(the ^M is the Carriage Return character; on Microsoft systems it precedes the Line Feed character at the end of a line in text files)
Perhaps you can deal with it by creating your own Reader subclass to sit between the FileReader and the CSVParser. Your reader will read a line, and if it is blank then return a line with the correct number of commas. Otherwise just return the line as-is.
For example:
class MyCSVCompatibilityReader extends BufferedReader
{
private final BufferedReader delegate;
public MyCSVCompatibilityReader(final FileReader fileReader)
{
this.delegate = new BufferedReader(fileReader);
}
#Override
public String readLine()
{
final String line = this.delegate.readLine();
if ("".equals(line.trim())
{ return ",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"; }
else
{ return line; }
}
}
There are a lot of other details to implement correctly when implementing the interface. You'll need to pass through calls to all the other methods (close, ready, reset, skip, etc.), and ensure that each of the various read() methods work correctly. It might be easier, if the file will fit in memory easily, to just read the file and write the fixed version to a new StringWriter then create a StringReader to the CSVParser.

Maybe try this:
Creates a parser for the given File.
parse(File file, Charset charset, CSVFormat format)
//import import java.nio.charset.StandardCharsets;
//StandardCharsets.UTF_8
Note: This method internally creates a FileReader using FileReader.FileReader(java.io.File) which in turn relies on the default encoding of the JVM that is executing the code.

Or maybe try withAllowMissingColumnNames?
//intialize CSVFormat object
CSVFormat csvFileFormat = CSVFormat.DEFAULT.withHeader(FILE_HEADER_MAPPING).withAllowMissingColumnNames();

Java OpenCSV - 2 List comparison and duplication

i am going to make a application, comparising 2 .csv lists, using OpenCSV. It should works like that:
Open 2 .csv files ( each file has columns: Name,Emails)
Save results ( and here is a prbolem, i don't know if it should be save to table or something)
Compare From List1 and List2 value of "Emails column".
If Email from List 1 appear on List2 - delete it(from list 1)
Export results to new .csv file
I don't know if it's good algorithm. Please Tell me which option to saving results of reading .csv file is best in that case.
Kind Regards

You can get around this more easily with univocity-parsers as it can read your data into columns:
CsvParserSettings parserSettings = new CsvParserSettings(); //parser config with many options, check the tutorial
parserSettings.setHeaderExtractionEnabled(true); // uses the first row as headers
// To get the values of all columns, use a column processor
ColumnProcessor rowProcessor = new ColumnProcessor();
parserSettings.setRowProcessor(rowProcessor);
CsvParser parser = new CsvParser(parserSettings);
//This will parse everything and pass the data to the column processor
parser.parse(new FileReader(new File("/path/to/your/file.csv")));
//Finally, we can get the column values:
Map<String, List<String>> columnValues = rowProcessor.getColumnValuesAsMapOfNames();
Let's say you parsed the second CSV with that. Just grab the emails and create a set:
Set<String> emails = new HashSet<>(columnValues.get("Email"));
Now just iterate over the first CSV and check if the emails are in the emails set.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

If you have a hard requirement to use openCSV then here is what I believe is the easiest solution:
First off I like Jeronimo's suggestion about the HashSet. Read the second csv file first using the CSVToBean and save off the email addresses in the HashSet.
Then create a Filter class that implements the CSVToBeanFilter interface. In the constructor pass in the set and in the allowLine method you look up the email address and return true if it is not in the set (so you have a quick lookup).
Then you pass the filter in the CsvToBean.parse when reading/parsing the first file and all you will get are the records from the first file whose email addresses are not on the second file. The CSVToBeanFilter javadoc has a good example that shows how this works.
Lastly use the BeanToCSV to create a file from the filtered list.
In interest of fairness I am the maintainer of the openCSV project and it is also open source and free (Apache V2.0 license).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

CSVPrinter with break line characters - java

Related

Generate CSV via Apache CSV in UTF-8

How to keep zero begin string when export data using opencsv library

opencsv content - values after comma

Error Parsing due to CSV Differences Before/After Saving (Java w/ Apache Commons CSV)

Java OpenCSV - 2 List comparison and duplication

Categories

Resources