OpenCSV quoting null values

OpenCSV quoting null values - java

Using the OpenCSV library, calling StatefulBeanToCsv.write() my null values are being wrapped in quotes.
Example:
String[] columns = new String[] {
"Col1",
"Col2",
"Col3"
};
ColumnPositionMappingStrategy strat = new ColumnPositionMappingStrategy();
strat.setColumnMapping(columns);
Writer writer = new FileWriter(outputFilePath);
StatefulBeanToCsv beanToCsv = new StatefulBeanToCsvBuilder(writer)
.withMappingStrategy(strat)
.build();
beanToCsv.write(items);
writer.close();
will produce:
1,"",3
When I expect:
"1",,"3"
I have not set quotes to all fields via .withApplyQuotesToAll(true).
If I do use .withApplyQuotesToAll(true), I end up with
"1","","3"
At one point, it appears the library the opposite of this:
OpenCSV CSVWriter does not add quote character for null element
How can I null values written as a blank/empty value, rather than an empty string?

It looks like the method that you mentioned not calling actually takes a boolean. Have you tried the following?
.withApplyQuotesToAll(false)

There is a way to do that. Setting .withApplyQuotesToAll(false) tells OpenCSV to only quote elements that has special characters, but we can change what OpenCSV understands by that, extending CSVWrite class like this:
public class CustomCsvWriter extends CSVWriter {
public CustomCsvWriter(Writer writer) {
super(writer);
}
#Override
protected boolean stringContainsSpecialCharacters(String line) {
return !line.isEmpty();
}
}
So, you can create a StatefulBeanToCsv like this:
new StatefulBeanToCsvBuilder<>(new CustomCsvWriter(writer))
.withMappingStrategy(strat)
.withApplyQuotesToAll(false)
.build();
Tested with OpenCSV 5.3

Related

Univocity - writing out surrounding quotes even if field does not contain delimiter char

I have a file unloaded from a database in such a way that all varchar columns are surrounded by quotes, regardless of the actual content of the column (unfortunately the unload proces is out of my control).
Like this:
1,"Alex ,/,awesome/,","chan"
2,"Peter ,boring","pitt"
When using the following code with univocity 2.2.3 in the pom:
public class Sample {
public static void main(String[] args) throws IOException {
BeanListProcessor<Person> rowProcessor = new BeanListProcessor<Person>(Person.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setProcessor(rowProcessor);
parserSettings.getFormat().setDelimiter(',');
parserSettings.getFormat().setQuote('"');
parserSettings.getFormat().setQuoteEscape('/');
CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader("src/main/resources/person.csv"));
List<Person> beans = rowProcessor.getBeans();
Writer outputWriter = new FileWriter("src/main/resources/personOut.csv", true);
CsvWriterSettings settings = new CsvWriterSettings();
settings.getFormat().setDelimiter(',');
settings.getFormat().setQuote('"');
settings.getFormat().setQuoteEscape('/');
settings.getFormat().setCharToEscapeQuoteEscaping('\0');
settings.setRowWriterProcessor(new BeanWriterProcessor<Person>(Person.class));
CsvWriter writer = new CsvWriter(outputWriter, settings);
for (Person person : beans) {
writer.processRecord(person);
}
writer.close();
}
}
Only the columns containing the delimiter are surrounded by quotes:
1,"Alex ,/,awesome/,",chan
2,"Peter ,boring",pitt
When using settings.setQuoteAllFields(true); on the writer setting, all the fields get surrounded by quotes, but now the non varchar fields are in trouble.
How do I surround only the columns that are surrounded by quotes from the source with quotes regardless of the content of the column (e.g. delimiter is or is not present)?
Desired result:
1,"Alex ,/,awesome/,","chan"
2,"Peter ,boring","pitt"

The CSV writer doesn't provide an explicit mechanism to configure this, but you can do the following:
Parse with this:
parserSettings.setKeepQuotes(true);
parserSettings.setKeepEscapeSequences(true);
These two settings will effectively work as a "split" operation over your input CSV - you will get the entire content between delimiters. Using your sample input, the values will be parsed as:
1 | "Alex ,/,awesome/," | chan |
2 | "Peter boring" | pitt |
I'm using pipes to separate the values above to make it easier to visualize what comes out.
Now, the hacky bit, I can't guarantee this will work with future versions of the library as it uses internal API's: the CsvWriter has a processRow method which you can override. As your input values are coming properly formatted as you want them to be, you can dump them out "as-is" by just joining the values of each row with commas. Just do the following:
CsvWriter writer = new CsvWriter(outputWriter, settings){
#Override
protected void processRow(Object[] row) {
for(int i = 0; i < row.length; i++){
Object value = row[i];
appender.append(value.toString());
if(i + 1 < row.length) { //not the last column
appender.append(',');
}
appendValueToRow();
}
}
};
This will produce the output you expect, but I'm not sure if it's very useful because you simply depend on the input to be properly formatted and making changes over it will complicate things quite a bit.
The appropriate thing to do here is to add an additional configuration option to the library that would allow you to configure whether to quote a given column or not.

Apache common CSVParser/CSVRecord to return null for empty fields

all
I have a question for Apache common CSVParser/CSVRecord. Take a look at the CSV file below:
Header1,Header2,Header3
"",,"L1C3"
CSVParser/CSVRecord is returning "" for the first two columns. In my case I want to distinguish empty string("") and null values. Is there a configuration I could set to let CSVParser to return null for the second column?
Thank you.

I've used this format :
CSVFormat.RFC4180.withFirstRecordAsHeader()
.withIgnoreSurroundingSpaces()
.withNullString("")
Where the 2 configurations:
ignore space - which trims any value on both sides, if its all space, it will be trimmed to empty space
null string - which treats the empty spaces as null
Here's a sample usage:
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNull;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import org.junit.Test;
public class CsvParseTest {
#Test
public void testParseWillTrimAndConvertToNull() throws Exception {
String CSV_HEADER = "Name,MobileNo,Location";
String CSV_ROW_1 = "abc, ,australia"; // MobileNo is 3 whitespaces
CSVParser parse = CSVFormat.RFC4180.withFirstRecordAsHeader().withIgnoreSurroundingSpaces().withNullString("")
.parse(new BufferedReader(new StringReader(CSV_HEADER + "\n" + CSV_ROW_1)));
CsvRecord rec = parse.getRecords().get(0);
assertEquals("abc", rec.get("Name"));
assertNull(rec.get("MobileNo"));
assertEquals("australia", rec.get("Location"));
}
}

I think uniVocity-parsers is the only library that allows you to distinguish empty strings from nulls (I know this won't address your problem with Apache Commons CSV directly, but at least there's a way to get what you need).
Here's how to do it:
public static void main(String ... args){
String input = "Header1,Header2,Header3\n" +
"\"\",,\"L1C3\"";
CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial.
settings.setEmptyValue("I'm empty"); //value to use when the parser finds "". Set to "" to get an empty String.
settings.setNullValue("I'm null"); //value to use when the parser finds a null value (i.e. ,,).
CsvParser parser = new CsvParser(settings);
List<String[]> allRows = parser.parseAll(new StringReader(input));
for(String[] row : allRows){
System.out.println(Arrays.toString(row));
}
}
This will produce the following output:
[Header1, Header2, Header3]
[I'm empty, I'm null, L1C3]
uniVocity-parsers is also 3 times faster than Apache Commons CSV and has way more features.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

At the end I didn't find a good solution to return null with Apache Commons CSV library. I switched to OpenCSV 3.6 and here is the code I used, which I also posted on another thread. Thanks to everyone else that suggested OpenCSV.
CSVReaderBuilder has withFieldAsNull() for this purpose.
CSVReader csvReader = new CSVReaderBuilder(csvFileReader)
.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
.build();

In Apache commons csv 1.2, we could use the CSVFormat class method withNullString() to convert null strings to NULL. Here null string could be "" or "N/A" or "Nill" according to your requirement.
CSVFormat csvFormat = CSVFormat.DEFAULT.withNullString("");
CSVParser csvParser = new CSVParser(fileReader, csvFormat);
This would give NULL, NULL, L1C3 for the given record in question.
Note: Empty records are automatically converted to empty strings thus resulting in converted to NULL value at last.

Supercsv - unable to find method exception

I have the below implementation.
csvReader = new CsvBeanReader(new InputStreamReader(stream), CsvPreference.STANDARD_PREFERENCE);
lastReadIdentity = (T) csvReader.read(Packages.class, Packages.COLS);
In my Packages.class
I have set my unitcount variable.
public String getUnitCount() {
return unitCount;
}
public void setUnitCount(String unitCount) {
this.unitCount = unitCount;
}
This works fine when it is taken as a string, but when taken as a integer, it throws the below exception. Please help
private int unitCount;
public int getUnitCount() {
return unitCount;
}
public void setUnitCount(int unitCount) {
this.unitCount = unitCount;
}
Exception:
org.supercsv.exception.SuperCsvReflectionException: unable to find method setUnitCount(java.lang.String) in class com.directv.sms.data.SubscriberPackages - check that the corresponding nameMapping element matches the field name in the bean, and the cell processor returns a type compatible with the field
context=null
at org.supercsv.util.ReflectionUtils.findSetter(ReflectionUtils.java:139)
at org.supercsv.util.MethodCache.getSetMethod(MethodCache.java:95)

I'm not sure about SuperCsv, but univocity-parsers should be able to handle this without a hitch, not to mention it is at least 3 times faster to parse your input.
Just annotate your class:
public class SubscriberPackages {
#Parsed(defaultNullRead = "0") // if the file contains nulls, then they will be converted to 0.
private int unitCount; // The attribute name will be matched against the column header in the file automatically.
}
To parse the CSV into beans:
// BeanListProcessor converts each parsed row to an instance of a given class, then stores each instance into a list.
BeanListProcessor<SubscriberPackages> rowProcessor = new BeanListProcessor<SubscriberPackages>(SubscriberPackages.class);
CsvParserSettings parserSettings = new CsvParserSettings(); //many options here, check the tutorial.
parserSettings.setRowProcessor(rowProcessor); //uses the bean processor to handle your input rows
parserSettings.setHeaderExtractionEnabled(true); // extracts header names from the input file.
CsvParser parser = new CsvParser(parserSettings); //creates a parser with your settings.
parser.parse(new FileReader(new File("/path/to/file.csv"))); //all rows parsed here go straight to the bean processor
// The BeanListProcessor provides a list of objects extracted from the input.
List<SubscriberPackages> beans = rowProcessor.getBeans();
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

How to rename Columns via Lambda function - fasterXML

Im using the FasterXML library to parse my CSV file. The CSV file has the column names in its first line. Unfortunately I need the columns to be renamed. I have a lambda function for this, where I can pass the red value from the csv file in and get the new value.
my code looks like this, but does not work.
CsvSchema csvSchema =CsvSchema.emptySchema().withHeader();
ArrayList<HashMap<String, String>> result = new ArrayList<HashMap<String, String>>();
MappingIterator<HashMap<String,String>> it = new CsvMapper().reader(HashMap.class)
.with(csvSchema )
.readValues(new File(fileName));
while (it.hasNext())
result.add(it.next());
System.out.println("changing the schema columns.");
for (int i=0; i < csvSchema.size();i++) {
String name = csvSchema.column(i).getName();
String newName = getNewName(name);
csvSchema.builder().renameColumn(i, newName);
}
csvSchema.rebuild();
when i try to print out the columns later, they are still the same as in the top line of my CSV file.
Additionally I noticed, that csvSchema.size() equals 0 - why?

You could instead use uniVocity-parsers for that. The following solution streams the input rows to the output so you don't need to load everything in memory to then write your data back with new headers. It will be much faster:
public static void main(String ... args) throws Exception{
Writer output = new StringWriter(); // use a FileWriter for your case
CsvWriterSettings writerSettings = new CsvWriterSettings(); //many options here - check the documentation
final CsvWriter writer = new CsvWriter(output, writerSettings);
CsvParserSettings parserSettings = new CsvParserSettings(); //many options here as well
parserSettings.setHeaderExtractionEnabled(true); // indicates the first row of the input are headers
parserSettings.setRowProcessor(new AbstractRowProcessor(){
public void processStarted(ParsingContext context) {
writer.writeHeaders("Column A", "Column B", "... etc");
}
public void rowProcessed(String[] row, ParsingContext context) {
writer.writeRow(row);
}
public void processEnded(ParsingContext context) {
writer.close();
}
});
CsvParser parser = new CsvParser(parserSettings);
Reader reader = new StringReader("A,B,C\n1,2,3\n4,5,6"); // use a FileReader for your case
parser.parse(reader); // all rows are parsed and submitted to the RowProcessor implementation of the parserSettings.
System.out.println(output.toString());
//nothing else to do. All resources are closed automatically in case of errors.
}
You can easily select the columns by using parserSettings.selectFields("B", "A") in case you want to reorder/eliminate columns.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

parsing csv using java [duplicate]

Here is the line i am using currently
File booleanTopicFile;
// booleanTopicFile is csv file uploaded from form
CSVReader csvReader = new CSVReader(new InputStreamReader(new FileInputStream(booleanTopicFile), "UTF-8"));
Want to skip the first line of the csv which contains headings.
I dont want to use any separator as except the default one comma(,) which is already available in default constructor.
In parameterized constructor there is a option to skip no. of lines but how to deal with the 2nd and 3rd param of the constructor.
CSVReader csvReader = new CSVReader(new InputStreamReader(Reader reader, char c, char c1, int index);
--
Thanks

This constructor of CSVReader class will skip 1st line of the csv while reading the file.
CSVReader reader = new CSVReader(new FileReader(file), ',', '\'', 1);

At least since version 3.8 you can use the CSVReaderBuilder and set it to skip the first line.
Example:
CSVReader reader = new CSVReaderBuilder(inputStreamReader)
.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
// Skip the header
.withSkipLines(1)
.build();

I found this question and response helpful, I'd like to expand on Christophe Roussy's comment. In the latest opencsv (2.3 as of this writing) The actual line of code is:
new CSVReader( new StringReader(csvText), CSVParser.DEFAULT_SEPARATOR,
CSVParser.DEFAULT_QUOTE_CHARACTER, 1);
Note it uses CSVParser instead of CSVReader.

with latest version opencsv version use -
CSVReader csvReader = new CSVReaderBuilder(new FileReader("book.csv")).withSkipLines(1).build()

watFileCsvBeans = new CsvToBeanBuilder<ClassType>(isr)
.withType(ClassType.class)
.withIgnoreLeadingWhiteSpace(true)
// CsvToBeanFilter with a custom allowLine implementation
.withFilter(line -> !line[0].equals("skipme"))
.build()
.parse();
It's useful in my case. Instead, "withSkipLines" is not working for me.
opencsv version: 5.5.2

You can also use withFilter:
watFileCsvBeans = new CsvToBeanBuilder<ClassType>(isr)
.withType(ClassType.class)
.withIgnoreLeadingWhiteSpace(true)
// CsvToBeanFilter with a custom allowLine implementation
.withFilter(line -> !line[0].equals("skipme"))
.build()
.parse();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

OpenCSV quoting null values - java

It looks like the method that you mentioned not calling actually takes a boolean. Have you tried the following? .withApplyQuotesToAll(false)

Related

Univocity - writing out surrounding quotes even if field does not contain delimiter char

Apache common CSVParser/CSVRecord to return null for empty fields

Supercsv - unable to find method exception

How to rename Columns via Lambda function - fasterXML

parsing csv using java [duplicate]

Categories

Resources