parsing csv using java [duplicate] - java

Here is the line i am using currently
File booleanTopicFile;
// booleanTopicFile is csv file uploaded from form
CSVReader csvReader = new CSVReader(new InputStreamReader(new FileInputStream(booleanTopicFile), "UTF-8"));
Want to skip the first line of the csv which contains headings.
I dont want to use any separator as except the default one comma(,) which is already available in default constructor.
In parameterized constructor there is a option to skip no. of lines but how to deal with the 2nd and 3rd param of the constructor.
CSVReader csvReader = new CSVReader(new InputStreamReader(Reader reader, char c, char c1, int index);
--
Thanks

This constructor of CSVReader class will skip 1st line of the csv while reading the file.
CSVReader reader = new CSVReader(new FileReader(file), ',', '\'', 1);

At least since version 3.8 you can use the CSVReaderBuilder and set it to skip the first line.
Example:
CSVReader reader = new CSVReaderBuilder(inputStreamReader)
.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
// Skip the header
.withSkipLines(1)
.build();

I found this question and response helpful, I'd like to expand on Christophe Roussy's comment. In the latest opencsv (2.3 as of this writing) The actual line of code is:
new CSVReader( new StringReader(csvText), CSVParser.DEFAULT_SEPARATOR,
CSVParser.DEFAULT_QUOTE_CHARACTER, 1);
Note it uses CSVParser instead of CSVReader.

with latest version opencsv version use -
CSVReader csvReader = new CSVReaderBuilder(new FileReader("book.csv")).withSkipLines(1).build()

watFileCsvBeans = new CsvToBeanBuilder<ClassType>(isr)
.withType(ClassType.class)
.withIgnoreLeadingWhiteSpace(true)
// CsvToBeanFilter with a custom allowLine implementation
.withFilter(line -> !line[0].equals("skipme"))
.build()
.parse();
It's useful in my case. Instead, "withSkipLines" is not working for me.
opencsv version: 5.5.2

You can also use withFilter:
watFileCsvBeans = new CsvToBeanBuilder<ClassType>(isr)
.withType(ClassType.class)
.withIgnoreLeadingWhiteSpace(true)
// CsvToBeanFilter with a custom allowLine implementation
.withFilter(line -> !line[0].equals("skipme"))
.build()
.parse();

Related

How to skip comments in a record

I'm using the Apache Commons CSV 1.9.0 library and to parse a csv file, the problem that I cannot set a comment marker "#" to fill the comment filed in the record so they can be skipped when looping through the file.
this is the code I'm using:
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;
Reader reader = Files.newBufferedReader((Paths.get(filename)), StandardCharsets.UTF_16LE);
CSVFormat csvFormat = CSVFormat.DEFAULT;
csvFormat.builder().setCommentMarker('#');
Iterable<CSVRecord> records = csvFormat.parse(reader);
char marker = csvFormat.getCommentMarker(); // marker is for test and it is empty.
for (CSVRecord record : records)
{
if (record.isSet(SHEET_COLUMN_1))
{
// TODO
}
}
can you please help me with this?
Kind regards,
Maan
CSVFormat.builder() creates a new instance of builder, but you're using old instance of csvFormat.
Use:
CSVFormat csvFormat = CSVFormat.DEFAULT
.builder()
.setCommentMarker('#')
.build();

OpenCSV reads in additional byte value together with first line's first value together in Java

I was working on a project where we use OpenCSV to read in CSV files and fill up a database with them at start. I noticed that there is a strange thing, that in certain cases a given identifier value can not be queried. During debugging I found that OpenCSV does not read up the CSV correctly.
Let's say that I have the following CSV file:
01;foo
02;bar
...
The first line in the example is the first line in the real CSV file as well. The file is encoded in UTF-8. The following code is used to read in the value:
try (CSVReader csvReader = CSVUtils.createCSVReader(masterDataCSVPath, csvDelimiter)) {
List<String[]> masterData = csvReader.readAll();
}
The code creating the csvReader:
static private CSVParser createCSVParser(String CSVDelimiter) {
return new CSVParserBuilder().withSeparator(CSVDelimiter.charAt(0)).build();
}
static public CSVReader createCSVReader(String CSVPath, String CSVDelimiter) throws FileNotFoundException {
return new CSVReaderBuilder(new FileReader(CSVPath)).withCSVParser(createCSVParser(CSVDelimiter)).build();
}
When I read in the CSV file with the following code, during debug I get the following byte values for 01:
However if I change my CSV file to (notice the newline at the top):
01;foo
02;bar
...
The read-in data becomes:
In this case "all is good", if I remove the first item in my masterData list, I can read in the values "properly". However, this is not a clean solution:
It begs the question: Why does this happen?
Also, I do not think that we should work around the problem rather than solving it. This is only provided to work if there a newline at the beginning of my source CSV.
So I kindly ask for help, that how can this be mitigated?
This is not an OpenCSV specific problem, but rather that FileReader reads in the BOM in the UTF encoded file. This is kind of unexpected, but it makes sense, as there is no context for FileReader that it should excludes those bytes.
The solution would be to either manually remove it, or - in my case - use a library to make sure it is excluded. I wrote the following utility class:
public class CSVUtils {
private static CSVParser createCSVParser(final String CSVDelimiter) {
return new CSVParserBuilder().withSeparator(CSVDelimiter.charAt(0)).build();
}
private static BOMInputStream versatileBOMInputStreamGenerator(final InputStream inputStream) {
return new BOMInputStream(inputStream, ByteOrderMark.UTF_8, ByteOrderMark.UTF_16BE, ByteOrderMark.UTF_16LE,
ByteOrderMark.UTF_16BE, ByteOrderMark.UTF_32LE, ByteOrderMark.UTF_32BE);
}
public static CSVReader createCSVReaderFromFile(final String CSVPath, final String CSVDelimiter) throws FileNotFoundException {
return new CSVReaderBuilder(new InputStreamReader(
versatileBOMInputStreamGenerator(new FileInputStream(CSVPath)), StandardCharsets.UTF_8))
.withCSVParser(createCSVParser(CSVDelimiter)).build();
}
public static CSVReader createCSVReaderFromString(final String content, final String CSVDelimiter) {
byte[] contentBytes = content.getBytes(StandardCharsets.UTF_8);
return new CSVReaderBuilder(new InputStreamReader(
versatileBOMInputStreamGenerator(new ByteArrayInputStream(contentBytes)), StandardCharsets.UTF_8))
.withCSVParser(createCSVParser(CSVDelimiter)).build();
}
}
All I have to do is use these created CSVReader objects later where needed. As you can see, it uses some dependencies, which can be imported with
import org.apache.commons.io.ByteOrderMark;
import org.apache.commons.io.input.BOMInputStream;
These dependencies can be added to the project via the POM as follows:
<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.11.0</version>

OpenCSV quoting null values

Using the OpenCSV library, calling StatefulBeanToCsv.write() my null values are being wrapped in quotes.
Example:
String[] columns = new String[] {
"Col1",
"Col2",
"Col3"
};
ColumnPositionMappingStrategy strat = new ColumnPositionMappingStrategy();
strat.setColumnMapping(columns);
Writer writer = new FileWriter(outputFilePath);
StatefulBeanToCsv beanToCsv = new StatefulBeanToCsvBuilder(writer)
.withMappingStrategy(strat)
.build();
beanToCsv.write(items);
writer.close();
will produce:
1,"",3
When I expect:
"1",,"3"
I have not set quotes to all fields via .withApplyQuotesToAll(true).
If I do use .withApplyQuotesToAll(true), I end up with
"1","","3"
At one point, it appears the library the opposite of this:
OpenCSV CSVWriter does not add quote character for null element
How can I null values written as a blank/empty value, rather than an empty string?
It looks like the method that you mentioned not calling actually takes a boolean. Have you tried the following?
.withApplyQuotesToAll(false)
There is a way to do that. Setting .withApplyQuotesToAll(false) tells OpenCSV to only quote elements that has special characters, but we can change what OpenCSV understands by that, extending CSVWrite class like this:
public class CustomCsvWriter extends CSVWriter {
public CustomCsvWriter(Writer writer) {
super(writer);
}
#Override
protected boolean stringContainsSpecialCharacters(String line) {
return !line.isEmpty();
}
}
So, you can create a StatefulBeanToCsv like this:
new StatefulBeanToCsvBuilder<>(new CustomCsvWriter(writer))
.withMappingStrategy(strat)
.withApplyQuotesToAll(false)
.build();
Tested with OpenCSV 5.3

Java OpenCSV Split by pipe limited

I am having an issue when reading from a file using comma split. I can read the file like this:
CSVReader reader = new CSVReader(new FileReader(FileName), '|' , '"' , 0);
Then when I want to get the individual values, I can read them like this:
String[] record = rowString.split(",");
The issue of course is that comma is not the most reliable way to read a file. Is there any way to split the string by pipe delimited like this?:
String[] record = rowString.split("\\|");
This is how I am reading the lines, it may possibly be in this code where I need to make such adjustment?
for(String[] row : allRows){
String rowString = Arrays.toString(row).toString();
String[] record = rowString.split(",");
}
Thank you.
I don't know if this answer the question but in my case this solve the problem:
val reader: Reader = Files.newBufferedReader(path)
val csvToBean = CsvToBeanBuilder<MyCsvSchema>(reader)
.withType(MyCsvSchema::class.java)
.withSeparator('|')
.withIgnoreLeadingWhiteSpace(true)
.build()
val list = csvToBean.parse()
This is a Kotlin code

Get CSV file header using apache commons

I have been looking for the past 2 hours for a solution to my problem in vain. I'am trying to read a CSV File using Apache commons ,I am able to read the whole file but my problem is how to extract only the header of the CSV in an array?
I looked everywhere and even the solution above didn't work.
For anyone else with this issue, this does.
Iterable<CSVRecord> records;
Reader in = new FileReader(fileLocation);
records = CSVFormat.EXCEL.withHeader().withSkipHeaderRecord(false).parse(in);
Set<String> headers = records.iterator().next().toMap().keySet();
Note that your use of .next() has consumed one row of the CSV.
By default, first record read by CSVParser will always be a header record, e.g. in the below example:
CSVFormat csvFileFormat = CSVFormat.DEFAULT.withHeader(FILE_HEADER_MAPPING);
FileReader fileReader = new FileReader("file");
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
List csvRecords = csvFileParser.getRecords();
csvRecords.get(0) will return the header record.
BufferedReader br = new BufferedReader(new FileReader(filename));
CSVParser parser = CSVParser.parse(br, CSVFormat.EXCEL.withFirstRecordAsHeader());
List<String> headers = parser.getHeaderNames();
This worked for me. The last line is what you need, extracts the headers found by the parser into a List of Strings.
Since Apache Commons CSV v1.9.0, the withSkipHeaderRecord() & the withFirstRecordAsHeader() methods are deprecated. A builder interface is provided. Use it thusly:
CSVFormat.DEFAULT.builder()
.setHeader()
.setSkipHeaderRecord(true)
.build();
In Kotlin:
val reader = File(path).bufferedReader()
val records = CSVFormat.DEFAULT.withFirstRecordAsHeader()
.withIgnoreHeaderCase()
.withTrim()
.parse(reader)
println(records.headerNames)
The code below works for me:
import java.io.FileReader;
import org.apache.commons.csv.*;
public static String[] headersInCSVFile (String csvFilePath) throws IOException {
//reading file
CSVFormat csvFileFormat = CSVFormat.DEFAULT;
FileReader fileReader = new FileReader(csvFilePath);
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
List csvRecords = csvFileParser.getRecords();
//Obtaining first record and splitting that into an array using delimiters and removing unnecessary text
String[] headers = csvRecords.get(0).toString().split("[,'=\\]\\[]+");
String[] result = new String[headers.length - 6];
for (int i = 6; i < headers.length; i++) {
//.replaceAll("\\s", "") removes spaces
result[i - 6] = headers[i].replaceAll("\\s", "");
}
return result;
}

Categories

Resources