How to skip comments in a record

How to skip comments in a record - java

I'm using the Apache Commons CSV 1.9.0 library and to parse a csv file, the problem that I cannot set a comment marker "#" to fill the comment filed in the record so they can be skipped when looping through the file.
this is the code I'm using:
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;
Reader reader = Files.newBufferedReader((Paths.get(filename)), StandardCharsets.UTF_16LE);
CSVFormat csvFormat = CSVFormat.DEFAULT;
csvFormat.builder().setCommentMarker('#');
Iterable<CSVRecord> records = csvFormat.parse(reader);
char marker = csvFormat.getCommentMarker(); // marker is for test and it is empty.
for (CSVRecord record : records)
{
if (record.isSet(SHEET_COLUMN_1))
{
// TODO
}
}
can you please help me with this?
Kind regards,
Maan

CSVFormat.builder() creates a new instance of builder, but you're using old instance of csvFormat.
Use:
CSVFormat csvFormat = CSVFormat.DEFAULT
.builder()
.setCommentMarker('#')
.build();

Related

How to verify if first line of CSV file matches the header names?

I would like to process CSV file of such structure:
header1,header2
val1.1, val1.2
val2.1, val2.2
But only if the first line contains both header names - otherwise throw an exception.
My current implementations using Apache Common CSV is:
Reader reader = new InputStreamReader(new ByteArrayInputStream(file.getContent()));
CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT
.withHeader("header1", "header2")
.withSkipHeaderRecord());
for (CSVRecord csvRecord : csvParser) { /* records processing */ }
The problem is that the first line might have values different than header names and the file is still processed.

Referring to the Java Doc of CSVFormat:
Referencing columns safely
If your source contains a header record, you can simplify your code and safely reference columns, by using withHeader(String...) with no arguments:
CSVFormat.EXCEL.withHeader();
This causes the parser to read the first record and use its values as column names. Then, call one of the CSVRecord get method that takes a String column name argument:
String value = record.get("Col1");
This makes your code impervious to changes in column order in the CSV file.
So you can just follow this and use first line as header, then validate headers CSVParser#getHeaderNames.
Following is a simple demonstration:
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
public class UseFirstRowAsHeader {
public static void main(String[] args) throws IOException {
String validHeaderCsv = "header1,header2\r\n"
+ "val1.1,val1.2\r\n"
+ "val2.1,val2.2";
parseWithHeaderValidation(validHeaderCsv);
String invalidHeaderCsv = "header1,header2,header3\r\n"
+ "val1.1,val1.2\r\n"
+ "val2.1,val2.2";
parseWithHeaderValidation(invalidHeaderCsv);
}
private static void parseWithHeaderValidation(String validHeaderCsv) throws IOException {
Reader reader = new StringReader(validHeaderCsv);
List<String> expectedHeaders = new ArrayList<String>();
expectedHeaders.add("header1");
expectedHeaders.add("header2");
try (CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT
.withHeader().withAllowMissingColumnNames(false)
.withSkipHeaderRecord())) {
if (!csvParser.getHeaderNames().equals(expectedHeaders)) {
throw new IllegalStateException("Not expected headers" + csvParser.getHeaderNames());
}
for (CSVRecord csvRecord : csvParser) {
System.out.println(csvRecord.get("header1") + "," + csvRecord.get("header2"));
}
}
}
}

Get CSV file header using apache commons

I have been looking for the past 2 hours for a solution to my problem in vain. I'am trying to read a CSV File using Apache commons ,I am able to read the whole file but my problem is how to extract only the header of the CSV in an array?

I looked everywhere and even the solution above didn't work.
For anyone else with this issue, this does.
Iterable<CSVRecord> records;
Reader in = new FileReader(fileLocation);
records = CSVFormat.EXCEL.withHeader().withSkipHeaderRecord(false).parse(in);
Set<String> headers = records.iterator().next().toMap().keySet();
Note that your use of .next() has consumed one row of the CSV.

By default, first record read by CSVParser will always be a header record, e.g. in the below example:
CSVFormat csvFileFormat = CSVFormat.DEFAULT.withHeader(FILE_HEADER_MAPPING);
FileReader fileReader = new FileReader("file");
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
List csvRecords = csvFileParser.getRecords();
csvRecords.get(0) will return the header record.

BufferedReader br = new BufferedReader(new FileReader(filename));
CSVParser parser = CSVParser.parse(br, CSVFormat.EXCEL.withFirstRecordAsHeader());
List<String> headers = parser.getHeaderNames();
This worked for me. The last line is what you need, extracts the headers found by the parser into a List of Strings.

Since Apache Commons CSV v1.9.0, the withSkipHeaderRecord() & the withFirstRecordAsHeader() methods are deprecated. A builder interface is provided. Use it thusly:
CSVFormat.DEFAULT.builder()
.setHeader()
.setSkipHeaderRecord(true)
.build();

In Kotlin:
val reader = File(path).bufferedReader()
val records = CSVFormat.DEFAULT.withFirstRecordAsHeader()
.withIgnoreHeaderCase()
.withTrim()
.parse(reader)
println(records.headerNames)

The code below works for me:
import java.io.FileReader;
import org.apache.commons.csv.*;
public static String[] headersInCSVFile (String csvFilePath) throws IOException {
//reading file
CSVFormat csvFileFormat = CSVFormat.DEFAULT;
FileReader fileReader = new FileReader(csvFilePath);
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
List csvRecords = csvFileParser.getRecords();
//Obtaining first record and splitting that into an array using delimiters and removing unnecessary text
String[] headers = csvRecords.get(0).toString().split("[,'=\\]\\[]+");
String[] result = new String[headers.length - 6];
for (int i = 6; i < headers.length; i++) {
//.replaceAll("\\s", "") removes spaces
result[i - 6] = headers[i].replaceAll("\\s", "");
}
return result;
}

Apache common CSVParser/CSVRecord to return null for empty fields

all
I have a question for Apache common CSVParser/CSVRecord. Take a look at the CSV file below:
Header1,Header2,Header3
"",,"L1C3"
CSVParser/CSVRecord is returning "" for the first two columns. In my case I want to distinguish empty string("") and null values. Is there a configuration I could set to let CSVParser to return null for the second column?
Thank you.

I've used this format :
CSVFormat.RFC4180.withFirstRecordAsHeader()
.withIgnoreSurroundingSpaces()
.withNullString("")
Where the 2 configurations:
ignore space - which trims any value on both sides, if its all space, it will be trimmed to empty space
null string - which treats the empty spaces as null
Here's a sample usage:
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNull;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import org.junit.Test;
public class CsvParseTest {
#Test
public void testParseWillTrimAndConvertToNull() throws Exception {
String CSV_HEADER = "Name,MobileNo,Location";
String CSV_ROW_1 = "abc, ,australia"; // MobileNo is 3 whitespaces
CSVParser parse = CSVFormat.RFC4180.withFirstRecordAsHeader().withIgnoreSurroundingSpaces().withNullString("")
.parse(new BufferedReader(new StringReader(CSV_HEADER + "\n" + CSV_ROW_1)));
CsvRecord rec = parse.getRecords().get(0);
assertEquals("abc", rec.get("Name"));
assertNull(rec.get("MobileNo"));
assertEquals("australia", rec.get("Location"));
}
}

I think uniVocity-parsers is the only library that allows you to distinguish empty strings from nulls (I know this won't address your problem with Apache Commons CSV directly, but at least there's a way to get what you need).
Here's how to do it:
public static void main(String ... args){
String input = "Header1,Header2,Header3\n" +
"\"\",,\"L1C3\"";
CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial.
settings.setEmptyValue("I'm empty"); //value to use when the parser finds "". Set to "" to get an empty String.
settings.setNullValue("I'm null"); //value to use when the parser finds a null value (i.e. ,,).
CsvParser parser = new CsvParser(settings);
List<String[]> allRows = parser.parseAll(new StringReader(input));
for(String[] row : allRows){
System.out.println(Arrays.toString(row));
}
}
This will produce the following output:
[Header1, Header2, Header3]
[I'm empty, I'm null, L1C3]
uniVocity-parsers is also 3 times faster than Apache Commons CSV and has way more features.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

At the end I didn't find a good solution to return null with Apache Commons CSV library. I switched to OpenCSV 3.6 and here is the code I used, which I also posted on another thread. Thanks to everyone else that suggested OpenCSV.
CSVReaderBuilder has withFieldAsNull() for this purpose.
CSVReader csvReader = new CSVReaderBuilder(csvFileReader)
.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
.build();

In Apache commons csv 1.2, we could use the CSVFormat class method withNullString() to convert null strings to NULL. Here null string could be "" or "N/A" or "Nill" according to your requirement.
CSVFormat csvFormat = CSVFormat.DEFAULT.withNullString("");
CSVParser csvParser = new CSVParser(fileReader, csvFormat);
This would give NULL, NULL, L1C3 for the given record in question.
Note: Empty records are automatically converted to empty strings thus resulting in converted to NULL value at last.

How to rename Columns via Lambda function - fasterXML

Im using the FasterXML library to parse my CSV file. The CSV file has the column names in its first line. Unfortunately I need the columns to be renamed. I have a lambda function for this, where I can pass the red value from the csv file in and get the new value.
my code looks like this, but does not work.
CsvSchema csvSchema =CsvSchema.emptySchema().withHeader();
ArrayList<HashMap<String, String>> result = new ArrayList<HashMap<String, String>>();
MappingIterator<HashMap<String,String>> it = new CsvMapper().reader(HashMap.class)
.with(csvSchema )
.readValues(new File(fileName));
while (it.hasNext())
result.add(it.next());
System.out.println("changing the schema columns.");
for (int i=0; i < csvSchema.size();i++) {
String name = csvSchema.column(i).getName();
String newName = getNewName(name);
csvSchema.builder().renameColumn(i, newName);
}
csvSchema.rebuild();
when i try to print out the columns later, they are still the same as in the top line of my CSV file.
Additionally I noticed, that csvSchema.size() equals 0 - why?

You could instead use uniVocity-parsers for that. The following solution streams the input rows to the output so you don't need to load everything in memory to then write your data back with new headers. It will be much faster:
public static void main(String ... args) throws Exception{
Writer output = new StringWriter(); // use a FileWriter for your case
CsvWriterSettings writerSettings = new CsvWriterSettings(); //many options here - check the documentation
final CsvWriter writer = new CsvWriter(output, writerSettings);
CsvParserSettings parserSettings = new CsvParserSettings(); //many options here as well
parserSettings.setHeaderExtractionEnabled(true); // indicates the first row of the input are headers
parserSettings.setRowProcessor(new AbstractRowProcessor(){
public void processStarted(ParsingContext context) {
writer.writeHeaders("Column A", "Column B", "... etc");
}
public void rowProcessed(String[] row, ParsingContext context) {
writer.writeRow(row);
}
public void processEnded(ParsingContext context) {
writer.close();
}
});
CsvParser parser = new CsvParser(parserSettings);
Reader reader = new StringReader("A,B,C\n1,2,3\n4,5,6"); // use a FileReader for your case
parser.parse(reader); // all rows are parsed and submitted to the RowProcessor implementation of the parserSettings.
System.out.println(output.toString());
//nothing else to do. All resources are closed automatically in case of errors.
}
You can easily select the columns by using parserSettings.selectFields("B", "A") in case you want to reorder/eliminate columns.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

parsing csv using java [duplicate]

Here is the line i am using currently
File booleanTopicFile;
// booleanTopicFile is csv file uploaded from form
CSVReader csvReader = new CSVReader(new InputStreamReader(new FileInputStream(booleanTopicFile), "UTF-8"));
Want to skip the first line of the csv which contains headings.
I dont want to use any separator as except the default one comma(,) which is already available in default constructor.
In parameterized constructor there is a option to skip no. of lines but how to deal with the 2nd and 3rd param of the constructor.
CSVReader csvReader = new CSVReader(new InputStreamReader(Reader reader, char c, char c1, int index);
--
Thanks

This constructor of CSVReader class will skip 1st line of the csv while reading the file.
CSVReader reader = new CSVReader(new FileReader(file), ',', '\'', 1);

At least since version 3.8 you can use the CSVReaderBuilder and set it to skip the first line.
Example:
CSVReader reader = new CSVReaderBuilder(inputStreamReader)
.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
// Skip the header
.withSkipLines(1)
.build();

I found this question and response helpful, I'd like to expand on Christophe Roussy's comment. In the latest opencsv (2.3 as of this writing) The actual line of code is:
new CSVReader( new StringReader(csvText), CSVParser.DEFAULT_SEPARATOR,
CSVParser.DEFAULT_QUOTE_CHARACTER, 1);
Note it uses CSVParser instead of CSVReader.

with latest version opencsv version use -
CSVReader csvReader = new CSVReaderBuilder(new FileReader("book.csv")).withSkipLines(1).build()

watFileCsvBeans = new CsvToBeanBuilder<ClassType>(isr)
.withType(ClassType.class)
.withIgnoreLeadingWhiteSpace(true)
// CsvToBeanFilter with a custom allowLine implementation
.withFilter(line -> !line[0].equals("skipme"))
.build()
.parse();
It's useful in my case. Instead, "withSkipLines" is not working for me.
opencsv version: 5.5.2

You can also use withFilter:
watFileCsvBeans = new CsvToBeanBuilder<ClassType>(isr)
.withType(ClassType.class)
.withIgnoreLeadingWhiteSpace(true)
// CsvToBeanFilter with a custom allowLine implementation
.withFilter(line -> !line[0].equals("skipme"))
.build()
.parse();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to skip comments in a record - java

CSVFormat.builder() creates a new instance of builder, but you're using old instance of csvFormat. Use: CSVFormat csvFormat = CSVFormat.DEFAULT .builder() .setCommentMarker('#') .build();

Related

How to verify if first line of CSV file matches the header names?

Get CSV file header using apache commons

Apache common CSVParser/CSVRecord to return null for empty fields

How to rename Columns via Lambda function - fasterXML

parsing csv using java [duplicate]

Categories

Resources