CSVFormat.RFC4180 ignores quoted values in .csv file - java

I have .csv file that has quoted values
Gender,ParentIncome,IQ,ParentEncouragement,CollegePlans
"Male",53900,118,"Encouraged","Plans to attend"
"Female",24900,87,"Not Encouraged","Does not plan to attend"
"Female",65800,93,"Not Encouraged","Does not plan to attend"
Reading this file with the following code (using IntelliJ and observing values in the debugger), returns values without quotes.
#Override
public CsvConnectorService read(String fullFileName, String outputAddress, int intervalMs, boolean repeat,
Handler<AsyncResult<Void>> result) {
CSVFormat format = CSVFormat.RFC4180.withHeader().withIgnoreEmptyLines().withQuote('"');
Subscription subscription = createCsvObservable(fullFileName, format, intervalMs, repeat)
.subscribeOn(Schedulers.io())
.subscribe(record ->
eventBus.publish(outputAddress, convertRecordToJson(record)));
subscriptions.add(subscription);
result.handle(Future.succeededFuture());
return this;
}
Reading with .withQuote('"'); or without it, makes no difference.

The quote " is the default character to represent quoted fields, and setting it explicitly makes no difference.
Do you want to get the original quote characters? In this case try setting the quote to a character that doesn't occur in the text, such as .withQuote('\0');

Related

Reading data and storing in array Java

I am writing a program which will allow users to reserve a room in a hotel (University Project). I have got this problem where when I try and read data from the file and store it in an array I receive a NumberFormatException.
I have been stuck on this problem for a while now and cannot figure out where I am going wrong. I've read up on it and apparently its when I try and convert a String to a numeric but I cannot figure out how to fix it.
Any suggestions, please?
This is my code for my reader.
FileReader file = new FileReader("rooms.txt");
Scanner reader = new Scanner(file);
int index = 0;
while(reader.hasNext()) {
int RoomNum = Integer.parseInt(reader.nextLine());
String Type = reader.nextLine();
double Price = Double.parseDouble(reader.nextLine());
boolean Balcony = Boolean.parseBoolean(reader.nextLine());
boolean Lounge = Boolean.parseBoolean(reader.nextLine());
String Reserved = reader.nextLine();
rooms[index] = new Room(RoomNum, Type, Price, Balcony, Lounge, Reserved);
index++;
}
reader.close();
This is the error message
This is the data in my file which I am trying to read:
Change your while loop like this
while (reader.hasNextLine())
{
// then split reader.nextLine() data using .split() function
// and store it in string array
// after that you can extract data from the array and do whatever you want
}
You're trying to parse the whole line to Integer. You can read the whole line as a String, call
.split(" ")
on it. This will split the whole line into multiple values and put them into an array. Then you can grab each item from the array and parse separately as you intended.
Please avoid posting screenshots next time, use proper formatting and text so someone can easily copy your code or test data to IDE and reproduce the scenario.
Use next() instead of nextLine().
With Scanner one must use hasNextLine, nextLine, hasNext, next, hasNextInt, nextInt etcetera. I would do it as follows:
Using Path and Files - the newer more general classes i.o. File.
Files can read lines, here I use Files.lines which gives a Stream of lines, a bit like a loop.
Try-with-resources: try (AutoCloseable in = ...) { ... } ensures that in.close() is always called implicitly, even on exception or return.
The line is without line ending.
The line is split into words separated by one or more spaces.
Only lines with at least 6 words are handled.
Create a Room from the words.
Collect an array of Room-s.
So:
Path file = Paths.get("rooms.txt");
try (Stream<String> in = Files.lines(file)) {
rooms = in // Stream<String>
.map(line -> line.split(" +")) // Stream<String[]>
.filter(words -> words.length >= 6)
.map(words -> {
int roomNum = Integer.parseInt(words[0]);
String type = words[1];
double price = Double.parseDouble(words[2]);
boolean balcony = Boolean.parseBoolean(words[3]);
boolean lounge = Boolean.parseBoolean(words[4]);
String reserved = words[5];
return new Room(roomNum, type, price, balcony, lounge, reserved);
}) // Stream<Room>
.toArray(Room[]::new); // Room[]
}
For local variables use camelCase with a small letter in front.
The code uses the default character encoding of the system to convert the bytes in the file to java Unicode String. If you want all Unicode symbols,
you might store your list as Unicode UTF-8, and read them as follows:
try (Stream<String> in = Files.lines(file, StandardCharsets.UTF_8)) {
An other issue is the imprecise floating point double. You might use BigDecimal instead; it holds a precision:
BigDecimal price = new BigDecimal(words[2]);
It is however much more verbose, so you need to look at a couple of examples.

Ignore rows having columns less than number of headers in csv - SuperCSV [duplicate]

I am working on CSV parser requirement and I am using supercsv parser library. My CSV file can have 25 columns(separated by tab(|)) and up to 100k rows with additional header row.
I would like to ignore white-space only lines and lines containing less than 25 columns.
I am using IcvBeanReader with name mappings(to set csv values to pojo) and field processors(to handle validations) for reading a file.
I am assuming that Supercsv IcvBeanReader will skip white space lines by default. But how to handle if a row contains less than 25 column numbers?
You can easily do this by writing your own Tokenizer.
For example, the following Tokenizer will have the same behaviour as the default one, but will skip over any lines that don't have the correct number of columns.
public class SkipBadColumnCountTokenizer extends Tokenizer {
private final int expectedColumns;
private final List<Integer> ignoredLines = new ArrayList<>();
public SkipBadColumnCountTokenizer(Reader reader,
CsvPreference preferences, int expectedColumns) {
super(reader, preferences);
this.expectedColumns = expectedColumns;
}
#Override
public boolean readColumns(List<String> columns) throws IOException {
boolean moreInputExists;
while ((moreInputExists = super.readColumns(columns)) &&
columns.size() != this.expectedColumns){
System.out.println(String.format("Ignoring line %s with %d columns: %s", getLineNumber(), columns.size(), getUntokenizedRow()));
ignoredLines.add(getLineNumber());
}
return moreInputExists;
}
public List<Integer> getIgnoredLines(){
return this.ignoredLines;
}
}
And a simple test using this Tokenizer...
#Test
public void testInvalidRows() throws IOException {
String input = "column1,column2,column3\n" +
"has,three,columns\n" +
"only,two\n" +
"one\n" +
"three,columns,again\n" +
"one,too,many,columns";
CsvPreference preference = CsvPreference.EXCEL_PREFERENCE;
int expectedColumns = 3;
SkipBadColumnCountTokenizer tokenizer = new SkipBadColumnCountTokenizer(
new StringReader(input), preference, expectedColumns);
try (ICsvBeanReader beanReader = new CsvBeanReader(tokenizer, preference)) {
String[] header = beanReader.getHeader(true);
TestBean bean;
while ((bean = beanReader.read(TestBean.class, header)) != null){
System.out.println(bean);
}
System.out.println(String.format("Ignored lines: %s", tokenizer.getIgnoredLines()));
}
}
Prints the following output (notice how it's skipped all of the invalid rows):
TestBean{column1='has', column2='three', column3='columns'}
Ignoring line 3 with 2 columns: only,two
Ignoring line 4 with 1 columns: one
TestBean{column1='three', column2='columns', column3='again'}
Ignoring line 6 with 4 columns: one,too,many,columns
Ignored lines: [3, 4, 6]
(1) If the selection must be done by your Java program using Super CSV, then (and I quote) "you'll have to use CsvListReader". In particular: listReader.length()
See this Super CSV page for details.
(2) If you can perform the selection by preprocessing the CSV file, then you might wish to consider a suitable command-line tool (or tools, depending on how complicated the CSV format is). If the delimiter of the CSV file does not occur within any field, then awk would suffice. For example, if the assumption is satisfied, and if the delimiter is |, then the relevant awk filter could be as simple as:
awk -F'|' 'NF == 25 {print}'
If the CSV file format is too complex for a naive application of awk, then you may wish to convert the complex format to a simpler one; often TSV has much to recommend it.

How do I skip white-space only lines and lines having variable columns using supercsv

I am working on CSV parser requirement and I am using supercsv parser library. My CSV file can have 25 columns(separated by tab(|)) and up to 100k rows with additional header row.
I would like to ignore white-space only lines and lines containing less than 25 columns.
I am using IcvBeanReader with name mappings(to set csv values to pojo) and field processors(to handle validations) for reading a file.
I am assuming that Supercsv IcvBeanReader will skip white space lines by default. But how to handle if a row contains less than 25 column numbers?
You can easily do this by writing your own Tokenizer.
For example, the following Tokenizer will have the same behaviour as the default one, but will skip over any lines that don't have the correct number of columns.
public class SkipBadColumnCountTokenizer extends Tokenizer {
private final int expectedColumns;
private final List<Integer> ignoredLines = new ArrayList<>();
public SkipBadColumnCountTokenizer(Reader reader,
CsvPreference preferences, int expectedColumns) {
super(reader, preferences);
this.expectedColumns = expectedColumns;
}
#Override
public boolean readColumns(List<String> columns) throws IOException {
boolean moreInputExists;
while ((moreInputExists = super.readColumns(columns)) &&
columns.size() != this.expectedColumns){
System.out.println(String.format("Ignoring line %s with %d columns: %s", getLineNumber(), columns.size(), getUntokenizedRow()));
ignoredLines.add(getLineNumber());
}
return moreInputExists;
}
public List<Integer> getIgnoredLines(){
return this.ignoredLines;
}
}
And a simple test using this Tokenizer...
#Test
public void testInvalidRows() throws IOException {
String input = "column1,column2,column3\n" +
"has,three,columns\n" +
"only,two\n" +
"one\n" +
"three,columns,again\n" +
"one,too,many,columns";
CsvPreference preference = CsvPreference.EXCEL_PREFERENCE;
int expectedColumns = 3;
SkipBadColumnCountTokenizer tokenizer = new SkipBadColumnCountTokenizer(
new StringReader(input), preference, expectedColumns);
try (ICsvBeanReader beanReader = new CsvBeanReader(tokenizer, preference)) {
String[] header = beanReader.getHeader(true);
TestBean bean;
while ((bean = beanReader.read(TestBean.class, header)) != null){
System.out.println(bean);
}
System.out.println(String.format("Ignored lines: %s", tokenizer.getIgnoredLines()));
}
}
Prints the following output (notice how it's skipped all of the invalid rows):
TestBean{column1='has', column2='three', column3='columns'}
Ignoring line 3 with 2 columns: only,two
Ignoring line 4 with 1 columns: one
TestBean{column1='three', column2='columns', column3='again'}
Ignoring line 6 with 4 columns: one,too,many,columns
Ignored lines: [3, 4, 6]
(1) If the selection must be done by your Java program using Super CSV, then (and I quote) "you'll have to use CsvListReader". In particular: listReader.length()
See this Super CSV page for details.
(2) If you can perform the selection by preprocessing the CSV file, then you might wish to consider a suitable command-line tool (or tools, depending on how complicated the CSV format is). If the delimiter of the CSV file does not occur within any field, then awk would suffice. For example, if the assumption is satisfied, and if the delimiter is |, then the relevant awk filter could be as simple as:
awk -F'|' 'NF == 25 {print}'
If the CSV file format is too complex for a naive application of awk, then you may wish to convert the complex format to a simpler one; often TSV has much to recommend it.

Write file with SuperCsv preserving leading zeros while opening in excel

I was wondering if there is a way to keep the leading 0 while using SuperCsv.
My problem is that I have a few columns which have numbers with leading 0. I want to keep the 0, but excel keeps stripping it, I've also tried to append a few characters at the beginning of the number like ' = " but no good result.
Excel is displaying the first character which I've added at the beginning of the number, so the column value looks like =0222333111 , and that's because probably supercsv is wrapping the output between quotes.
I didn't find anything on the superCsv website and I guess I am not the only one who has this problem.
Should I migrate the to an Excel Java lib, or there is a workaround?
The CSV file format does not allow you to specify how the cells are treated by external programs. Even if the leading zeroes are written to the CSV file (please check that, if you have not already done so), Excel might think that it's smarter than you, that the leading zeroes are there by accident and discard them.
Even if there where some workarounds like adding all sorts of invisible Unicode characters, this is just a hack that is not guaranteed to work with other versions of Excel.
Therefore, CSV seems not to be an adequate file format for your requirements. Either switch to a different file format, or configure Excel to treat all cells as strings instead of numbers (I don't know how or if the latter is possible).
In supercsv, you can use custom cellprocessor below, it will append = in your cell value
public class PreserveLeadingZeroes extends CellProcessorAdaptor {
private static final Logger LOG = LoggerFactory.getLogger(PreserveLeadingZeroes.class);
public PreserveLeadingZeroes() {
super();
}
public PreserveLeadingZeroes(CellProcessor next) {
super(next);
}
public Object execute(Object value, CsvContext context) {
if (value == null) {
// LOG.debug("null customer code");
final String result = "";
return next.execute(result, context);
}
// LOG.debug("parse customer code : " + value.toString());
final String result = "=\"" + value.toString() + "\"";
return next.execute(result, context);
}
}

What approach to use for parsing a file with fixed length records, when the record layout isn't known until runtime?

I want to parse a file based on a record layout provided in another file.
Basically there will be a definition file, which is a comma delimited list of fields and their respective lengths. There will be many of these, a new one will be loaded each time I run the program.
firstName,text,20
middleInitial,text,1
lastName,text,20
salary,number,10
Then I will display a blank table with the supplied column headings, and an option to add data by clicking a button or whatever - I haven't decided yet.
I also want to have an option to both load data from a file, or save data to a file, with the file matching the format described in the definition file.
For example, a file to load (or one produced by the save function) for the above definition file might look like this.
Adam DSmith 50000
Brent GWilliams 45000
Harry TThompson 47500
What kind of patterns could be useful here, and can anyone give me pointers of a rough guide on how to structure the way data is internally stored and modeled.
I would like to think I can find my way around the java documentation alright, but if anyone can point me at somewhere to start looking, it would be greatly appreciated!
Thanks
So it sounds like to me that you have a howToParse file and infoToParse file with the directions of how to parse information and the information to parse in these files respectively.
First, I would read in the howToParse file and create some sort of dynamic Parser object. It looks like each line in this file is a different ParsingStep object. Then you just need to read the line which will be stored as a String object and just split the ParsingStep into its 3 parts: field name, type of data, length of data.
// Create new parser to hold parsing steps.
Parser dynamicParser = new Parser();
// Create new scanner to read through parse file.
Scanner parseFileScanner = new Scanner(howToParseFileName);
// *** Add exception handling as necessary *** this is just an example
// Read till end of file.
while (parseFileScanner.hasNext()) {
String line = parseFileScanner.nextLine(); // Get next line in file.
String[] lineSplit = line.split(","); // Split on comma
String fieldName = lineSplit[0];
String dataType = lineSplit[1];
String dataLength = lineSplit[2]; // Convert to Integer with Integer.parseInt();
ParsingStep step = new ParsingStep(fieldName, dataType, dataLength);
dynamicParser.addStep(step);
}
parseFileScanner.close();
Then you would have how to parse a line, then you just need to parse through the other file and store the information from that file, probably in an array.
// Open infoToParse file and start reading.
Scanner infoScanner = new Scanner(infoToParseFileName);
// Add exception handling.
while (infoScanner.hasNext()) {
String line = infoScanner.nextLine();
// Parse line and return a Person object or maybe just a Map of field names to values
Map<String,String> personMap = dynamicParser.parse(line);
}
infoScanner.close();
Then the only other code is just making sure the parser is parsing in the correct order.
public class Parser {
private ArrayList<ParsingStep> steps;
public Parser() {
steps = new ArrayList<ParsingStep>();
}
public void addStep(ParsingStep step) {
steps.add(step);
}
public Map<String,String> parse(String line) {
String remainingLine = line;
for (ParsingStep step : steps) {
remainingLine = step.parse(remainingLine);
}
return map; // Somehow convert to map.
}
}
Personally, I would add some error checking in the parse steps just in case the infoToParse file is not in the proper format.
Hope this helps.

Categories

Resources