Related
Values are separated with comma, following format:
Country,Timescale,Vendor,Units
Africa,2010 Q3,Fujitsu Siemens,2924.742632
I want to make array for every value. How can I do it?
I tried many things, code below:
BufferedReader br = null;
String line = "";
String cvsSplitBy = ",";
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
String[] country = line.split(cvsSplitBy);
country[0] +=",";
String[] Krajina = country[0].split(",");
What you appear to be talking about is utilizing what is otherwise known as Parallel Arrays and is generally a bad idea in this particular use case since it can be prone to OutOfBounds Exceptions later on down the road. A better solution would be to utilize a Two Dimensional (2D) Array or an ArrayList. Never the less, parallel arrays it is:
You say an array size of 30, well maybe today but tomorrow it might be 25 or 40 so in order to size your Arrays to hold the file data you will need to know how many lines of that actual raw data is contained within the CSV file (excluding Header, possible comments, and possible blank lines). The easiest way would be to just dump everything into separate ArrayList's and then convert them to their respective arrays later on be it String, int's, long's, double, whatever.
Counting file lines first so as to initialize Arrays:
One line of code can give you the number of lines contained within a supplied text file:
long count = Files.lines(Paths.get("C:\\MyDataFiles\\DataFile.csv")).count();
In reality however, on its own the above code line does need to be enclosed within a try/catch block in case of a IO Exception so there is a wee bit more code than a single line. For a simple use case where the CSV file contains a Header Line and no Comment or Blank lines this could be all you need since all you would need to do is subtract one to eliminate the Header Line from the overall count for initializing your Arrays. Another minor issue with the above one-liner is the fact that it provides a count value in a Long Integer (long) data type. This is no good since Java Arrays will only accept Integer (int) values for initialization therefore the value obtained will need to be cast to int, for example:
String[] countries = new String[(int) count];
and this is only good if count does not exceed the Integer.MAX_VALUE - 2 (2147483645). That's a lot of array elements so in general you wouldn't really have a problem with this but if are dealing with extremely large array initializations then you will also need to consider JVM Memory and running out of it.
Sometimes it's just nice to have a method that could be used for a multitude of different situations when getting the total number of raw data lines from a CSV (or other) text file. The provided method below is obviously more than a single line of code but it does provide a little more flexibility towards what to count in a file. As mentioned earlier there is the possibility of a Header Line. A Header line is very common in CSV files and it is usually the first line within the file but this may not always be the case. The Header line could be preceded with a Comment Line of even a Blank Line. The Header line however should always be the first line before the raw data lines. Here is an example of a possible CSV file:
Example CSV file contents:
# Units Summary Report
# End Date: May 27, 2019
Country,TimeScale,Vendor,Units
Czech Republic,2010 Q3,Fujitsu Siemens,2924.742032
Slovakia,2010 Q4,Dell,2525r.011404
Slovakia,2010 Q4,Lenovo,2648.973238
Czech Republic,2010 Q3,ASUS,1323.507139
Czech Republic,2010 Q4,Apple,266.7584542
The first two lines are Comment Lines and Comment Lines always begin with either a Hash (#) character or a Semicolon (;). These lines are to be ignored when read.
The third line is a Blank Line and serves absolutely no purpose other than aesthetics (easier on the eyes I suppose). These lines are also to be ignored.
The fourth line which is directly above the raw data lines is the Header Line. This line may or may not be contained within a CSV file. Its purpose is to provide the Column Names for the data records contained on each raw data line. This line can be read (if it exists) to acquire record field (column) names.
The remaining lines within the CSV file are Raw Data Lines otherwise considered data records. Each line is a complete record and each delimited element of that record is considered a data field value. These are the lines you want to count so as to initialize your different Arrays. Here is a method that allows you to do that:
The fileLinesCount() Method:
/**
* Counts the number of lines within the supplied Text file. Which lines are
* counted depends upon the optional arguments supplied. By default, all
* file lines are counted.<br><br>
*
* #param filePath (String) The file path and name of file (with
* extension) to count lines in.<br>
*
* #param countOptions (Optional - Boolean) Three Optional Parameters. If an
* optional argument is provided then the preceeding
* optional argument MUST also be provided (be it true
* or false):<pre>
*
* ignoreHeader - Default is false. If true is passed then a value of
* one (1) is subtracted from the sum of lines detected.
* You must know for a fact that a header exists before
* passing <b>true</b> to this optional parameter.
*
* ignoreComments - Default is false. If true is passed then comment lines
* are ignored from the count. Only file lines (after being
* trimmed) which <b>start with</b> either a semicolon (;) or a
* hash (#) character are considered a comment line. These
* characters are typical for comment lines in CSV files and
* many other text file formats.
*
* ignoreBlanks - Default is false. If true is passed then file lines
* which contain nothing after they are trimmed is ignored
* in the count.
*
* <u>When a line is Trimmed:</u>
* If the String_Object represents an empty character
* sequence then reference to this String_Object is
* returned. If both the first & last character of the
* String_Object have codes greater than unicode ‘\u0020’
* (the space character) then reference to this String_Object
* is returned. When there is no character with a code
* greater than unicode ‘\u0020’ (the space character)
* then an empty string is created and returned.
*
* As an example, a trimmed line removes leading and
* trailing whitespaces, tabs, Carriage Returns, and
* Line Feeds.</pre>
*
* #return (Long) The number of lines contained within the supplied text
* file.
*/
public long fileLinesCount(final String filePath, final boolean... countOptions) {
// Defaults for optional parameters.
final boolean ignoreHeader = (countOptions.length >= 1 ? countOptions[0] : false);
// Only strings in lines that start with ';' or '#' are considered comments.
final boolean ignoreComments = (countOptions.length >= 2 ? countOptions[1] : false);
// All lines that when trimmed contain nothing (null string).
final boolean ignoreBlanks = (countOptions.length >= 3 ? countOptions[2] : false);
long count = 0; // lines Count variable to hold the number of lines.
// Gather supplied arguments for optional parameters
try {
if (ignoreBlanks) {
// Using lambda along with Ternary Operator
count = Files.lines(Paths.get(filePath)).filter(line -> (ignoreComments
? (!line.trim().startsWith(";") && !line.trim().startsWith("#"))
&& line.trim().length() > 0 : line.trim().length() > 0)).count();
if (ignoreHeader) {
count--;
}
return count;
}
if (ignoreComments) {
// Using lambda along with Ternary Operator
count = Files.lines(Paths.get(filePath)).filter(line -> (ignoreBlanks ? line.trim().length() > 0
&& (!line.trim().startsWith(";") && !line.trim().startsWith("#"))
: (!line.trim().startsWith(";") && !line.trim().startsWith("#")))).count();
if (ignoreHeader) {
count--;
}
return count;
}
else {
count = Files.lines(Paths.get(filePath)).count();
if (ignoreHeader) {
count--;
}
}
}
catch (IOException ex) {
Logger.getLogger("fileLinesCount() Method Error!").log(Level.SEVERE, null, ex);
}
return count;
}
Filling the Parallel Arrays:
Now it time to create a method to fill the desired Arrays and by looking at the data file it look like you need three String type arrays and one double type Array. You may want to make these instance or Class member variables:
// Instance (Class Member) variables:
String[] country;
String[] timeScale;
String[] vendor;
double[] units;
then for filling these arrays we would use an method like this:
/**
* Fills the 4 class member array variables country[], timeScale[], vendor[],
* and units[] with data obtained from the supplied CSV data file.<br><br>
*
* #param filePath (String) Full Path and file name of the CSV data file.<br>
*
* #param fileHasHeader (Boolean) Either true or false. Supply true if the CSV
* file does contain a Header and false if it does not.
*/
public void fillDataArrays(String filePath, boolean fileHasHeader) {
long dataCount = fileLinesCount(filePath, fileHasHeader, true, true);
/* Java Arrays will not accept the long data type for sizing
therefore we cast to int. */
country = new String[(int) dataCount];
timeScale = new String[(int) dataCount];
vendor = new String[(int) dataCount];
units = new double[(int) dataCount];
int lineCounter = 0; // counts all lines contained within the supplied text file
try (Scanner reader = new Scanner(new File("DataFile.txt"))) {
int indexCounter = 0;
while (reader.hasNextLine()) {
lineCounter++;
String line = reader.nextLine().trim();
// Skip comment and blank file lines.
if (line.startsWith(";") || line.startsWith("#") || line.equals("")) {
continue;
}
if (indexCounter == 0 && fileHasHeader) {
/* Since we are skipping the header right away we
now no longer need the fileHasHeader flag. */
fileHasHeader = false;
continue; // Skip the first line of data since it's a header
}
/* Split the raw data line based on a comma (,) delimiter.
The Regular Expression (\\s{0,},\\s{0,}") ensures that
it doesn't matter how many spaces (if any at all) are
before OR after the comma, the split removes those
unwanted spaces, even tabs are removed if any.
*/
String[] splitLine = line.split("\\s{0,},\\s{0,}");
country[indexCounter] = splitLine[0];
timeScale[indexCounter] = splitLine[1];
vendor[indexCounter] = splitLine[2];
/* The Regular Expression ("-?\\d+(\\.\\d+)?") below ensures
that the value contained within what it to be the Units
element of the split array is actually a string representation
of a signed or unsigned integer or double/float numerical value.
*/
if (splitLine[3].matches("-?\\d+(\\.\\d+)?")) {
units[indexCounter] = Double.parseDouble(splitLine[3]);
}
else {
JOptionPane.showMessageDialog(this, "<html>An invalid Units value (<b><font color=blue>" +
splitLine[3] + "</font></b>) has been detected<br>in data file line number <b><font " +
"color=red>" + lineCounter + "</font></b>. A value of <b>0.0</b> has been applied<br>to " +
"the Units Array to replace the data provided on the data<br>line which consists of: " +
"<br><br><b><center>" + line + "</center></b>.", "Invalid Units Value Detected!",
JOptionPane.WARNING_MESSAGE);
units[indexCounter] = 0.0d;
}
indexCounter++;
}
}
catch (IOException ex) {
Logger.getLogger("fillDataArrays() ethod Error!").log(Level.SEVERE, null, ex);
}
}
To get the ball rolling just run the following code:
/// Fill the Arrays with data.
fillDataArrays("DataFile.txt", true);
// Display the filled Arrays.
System.out.println(Arrays.toString(country));
System.out.println(Arrays.toString(timeScale));
System.out.println(Arrays.toString(vendor));
System.out.println(Arrays.toString(units));
You have to define your arrays before processing your file :
String[] country = new String[30];
String[] timescale = new String[30];
String[] vendor = new String[30];
String[] units = new String[30];
And while reading lines you have to put the values in the defined arrays with the same index, to keep the index use another variable and increase it at every iteration. It should look like this:
int index = 0;
while (true) {
if (!((line = br.readLine()) != null)) break;
String[] splitted = line.split(",");
country[index] = splitted[0];
timescale[index] = splitted[1];
vendor[index] = splitted[2];
units[index] = splitted[3];
index++;
}
Since your csv would probably include headers in it, you may also want to skip the first line too.
Always try to use try-with-resources when using I/O
The following code should help you out:
String line = "";
String cvsSplitBy = ",";
List<String> countries = new ArrayList<>();
List<String> timeScales = new ArrayList<>();
List<String> vendors = new ArrayList<>();
List<String> units = new ArrayList<>();
//use try-with resources
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
String[] parts = line.split(cvsSplitBy);
countries.add(parts[0]);
timeScales.add(parts[1]);
vendors.add(parts[2]);
units.add(parts[3]);
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
for (String country: countries) {
System.out.println(country);
}
for (String scale: timeScales) {
System.out.println(scale);
}
for (String vendor: vendors) {
System.out.println(vendor);
}
for (String unit: units) {
System.out.println(unit);
}
My csv is getting read into the System.out, but I've noticed that any text with a space gets moved into the next line (as a return \n)
Here's how my csv starts:
first,last,email,address 1, address 2
john,smith,blah#blah.com,123 St. Street,
Jane,Smith,blech#blech.com,4455 Roger Cir,apt 2
After running my app, any cell with a space (address 1), gets thrown onto the next line.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class main {
public static void main(String[] args) {
// -define .csv file in app
String fileNameDefined = "uploadedcsv/employees.csv";
// -File class needed to turn stringName to actual file
File file = new File(fileNameDefined);
try{
// -read from filePooped with Scanner class
Scanner inputStream = new Scanner(file);
// hashNext() loops line-by-line
while(inputStream.hasNext()){
//read single line, put in string
String data = inputStream.next();
System.out.println(data + "***");
}
// after loop, close scanner
inputStream.close();
}catch (FileNotFoundException e){
e.printStackTrace();
}
}
}
So here's the result in the console:
first,last,email,address
1,address
2
john,smith,blah#blah.com,123
St.
Street,
Jane,Smith,blech#blech.com,4455
Roger
Cir,apt
2
Am I using Scanner incorrectly?
Please stop writing faulty CSV parsers!
I've seen hundreds of CSV parsers and so called tutorials for them online.
Nearly every one of them gets it wrong!
This wouldn't be such a bad thing as it doesn't affect me but people who try to write CSV readers and get it wrong tend to write CSV writers, too. And get them wrong as well. And these ones I have to write parsers for.
Please keep in mind that CSV (in order of increasing not so obviousness):
can have quoting characters around values
can have other quoting characters than "
can even have other quoting characters than " and '
can have no quoting characters at all
can even have quoting characters on some values and none on others
can have other separators than , and ;
can have whitespace between seperators and (quoted) values
can have other charsets than ascii
should have the same number of values in each row, but doesn't always
can contain empty fields, either quoted: "foo","","bar" or not: "foo",,"bar"
can contain newlines in values
can not contain newlines in values if they are not delimited
can not contain newlines between values
can have the delimiting character within the value if properly escaped
does not use backslash to escape delimiters but...
uses the quoting character itself to escape it, e.g. Frodo's Ring will be 'Frodo''s Ring'
can have the quoting character at beginning or end of value, or even as only character ("foo""", """bar", """")
can even have the quoted character within the not quoted value; this one is not escaped
If you think this is obvious not a problem, then think again. I've seen every single one of these items implemented wrongly. Even in major software packages. (e.g. Office-Suites, CRM Systems)
There are good and correctly working out-of-the-box CSV readers and writers out there:
opencsv
Ostermiller Java Utilities
Apache Commons CSV
If you insist on writing your own at least read the (very short) RFC for CSV.
scanner.useDelimiter(",");
This should work.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class TestScanner {
public static void main(String[] args) throws FileNotFoundException {
Scanner scanner = new Scanner(new File("/Users/pankaj/abc.csv"));
scanner.useDelimiter(",");
while(scanner.hasNext()){
System.out.print(scanner.next()+"|");
}
scanner.close();
}
}
For CSV File:
a,b,c d,e
1,2,3 4,5
X,Y,Z A,B
Output is:
a|b|c d|e
1|2|3 4|5
X|Y|Z A|B|
Scanner.next() does not read a newline but reads the next token, delimited by whitespace (by default, if useDelimiter() was not used to change the delimiter pattern). To read a line use Scanner.nextLine().
Once you read a single line you can use String.split(",") to separate the line into fields. This enables identification of lines that do not consist of the required number of fields. Using useDelimiter(","); would ignore the line-based structure of the file (each line consists of a list of fields separated by a comma). For example:
while (inputStream.hasNextLine())
{
String line = inputStream.nextLine();
String[] fields = line.split(",");
if (fields.length >= 4) // At least one address specified.
{
for (String field: fields) System.out.print(field + "|");
System.out.println();
}
else
{
System.err.println("Invalid record: " + line);
}
}
As already mentioned, using a CSV library is recommended. For one, this (and useDelimiter(",") solution) will not correctly handle quoted identifiers containing , characters.
I agree with Scheintod that using an existing CSV library is a good idea to have RFC-4180-compliance from the start. Besides the mentioned OpenCSV and Oster Miller, there are a series of other CSV libraries out there. If you're interested in performance, you can take a look at the uniVocity/csv-parsers-comparison. It shows that
uniVocity CSV parser
SimpleFlatMapper CSV parser
Jackson CSV parser
are consistently the fastest using either JDK 6, 7, 8, or 9. The study did not find any RFC 4180 compatibility issues in any of those three. Both OpenCSV and Oster Miller are found to be about twice as slow as those.
I'm not in any way associated with the author(s), but concerning the uniVocity CSV parser, the study might be biased due to its author being the same as of that parser.
To note, the author of SimpleFlatMapper has also published a performance comparison comparing only those three.
Split nextLine() by this delimiter:
(?=([^\"]*\"[^\"]*\")*[^\"]*$)").
I have seen many production problems caused by code not handling quotes ("), newline characters within quotes, and quotes within the quotes; e.g.: "he said ""this""" should be parsed into: he said "this"
Like it was mentioned earlier, many CSV parsing examples out there just read a line, and then break up the line by the separator character. This is rather incomplete and problematic.
For me and probably those who prefer build verses buy (or use somebody else's code and deal with their dependencies), I got down to classic text parsing programming and that worked for me:
/**
* Parse CSV data into an array of String arrays. It handles double quoted values.
* #param is input stream
* #param separator
* #param trimValues
* #param skipEmptyLines
* #return an array of String arrays
* #throws IOException
*/
public static String[][] parseCsvData(InputStream is, char separator, boolean trimValues, boolean skipEmptyLines)
throws IOException
{
ArrayList<String[]> data = new ArrayList<String[]>();
ArrayList<String> row = new ArrayList<String>();
StringBuffer value = new StringBuffer();
int ch = -1;
int prevCh = -1;
boolean inQuotedValue = false;
boolean quoteAtStart = false;
boolean rowIsEmpty = true;
boolean isEOF = false;
while (true)
{
prevCh = ch;
ch = (isEOF) ? -1 : is.read();
// Handle carriage return line feed
if (prevCh == '\r' && ch == '\n')
{
continue;
}
if (inQuotedValue)
{
if (ch == -1)
{
inQuotedValue = false;
isEOF = true;
}
else
{
value.append((char)ch);
if (ch == '"')
{
inQuotedValue = false;
}
}
}
else if (ch == separator || ch == '\r' || ch == '\n' || ch == -1)
{
// Add the value to the row
String s = value.toString();
if (quoteAtStart && s.endsWith("\""))
{
s = s.substring(1, s.length() - 1);
}
if (trimValues)
{
s = s.trim();
}
rowIsEmpty = (s.length() > 0) ? false : rowIsEmpty;
row.add(s);
value.setLength(0);
if (ch == '\r' || ch == '\n' || ch == -1)
{
// Add the row to the result
if (!skipEmptyLines || !rowIsEmpty)
{
data.add(row.toArray(new String[0]));
}
row.clear();
rowIsEmpty = true;
if (ch == -1)
{
break;
}
}
}
else if (prevCh == '"')
{
inQuotedValue = true;
}
else
{
if (ch == '"')
{
inQuotedValue = true;
quoteAtStart = (value.length() == 0) ? true : false;
}
value.append((char)ch);
}
}
return data.toArray(new String[0][]);
}
Unit Test:
String[][] data = parseCsvData(new ByteArrayInputStream("foo,\"\",,\"bar\",\"\"\"music\"\"\",\"carriage\r\nreturn\",\"new\nline\"\r\nnext,line".getBytes()), ',', true, true);
for (int rowIdx = 0; rowIdx < data.length; rowIdx++)
{
System.out.println(Arrays.asList(data[rowIdx]));
}
generates the output:
[foo, , , bar, "music", carriage
return, new
line]
[next, line]
If you absolutely must use Scanner, then you must set its delimiter via its useDelimiter(...) method. Else it will default to using all white space as its delimiter. Better though as has already been stated -- use a CSV library since this is what they do best.
For example, this delimiter will split on commas with or without surrounding whitespace:
scanner.useDelimiter("\\s*,\\s*");
Please check out the java.util.Scanner API for more on this.
Well, I do my coding in NetBeans 8.1:
First: Create a new project, select Java application and name your project.
Then modify your code after public class to look like the following:
/**
* #param args the command line arguments
* #throws java.io.FileNotFoundException
*/
public static void main(String[] args) throws FileNotFoundException {
try (Scanner scanner = new Scanner(new File("C:\\Users\\YourName\\Folder\\file.csv"))) {
scanner.useDelimiter(",");
while(scanner.hasNext()){
System.out.print(scanner.next()+"|");
}}
}
}
This is a problem I've encountered several times, and always wondered why.
For my code below as an example, if a string of whitespace is entered, the method will not print. However, after the next input with a value string containing characters, it will print all the whitespace strings and the valid character containing string. Why is this delayed and stored in memory?
Example for the code below:
Enter " " returns nothing.
Enter " " returns nothing.
Enter "SwiggitySwooty" returns " " \n " " \n "SwiggitySwooty"
Explaination: The whitespace containing strings are delayed until a valid character string is entered.
Extra info: I use intellij, also happens when not sending the string to a method. I've also had this happen during a while(input.hasNext()) statement, in which I try to catch an invalid input as a string, when I want to take an integer. If I enter 'n' amount of legitimate integers, and then a string, it would print out my "please enter an integer" that 'n' amount of times like in this code.
Lastly, if anyone thinks of a better title for this, let me know so I can change it for more exposure for those with similar questions. Thank you.
Let me know if you guys need anything else!
/**
* Created by JacobHein on 4/19/15.
*/
import java.util.Scanner;
public class FizzString {
/*TODO
* Given a string str, if the string starts with "f" return "Fizz".
If the string ends
* with "b" return "Buzz". If both the "f" and "b" conditions are true, return
* "FizzBuzz". In all other cases, return the string unchanged. */
public static void main(String[] args) {
Scanner input=new Scanner(System.in);
while(input.hasNext()) {
System.out.println(fizzString(input.nextLine()));
}
}
public static String fizzString(String str) {
String result=str;
int l=str.length();
if (str.charAt(0)=='f'||str.charAt(l-1)=='b') {
result="";
if (l>0) {
if (str.charAt(0)=='f') {
result="Fizz";
}
if (str.charAt(0)=='b') {
result="Buzz";
}
if (l>1) {
/*technique: continue the process if l>1 (within l>0 if statement),
prevents breaking the program.*/
if (str.charAt(l-1)=='b') {
result="Buzz";
}
if (str.charAt(0)=='f'&&str.charAt(l-1)=='b') {
result="FizzBuzz";
}
}/*end l>1*/
}/*end l>0*/
}/*end charAt if*/
return result;
}
}
I believe this is what you're looking for:
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
String inputLine = "";
do {
inputLine = input.nextLine();
System.out.println(fizzString(inputLine));
System.out.println("");
} while (inputLine.length() > 0);
System.out.println("Goodbye");
}
public static String fizzString(String str) {
// Given a string
// If both the "f" and "b" conditions are true, return FizzBuzz
if (str.startsWith("f") && str.endsWith("b")) {
return "FizzBuzz";
}
// If the string starts with "f" return "Fizz".
if (str.startsWith("f")) {
return "Fizz";
}
// If the string ends with "b" return "Buzz".
if (str.endsWith("b")) {
return "Buzz";
}
// In all other cases, return the string unchanged.
return str;
}
Results:
The problem is the behavior of the Scanner class:
The next and hasNext methods and their primitive-type companion methods
(such as nextInt and hasNextInt) first skip any input that matches the
delimiter pattern, and then attempt to return the next token. Both
hasNext and next methods may block waiting for further input. Whether a
hasNext method blocks has no connection to whether or not its
associated next method will block.
Internally, the Scanner class is performing the following operation:
while (!sourceClosed) {
if (hasTokenInBuffer())
return revertState(true);
readInput();
}
The method hasTokenInBuffer() skips all the delimiter tokens (by default is \p{javaWhitespace}+), so only when the class found a non-delimiter token, it returns true in the hasNext() method.
For example, if you type this content: "\n\n\n5\n" and then execute nextInt() method, you'll obtain a result of 5, because Scanner automatically skips all the return line characters.
If you want to find some string into a line, try with the method java.util.Scanner.findInLine instead of nextLine().
Use the patterns: ^f(.)* to look for every line that starts with a f character and the pattern (.)*b$ to look for every line that ends with a b character.
My csv is getting read into the System.out, but I've noticed that any text with a space gets moved into the next line (as a return \n)
Here's how my csv starts:
first,last,email,address 1, address 2
john,smith,blah#blah.com,123 St. Street,
Jane,Smith,blech#blech.com,4455 Roger Cir,apt 2
After running my app, any cell with a space (address 1), gets thrown onto the next line.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class main {
public static void main(String[] args) {
// -define .csv file in app
String fileNameDefined = "uploadedcsv/employees.csv";
// -File class needed to turn stringName to actual file
File file = new File(fileNameDefined);
try{
// -read from filePooped with Scanner class
Scanner inputStream = new Scanner(file);
// hashNext() loops line-by-line
while(inputStream.hasNext()){
//read single line, put in string
String data = inputStream.next();
System.out.println(data + "***");
}
// after loop, close scanner
inputStream.close();
}catch (FileNotFoundException e){
e.printStackTrace();
}
}
}
So here's the result in the console:
first,last,email,address
1,address
2
john,smith,blah#blah.com,123
St.
Street,
Jane,Smith,blech#blech.com,4455
Roger
Cir,apt
2
Am I using Scanner incorrectly?
Please stop writing faulty CSV parsers!
I've seen hundreds of CSV parsers and so called tutorials for them online.
Nearly every one of them gets it wrong!
This wouldn't be such a bad thing as it doesn't affect me but people who try to write CSV readers and get it wrong tend to write CSV writers, too. And get them wrong as well. And these ones I have to write parsers for.
Please keep in mind that CSV (in order of increasing not so obviousness):
can have quoting characters around values
can have other quoting characters than "
can even have other quoting characters than " and '
can have no quoting characters at all
can even have quoting characters on some values and none on others
can have other separators than , and ;
can have whitespace between seperators and (quoted) values
can have other charsets than ascii
should have the same number of values in each row, but doesn't always
can contain empty fields, either quoted: "foo","","bar" or not: "foo",,"bar"
can contain newlines in values
can not contain newlines in values if they are not delimited
can not contain newlines between values
can have the delimiting character within the value if properly escaped
does not use backslash to escape delimiters but...
uses the quoting character itself to escape it, e.g. Frodo's Ring will be 'Frodo''s Ring'
can have the quoting character at beginning or end of value, or even as only character ("foo""", """bar", """")
can even have the quoted character within the not quoted value; this one is not escaped
If you think this is obvious not a problem, then think again. I've seen every single one of these items implemented wrongly. Even in major software packages. (e.g. Office-Suites, CRM Systems)
There are good and correctly working out-of-the-box CSV readers and writers out there:
opencsv
Ostermiller Java Utilities
Apache Commons CSV
If you insist on writing your own at least read the (very short) RFC for CSV.
scanner.useDelimiter(",");
This should work.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class TestScanner {
public static void main(String[] args) throws FileNotFoundException {
Scanner scanner = new Scanner(new File("/Users/pankaj/abc.csv"));
scanner.useDelimiter(",");
while(scanner.hasNext()){
System.out.print(scanner.next()+"|");
}
scanner.close();
}
}
For CSV File:
a,b,c d,e
1,2,3 4,5
X,Y,Z A,B
Output is:
a|b|c d|e
1|2|3 4|5
X|Y|Z A|B|
Scanner.next() does not read a newline but reads the next token, delimited by whitespace (by default, if useDelimiter() was not used to change the delimiter pattern). To read a line use Scanner.nextLine().
Once you read a single line you can use String.split(",") to separate the line into fields. This enables identification of lines that do not consist of the required number of fields. Using useDelimiter(","); would ignore the line-based structure of the file (each line consists of a list of fields separated by a comma). For example:
while (inputStream.hasNextLine())
{
String line = inputStream.nextLine();
String[] fields = line.split(",");
if (fields.length >= 4) // At least one address specified.
{
for (String field: fields) System.out.print(field + "|");
System.out.println();
}
else
{
System.err.println("Invalid record: " + line);
}
}
As already mentioned, using a CSV library is recommended. For one, this (and useDelimiter(",") solution) will not correctly handle quoted identifiers containing , characters.
I agree with Scheintod that using an existing CSV library is a good idea to have RFC-4180-compliance from the start. Besides the mentioned OpenCSV and Oster Miller, there are a series of other CSV libraries out there. If you're interested in performance, you can take a look at the uniVocity/csv-parsers-comparison. It shows that
uniVocity CSV parser
SimpleFlatMapper CSV parser
Jackson CSV parser
are consistently the fastest using either JDK 6, 7, 8, or 9. The study did not find any RFC 4180 compatibility issues in any of those three. Both OpenCSV and Oster Miller are found to be about twice as slow as those.
I'm not in any way associated with the author(s), but concerning the uniVocity CSV parser, the study might be biased due to its author being the same as of that parser.
To note, the author of SimpleFlatMapper has also published a performance comparison comparing only those three.
Split nextLine() by this delimiter:
(?=([^\"]*\"[^\"]*\")*[^\"]*$)").
I have seen many production problems caused by code not handling quotes ("), newline characters within quotes, and quotes within the quotes; e.g.: "he said ""this""" should be parsed into: he said "this"
Like it was mentioned earlier, many CSV parsing examples out there just read a line, and then break up the line by the separator character. This is rather incomplete and problematic.
For me and probably those who prefer build verses buy (or use somebody else's code and deal with their dependencies), I got down to classic text parsing programming and that worked for me:
/**
* Parse CSV data into an array of String arrays. It handles double quoted values.
* #param is input stream
* #param separator
* #param trimValues
* #param skipEmptyLines
* #return an array of String arrays
* #throws IOException
*/
public static String[][] parseCsvData(InputStream is, char separator, boolean trimValues, boolean skipEmptyLines)
throws IOException
{
ArrayList<String[]> data = new ArrayList<String[]>();
ArrayList<String> row = new ArrayList<String>();
StringBuffer value = new StringBuffer();
int ch = -1;
int prevCh = -1;
boolean inQuotedValue = false;
boolean quoteAtStart = false;
boolean rowIsEmpty = true;
boolean isEOF = false;
while (true)
{
prevCh = ch;
ch = (isEOF) ? -1 : is.read();
// Handle carriage return line feed
if (prevCh == '\r' && ch == '\n')
{
continue;
}
if (inQuotedValue)
{
if (ch == -1)
{
inQuotedValue = false;
isEOF = true;
}
else
{
value.append((char)ch);
if (ch == '"')
{
inQuotedValue = false;
}
}
}
else if (ch == separator || ch == '\r' || ch == '\n' || ch == -1)
{
// Add the value to the row
String s = value.toString();
if (quoteAtStart && s.endsWith("\""))
{
s = s.substring(1, s.length() - 1);
}
if (trimValues)
{
s = s.trim();
}
rowIsEmpty = (s.length() > 0) ? false : rowIsEmpty;
row.add(s);
value.setLength(0);
if (ch == '\r' || ch == '\n' || ch == -1)
{
// Add the row to the result
if (!skipEmptyLines || !rowIsEmpty)
{
data.add(row.toArray(new String[0]));
}
row.clear();
rowIsEmpty = true;
if (ch == -1)
{
break;
}
}
}
else if (prevCh == '"')
{
inQuotedValue = true;
}
else
{
if (ch == '"')
{
inQuotedValue = true;
quoteAtStart = (value.length() == 0) ? true : false;
}
value.append((char)ch);
}
}
return data.toArray(new String[0][]);
}
Unit Test:
String[][] data = parseCsvData(new ByteArrayInputStream("foo,\"\",,\"bar\",\"\"\"music\"\"\",\"carriage\r\nreturn\",\"new\nline\"\r\nnext,line".getBytes()), ',', true, true);
for (int rowIdx = 0; rowIdx < data.length; rowIdx++)
{
System.out.println(Arrays.asList(data[rowIdx]));
}
generates the output:
[foo, , , bar, "music", carriage
return, new
line]
[next, line]
If you absolutely must use Scanner, then you must set its delimiter via its useDelimiter(...) method. Else it will default to using all white space as its delimiter. Better though as has already been stated -- use a CSV library since this is what they do best.
For example, this delimiter will split on commas with or without surrounding whitespace:
scanner.useDelimiter("\\s*,\\s*");
Please check out the java.util.Scanner API for more on this.
Well, I do my coding in NetBeans 8.1:
First: Create a new project, select Java application and name your project.
Then modify your code after public class to look like the following:
/**
* #param args the command line arguments
* #throws java.io.FileNotFoundException
*/
public static void main(String[] args) throws FileNotFoundException {
try (Scanner scanner = new Scanner(new File("C:\\Users\\YourName\\Folder\\file.csv"))) {
scanner.useDelimiter(",");
while(scanner.hasNext()){
System.out.print(scanner.next()+"|");
}}
}
}
I am working on a class assignment this morning and I want to try and solve a problem I have noticed in all of my team mates programs so far; the fact that spaces in an int/float/double cause Java to freak out.
To solve this issue I had a very crazy idea but it does work under certain circumstances. However the problem is that is does not always work and I cannot figure out why. Here is my "main" method:
import java.util.Scanner; //needed for scanner class
public class Test2
{
public static void main(String[] args)
{
BugChecking bc = new BugChecking();
String i;
double i2 = 0;
Scanner in = new Scanner(System.in);
System.out.println("Please enter a positive integer");
while (i2 <= 0.0)
{
i = in.nextLine();
i = bc.deleteSpaces(i);
//cast back to float
i2 = Double.parseDouble(i);
if (i2 <= 0.0)
{
System.out.println("Please enter a number greater than 0.");
}
}
in.close();
System.out.println(i2);
}
}
So here is the class, note that I am working with floats but I made it so that it can be used for any type so long as it can be cast to a string:
public class BugChecking
{
BugChecking()
{
}
public String deleteSpaces(String s)
{
//convert string into a char array
char[] cArray = s.toCharArray();
//now use for loop to find and remove spaces
for (i3 = 0; i3 < cArray.length; i3++)
{
if ((Character.isWhitespace(cArray[i3])) && (i3 != cArray.length)) //If current element contains a space remove it via overwrite
{
for (i4 = i3; i4 < cArray.length-1;i4++)
{
//move array elements over by one element
storage1 = cArray[i4+1];
cArray[i4] = storage1;
}
}
}
s = new String(cArray);
return s;
}
int i3; //for iteration
int i4; //for iteration
char storage1; //for storage
}
Now, the goal is to remove spaces from the array in order to fix the problem stated at the beginning of the post and from what I can tell this code should achieve that and it does, but only when the first character of an input is the space.
For example, if I input " 2.0332" the output is "2.0332".
However if I input "2.03 445 " the output is "2.03" and the rest gets lost somewhere.
This second example is what I am trying to figure out how to fix.
EDIT:
David's suggestion below was able to fix the problem. Bypassed sending an int. Send it directly as a string then convert (I always heard this described as casting) to desired variable type. Corrected code put in place above in the Main method.
A little side note, if you plan on using this even though replace is much easier, be sure to add an && check to the if statement in deleteSpaces to make sure that the if statement only executes if you are not on the final array element of cArray. If you pass the last element value via i3 to the next for loop which sets i4 to the value of i3 it will trigger an OutOfBounds error I think since it will only check up to the last element - 1.
If you'd like to get rid of all white spaces inbetween a String use replaceAll(String regex,String replacement) or replace(char oldChar, char newChar):
String sBefore = "2.03 445 ";
String sAfter = sBefore.replaceAll("\\s+", "");//replace white space and tabs
//String sAfter = sBefore.replace(' ', '');//replace white space only
double i = 0;
try {
i = Double.parseDouble(sAfter);//parse to integer
} catch (NumberFormatException nfe) {
nfe.printStackTrace();
}
System.out.println(i);//2.03445
UPDATE:
Looking at your code snippet the problem might be that you read it directly as a float/int/double (thus entering a whitespace stops the nextFloat()) rather read the input as a String using nextLine(), delete the white spaces then attempt to convert it to the appropriate format.
This seems to work fine for me:
public static void main(String[] args) {
//bugChecking bc = new bugChecking();
float i = 0.0f;
String tmp = "";
Scanner in = new Scanner(System.in);
System.out.println("Please enter a positive integer");
while (true) {
tmp = in.nextLine();//read line
tmp = tmp.replaceAll("\\s+", "");//get rid of spaces
if (tmp.isEmpty()) {//wrong input
System.err.println("Please enter a number greater than 0.");
} else {//correct input
try{//attempt to convert sring to float
i = new Float(tmp);
}catch(NumberFormatException nfe) {
System.err.println(nfe.getMessage());
}
System.out.println(i);
break;//got correct input halt loop
}
}
in.close();
}
EDIT:
as a side note please start all class names with a capital letter i.e bugChecking class should be BugChecking the same applies for test2 class it should be Test2
String objects have methods on them that allow you to do this kind of thing. The one you want in particular is String.replace. This pretty much does what you're trying to do for you.
String input = " 2.03 445 ";
input = input.replace(" ", ""); // "2.03445"
You could also use regular expressions to replace more than just spaces. For example, to get rid of everything that isn't a digit or a period:
String input = "123,232 . 03 445 ";
input = input.replaceAll("[^\\d.]", ""); // "123232.03445"
This will replace any non-digit, non-period character so that you're left with only those characters in the input. See the javadocs for Pattern to learn a bit about regular expressions, or search for one of the many tutorials available online.
Edit: One other remark, String.trim will remove all whitespace from the beginning and end of your string to turn " 2.0332" into "2.0332":
String input = " 2.0332 ";
input = input.trim(); // "2.0332"
Edit 2: With your update, I see the problem now. Scanner.nextFloat is what's breaking on the space. If you change your code to use Scanner.nextLine like so:
while (i <= 0) {
String input = in.nextLine();
input = input.replaceAll("[^\\d.]", "");
float i = Float.parseFloat(input);
if (i <= 0.0f) {
System.out.println("Please enter a number greater than 0.");
}
System.out.println(i);
}
That code will properly accept you entering things like "123,232 . 03 445". Use any of the solutions in place of my replaceAll and it will work.
Scanner.nextFloat will split your input automatically based on whitespace. Scanner can take a delimiter when you construct it (for example, new Scanner(System.in, ",./ ") will delimit on ,, ., /, and )" The default constructor, new Scanner(System.in), automatically delimits based on whitespace.
I guess you're using the first argument from you main method. If you main method looks somehow like this:
public static void main(String[] args){
System.out.println(deleteSpaces(args[0]);
}
Your problem is, that spaces separate the arguments that get handed to your main method. So running you class like this:
java MyNumberConverter 22.2 33
The first argument arg[0] is "22.2" and the second arg[1] "33"
But like other have suggested, String.replace is a better way of doing this anyway.