Put multiple values in arrays from cvs file

Put multiple values in arrays from cvs file - java

Values are separated with comma, following format:
Country,Timescale,Vendor,Units
Africa,2010 Q3,Fujitsu Siemens,2924.742632
I want to make array for every value. How can I do it?
I tried many things, code below:
BufferedReader br = null;
String line = "";
String cvsSplitBy = ",";
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
String[] country = line.split(cvsSplitBy);
country[0] +=",";
String[] Krajina = country[0].split(",");

What you appear to be talking about is utilizing what is otherwise known as Parallel Arrays and is generally a bad idea in this particular use case since it can be prone to OutOfBounds Exceptions later on down the road. A better solution would be to utilize a Two Dimensional (2D) Array or an ArrayList. Never the less, parallel arrays it is:
You say an array size of 30, well maybe today but tomorrow it might be 25 or 40 so in order to size your Arrays to hold the file data you will need to know how many lines of that actual raw data is contained within the CSV file (excluding Header, possible comments, and possible blank lines). The easiest way would be to just dump everything into separate ArrayList's and then convert them to their respective arrays later on be it String, int's, long's, double, whatever.
Counting file lines first so as to initialize Arrays:
One line of code can give you the number of lines contained within a supplied text file:
long count = Files.lines(Paths.get("C:\\MyDataFiles\\DataFile.csv")).count();
In reality however, on its own the above code line does need to be enclosed within a try/catch block in case of a IO Exception so there is a wee bit more code than a single line. For a simple use case where the CSV file contains a Header Line and no Comment or Blank lines this could be all you need since all you would need to do is subtract one to eliminate the Header Line from the overall count for initializing your Arrays. Another minor issue with the above one-liner is the fact that it provides a count value in a Long Integer (long) data type. This is no good since Java Arrays will only accept Integer (int) values for initialization therefore the value obtained will need to be cast to int, for example:
String[] countries = new String[(int) count];
and this is only good if count does not exceed the Integer.MAX_VALUE - 2 (2147483645). That's a lot of array elements so in general you wouldn't really have a problem with this but if are dealing with extremely large array initializations then you will also need to consider JVM Memory and running out of it.
Sometimes it's just nice to have a method that could be used for a multitude of different situations when getting the total number of raw data lines from a CSV (or other) text file. The provided method below is obviously more than a single line of code but it does provide a little more flexibility towards what to count in a file. As mentioned earlier there is the possibility of a Header Line. A Header line is very common in CSV files and it is usually the first line within the file but this may not always be the case. The Header line could be preceded with a Comment Line of even a Blank Line. The Header line however should always be the first line before the raw data lines. Here is an example of a possible CSV file:
Example CSV file contents:
# Units Summary Report
# End Date: May 27, 2019
Country,TimeScale,Vendor,Units
Czech Republic,2010 Q3,Fujitsu Siemens,2924.742032
Slovakia,2010 Q4,Dell,2525r.011404
Slovakia,2010 Q4,Lenovo,2648.973238
Czech Republic,2010 Q3,ASUS,1323.507139
Czech Republic,2010 Q4,Apple,266.7584542
The first two lines are Comment Lines and Comment Lines always begin with either a Hash (#) character or a Semicolon (;). These lines are to be ignored when read.
The third line is a Blank Line and serves absolutely no purpose other than aesthetics (easier on the eyes I suppose). These lines are also to be ignored.
The fourth line which is directly above the raw data lines is the Header Line. This line may or may not be contained within a CSV file. Its purpose is to provide the Column Names for the data records contained on each raw data line. This line can be read (if it exists) to acquire record field (column) names.
The remaining lines within the CSV file are Raw Data Lines otherwise considered data records. Each line is a complete record and each delimited element of that record is considered a data field value. These are the lines you want to count so as to initialize your different Arrays. Here is a method that allows you to do that:
The fileLinesCount() Method:
/**
* Counts the number of lines within the supplied Text file. Which lines are
* counted depends upon the optional arguments supplied. By default, all
* file lines are counted.<br><br>
*
* #param filePath (String) The file path and name of file (with
* extension) to count lines in.<br>
*
* #param countOptions (Optional - Boolean) Three Optional Parameters. If an
* optional argument is provided then the preceeding
* optional argument MUST also be provided (be it true
* or false):<pre>
*
* ignoreHeader - Default is false. If true is passed then a value of
* one (1) is subtracted from the sum of lines detected.
* You must know for a fact that a header exists before
* passing <b>true</b> to this optional parameter.
*
* ignoreComments - Default is false. If true is passed then comment lines
* are ignored from the count. Only file lines (after being
* trimmed) which <b>start with</b> either a semicolon (;) or a
* hash (#) character are considered a comment line. These
* characters are typical for comment lines in CSV files and
* many other text file formats.
*
* ignoreBlanks - Default is false. If true is passed then file lines
* which contain nothing after they are trimmed is ignored
* in the count.
*
* <u>When a line is Trimmed:</u>
* If the String_Object represents an empty character
* sequence then reference to this String_Object is
* returned. If both the first & last character of the
* String_Object have codes greater than unicode ‘\u0020’
* (the space character) then reference to this String_Object
* is returned. When there is no character with a code
* greater than unicode ‘\u0020’ (the space character)
* then an empty string is created and returned.
*
* As an example, a trimmed line removes leading and
* trailing whitespaces, tabs, Carriage Returns, and
* Line Feeds.</pre>
*
* #return (Long) The number of lines contained within the supplied text
* file.
*/
public long fileLinesCount(final String filePath, final boolean... countOptions) {
// Defaults for optional parameters.
final boolean ignoreHeader = (countOptions.length >= 1 ? countOptions[0] : false);
// Only strings in lines that start with ';' or '#' are considered comments.
final boolean ignoreComments = (countOptions.length >= 2 ? countOptions[1] : false);
// All lines that when trimmed contain nothing (null string).
final boolean ignoreBlanks = (countOptions.length >= 3 ? countOptions[2] : false);
long count = 0; // lines Count variable to hold the number of lines.
// Gather supplied arguments for optional parameters
try {
if (ignoreBlanks) {
// Using lambda along with Ternary Operator
count = Files.lines(Paths.get(filePath)).filter(line -> (ignoreComments
? (!line.trim().startsWith(";") && !line.trim().startsWith("#"))
&& line.trim().length() > 0 : line.trim().length() > 0)).count();
if (ignoreHeader) {
count--;
}
return count;
}
if (ignoreComments) {
// Using lambda along with Ternary Operator
count = Files.lines(Paths.get(filePath)).filter(line -> (ignoreBlanks ? line.trim().length() > 0
&& (!line.trim().startsWith(";") && !line.trim().startsWith("#"))
: (!line.trim().startsWith(";") && !line.trim().startsWith("#")))).count();
if (ignoreHeader) {
count--;
}
return count;
}
else {
count = Files.lines(Paths.get(filePath)).count();
if (ignoreHeader) {
count--;
}
}
}
catch (IOException ex) {
Logger.getLogger("fileLinesCount() Method Error!").log(Level.SEVERE, null, ex);
}
return count;
}
Filling the Parallel Arrays:
Now it time to create a method to fill the desired Arrays and by looking at the data file it look like you need three String type arrays and one double type Array. You may want to make these instance or Class member variables:
// Instance (Class Member) variables:
String[] country;
String[] timeScale;
String[] vendor;
double[] units;
then for filling these arrays we would use an method like this:
/**
* Fills the 4 class member array variables country[], timeScale[], vendor[],
* and units[] with data obtained from the supplied CSV data file.<br><br>
*
* #param filePath (String) Full Path and file name of the CSV data file.<br>
*
* #param fileHasHeader (Boolean) Either true or false. Supply true if the CSV
* file does contain a Header and false if it does not.
*/
public void fillDataArrays(String filePath, boolean fileHasHeader) {
long dataCount = fileLinesCount(filePath, fileHasHeader, true, true);
/* Java Arrays will not accept the long data type for sizing
therefore we cast to int. */
country = new String[(int) dataCount];
timeScale = new String[(int) dataCount];
vendor = new String[(int) dataCount];
units = new double[(int) dataCount];
int lineCounter = 0; // counts all lines contained within the supplied text file
try (Scanner reader = new Scanner(new File("DataFile.txt"))) {
int indexCounter = 0;
while (reader.hasNextLine()) {
lineCounter++;
String line = reader.nextLine().trim();
// Skip comment and blank file lines.
if (line.startsWith(";") || line.startsWith("#") || line.equals("")) {
continue;
}
if (indexCounter == 0 && fileHasHeader) {
/* Since we are skipping the header right away we
now no longer need the fileHasHeader flag. */
fileHasHeader = false;
continue; // Skip the first line of data since it's a header
}
/* Split the raw data line based on a comma (,) delimiter.
The Regular Expression (\\s{0,},\\s{0,}") ensures that
it doesn't matter how many spaces (if any at all) are
before OR after the comma, the split removes those
unwanted spaces, even tabs are removed if any.
*/
String[] splitLine = line.split("\\s{0,},\\s{0,}");
country[indexCounter] = splitLine[0];
timeScale[indexCounter] = splitLine[1];
vendor[indexCounter] = splitLine[2];
/* The Regular Expression ("-?\\d+(\\.\\d+)?") below ensures
that the value contained within what it to be the Units
element of the split array is actually a string representation
of a signed or unsigned integer or double/float numerical value.
*/
if (splitLine[3].matches("-?\\d+(\\.\\d+)?")) {
units[indexCounter] = Double.parseDouble(splitLine[3]);
}
else {
JOptionPane.showMessageDialog(this, "<html>An invalid Units value (<b><font color=blue>" +
splitLine[3] + "</font></b>) has been detected<br>in data file line number <b><font " +
"color=red>" + lineCounter + "</font></b>. A value of <b>0.0</b> has been applied<br>to " +
"the Units Array to replace the data provided on the data<br>line which consists of: " +
"<br><br><b><center>" + line + "</center></b>.", "Invalid Units Value Detected!",
JOptionPane.WARNING_MESSAGE);
units[indexCounter] = 0.0d;
}
indexCounter++;
}
}
catch (IOException ex) {
Logger.getLogger("fillDataArrays() ethod Error!").log(Level.SEVERE, null, ex);
}
}
To get the ball rolling just run the following code:
/// Fill the Arrays with data.
fillDataArrays("DataFile.txt", true);
// Display the filled Arrays.
System.out.println(Arrays.toString(country));
System.out.println(Arrays.toString(timeScale));
System.out.println(Arrays.toString(vendor));
System.out.println(Arrays.toString(units));

You have to define your arrays before processing your file :
String[] country = new String[30];
String[] timescale = new String[30];
String[] vendor = new String[30];
String[] units = new String[30];
And while reading lines you have to put the values in the defined arrays with the same index, to keep the index use another variable and increase it at every iteration. It should look like this:
int index = 0;
while (true) {
if (!((line = br.readLine()) != null)) break;
String[] splitted = line.split(",");
country[index] = splitted[0];
timescale[index] = splitted[1];
vendor[index] = splitted[2];
units[index] = splitted[3];
index++;
}
Since your csv would probably include headers in it, you may also want to skip the first line too.

Always try to use try-with-resources when using I/O
The following code should help you out:
String line = "";
String cvsSplitBy = ",";
List<String> countries = new ArrayList<>();
List<String> timeScales = new ArrayList<>();
List<String> vendors = new ArrayList<>();
List<String> units = new ArrayList<>();
//use try-with resources
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
String[] parts = line.split(cvsSplitBy);
countries.add(parts[0]);
timeScales.add(parts[1]);
vendors.add(parts[2]);
units.add(parts[3]);
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
for (String country: countries) {
System.out.println(country);
}
for (String scale: timeScales) {
System.out.println(scale);
}
for (String vendor: vendors) {
System.out.println(vendor);
}
for (String unit: units) {
System.out.println(unit);
}

Related

How do I count word occurrences in a csv file?

I have a CSV file that I need to read and display the number of occurrences of each word, the application should only count words that have more than one letter and not alphanumerical also turned to lowercase.
This is what I have right now and I'm stuck at this and have no ideas where to go from this.
public static void countWordNumber() throws IOException, CsvException
String pathFile1 = "src/main/resources/Documents/Example.csv"
{
CSVReader reader = new CSVReaderBuilder(new FileReader(pathFile1)).withSkipLines(1).build();
Map<String, Integer> frequency = new HashMap<>();
String[] line;
while ((line = reader.readNext()) != null) {
String words = line[1];
words = words.replaceAll("\\p{Punct}", " ").trim();
words = words.replaceAll("\\s{2}", " ");
words = words.toLowerCase();
if (frequency.containsKey(words)) {
frequency.put(words, frequency.get(words) + 1);
} else {
frequency.put(words, 0);
}
}
}
I am trying to read the second index in the array list of the csv, which is line[1] , This is where the text of the document is located.
I have replaced all punctuation with spaces and trimmed it, also if there are more than 2 spaces I have replaced those with 1 and made it lowercase.
The output I am trying to achieve is:
Title of document: XXXX
Word: is, Value: 3
EDIT: This is an example of my input file.
title,text,date
exampleTitle,This is is is an example example, April 2022

Your solution does not look that bad. But for initialization i would replace
frequency.put(words, 0);
with
frequency.put(words, 1);
Since I am mising your Input file i created a dummy that works fine.
Map<String, Integer> frequency = new HashMap<>();
List<String> csvSimulation = new ArrayList<String>();
csvSimulation.add("test");
csvSimulation.add( "marvin");
csvSimulation.add("aaaaa");
csvSimulation.add("nothing");
csvSimulation.add("test");
csvSimulation.add("test");
csvSimulation.add("aaaaa");
csvSimulation.add("stackoverflow");
csvSimulation.add("test");
csvSimulation.add("bread");
Iterator<String> iterator = csvSimulation.iterator();
while(iterator.hasNext()){
String words = iterator.next();
words = words.toLowerCase();
if (frequency.containsKey(words)) {
frequency.put(words, frequency.get(words) + 1);
} else {
frequency.put(words, 1);
}
}
System.out.println(frequency);
Are you sure that accessing line[1] in an loop while iteration is correct? The correct reading of the input seems to be the problem for me. Without seeing your CSV file i however cant help you any further.
EDIT:
with the provided csv data an adjustemt to your Code like this would solve your Problem
.....
.....
while ((line = reader.readNext()) != null) {
String words = line[1];
words = words.replaceAll("\\p{Punct}", " ").trim();
words = words.replaceAll("\\s{2}", " ");
words = words.toLowerCase();
String[] singleWords = words.split(" ");
for(int i = 0 ; i < singleWords.length; i++) {
String currentWord = singleWords[i];
if (frequency.containsKey(currentWord)) {
frequency.put(currentWord, frequency.get(currentWord) + 1);
} else {
frequency.put(currentWord, 1);
}
}
}
System.out.println("Word: is, Value: " + frequency.get("is"));

You can use a regex match to verify that words fits your criteria before adding it to your HashMap, like so:
if (words.matches("[a-z]{2,}"))
[a-z] specifies only lowercase alpha chars
{2,} specifies "Minimum of 2 occurrences, maximum of <undefined>"
Though, given you're converting punctuation to spaces, it sounds like you could have multiple words in line[1]. If you want to gather counts of multiple words across multiple lines, then you may want to split words on the space char, like so:
for (String word : words.split(" ")) {
if (word.matches("[a-z]{2,}")) {
// Then use your code for checking if frequency contains the term,
// but use `word` instead of `words`
}
}

Just another twist on things:
Since it's been established (by OP) that the CSV file consists of Title, Text, Date data, it can be assumed that every data line of that file is delimited with the typical comma (,) and, that each line of that CSV data file (other than the Header line) can potentially contain a different Title.
Yet, the established desired output (by OP) is:
Title of document: exampleTitle
Word: is, Value: 3
Word: example, Value: 2
Let's change this output so that things are wee more pleasing to the eye:
-------------------------------
Title of document: exampleTitle
-------------------------------
Word Value
=====================
an 1
is 3
example 2
this 1
=====================
Based on this information, it only seems logical that because each file data line contains a Title we need to process and store word occurrences from column 2 only for that data line. With each word being processed we need to maintain the origin of that word so that we know what Title it came from rather than just carrying out a simple occurrence count of all column 2 words in all rows (file lines). This if course then means that the KEY for the Map which would be a word must also hold the origin of the word. This isn't a big deal but a little more thought will be needed when it comes time to pull relevant data from the Map so to display it properly in the Console Window or, to use for other purposes within the application. What we could do is utilize a List Interface of Map, for example:
List<Map<String, Integer>> mapsList = new ArrayList<>();
By doing this we can place each file line processed into a Map and then add that map to a List named mapsList.
The provided text file contents example leaves little to be desired, never the less, it does help to some extent in the sense that, it comforts me to know that, yes, there is a Header line in the CSV data file and the fact that the typical comma is used as the delimiter in the file .... and that's all. So more questions come to mind:
Can more than one file data line contain the same Title (Origin)?
If "yes", then what do you want to do with the word
occurrences for the other line(s)?
Do you want to add them to the first established
Title?
Or do you want to start a new modified Title? (must be a different title name in this case)
Will the second column (the text column) ever possibly contain a
comma delimiter? Commas are very common in text of reasonable
length.
If "yes", is the text in column 2 enclosed in quotation marks?
How long can the text in column 2 possibly get to? (just curious - it's actually irrelevant).
Will there ever be different CSV files to get Word Occurrences from
which contain more than three columns?
Will there ever be a time where Word Occurrences will need to be
derived from more than one column on any CSV file data line?
The method provided below (in the runnable application) named getWordOccurrencesFromCSV() is flexible enough to basically cover all the questions above. This method also makes use of two other helper methods named getWordOccurrences() and combineSameOrigins() to get the job done. These helper methods can be used on their own for other situations if so desired although the combineSameOrigins() method is uniquely designed for the getWordOccurrencesFromCSV() method. The startApp() method gets the ball rolling and displays the generated List of Maps into the console Window.
Here is the runnable code (be sure to read ALL the comments in code):
package so_demo_hybridize;
public class SO_Demo_hybridize {
private final java.util.Scanner userInput = new java.util.Scanner(System.in);
public static void main(String[] args) {
// Started this way to avoid the need for statics.
new SO_Demo_hybridize().startApp(args);
}
private void startApp(String[] args) {
String ls = System.lineSeparator();
String filePath = "DemoCSV.csv"; //"DemoCSV.csv";
/* Retrieve the neccessary data from the supplied CSV
file and place it into a List of Maps:
getWordOccurrencesFromCSV() Parameters Info:
--------------------------------------------
filePath: Path and file name of the CSV file.
"," : The delimiter used in the CSV file.
1 : The literal data column which will hold the Origin String.
2 : The literal data column which will hold text of words to get occurrences from.
1 : The minimum number of occurrences needed to save. */
java.util.List<java.util.Map<String, Integer>> mapsList
= getWordOccurrencesFromCSV(filePath, ",", 1, new int[]{2}, 1);
/* Display what is desired from the gathered file data now
contained within the Maps held within the 'mapsList' List.
Now that you have this List of Maps, you can do whatever
and display whatever you like with the data. */
System.out.println("Word Occurrences In CSV File" + ls
+ "============================" + ls);
for (java.util.Map<String, Integer> maps : mapsList) {
String mapTitle = "";
int cnt = 0;
for (java.util.Map.Entry<String, Integer> entry : maps.entrySet()) {
/* Because the Origin is attached to the Map Key (a word)
we need to split it off. Note the special delimiter. */
String[] keyParts = entry.getKey().split(":\\|:");
String wordOrigin = keyParts[0];
String word = keyParts[1];
if (mapTitle.isEmpty()) {
mapTitle = "Title of document: " + wordOrigin;
// The Title underline...
String underLine = String.join("", java.util.Collections.nCopies(mapTitle.length(), "-"));
System.out.println(underLine + ls + mapTitle + ls + underLine);
// Disaplay a Header and underline
String mapHeader = "Words Values" + ls
+ "=====================";
System.out.println(mapHeader);
cnt++;
}
System.out.println(String.format("%-15s%-6s", word, entry.getValue()));
}
if (cnt > 0) {
// The underline for the Word Occurences table displayed.
System.out.println("=====================");
System.out.println();
}
}
}
/**
* Retrieves and returns a List of Word Occurrences Maps ({#code Map<String,
* Integer>}) from each CSV data line from the specified column. The each
* word is considered the KEY and the number of Occurrences of that word
* would be VALUE. Each KEY in each Map is also prefixed with the Origin
* String of that word delimited with ":|:". READ THIS DOCUMENT IN FULL!
*
* #param csvFilePath (String) The full path and file name of the
* CSV file to process.<br>
*
* #param csvDelimiter (String) The delimiter used within the CSV
* File. This can be and single character
* string including the whitespace. Although
* not mandatory, adding whitespaces to your
* CSV Delimiter argument should be
* discouraged.<br>
*
* #param originFromColumn (Integer) The The CSV File line data column
* literal number where the Origin for the
* evaluated occurrences will be related to. By
* <b><i>literal</i></b> we mean the actual
* column number, <u>not</u> the column Index
* Value. Whatever literal column number is
* supplied, the data within that column should
* be Unique to all other lines within the CSV
* data file. In most CSV files, records in that
* file (each line) usually contains one column
* that contains a Unique ID of some sort which
* identifies that line as a separate record. It
* would be this column which would make a good
* candidate to use as the <b><i>origin</i></b>
* for the <i>Word Occurrences</i> about to be
* generated from the line (record) column
* specified from the argument supplied to the
* 'occurrencesFromColumn' parameter.<br><br>
*
* If null is supplied to this parameter then
* Column one (1) (index 0) of every data line
* will be assumed to contain the Origin data
* string.<br>
*
* #param occurrencesFromColumn (int[] Array) Because this method can gather
* Word Occurrences from 1 <b><i>or
* more</i></b> columns within any given CSV
* file data line, the literal column number(s)
* must be supplied within an <b>int[]</b> array.
* The objective of this method is to obviously
* collect Word Occurrences contained within
* text that is mounted within at least one
* column of any CSV file data line, therefore,
* the literal column number(s) from where the
* Words to process are located need to be suppled.
* By <b><i>literal</i></b> we mean the actual
* column number, <u>not</u> the column Index
* Value. The desired column number (or column
* numbers) can be supplied to this parameter
* in this fashion: <b><i>new int[]{3}</i></b>
* OR <b><i>new int[]{3,5,6}</i></b>.<br><br>
* All words processed, regardless of what
* columns they come from, will all fall under
* the same Origin String.<br><br>
*
* Null <b><u>can not</u></b> be supplied as an
* argument to this parameter.<br>
*
* #param minWordCountRequired (Integer) If any integer value less than 2
* is supplied then all words within the
* supplied Input String will be placed into
* the map regardless of how many there are. If
* however you only want words where there are
* two (2) or more within the Input String then
* supply 2 as an argument to this parameter.
* If you only want words where there are three
* (3) or more within the Input String then
* supply 3 as an argument to this parameter...
* and so on.<br><br>
*
* If null is supplied to this parameter then a
* default of one (1) will be assumed.<br>
*
* #param options (Optional - Two parameters both boolean):<pre>
*
* noDuplicateOrigins - Optional - Default is true. By default, duplicate
* Origins are not permitted. This means that no two
* Maps within the List can contain the same Origin
* for word occurrences. Obviously, it is possible
* for a CSV file to contain data lines which holds
* duplicate Origins but in this case the words in
* the duplicate Origin are added to the Map within
* the List which already contains that Origin and
* if any word in the new duplicate Origin Map is
* found to be in the already stored Map with the
* original Origin then the occurrences count for
* the word in the new Map is added to the same word
* occurrences count of the already stored Map.
*
* If boolean <b>false</b> is optionally supplied to this
* parameter then a duplicate Map with the duplicate
* Origin is added to the List of Maps.
*
* Null can be supplied to this optional parameter.
* You can not just supply a blank comma.
*
* noNumerics - Optional - Default is false. By default, this
* method considers numbers (either integer or
* floating point) as words therefore this
* parameter would be considered as always
* false. If however you don't want then
* occurrences of numbers to be placed into the
* returned Map then you can optionally supply
* an argument of boolean true here.
*
* If null or a boolean value is supplied to this
* optional parameter then null or a boolean value
* <u>must</u> be supplied to the noDuplicateOrigins
* parameter.</pre>
*
* #return ({#code List<Map<String, Integer>>})
*/
#SuppressWarnings("CallToPrintStackTrace")
public java.util.List<java.util.Map<String, Integer>> getWordOccurrencesFromCSV(
String csvFilePath, String csvDelimiter, Integer originFromColumn,
int[] occurrencesFromColumn, Integer minWordCountRequired, Boolean... options) {
String ls = System.lineSeparator();
// Handle invalid arguments to this method...
if (!new java.io.File(csvFilePath).exists()) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The file indicated below can not be found!" + ls
+ csvFilePath + ls);
}
else if (csvFilePath == null || csvFilePath.isEmpty()) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The csvFilePath parameter can not be supplied "
+ "null or a null string!" + ls);
}
else if (csvDelimiter == null || csvDelimiter.isEmpty()) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The csvDelimiter parameter can not be supplied "
+ "null or a null string!" + ls);
}
if (originFromColumn == null || originFromColumn < 1) {
originFromColumn = 1;
}
for (int i = 0; i < occurrencesFromColumn.length; i++) {
if (occurrencesFromColumn[i] == originFromColumn) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The 'occurrencesFromColumn' argument ("
+ occurrencesFromColumn[i] + ")" + ls + "can not be the same column "
+ "as the 'originFromColumn' argument (" + originFromColumn
+ ")!" + ls);
}
else if (occurrencesFromColumn[i] < 1) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The argument for the occurrencesFromColumn "
+ "parameter can not be less than 1!" + ls);
}
}
if (minWordCountRequired == null || minWordCountRequired < 2) {
minWordCountRequired = 1;
}
final int minWrdCnt = minWordCountRequired;
// Take care of the Optional Parameters
boolean noDuplicateOrigins = true;
boolean noNumerics = false;
if (options != null && options.length > 0) {
if (options[0] != null && options.length >= 1) {
noDuplicateOrigins = options[0];
}
if (options[1] != null && options.length >= 2) {
noNumerics = options[1];
}
}
java.util.List<java.util.Map<String, Integer>> mapsList = new java.util.ArrayList<>();
// 'Try With Resources' is used here to auto-close file and free resources.
try (java.util.Scanner reader = new java.util.Scanner(new java.io.FileReader(csvFilePath))) {
String line = reader.nextLine(); // Skip the Header line (first line in file).
String origin = "", date = "";
java.util.Map<String, Integer> map;
while (reader.hasNextLine()) {
line = reader.nextLine().trim();
// Skip blank lines (if any)
if (line.isEmpty()) {
continue;
}
// Get columnar data from data line
// If there are no quotation marks in data line.
String regex = "\\s*\\" + csvDelimiter + "\\s*";
/* If there are quotation marks in data line and they are
actually balanced. If they're not balanced the we obviously
use the regular expression above. The regex below ignores
the supplied delimiter contained between quotation marks. */
if (line.contains("\"") && line.replaceAll("[^\"]", "").length() % 2 == 0) {
regex = "\\s*\\" + csvDelimiter + "\\s*(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)";
}
String[] csvColumnStrings = line.split(regex);
// Acquire the Origin String
origin = csvColumnStrings[originFromColumn - 1];
// Get the Word Occurrences from the provided column number(s)...
for (int i = 0; i < occurrencesFromColumn.length; i++) {
/* Acquire the String to get Word Occurrences from
and remove any punctuation characters from it. */
line = csvColumnStrings[occurrencesFromColumn[i] - 1];
line = line.replaceAll("\\p{Punct}", "").replaceAll("\\s+", " ").toLowerCase();
// Get Word Occurrences...
map = getWordOccurrences(origin, line, noNumerics);
/* Has same Origin been processed before?
If so, do we add this one to the original? */
if (noDuplicateOrigins || i > 0) {
if (combineSameOrigins(mapsList, map) > 0) {
continue;
}
}
mapsList.add(map);
}
}
}
catch (java.io.FileNotFoundException ex) {
ex.printStackTrace();
}
/* Remove any words from all the Maps within the List that
does not meet our occurrences minimum count argument
supplied to the 'minWordCountRequired' parameter.
(Java8+ needed) */
for (java.util.Map<String, Integer> mapInList : mapsList) {
mapInList.entrySet().removeIf(e -> e.getValue() < minWrdCnt);
}
return mapsList; // Return the generated List of Maps
}
/**
* This method will go through all Maps contained within the supplied List of
* Maps and see if the Origin String within the supplied Map already exists
* within a Listed Map. If it does then those words within the Supplied Map
* are added to the Map within the List is they don't already exist there.
* If any words from the Supplied Map does exist within the Listed Map then
* only the count values from those words are summed to the words within the
* Listed Map.<br>
*
* #param list ({#code List of Map<String, Integer>}) The List Interface which
* contains all the Maps of Word Occurrences or different Origins.<br>
*
* #param suppliedMap ({#code Map<String, Integer> Map}) The Map to check
* against all Maps contained within the List of Maps for duplicate Origins.
*
* #return (int) The number of words added to any Map contained within the
* List which contains the same Origin.
*/
public int combineSameOrigins(java.util.List<java.util.Map<String, Integer>> list,
java.util.Map<String, Integer> suppliedMap) {
int wrdCnt = 0;
String newOrigin = suppliedMap.keySet().stream().findFirst().get().split(":\\|:")[0].trim();
String originInListedMap;
for (java.util.Map<String, Integer> mapInList : list) {
originInListedMap = mapInList.keySet().stream().findFirst().get().split(":\\|:")[0].trim();
if (originInListedMap.equals(newOrigin)) {
wrdCnt++;
for (java.util.Map.Entry<String, Integer> suppliedMapEntry : suppliedMap.entrySet()) {
String key = suppliedMapEntry.getKey();
int value = suppliedMapEntry.getValue();
boolean haveIt = false;
for (java.util.Map.Entry<String, Integer> mapInListEntry : mapInList.entrySet()) {
if (mapInListEntry.getKey().equals(key)) {
haveIt = true;
mapInListEntry.setValue(mapInListEntry.getValue() + value);
break;
}
}
if (!haveIt) {
mapInList.put(key, value);
}
}
}
}
return wrdCnt;
}
/**
* Find the Duplicate Words In a String And Count the Number Of Occurrences
* for each of those words. This method will fill and return a Map of all
* the words within the supplied string (as Key) and the number of
* occurrences for each word (as Value).<br><br>
* <p>
* <b>Example to read the returned Map and display in console:</b><pre>
* {#code for (java.util.Map.Entry<String, Integer> entry : map.entrySet()) {
* System.out.println(String.format("%-12s%-4s", entry.getKey(), entry.getValue()));
* } }</pre>
*
* #param origin (String) The UNIQUE origin String of what the word is
* related to. This can be anything as long as this same
* origin string is applied to all the related words of
* the same Input String. A unique ID of some sort or a
* title string of some kind would work fine for
* this.<br>
*
* #param inputString (String) The string to process for word
* occurrences.<br>
*
* #param noNumerics (Optional - Default - false) By default, this method
* considers numbers (either integer or floating point)
* as words therefore this parameter would be considered
* as always false. If however you don't want the
* occurrences of numbers to be placed into the returned
* Map then you can optionally supply an argument of
* boolean true here.<br>
*
* #return ({#code java.util.Map<String, Integer>}) Consisting of individual
* words found within the supplied String as KEY and the number of
* occurrences as VALUE.
*/
public static java.util.Map<String, Integer> getWordOccurrences(String origin,
String inputString, Boolean... noNumerics) {
boolean allowNumbersAsWords = true;
if (noNumerics.length > 0) {
if (noNumerics[0] != null) {
allowNumbersAsWords = !noNumerics[0];
}
}
// Use this to have Words in Ascending order.
java.util.TreeMap<String, Integer> map = new java.util.TreeMap<>();
// Use this to have Words in the order of Insertion (when they're added).
//java.util.LinkedHashMap<String, Integer> map = new java.util.LinkedHashMap<>();
// Use this to have Words in a 'who cares' order.
//java.util.Map<String, Integer> map = new java.util.HashMap<>();
String[] words = inputString.replaceAll("[^A-Za-z0-9' ]", "")
.replaceAll("\\s+", " ").trim().split("\\s+");
for (int i = 0; i < words.length; i++) {
String w = words[i];
if (!allowNumbersAsWords && w.matches("-?\\d+(\\.\\d+)?")) {
continue;
}
if (map.containsKey(origin + ":|:" + w)) {
int cnt = map.get(origin + ":|:" + w);
map.put(origin + ":|:" + w, ++cnt);
}
else {
map.put(origin + ":|:" + w, 1);
}
}
return map;
}
}

Reading/splitting different parts from different lines in a text file in Java

I'm working on this assignment I'm supposed to read from a text file like this...
Student Name: John
Student ID: 12344/19
College: Science
Credits Attempted: 15
Credits Earned: 15
Grade Points: 41.2
Course Code Course Title Credit Grade
COMP1007, Amazing Applications of AI, 2, B
COMP2202, Fund. of Object Oriented Prog., 3, C-
MATH2108, Calculus (2), 3, C-
MATH3340, Discrete Math. for Comp. Sci., 3, B-
STAT2101, Introduction to Statistics, 4, C+
I should read this text file and calculate the GPA of the student and create an output file that should look like this...
Output text file
So basically I'm stuck and I have no idea what I to do...
I know how to read line by line and split a line into different parts, but this doesn't seem to work here since every line is different from the other. For example the first line has two parts, the "Student Name" and the name itself in this case "John". But in line 9, there are four different parts, the course code, course name, credit and grade.
I'm honestly not looking to cheat on the assignment but only to understand it
help :)
Note I can't use Stream or Hashmap or BufferedReader

Each data record in a text file always has a Start and an End. The easiest records are obviously those that are contained on a single delimited line within the text file, where each file line is in fact a record as you can see within a typical CSV format data file. The harder records to read are the Multi-Line records whereas each data record consists of several sequential text file lines but still, there is a Start and a End to each record.
The Start of a record is usually pretty easy to distinguish. For example, in the file example you provided in your post it is obviously any file line that starts with Student Name:.
The End of a record may not always be so easy to determine since many applications do not save fields which contain no data value in order to help increase access speed and reduce file bloat. The thought is "why have a text file full of empty fields" and to be honest, rightly so. I'm not a big fan of text file records anyways since utilizing a database would make far better sense for larger amounts of data. In any case, there will always be a file line that will indicate the Start of a record so it would make sense to read from Start to Start of the next record or in the case of the last record in file, from Start to End Of File (EOF).
Here is an example (read the comments in code):
// System line separator to use in files.
String ls = System.lineSeparator();
/* Array will hold student data: Student Name, Student ID, College,
Credits Attempted, Credits Earned, and finally Grade Points. */
String[] studentData = new String[6];
// String Array to hold Course Table Header Names.
String[] coursesHeader = {"COURSE NO", "COURSE TITLE", "CREDITS", "GRADE"};
// List Interface to hold all the course Data line Arrays for each record
java.util.List<String[]> cousesList = new java.util.ArrayList<>();
// Underlines to be used for Console display and file records
// Under courses Header
String underline1 = "-------------------------------------------------------------";
// Under all the courses
String underline2 = "------------------------------------------------------------------------------------";
/* Read and Write to files using 'Try With Resources' so to
automatically close the reader an writer objects. */
try (Scanner reader = new Scanner(new java.io.File("StudentData.txt"), "UTF-8");
java.io.Writer writer = new java.io.FileWriter("StudentsGPA.txt")) {
// For console display only! [Anything (except caught errors) to Console can be deleted]
System.out.println("The 'StudentsGPA.txt' file will contain:");
System.out.println("======================================");
System.out.println();
// Will hold each line read from the reader
String line = "";
/* Will hold the name for the next record. This would be the record START
but only AFTER the first record has been read. */
String newName = "";
// Start reading the 'StudentData.txt' file (line by line)...
while (reader.hasNextLine()) {
/* If newName is empty then we're on our first record or
there is only one record in file. */
if (newName.isEmpty()) {
line = reader.nextLine(); // read in a file line...
}
else {
/* newName contains a name so we must have bumped into
the START of a new record during processing of the
previous record. We aleady now have the first line
of this new record (which is the student's name line)
currently held in the 'newName' variable so we just
make 'line' equal what is in the 'newName' variable
and carry on processing the data as normal. in essance,
we simply skipped a read because we've already read it
earlier when processing the previous record. */
line = newName;
// Clear this variable in preparation for another record START.
newName = "";
}
/* Skip comment lines (lines that start with a semicolon (;)
or a hash mark (#). Also skip any blank lines. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
continue;
}
/* Does this file line start with 'Student Name:'? If so then
this is a record START, let's process this record. If not
then just keep reading the file. */
if (line.startsWith("Student Name:")) {
/* Let's put the student name into the studentData array at
index 0. If it is detected that there has been no name
applied for some reason then we place "N/A" as the name.
We use a Ternary Operator for this. So, "N/A" will be a
Default if there is not name. This will be typical for
the other portions of student data. */
studentData[0] = line.isEmpty() ? "N/A" : line.split("\\s*:\\s*")[1].trim();
/* Let's keep reading the file from this point on and retrieve
the other bits of student data to fill the studentData[]
Array... */
for (int i = 1; i < 6; i++) {
line = reader.nextLine().trim();
/* If we encounter a comment line or a blank line then let's
just skip past it. We don't want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
i--;
continue;
}
/* Store the other portions of student data into the
studentData Array using "N/A" as a default should
any student data field contain nothing. */
studentData[i] = line.isEmpty() ? "N/A" : line.split("\\s*:\\s*")[1].trim();
}
// The current Student's Courses...
/* Clear the List Interface object in preparation for new
Courses from this particular record. */
cousesList.clear();
// Read past the courses header line...We don't want it.
reader.nextLine();
// Get the courses data (line by line)...
while (reader.hasNextLine()) {
line = reader.nextLine().trim();
/* Again, if we encounter a comment line or a blank line
in this section then let's just skip past it. We don't
want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
continue;
}
/* At this point, if we have read in a line that starts
with 'Student Name:' then we just hit the START of a
NEW Record! This then means that the record we're
currently working on is now finished. Let's store this
file line into the 'newRecord' variable and then break
out of this current record read. */
if (line.startsWith("Student Name:")) {
newName = line;
break;
}
/* Well, we haven't reached the START of a New Record yet
so let's keep creating the courses list (line by line).
Break the read in course line into a String[] array.
We use the String#split() method for this with a small
Regular Expression (regex) to split each line based on
comma delimiters no matter how the delimiter spacing
might be (ex: "," " ," " , " or even " , "). */
String[] coursesData = line.split("\\s*,\\s*");
/* Add this above newly created coursesData string array
to the list. */
cousesList.add(coursesData);
}
/* Write (append) this current record to new file. The String#format()
method is used here to save the desired data into the 'StudentGPA.txt'
file in a table style format. */
// Student Data...
writer.append(String.format("%-12s: %-25s", "ID", studentData[1])).append(ls);
writer.append(String.format("%-12s: %-25s", "Name", studentData[0])).append(ls);
writer.append(String.format("%-12s: %-25s", "College", studentData[2])).append(ls);
// Student Courses...
// The Header line
writer.append(String.format("%-13s %-30s %-10s %-4s", coursesHeader[0],
coursesHeader[1], coursesHeader[2], coursesHeader[3])).append(ls);
// Apply an Underline (underline1) under the header.
writer.append(underline1).append(ls);
// Write the Courses data in a table style format to make the Header format.
for (String[] cData : cousesList) {
writer.append(String.format("%-13s %-33s %-9s %-4s",
cData[0], cData[1], cData[2], cData[3])).append(ls);
}
// Apply an Underline (underline2) under the Courses table.
writer.append(underline2).append(ls);
// Display In Console Window (you can delete this if you want)...
System.out.println(String.format("%-12s: %-25s", "ID", studentData[1]));
System.out.println(String.format("%-12s: %-25s", "Name", studentData[0]));
System.out.println(String.format("%-12s: %-25s", "College", studentData[2]));
System.out.println(String.format("%-13s %-30s %-10s %-4s", coursesHeader[0],
coursesHeader[1], coursesHeader[2], coursesHeader[3]));
System.out.println(underline1);
for (String[] cData : cousesList) {
System.out.println(String.format("%-13s %-33s %-9s %-4s",
cData[0], cData[1], cData[2], cData[3]));
}
System.out.println(underline2);
// The LAST line of each record, the Credits...
// YOU DO THE CALCULATIONS FOR: totalAttemped, semGPA, and cumGPA
String creditsAttempted = studentData[3];
String creditsEarned = studentData[4];
int credAttempted = 0;
int credEarned = 0;
int totalAttempted = 0;
double semGPA = 0.0d;
double cumGPA = 0.0d;
/* Make sure the 'credits attemted' numerical value is in fact
a string representaion of an integer value. if it is then
convert that string numerical value to integer. */
if (creditsAttempted.matches("\\d+")) {
credAttempted = Integer.valueOf(creditsAttempted);
}
/* Make sure the 'credits earned' numerical value is in fact
a string representaion of an integer value. if it is then
convert that string numerical value to integer. */
if (creditsEarned.matches("\\d+")) {
credEarned = Integer.valueOf(creditsEarned);
}
// Build the last record line (the Credits string) with the acquired data.
String creditsString = new StringBuilder("CREDITS: TOTAL.ATTEMPTED ")
.append(totalAttempted).append("? EARNED ").append(credEarned)
.append(" ATTEMPTED ").append(credAttempted).append(" SEM GPA ")
.append(semGPA).append("? CUM GPA ").append(cumGPA).append("?")
.toString();
// Display it to the console Window (you can delete this).
System.out.println(creditsString);
System.out.println();
// Write the built 'credit string' to file which finishes this record.
writer.append(creditsString).append(ls);
writer.append(ls); // Blank Line in preparation for next record.
writer.flush(); // Flush the data buffer - write record to disk NOW.
}
}
}
// Trap Errors...Do whatever you want with these.
catch (FileNotFoundException ex) {
System.err.println("File Not Found!\n" + ex.getMessage());
}
catch (IOException ex) {
System.err.println("IO Error Encountered!\n" + ex.getMessage());
}
Yes, it looks long but if you get rid of all the comments you can see that it really isn't. Don't be afraid to experiment with the code. Make it do what you want.
EDIT: (as per comments)
To place the student info portion of each record into an ArrayList so that you can parse it the way you want:
Where the forloop is located within the example code above for gathering the student info, just change this loop to this code and parse the data the way you want:
// Place this ArrayList declaration underneath the 'underline2' variable declaration:
java.util.ArrayList<String> studentInfo = new java.util.ArrayList<>();
then:
if (line.startsWith("Student Name:")) {
studentInfo.clear();
studentInfo.add(line);
/* Let's keep reading the file from this point on and retrieve
the other bits of student data to fill the studentData[]
Array... */
for (int i = 1; i < 6; i++) {
line = reader.nextLine().trim();
/* If we encounter a comment line or a blank line then let's
just skip past it. We don't want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
i--;
continue;
}
studentInfo.add(line);
}
// .................................................
// .... The rest of the code for this `if` block ...
// .................................................
}
You will of course need to change the code after this loop to properly represent this ArrayList.

OK, so here's how you do it ...
You read in all of the file and store each line in a List<String>
For the first 8 lines you process each one in a separate way. You can even write a separate function to parse the necessary info out of every line for lines 0-7
All the remaining lines have identical structure. Therefore, you can process them all in the same way to parse out and then process the necessary data.
And a comment to this answer if something is unclear and I'll clarify.

Reading in stops after handling exception inside loop

I'd have a pretty strange question here. After throwing and handling my ReaderException exception my read-in still stops at the first occurence of the exception. Can somebody please explain why is this happening?
Input:
Hotel Paradis;Strada Ciocarliei, Cluj-Napoca 400124;46.779862;23.611739;7;200;8;250;1;400
Hotel Sunny Hill;Strada Fagetului 31A, Cluj-Napoca 400497;46.716030;23.573740;4;150;6;190
Golden Tulip Ana Dome;Strada Observatorului 129, Cluj-Napoca 400352;46.751989;23.576580;0;330;0;350;0;600
Code:
public HotelDescriptor readLine(final String line) throws ReaderException {
System.out.println(line);
String info[] = line.split(";");
for (String i:info)
System.out.println(i);
String tempname = info[0];
String tempaddress = info[1];
float templatitudeh = Float.parseFloat(info[2]);
float templongitudeh = Float.parseFloat(info[3]);
int singleroom = Integer.parseInt(info[4]);
int singleprice = Integer.parseInt(info[5]);
int doubleroom = Integer.parseInt(info[6]);
int doubleprice = Integer.parseInt(info[7]);
int suiteroom = Integer.parseInt(info[8]);
int suiteprice = Integer.parseInt(info[9]);
Hotel tempHotel = new Hotel(tempname, tempaddress, templatitudeh, templongitudeh, singleroom, singleprice, doubleroom, doubleprice, suiteroom, suiteprice);
System.out.println(tempHotel.getName());
return tempHotel;
}
public List<HotelDescriptor> readFile(final String hotels) {
try (BufferedReader buff = new BufferedReader(new FileReader(hotels))) {
String line = "";
while ((line = buff.readLine() )!= null) {try {
hotelData.add(readLine(line));
} catch (ReaderException e){
e.printStackTrace();
} catch (ArrayIndexOutOfBoundsException ex){
ex.printStackTrace();
}
//line = buff.readLine();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return hotelData;
}

I take it that hotelData is declared as a Class field (class global).
When reading in a text file you should take into consideration some anomalies that can happen (or not). Simple steps can be taken to ensure that reading of that text file will be relatively successful. If your application is creating the text file then your success rate raises considerably since you can control how it is written but, if your application is not creating the text file or the text file is compiled from remote sources then the success rate can be reduced unless steps are taken to ensure expected results.
In my opinion:
A text file should be identifiable so as to ensure that the proper
text file is actually being read to process. If the text data is from
a CSV file then a CSV Header line should be the very first line
within the file and a read of this line should be done and compared
to so as to verify that the correct file is being accessed. This is
especially true if the file is to be selectable by any number of
Users (perhaps by way of a file chooser). If a File Identifier
(or Descriptor) line does not exist within your text file as the first line of
the file then perhaps you should consider using one even if it is
considered a Comment Line where the line might start with perhaps
a semi-colon (;) as the first character of the line. Anything
that can identify the file as being the correct file to process.
Blank lines and any lines deemed to be Comment Lines should be
ignored. This includes any file lines known not to be actual Data
Lines whatever they may be. In general, a couple lines of code
consisting of an if statement along with a few conditions can
take care of this situation.
Never count on actual Data Lines (the lines of data you will be
processing) to be holding all the required data expected. This is
especially true when manipulating split delimited data with
methods such as Integer.parseInt() or Float.parseFloat() as a
mere examples. This is actually the biggest problem in your particular
situation. Take note of the example data lines you have provided
within your post. The first line consists of 10 delimited pieces of
data, the second data line consists of 8 delimited pieces of
data, and the third line consists of again, 10 pieces of delimited
data. It is the second data line that is the issue here. When
this line is split the result will be an Array (info[]) which
will hold 8 elements (index 0 to 7) yet the readLine() method is
expecting to always deal with an array consisting of 10 elements
(index 0 to 9). While processing the second data line, guess what
happens when the code line int suiteroom = Integer.parseInt(info[8]); is hit. That's right, you get a
ArrayIndexOutOfBoundsException because there simply is no index 8 within the info[] array. You need to handle situations like this in your code and prepare to deal with them. Don't rely on
exception handling to take care of business for you. The whole idea
is to avoid exceptions if at all possible mind you there are times when it is necessary. I don't believe this is one of them.
Without access to your code classes I'm just going to naturally assume that your method returns are valid and functioning as planned. With this in mind, here is how I would format the Hotels text file:
My App Name - Hotels Data File
;Hotel Name; Hotel Address; Latitude; Longtitude; Single Room; Single Price; Double Room; Double Price; Suite Room; Suite Price
Hotel Paradis;Strada Ciocarliei, Cluj-Napoca 400124;46.779862;23.611739;7;200;8;250;1;400
Hotel Sunny Hill;Strada Fagetului 31A, Cluj-Napoca 400497;46.716030;23.573740;4;150;6;190
Golden Tulip Ana Dome;Strada Observatorului 129, Cluj-Napoca 400352;46.751989;23.576580;0;330;0;350;0;600
The first line of the file is the File Descriptor line. The second line is a Blank Line simply for easier viewing of the file. The third line is considered a Comment Line because in this case it starts with a semi-colon (;). It's actually up to you to decide what is to be in place to make a file line considered as a Comment line. This line simply acts as a Header Line and describes what each delimited piece of data on any Data Line means. The fourth line is of course yet another blank line and again, for easier viewing of the file. The remaining file lines are all Data Lines and these are the file lines you want to process.
To read the file your methods might look like this:
public HotelDescriptor readLine(final String line) {
// Split on various possible combinations of how the
// delimiter might be formated within a file line.
String info[] = line.split(" ; |; |;");
// Variables declaration and default initialization values
String tempname = "";
String tempaddress = "";
float templatitudeh = 0.0f;
float templongitudeh = 0.0f;
int singleroom = 0;
int singleprice = 0;
int doubleroom = 0;
int doubleprice = 0;
int suiteroom = 0;
int suiteprice = 0;
String strg; // Used to hold the current Array Element in the for/loop
String regExF = "-?\\d+(\\.\\d+)?"; // RegEx to validate a string float or double value.
String regExI = "\\d+"; // RegEx to validate a string Integer value.
for (int i = 0; i < info.length; i++) {
strg = info[i].trim(); // remove leading/trailing spaces if any
switch (i) {
case 0:
tempname = info[i];
break;
case 1:
tempaddress = info[i];
break;
case 2:
// Is it a float or double numerical value
if (strg.matches(regExF)) {
templatitudeh = Float.parseFloat(info[i]);
}
break;
case 3:
// Is it a float or double numerical value
if (strg.matches(regExF)) {
templongitudeh = Float.parseFloat(info[i]);
}
break;
case 4:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
singleroom = Integer.parseInt(info[i]);
}
break;
case 5:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
singleprice = Integer.parseInt(info[i]);
}
break;
case 6:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
doubleroom = Integer.parseInt(info[i]);
}
break;
case 7:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
doubleprice = Integer.parseInt(info[i]);
}
break;
case 8:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
suiteroom = Integer.parseInt(info[i]);
}
break;
case 9:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
suiteprice = Integer.parseInt(info[i]);
}
break;
}
}
Hotel tempHotel = new Hotel(tempname, tempaddress, templatitudeh, templongitudeh,
singleroom, singleprice, doubleroom, doubleprice, suiteroom, suiteprice);
System.out.println(tempHotel.getName());
return tempHotel;
}
public List<HotelDescriptor> readFile(final String hotels) {
try (BufferedReader buff = new BufferedReader(new FileReader(hotels))) {
String line;
int lineCounter = 0;
while ((line = buff.readLine()) != null) {
// Trim any leading or trailing spaces (spaces, tabs, etc)
line = line.trim();
lineCounter++;
// Is this the right file to read?
if (lineCounter == 1) {
if (!line.equalsIgnoreCase("My App Name - Hotels Data File")) {
//No it isn't...
JOptionPane.showMessageDialog(this, "Invalid Hotels Data File!",
"Invalid Data File", JOptionPane.WARNING_MESSAGE);
break; // Get out of while loop
}
// Otherwise skip the File Descriptor line.
else { continue; }
}
// Is this a blank or Comment line...
// Lines that start with ; are comment lines
if (line.equals("") || line.startsWith(";")) {
// Yes it is...skip this line.
continue;
}
// Process the data line...
hotelData.add(readLine(line));
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
return hotelData;
}
In the readLine() method variables are initialized to hold default values should not all values be present on any given file Data Line. The switch block ensures that only supplied data line values are processed regardless of how data is provided, defaults fill in the rest. This eliminates the possibility of an ArrayIndexOutOfBoundsException from happening when working with the info[] Array.
Where parseFloat() and parseInt() are used the string to be converted into its respective data type is first checked to ensure that it is a valid numerical representation of the data type we are converting to. The String.matches() method is used for this in conjunction with a regular expression.
The code above can of course be optimized much further but I feel it provides a good description of what can be done to to increase the success of reading and processing the file(s).
As a side note, it is also understandably confusing to call one of your own methods (readLine()) by the same name as a method used by BufferedReader. Up to you but perhaps this would be better named as processReadLine()
Prices should be at least in either float or double data type

Check for escape character in Java

I have a flat comma separated file that has "\N" for some rows. I need to load all rows and skip all those are not containing \N.
I am trying to do the following but it doesn't work.
if (!line.contains("\\N")) {
//do load here
}
Above code still passes the line from csv below:
1,text,abc,\N,23,56
and then we have NumberFormatException (it should be Int Value there).
Why is this happening?

If you want to only process file lines that contain a \N then omit the ! (exclamation) character from your if statement condition. This is the NOT flag.
What your condition is basically saying right now is:
If the current text file line contained within the string variable
line does NOT contain a \N then do execute the code within the if statement block.
IF statements are only executed if the supplied condition is boolean true. Applying the ! flag basically sets the condition to true if the supplied condition is NOT true. This may help you more.
If you want just those lines that DO contain a \N then your code should look like:
if (line.contains("\\N")) {
//process line here
}
If you DO NOT want to process those file lines that contain \N then what you are using right now should work just fine.
Regarding your question:
and then we have NumberFormatException (it should be Int Value there).
Why is this happening?
\n (lowercase n) is generally a tag which is applied within a string to force a New Line when processed, if a uppercase N is used it does not do this. In general, a lot of CSV files use the \N to mean NULL and others simply place nothing, just the delimiter. You will need to look into what is creating the CSV file to find the actual reason as to why since they may be using it for something else but for now you can consider it as NULL. Integer variables are never Null, they would be considered as containing 0 by default so you could change your code to:
if (line.contains("\\N") { line = line.replace("\\N", "0"); }
You could however also encounter \N where there should be a String so the above line will do you no good. One solution would be to handle \N within the contents of each array element (should it be there) after you have split the file line, for example:
String csvFilePath = "MyCSVfile.txt"; // assuming file is in classpath
try (BufferedReader br = new BufferedReader(new FileReader("MyCSVfile.txt"))) {
int j = 0; //used as a counter
String line = "";
while ((line = br.readLine()) != null) {
j++; //increment counter
String[] data = line.split(",");
//if the ID value contains null then apply the counter (j) value.
int id = Integer.parseInt(data[0].replace("\\N",String.valueOf(j)));
String type = data[1].replace("\\N","");
String text = data[2].replace("\\N","");
int value1 = Integer.parseInt(data[3].replace("\\N","0"));
int value2 = Integer.parseInt(data[4].replace("\\N","0"));
int value3 = Integer.parseInt(data[5].replace("\\N","0"));
System.out.println("ID is:\t\t" + id + "\nData Type is:\t" + type +
"\nText is:\t" + text + "\nValue 1 is:\t" + value1 +
"\nValue 2 is:\t" + value2 + "\nValue 3 is:\t" +
value3 + "\n");
}
br.close();
}
catch (IOException ex) {
//however you want to handle exception
}
This will handle the \N tag regardless of where it is encountered within any one of your CSV file lines.

In Java, how do I overwrite a specific part of a line in a file? [duplicate]

This question already has answers here:
Java Replace Line In Text File
(8 answers)
Closed 6 years ago.
I have a csv file thats formatted like this id,text. Here is an example:
helloText,How are you
goodbyeMessage,Some new text for a change
errorMessage,Oops something went wrong
Now lets say for example I want to edit the text part of goodbyeMessage which is Some new text for change, to See you later
The resulting csv should then look like this:
helloText,How are you
goodbyeMessage,See you later
errorMessage,Oops something went wrong
I have code that can write to the file but when the code finishes executing, this is the resulting csv file:
helloText,How are you
goodbyeMessage,Some new text for a change
errorMessage,Oops something went wronggoodbyeMessage,See you later
I know this is occurring because I set the FileWriter's append value to true. If I don't everything gets wiped.
I have tried using FileWriter.newLine() to make it look better but that is not what I am trying to achieve. I still want the same number of line in the file.
MyApp.java
public static void main(String[] args) throws FileNotFoundException, IOException {
PropsWriter pw = new PropsWriter("test_props.txt");
pw.updateElementText("goodbyeMessage", "See you later");
}
PropsWriter.java
/**
* Updates the text of a given element in the properties file.
*
* #param id The id of the element
* #param newText The text that will replace the original text.
*
* #throws IOException If I/O error occurs
*/
public void updateElementText(String id, String newText) throws IOException {
Assertions.checkNotNull(id, "Id must not be null.");
Assertions.checkNotNull(id, "Id must not be an empty string.");
File file = new File(pathName);
BufferedReader br = new BufferedReader(new FileReader(file));
BufferedWriter wr = new BufferedWriter(new FileWriter(file, true));
try {
String line;
while((line = br.readLine()) != null) {
if(line.contains(id)) {
//returns true
System.out.println("Is line there: " + line.contains(id));
//returns helloText
System.out.println("ID: " + extractId(line));
//returns How are you
System.out.println("TEXT: " + extractText(line));
//returns Some new text for a change
System.out.println("NEW_TEXT: " + newText);
// This is where I am trying to replace the old text
// with new text that came from the main method.
line = line.replaceAll(extractText(line), newText);
//wr.newLine();
wr.write(line);
}
}
} catch(IOException e) {
e.printStackTrace();
} finally {
wr.close();
}
}
/**
* Gets the id part of a line that is stored in the
* properties file.
*
* #param element The element the id is got from.
* #return String representation of the id.
*/
private static String extractId(String line) {
final int commaOccurence = getFirstCommaOccurrence(line);
return line.substring(0, commaOccurence);
}
/**
* Gets the text part of a line that is stored in the
* properties file.
*
* #param element The element the text is got from.
* #return String representation of the text.
*/
private static String extractText(String line) {
final int commaOccurence = getFirstCommaOccurrence(line);
return line.substring(commaOccurence + 1, line.length());
}
/**
* Gets the first occurrence of a comma in any given line of a text file.
* #param element
* #return
*/
private static int getFirstCommaOccurrence(String line) {
return line.indexOf(",");
}

You just said it. Do not set the FileWriter to true (as it then only appends new stuff). You need to read the whole file, save all lines (for example in a List<String>). Then you manipulate the data. And after that you rewrite the whole text (for example by using the said List<String>).
In your above code this would be first replace true with false:
BufferedWriter wr = new BufferedWriter(new FileWriter(file, false));
Second, always write to the file, as other lines would then get lost:
if(line.contains(id)) {
...
line = line.replaceAll(extractText(line), newText);
}
wr.write(line);
However, if you have a very big file and don't want to rewrite all lines, then you can go for a more low-level FileWriter, like RandomAccessFile. This concept allows you to start the file manipulation at a given position without rewriting everything that was before this position. You can search the line (or jump, if you know where), make your change. But you will need to rewrite everything that comes after the change. Here's an usage example: RandomAccessFile example
There is no better solution as this is platform dependent. More specific, it depends on the used Filesystem, for example NTFS or FAT32 and so on.
In general, they store a file by splitting it into several packages. The packages then get saved all over your hard drive, it puts them where they best fit in. The file system saves a pointer to each start package of a file in a master table. Every package then saves a pointer to the next packages and so on until EOF is reached.
You see, it is easy to change something in the middle without changing everything that was before. But you probably need to change the stuff that comes after, as you can not control how the OS splits the new data into packages. If you go very low-level, you may control a single packages and inject data without changing everything after. But I don't think that you want to do this :)

Unfortunately, there isn't really a way to do this (that I know of) besides loading all the file data into some mutable object, and then writing it all back to the file.
The Files class provides a convenient method for reading an entire file into a List
As #Zabuza mentioned in the comments, Java provides a class called RandomAccessFile, which can navigate to a specific place in a file and overwrite bytes.
Methods that may be of interest to you would be:
getFilePointer() - Returns the current file offset
seek(long pos) - advances the file offset by pos
write(byte[] b) - writes b to current file position
Hope this helps

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Put multiple values in arrays from cvs file - java

Related

How do I count word occurrences in a csv file?

Reading/splitting different parts from different lines in a text file in Java

Reading in stops after handling exception inside loop

Check for escape character in Java

In Java, how do I overwrite a specific part of a line in a file? [duplicate]

Categories

Resources