I'd have a pretty strange question here. After throwing and handling my ReaderException exception my read-in still stops at the first occurence of the exception. Can somebody please explain why is this happening?
Input:
Hotel Paradis;Strada Ciocarliei, Cluj-Napoca 400124;46.779862;23.611739;7;200;8;250;1;400
Hotel Sunny Hill;Strada Fagetului 31A, Cluj-Napoca 400497;46.716030;23.573740;4;150;6;190
Golden Tulip Ana Dome;Strada Observatorului 129, Cluj-Napoca 400352;46.751989;23.576580;0;330;0;350;0;600
Code:
public HotelDescriptor readLine(final String line) throws ReaderException {
System.out.println(line);
String info[] = line.split(";");
for (String i:info)
System.out.println(i);
String tempname = info[0];
String tempaddress = info[1];
float templatitudeh = Float.parseFloat(info[2]);
float templongitudeh = Float.parseFloat(info[3]);
int singleroom = Integer.parseInt(info[4]);
int singleprice = Integer.parseInt(info[5]);
int doubleroom = Integer.parseInt(info[6]);
int doubleprice = Integer.parseInt(info[7]);
int suiteroom = Integer.parseInt(info[8]);
int suiteprice = Integer.parseInt(info[9]);
Hotel tempHotel = new Hotel(tempname, tempaddress, templatitudeh, templongitudeh, singleroom, singleprice, doubleroom, doubleprice, suiteroom, suiteprice);
System.out.println(tempHotel.getName());
return tempHotel;
}
public List<HotelDescriptor> readFile(final String hotels) {
try (BufferedReader buff = new BufferedReader(new FileReader(hotels))) {
String line = "";
while ((line = buff.readLine() )!= null) {try {
hotelData.add(readLine(line));
} catch (ReaderException e){
e.printStackTrace();
} catch (ArrayIndexOutOfBoundsException ex){
ex.printStackTrace();
}
//line = buff.readLine();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return hotelData;
}
I take it that hotelData is declared as a Class field (class global).
When reading in a text file you should take into consideration some anomalies that can happen (or not). Simple steps can be taken to ensure that reading of that text file will be relatively successful. If your application is creating the text file then your success rate raises considerably since you can control how it is written but, if your application is not creating the text file or the text file is compiled from remote sources then the success rate can be reduced unless steps are taken to ensure expected results.
In my opinion:
A text file should be identifiable so as to ensure that the proper
text file is actually being read to process. If the text data is from
a CSV file then a CSV Header line should be the very first line
within the file and a read of this line should be done and compared
to so as to verify that the correct file is being accessed. This is
especially true if the file is to be selectable by any number of
Users (perhaps by way of a file chooser). If a File Identifier
(or Descriptor) line does not exist within your text file as the first line of
the file then perhaps you should consider using one even if it is
considered a Comment Line where the line might start with perhaps
a semi-colon (;) as the first character of the line. Anything
that can identify the file as being the correct file to process.
Blank lines and any lines deemed to be Comment Lines should be
ignored. This includes any file lines known not to be actual Data
Lines whatever they may be. In general, a couple lines of code
consisting of an if statement along with a few conditions can
take care of this situation.
Never count on actual Data Lines (the lines of data you will be
processing) to be holding all the required data expected. This is
especially true when manipulating split delimited data with
methods such as Integer.parseInt() or Float.parseFloat() as a
mere examples. This is actually the biggest problem in your particular
situation. Take note of the example data lines you have provided
within your post. The first line consists of 10 delimited pieces of
data, the second data line consists of 8 delimited pieces of
data, and the third line consists of again, 10 pieces of delimited
data. It is the second data line that is the issue here. When
this line is split the result will be an Array (info[]) which
will hold 8 elements (index 0 to 7) yet the readLine() method is
expecting to always deal with an array consisting of 10 elements
(index 0 to 9). While processing the second data line, guess what
happens when the code line int suiteroom = Integer.parseInt(info[8]); is hit. That's right, you get a
ArrayIndexOutOfBoundsException because there simply is no index 8 within the info[] array. You need to handle situations like this in your code and prepare to deal with them. Don't rely on
exception handling to take care of business for you. The whole idea
is to avoid exceptions if at all possible mind you there are times when it is necessary. I don't believe this is one of them.
Without access to your code classes I'm just going to naturally assume that your method returns are valid and functioning as planned. With this in mind, here is how I would format the Hotels text file:
My App Name - Hotels Data File
;Hotel Name; Hotel Address; Latitude; Longtitude; Single Room; Single Price; Double Room; Double Price; Suite Room; Suite Price
Hotel Paradis;Strada Ciocarliei, Cluj-Napoca 400124;46.779862;23.611739;7;200;8;250;1;400
Hotel Sunny Hill;Strada Fagetului 31A, Cluj-Napoca 400497;46.716030;23.573740;4;150;6;190
Golden Tulip Ana Dome;Strada Observatorului 129, Cluj-Napoca 400352;46.751989;23.576580;0;330;0;350;0;600
The first line of the file is the File Descriptor line. The second line is a Blank Line simply for easier viewing of the file. The third line is considered a Comment Line because in this case it starts with a semi-colon (;). It's actually up to you to decide what is to be in place to make a file line considered as a Comment line. This line simply acts as a Header Line and describes what each delimited piece of data on any Data Line means. The fourth line is of course yet another blank line and again, for easier viewing of the file. The remaining file lines are all Data Lines and these are the file lines you want to process.
To read the file your methods might look like this:
public HotelDescriptor readLine(final String line) {
// Split on various possible combinations of how the
// delimiter might be formated within a file line.
String info[] = line.split(" ; |; |;");
// Variables declaration and default initialization values
String tempname = "";
String tempaddress = "";
float templatitudeh = 0.0f;
float templongitudeh = 0.0f;
int singleroom = 0;
int singleprice = 0;
int doubleroom = 0;
int doubleprice = 0;
int suiteroom = 0;
int suiteprice = 0;
String strg; // Used to hold the current Array Element in the for/loop
String regExF = "-?\\d+(\\.\\d+)?"; // RegEx to validate a string float or double value.
String regExI = "\\d+"; // RegEx to validate a string Integer value.
for (int i = 0; i < info.length; i++) {
strg = info[i].trim(); // remove leading/trailing spaces if any
switch (i) {
case 0:
tempname = info[i];
break;
case 1:
tempaddress = info[i];
break;
case 2:
// Is it a float or double numerical value
if (strg.matches(regExF)) {
templatitudeh = Float.parseFloat(info[i]);
}
break;
case 3:
// Is it a float or double numerical value
if (strg.matches(regExF)) {
templongitudeh = Float.parseFloat(info[i]);
}
break;
case 4:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
singleroom = Integer.parseInt(info[i]);
}
break;
case 5:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
singleprice = Integer.parseInt(info[i]);
}
break;
case 6:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
doubleroom = Integer.parseInt(info[i]);
}
break;
case 7:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
doubleprice = Integer.parseInt(info[i]);
}
break;
case 8:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
suiteroom = Integer.parseInt(info[i]);
}
break;
case 9:
// Is it a Integer numerical value
if (strg.matches(regExI)) {
suiteprice = Integer.parseInt(info[i]);
}
break;
}
}
Hotel tempHotel = new Hotel(tempname, tempaddress, templatitudeh, templongitudeh,
singleroom, singleprice, doubleroom, doubleprice, suiteroom, suiteprice);
System.out.println(tempHotel.getName());
return tempHotel;
}
public List<HotelDescriptor> readFile(final String hotels) {
try (BufferedReader buff = new BufferedReader(new FileReader(hotels))) {
String line;
int lineCounter = 0;
while ((line = buff.readLine()) != null) {
// Trim any leading or trailing spaces (spaces, tabs, etc)
line = line.trim();
lineCounter++;
// Is this the right file to read?
if (lineCounter == 1) {
if (!line.equalsIgnoreCase("My App Name - Hotels Data File")) {
//No it isn't...
JOptionPane.showMessageDialog(this, "Invalid Hotels Data File!",
"Invalid Data File", JOptionPane.WARNING_MESSAGE);
break; // Get out of while loop
}
// Otherwise skip the File Descriptor line.
else { continue; }
}
// Is this a blank or Comment line...
// Lines that start with ; are comment lines
if (line.equals("") || line.startsWith(";")) {
// Yes it is...skip this line.
continue;
}
// Process the data line...
hotelData.add(readLine(line));
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
return hotelData;
}
In the readLine() method variables are initialized to hold default values should not all values be present on any given file Data Line. The switch block ensures that only supplied data line values are processed regardless of how data is provided, defaults fill in the rest. This eliminates the possibility of an ArrayIndexOutOfBoundsException from happening when working with the info[] Array.
Where parseFloat() and parseInt() are used the string to be converted into its respective data type is first checked to ensure that it is a valid numerical representation of the data type we are converting to. The String.matches() method is used for this in conjunction with a regular expression.
The code above can of course be optimized much further but I feel it provides a good description of what can be done to to increase the success of reading and processing the file(s).
As a side note, it is also understandably confusing to call one of your own methods (readLine()) by the same name as a method used by BufferedReader. Up to you but perhaps this would be better named as processReadLine()
Prices should be at least in either float or double data type
Related
I'm working on this assignment I'm supposed to read from a text file like this...
Student Name: John
Student ID: 12344/19
College: Science
Credits Attempted: 15
Credits Earned: 15
Grade Points: 41.2
Course Code Course Title Credit Grade
COMP1007, Amazing Applications of AI, 2, B
COMP2202, Fund. of Object Oriented Prog., 3, C-
MATH2108, Calculus (2), 3, C-
MATH3340, Discrete Math. for Comp. Sci., 3, B-
STAT2101, Introduction to Statistics, 4, C+
I should read this text file and calculate the GPA of the student and create an output file that should look like this...
Output text file
So basically I'm stuck and I have no idea what I to do...
I know how to read line by line and split a line into different parts, but this doesn't seem to work here since every line is different from the other. For example the first line has two parts, the "Student Name" and the name itself in this case "John". But in line 9, there are four different parts, the course code, course name, credit and grade.
I'm honestly not looking to cheat on the assignment but only to understand it
help :)
Note I can't use Stream or Hashmap or BufferedReader
Each data record in a text file always has a Start and an End. The easiest records are obviously those that are contained on a single delimited line within the text file, where each file line is in fact a record as you can see within a typical CSV format data file. The harder records to read are the Multi-Line records whereas each data record consists of several sequential text file lines but still, there is a Start and a End to each record.
The Start of a record is usually pretty easy to distinguish. For example, in the file example you provided in your post it is obviously any file line that starts with Student Name:.
The End of a record may not always be so easy to determine since many applications do not save fields which contain no data value in order to help increase access speed and reduce file bloat. The thought is "why have a text file full of empty fields" and to be honest, rightly so. I'm not a big fan of text file records anyways since utilizing a database would make far better sense for larger amounts of data. In any case, there will always be a file line that will indicate the Start of a record so it would make sense to read from Start to Start of the next record or in the case of the last record in file, from Start to End Of File (EOF).
Here is an example (read the comments in code):
// System line separator to use in files.
String ls = System.lineSeparator();
/* Array will hold student data: Student Name, Student ID, College,
Credits Attempted, Credits Earned, and finally Grade Points. */
String[] studentData = new String[6];
// String Array to hold Course Table Header Names.
String[] coursesHeader = {"COURSE NO", "COURSE TITLE", "CREDITS", "GRADE"};
// List Interface to hold all the course Data line Arrays for each record
java.util.List<String[]> cousesList = new java.util.ArrayList<>();
// Underlines to be used for Console display and file records
// Under courses Header
String underline1 = "-------------------------------------------------------------";
// Under all the courses
String underline2 = "------------------------------------------------------------------------------------";
/* Read and Write to files using 'Try With Resources' so to
automatically close the reader an writer objects. */
try (Scanner reader = new Scanner(new java.io.File("StudentData.txt"), "UTF-8");
java.io.Writer writer = new java.io.FileWriter("StudentsGPA.txt")) {
// For console display only! [Anything (except caught errors) to Console can be deleted]
System.out.println("The 'StudentsGPA.txt' file will contain:");
System.out.println("======================================");
System.out.println();
// Will hold each line read from the reader
String line = "";
/* Will hold the name for the next record. This would be the record START
but only AFTER the first record has been read. */
String newName = "";
// Start reading the 'StudentData.txt' file (line by line)...
while (reader.hasNextLine()) {
/* If newName is empty then we're on our first record or
there is only one record in file. */
if (newName.isEmpty()) {
line = reader.nextLine(); // read in a file line...
}
else {
/* newName contains a name so we must have bumped into
the START of a new record during processing of the
previous record. We aleady now have the first line
of this new record (which is the student's name line)
currently held in the 'newName' variable so we just
make 'line' equal what is in the 'newName' variable
and carry on processing the data as normal. in essance,
we simply skipped a read because we've already read it
earlier when processing the previous record. */
line = newName;
// Clear this variable in preparation for another record START.
newName = "";
}
/* Skip comment lines (lines that start with a semicolon (;)
or a hash mark (#). Also skip any blank lines. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
continue;
}
/* Does this file line start with 'Student Name:'? If so then
this is a record START, let's process this record. If not
then just keep reading the file. */
if (line.startsWith("Student Name:")) {
/* Let's put the student name into the studentData array at
index 0. If it is detected that there has been no name
applied for some reason then we place "N/A" as the name.
We use a Ternary Operator for this. So, "N/A" will be a
Default if there is not name. This will be typical for
the other portions of student data. */
studentData[0] = line.isEmpty() ? "N/A" : line.split("\\s*:\\s*")[1].trim();
/* Let's keep reading the file from this point on and retrieve
the other bits of student data to fill the studentData[]
Array... */
for (int i = 1; i < 6; i++) {
line = reader.nextLine().trim();
/* If we encounter a comment line or a blank line then let's
just skip past it. We don't want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
i--;
continue;
}
/* Store the other portions of student data into the
studentData Array using "N/A" as a default should
any student data field contain nothing. */
studentData[i] = line.isEmpty() ? "N/A" : line.split("\\s*:\\s*")[1].trim();
}
// The current Student's Courses...
/* Clear the List Interface object in preparation for new
Courses from this particular record. */
cousesList.clear();
// Read past the courses header line...We don't want it.
reader.nextLine();
// Get the courses data (line by line)...
while (reader.hasNextLine()) {
line = reader.nextLine().trim();
/* Again, if we encounter a comment line or a blank line
in this section then let's just skip past it. We don't
want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
continue;
}
/* At this point, if we have read in a line that starts
with 'Student Name:' then we just hit the START of a
NEW Record! This then means that the record we're
currently working on is now finished. Let's store this
file line into the 'newRecord' variable and then break
out of this current record read. */
if (line.startsWith("Student Name:")) {
newName = line;
break;
}
/* Well, we haven't reached the START of a New Record yet
so let's keep creating the courses list (line by line).
Break the read in course line into a String[] array.
We use the String#split() method for this with a small
Regular Expression (regex) to split each line based on
comma delimiters no matter how the delimiter spacing
might be (ex: "," " ," " , " or even " , "). */
String[] coursesData = line.split("\\s*,\\s*");
/* Add this above newly created coursesData string array
to the list. */
cousesList.add(coursesData);
}
/* Write (append) this current record to new file. The String#format()
method is used here to save the desired data into the 'StudentGPA.txt'
file in a table style format. */
// Student Data...
writer.append(String.format("%-12s: %-25s", "ID", studentData[1])).append(ls);
writer.append(String.format("%-12s: %-25s", "Name", studentData[0])).append(ls);
writer.append(String.format("%-12s: %-25s", "College", studentData[2])).append(ls);
// Student Courses...
// The Header line
writer.append(String.format("%-13s %-30s %-10s %-4s", coursesHeader[0],
coursesHeader[1], coursesHeader[2], coursesHeader[3])).append(ls);
// Apply an Underline (underline1) under the header.
writer.append(underline1).append(ls);
// Write the Courses data in a table style format to make the Header format.
for (String[] cData : cousesList) {
writer.append(String.format("%-13s %-33s %-9s %-4s",
cData[0], cData[1], cData[2], cData[3])).append(ls);
}
// Apply an Underline (underline2) under the Courses table.
writer.append(underline2).append(ls);
// Display In Console Window (you can delete this if you want)...
System.out.println(String.format("%-12s: %-25s", "ID", studentData[1]));
System.out.println(String.format("%-12s: %-25s", "Name", studentData[0]));
System.out.println(String.format("%-12s: %-25s", "College", studentData[2]));
System.out.println(String.format("%-13s %-30s %-10s %-4s", coursesHeader[0],
coursesHeader[1], coursesHeader[2], coursesHeader[3]));
System.out.println(underline1);
for (String[] cData : cousesList) {
System.out.println(String.format("%-13s %-33s %-9s %-4s",
cData[0], cData[1], cData[2], cData[3]));
}
System.out.println(underline2);
// The LAST line of each record, the Credits...
// YOU DO THE CALCULATIONS FOR: totalAttemped, semGPA, and cumGPA
String creditsAttempted = studentData[3];
String creditsEarned = studentData[4];
int credAttempted = 0;
int credEarned = 0;
int totalAttempted = 0;
double semGPA = 0.0d;
double cumGPA = 0.0d;
/* Make sure the 'credits attemted' numerical value is in fact
a string representaion of an integer value. if it is then
convert that string numerical value to integer. */
if (creditsAttempted.matches("\\d+")) {
credAttempted = Integer.valueOf(creditsAttempted);
}
/* Make sure the 'credits earned' numerical value is in fact
a string representaion of an integer value. if it is then
convert that string numerical value to integer. */
if (creditsEarned.matches("\\d+")) {
credEarned = Integer.valueOf(creditsEarned);
}
// Build the last record line (the Credits string) with the acquired data.
String creditsString = new StringBuilder("CREDITS: TOTAL.ATTEMPTED ")
.append(totalAttempted).append("? EARNED ").append(credEarned)
.append(" ATTEMPTED ").append(credAttempted).append(" SEM GPA ")
.append(semGPA).append("? CUM GPA ").append(cumGPA).append("?")
.toString();
// Display it to the console Window (you can delete this).
System.out.println(creditsString);
System.out.println();
// Write the built 'credit string' to file which finishes this record.
writer.append(creditsString).append(ls);
writer.append(ls); // Blank Line in preparation for next record.
writer.flush(); // Flush the data buffer - write record to disk NOW.
}
}
}
// Trap Errors...Do whatever you want with these.
catch (FileNotFoundException ex) {
System.err.println("File Not Found!\n" + ex.getMessage());
}
catch (IOException ex) {
System.err.println("IO Error Encountered!\n" + ex.getMessage());
}
Yes, it looks long but if you get rid of all the comments you can see that it really isn't. Don't be afraid to experiment with the code. Make it do what you want.
EDIT: (as per comments)
To place the student info portion of each record into an ArrayList so that you can parse it the way you want:
Where the forloop is located within the example code above for gathering the student info, just change this loop to this code and parse the data the way you want:
// Place this ArrayList declaration underneath the 'underline2' variable declaration:
java.util.ArrayList<String> studentInfo = new java.util.ArrayList<>();
then:
if (line.startsWith("Student Name:")) {
studentInfo.clear();
studentInfo.add(line);
/* Let's keep reading the file from this point on and retrieve
the other bits of student data to fill the studentData[]
Array... */
for (int i = 1; i < 6; i++) {
line = reader.nextLine().trim();
/* If we encounter a comment line or a blank line then let's
just skip past it. We don't want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
i--;
continue;
}
studentInfo.add(line);
}
// .................................................
// .... The rest of the code for this `if` block ...
// .................................................
}
You will of course need to change the code after this loop to properly represent this ArrayList.
OK, so here's how you do it ...
You read in all of the file and store each line in a List<String>
For the first 8 lines you process each one in a separate way. You can even write a separate function to parse the necessary info out of every line for lines 0-7
All the remaining lines have identical structure. Therefore, you can process them all in the same way to parse out and then process the necessary data.
And a comment to this answer if something is unclear and I'll clarify.
Values are separated with comma, following format:
Country,Timescale,Vendor,Units
Africa,2010 Q3,Fujitsu Siemens,2924.742632
I want to make array for every value. How can I do it?
I tried many things, code below:
BufferedReader br = null;
String line = "";
String cvsSplitBy = ",";
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
String[] country = line.split(cvsSplitBy);
country[0] +=",";
String[] Krajina = country[0].split(",");
What you appear to be talking about is utilizing what is otherwise known as Parallel Arrays and is generally a bad idea in this particular use case since it can be prone to OutOfBounds Exceptions later on down the road. A better solution would be to utilize a Two Dimensional (2D) Array or an ArrayList. Never the less, parallel arrays it is:
You say an array size of 30, well maybe today but tomorrow it might be 25 or 40 so in order to size your Arrays to hold the file data you will need to know how many lines of that actual raw data is contained within the CSV file (excluding Header, possible comments, and possible blank lines). The easiest way would be to just dump everything into separate ArrayList's and then convert them to their respective arrays later on be it String, int's, long's, double, whatever.
Counting file lines first so as to initialize Arrays:
One line of code can give you the number of lines contained within a supplied text file:
long count = Files.lines(Paths.get("C:\\MyDataFiles\\DataFile.csv")).count();
In reality however, on its own the above code line does need to be enclosed within a try/catch block in case of a IO Exception so there is a wee bit more code than a single line. For a simple use case where the CSV file contains a Header Line and no Comment or Blank lines this could be all you need since all you would need to do is subtract one to eliminate the Header Line from the overall count for initializing your Arrays. Another minor issue with the above one-liner is the fact that it provides a count value in a Long Integer (long) data type. This is no good since Java Arrays will only accept Integer (int) values for initialization therefore the value obtained will need to be cast to int, for example:
String[] countries = new String[(int) count];
and this is only good if count does not exceed the Integer.MAX_VALUE - 2 (2147483645). That's a lot of array elements so in general you wouldn't really have a problem with this but if are dealing with extremely large array initializations then you will also need to consider JVM Memory and running out of it.
Sometimes it's just nice to have a method that could be used for a multitude of different situations when getting the total number of raw data lines from a CSV (or other) text file. The provided method below is obviously more than a single line of code but it does provide a little more flexibility towards what to count in a file. As mentioned earlier there is the possibility of a Header Line. A Header line is very common in CSV files and it is usually the first line within the file but this may not always be the case. The Header line could be preceded with a Comment Line of even a Blank Line. The Header line however should always be the first line before the raw data lines. Here is an example of a possible CSV file:
Example CSV file contents:
# Units Summary Report
# End Date: May 27, 2019
Country,TimeScale,Vendor,Units
Czech Republic,2010 Q3,Fujitsu Siemens,2924.742032
Slovakia,2010 Q4,Dell,2525r.011404
Slovakia,2010 Q4,Lenovo,2648.973238
Czech Republic,2010 Q3,ASUS,1323.507139
Czech Republic,2010 Q4,Apple,266.7584542
The first two lines are Comment Lines and Comment Lines always begin with either a Hash (#) character or a Semicolon (;). These lines are to be ignored when read.
The third line is a Blank Line and serves absolutely no purpose other than aesthetics (easier on the eyes I suppose). These lines are also to be ignored.
The fourth line which is directly above the raw data lines is the Header Line. This line may or may not be contained within a CSV file. Its purpose is to provide the Column Names for the data records contained on each raw data line. This line can be read (if it exists) to acquire record field (column) names.
The remaining lines within the CSV file are Raw Data Lines otherwise considered data records. Each line is a complete record and each delimited element of that record is considered a data field value. These are the lines you want to count so as to initialize your different Arrays. Here is a method that allows you to do that:
The fileLinesCount() Method:
/**
* Counts the number of lines within the supplied Text file. Which lines are
* counted depends upon the optional arguments supplied. By default, all
* file lines are counted.<br><br>
*
* #param filePath (String) The file path and name of file (with
* extension) to count lines in.<br>
*
* #param countOptions (Optional - Boolean) Three Optional Parameters. If an
* optional argument is provided then the preceeding
* optional argument MUST also be provided (be it true
* or false):<pre>
*
* ignoreHeader - Default is false. If true is passed then a value of
* one (1) is subtracted from the sum of lines detected.
* You must know for a fact that a header exists before
* passing <b>true</b> to this optional parameter.
*
* ignoreComments - Default is false. If true is passed then comment lines
* are ignored from the count. Only file lines (after being
* trimmed) which <b>start with</b> either a semicolon (;) or a
* hash (#) character are considered a comment line. These
* characters are typical for comment lines in CSV files and
* many other text file formats.
*
* ignoreBlanks - Default is false. If true is passed then file lines
* which contain nothing after they are trimmed is ignored
* in the count.
*
* <u>When a line is Trimmed:</u>
* If the String_Object represents an empty character
* sequence then reference to this String_Object is
* returned. If both the first & last character of the
* String_Object have codes greater than unicode ‘\u0020’
* (the space character) then reference to this String_Object
* is returned. When there is no character with a code
* greater than unicode ‘\u0020’ (the space character)
* then an empty string is created and returned.
*
* As an example, a trimmed line removes leading and
* trailing whitespaces, tabs, Carriage Returns, and
* Line Feeds.</pre>
*
* #return (Long) The number of lines contained within the supplied text
* file.
*/
public long fileLinesCount(final String filePath, final boolean... countOptions) {
// Defaults for optional parameters.
final boolean ignoreHeader = (countOptions.length >= 1 ? countOptions[0] : false);
// Only strings in lines that start with ';' or '#' are considered comments.
final boolean ignoreComments = (countOptions.length >= 2 ? countOptions[1] : false);
// All lines that when trimmed contain nothing (null string).
final boolean ignoreBlanks = (countOptions.length >= 3 ? countOptions[2] : false);
long count = 0; // lines Count variable to hold the number of lines.
// Gather supplied arguments for optional parameters
try {
if (ignoreBlanks) {
// Using lambda along with Ternary Operator
count = Files.lines(Paths.get(filePath)).filter(line -> (ignoreComments
? (!line.trim().startsWith(";") && !line.trim().startsWith("#"))
&& line.trim().length() > 0 : line.trim().length() > 0)).count();
if (ignoreHeader) {
count--;
}
return count;
}
if (ignoreComments) {
// Using lambda along with Ternary Operator
count = Files.lines(Paths.get(filePath)).filter(line -> (ignoreBlanks ? line.trim().length() > 0
&& (!line.trim().startsWith(";") && !line.trim().startsWith("#"))
: (!line.trim().startsWith(";") && !line.trim().startsWith("#")))).count();
if (ignoreHeader) {
count--;
}
return count;
}
else {
count = Files.lines(Paths.get(filePath)).count();
if (ignoreHeader) {
count--;
}
}
}
catch (IOException ex) {
Logger.getLogger("fileLinesCount() Method Error!").log(Level.SEVERE, null, ex);
}
return count;
}
Filling the Parallel Arrays:
Now it time to create a method to fill the desired Arrays and by looking at the data file it look like you need three String type arrays and one double type Array. You may want to make these instance or Class member variables:
// Instance (Class Member) variables:
String[] country;
String[] timeScale;
String[] vendor;
double[] units;
then for filling these arrays we would use an method like this:
/**
* Fills the 4 class member array variables country[], timeScale[], vendor[],
* and units[] with data obtained from the supplied CSV data file.<br><br>
*
* #param filePath (String) Full Path and file name of the CSV data file.<br>
*
* #param fileHasHeader (Boolean) Either true or false. Supply true if the CSV
* file does contain a Header and false if it does not.
*/
public void fillDataArrays(String filePath, boolean fileHasHeader) {
long dataCount = fileLinesCount(filePath, fileHasHeader, true, true);
/* Java Arrays will not accept the long data type for sizing
therefore we cast to int. */
country = new String[(int) dataCount];
timeScale = new String[(int) dataCount];
vendor = new String[(int) dataCount];
units = new double[(int) dataCount];
int lineCounter = 0; // counts all lines contained within the supplied text file
try (Scanner reader = new Scanner(new File("DataFile.txt"))) {
int indexCounter = 0;
while (reader.hasNextLine()) {
lineCounter++;
String line = reader.nextLine().trim();
// Skip comment and blank file lines.
if (line.startsWith(";") || line.startsWith("#") || line.equals("")) {
continue;
}
if (indexCounter == 0 && fileHasHeader) {
/* Since we are skipping the header right away we
now no longer need the fileHasHeader flag. */
fileHasHeader = false;
continue; // Skip the first line of data since it's a header
}
/* Split the raw data line based on a comma (,) delimiter.
The Regular Expression (\\s{0,},\\s{0,}") ensures that
it doesn't matter how many spaces (if any at all) are
before OR after the comma, the split removes those
unwanted spaces, even tabs are removed if any.
*/
String[] splitLine = line.split("\\s{0,},\\s{0,}");
country[indexCounter] = splitLine[0];
timeScale[indexCounter] = splitLine[1];
vendor[indexCounter] = splitLine[2];
/* The Regular Expression ("-?\\d+(\\.\\d+)?") below ensures
that the value contained within what it to be the Units
element of the split array is actually a string representation
of a signed or unsigned integer or double/float numerical value.
*/
if (splitLine[3].matches("-?\\d+(\\.\\d+)?")) {
units[indexCounter] = Double.parseDouble(splitLine[3]);
}
else {
JOptionPane.showMessageDialog(this, "<html>An invalid Units value (<b><font color=blue>" +
splitLine[3] + "</font></b>) has been detected<br>in data file line number <b><font " +
"color=red>" + lineCounter + "</font></b>. A value of <b>0.0</b> has been applied<br>to " +
"the Units Array to replace the data provided on the data<br>line which consists of: " +
"<br><br><b><center>" + line + "</center></b>.", "Invalid Units Value Detected!",
JOptionPane.WARNING_MESSAGE);
units[indexCounter] = 0.0d;
}
indexCounter++;
}
}
catch (IOException ex) {
Logger.getLogger("fillDataArrays() ethod Error!").log(Level.SEVERE, null, ex);
}
}
To get the ball rolling just run the following code:
/// Fill the Arrays with data.
fillDataArrays("DataFile.txt", true);
// Display the filled Arrays.
System.out.println(Arrays.toString(country));
System.out.println(Arrays.toString(timeScale));
System.out.println(Arrays.toString(vendor));
System.out.println(Arrays.toString(units));
You have to define your arrays before processing your file :
String[] country = new String[30];
String[] timescale = new String[30];
String[] vendor = new String[30];
String[] units = new String[30];
And while reading lines you have to put the values in the defined arrays with the same index, to keep the index use another variable and increase it at every iteration. It should look like this:
int index = 0;
while (true) {
if (!((line = br.readLine()) != null)) break;
String[] splitted = line.split(",");
country[index] = splitted[0];
timescale[index] = splitted[1];
vendor[index] = splitted[2];
units[index] = splitted[3];
index++;
}
Since your csv would probably include headers in it, you may also want to skip the first line too.
Always try to use try-with-resources when using I/O
The following code should help you out:
String line = "";
String cvsSplitBy = ",";
List<String> countries = new ArrayList<>();
List<String> timeScales = new ArrayList<>();
List<String> vendors = new ArrayList<>();
List<String> units = new ArrayList<>();
//use try-with resources
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
String[] parts = line.split(cvsSplitBy);
countries.add(parts[0]);
timeScales.add(parts[1]);
vendors.add(parts[2]);
units.add(parts[3]);
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
for (String country: countries) {
System.out.println(country);
}
for (String scale: timeScales) {
System.out.println(scale);
}
for (String vendor: vendors) {
System.out.println(vendor);
}
for (String unit: units) {
System.out.println(unit);
}
I am trying to run a mapreduce job on hadoop which reads the fifth entry of a tab delimited file (fifth entry are user reviews) and then do some sentiment analysis and word count on them.
However, as you know with user reviews, they usually include line breaks and empty lines. My code iterates through the words of each review to find keywords and check sentiment if keyword is found.
The problem is as the code iterates through the review, it gives me ArrayIndexOutofBoundsException Error because of these line breaks and empty lines in one review.
I have tried using replaceAll("\r", " ") and replaceAll("\n", " ") to no avail.
I have also tried if(tokenizer.countTokens() == 2){
word.set(tokenizer.nextToken());}
else {
}
also to no avail. Below is my code:
public class KWSentiment_Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {
ArrayList<String> keywordsList = new ArrayList<String>();
ArrayList<String> posWordsList = new ArrayList<String>();
ArrayList<String> tokensList = new ArrayList<String>();
int e;
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] line = value.toString().split("\t");
String Review = line[4].replaceAll("[\\-\\+\\\\)\\.\\(\"\\{\\$\\^:,]", "").toLowerCase();
StringTokenizer tokenizer = new StringTokenizer(Review);
while (tokenizer.hasMoreTokens()) {
// 1- first read the review line and store the tokens in an arraylist, 2-
// iterate through review to check for KW if found
// 3-check if there's PosWord near (upto +3 and -2)
// 4- setWord & context.write 5- null the review line arraylist
String CompareString = tokenizer.nextToken();
tokensList.add(CompareString);
}
{
for (int i = 0; i < tokensList.size(); i++)
{
for (int j = 0; j < keywordsList.size(); j++) {
boolean flag = false;
if (tokensList.get(i).startsWith(keywordsList.get(j)) == true) {
for (int e = Math.max(0, i - 2); e < Math.min(tokensList.size(), i + 4); e++) {
if (posWordsList.contains(tokensList.get(e))) {
word.set(keywordsList.get(j));
context.write(word, one);
flag = true;
break; // breaks out of e loop }}
}
}
}
if (flag)
break;
}
}
tokensList.clear();
}
}
Expected results are such that:
Take these two cases of reviews where error occurs:
Case 1: "Beautiful and spacious!
I highly recommend this place and great host."
Case 2: "The place in general was really silent but we didn't feel stayed.
Aside from this, the bathroom is big and the shower is really nice but there problem. "
The system should read the whole review as one line and iterate through the words in it. However, it just stops as it finds a line break or an empty line as in case 2.
Case 1 should be read such as: "Beautiful and spacious! I highly recommend this place and great host."
Case 2 should be:"The place in general was really silent but we didn't feel stayed. Aside from this, the bathroom is big and the shower is really nice but there problem. "
I am running out of time and would really appreciate help here.
Thanks!
So, I hope I am understanding what what you are trying to do....
If I am reading what you have above correctly, the value of 'value' passed into your map function above contains the delimited value that you would like to parse the user reviews out of. If that is the case, I believe we can make use of the escaping functionality in the opencsv library using tabs as your delimiting character instead of commas to correctly populate the user review field:
http://opencsv.sourceforge.net
In this example we are reading one line from the input that is passed in and parsing it into 'columns' base on the tab character and placing the results in the 'nextLine' array. This will allow us to use the escaping functionality of the CSVReader without reading an actual file and instead using the value of the text passed into your map function.
StringReader reader = new StringReader(value.toString());
CSVReader csvReader = new CSVReader(reader, '\t', '\"', '\\', 0);
String [] nextLine = csvReader.readNext();
if(nextLine != null && nextLine.length >= 5) {
// Do some stuff
}
In the example that you pasted above, I think even that split("\n") will be problematic as tabs within a user review split into two results in the result in addition to new lines being treated as new records. But, both of these characters are legal as long as they are inside a quoted value (as they should be in a properly escaped file and as they are in your example). CSVReader should handle all of these.
Validate each line at the start of the map method, so that you know line[4] exists and isn't null.
if (value == null || value.toString == null) {
return;
}
String[] line = value.toString().split("\t");
if (line == null || line.length() < 5 || line[4] == null) {
return;
}
As for line breaks, you'll need to show some sample input. By default MapReduce passes each line into the map method independently, so if you do want to read multiple lines as one message, you'll have to write a custom InputSplit, or pre-format your data so that all data for each review is on the same line.
I have a flat comma separated file that has "\N" for some rows. I need to load all rows and skip all those are not containing \N.
I am trying to do the following but it doesn't work.
if (!line.contains("\\N")) {
//do load here
}
Above code still passes the line from csv below:
1,text,abc,\N,23,56
and then we have NumberFormatException (it should be Int Value there).
Why is this happening?
If you want to only process file lines that contain a \N then omit the ! (exclamation) character from your if statement condition. This is the NOT flag.
What your condition is basically saying right now is:
If the current text file line contained within the string variable
line does NOT contain a \N then do execute the code within the if statement block.
IF statements are only executed if the supplied condition is boolean true. Applying the ! flag basically sets the condition to true if the supplied condition is NOT true. This may help you more.
If you want just those lines that DO contain a \N then your code should look like:
if (line.contains("\\N")) {
//process line here
}
If you DO NOT want to process those file lines that contain \N then what you are using right now should work just fine.
Regarding your question:
and then we have NumberFormatException (it should be Int Value there).
Why is this happening?
\n (lowercase n) is generally a tag which is applied within a string to force a New Line when processed, if a uppercase N is used it does not do this. In general, a lot of CSV files use the \N to mean NULL and others simply place nothing, just the delimiter. You will need to look into what is creating the CSV file to find the actual reason as to why since they may be using it for something else but for now you can consider it as NULL. Integer variables are never Null, they would be considered as containing 0 by default so you could change your code to:
if (line.contains("\\N") { line = line.replace("\\N", "0"); }
You could however also encounter \N where there should be a String so the above line will do you no good. One solution would be to handle \N within the contents of each array element (should it be there) after you have split the file line, for example:
String csvFilePath = "MyCSVfile.txt"; // assuming file is in classpath
try (BufferedReader br = new BufferedReader(new FileReader("MyCSVfile.txt"))) {
int j = 0; //used as a counter
String line = "";
while ((line = br.readLine()) != null) {
j++; //increment counter
String[] data = line.split(",");
//if the ID value contains null then apply the counter (j) value.
int id = Integer.parseInt(data[0].replace("\\N",String.valueOf(j)));
String type = data[1].replace("\\N","");
String text = data[2].replace("\\N","");
int value1 = Integer.parseInt(data[3].replace("\\N","0"));
int value2 = Integer.parseInt(data[4].replace("\\N","0"));
int value3 = Integer.parseInt(data[5].replace("\\N","0"));
System.out.println("ID is:\t\t" + id + "\nData Type is:\t" + type +
"\nText is:\t" + text + "\nValue 1 is:\t" + value1 +
"\nValue 2 is:\t" + value2 + "\nValue 3 is:\t" +
value3 + "\n");
}
br.close();
}
catch (IOException ex) {
//however you want to handle exception
}
This will handle the \N tag regardless of where it is encountered within any one of your CSV file lines.
I want to read a file and detect if the character after the symbol is a number or a word. If it is a number, I want to delete the symbol in front of it, translate the number into binary and replace it in the file. If it is a word, I want to set the characters to number 16 at first, but then, if another word is used, I want to add the 1 to the original number. Here's what I want:
If the file name reads (... represents a string that does not need to be translated):
%10
...
%firststring
...
%secondstring
...
%firststring
...
%11
...
and so on...
I want it to look like this:
0000000000001010 (10 in binary)
...
0000000000010000 (16 in binary)
...
0000000000010001 (another word was used, so 16+1 = 17 in binary)
...
0000000000010000 (16 in binary)
...
0000000000001011 (11 in binary)
And here's what I tried:
anyLines is just a string array which has the contents of the file (if I were to say System.out.println(anyLines[i]), I would the file's contents printed out).
UPDATED!
try {
ReadFile files = new ReadFile(file.getPath());
String[] anyLines = files.OpenFile();
int i;
int wordValue = 16;
// to keep track words that are already used
Map<String, Integer> wordValueMap = new HashMap<String, Integer>();
for (String line : anyLines) {
// if line doesn't begin with &, then ignore it
if (!line.startsWith("#")) {
continue;
}
// remove
line = line.substring(1);
Integer binaryValue = null;
if (line.matches("\\d+")) {
binaryValue = Integer.parseInt(line);
}
else if (line.matches("\\w+")) {
binaryValue = wordValueMap.get(line);
// if the map doesn't contain the word value, then assign and store it
if (binaryValue == null) {
binaryValue = wordValue;
wordValueMap.put(line, binaryValue);
++wordValue;
}
}
// I'm using Commons Lang's StringUtils.leftPad(..) to create the zero padded string
System.out.println(Integer.toBinaryString(binaryValue));
}
Now, I only have to replace the symbols (%10, %firststring, etc) with the binary value.
After executing this code, what I get as the output is:
1010
10000
10001
10000
1011
%10
...
%firststring
...
%secondstring
...
%firststring
...
%11
...
Now I just need to replace the %10 with 1010, the %firststring with 10000 and so on, so that the file would read like this:
0000000000001010 (10 in binary)
...
0000000000010000 (16 in binary)
...
0000000000010001 (another word was used, so 16+1 = 17 in binary)
...
0000000000010000 (16 in binary)
...
0000000000001011 (11 in binary)
Do you have any suggestions on how to make this work?
This may not be doing what you think it's doing:
int binaryValue = wordValue++;
Because you are using the post-increment operator, binary value is being assigned the old worldValue value, and then worldValue is incremented. I'd do this on two separate lines with the increment being done first:
wordValue++;
int binaryValue = wordValue; // binaryValue now gets the new value for wordValue
EDIT 1
OK, if you still need our help, I suggest you do the following:
Show us a sample of the data file so we can see what it actually looks like.
Explain the difference between the anyLines array and the lines array and how they relate to the data file. They both have Strings, and lines is obviously the result of splitting anyLines with "\n" but what again is anyLines. You state that the file is a text file, but how do you get the initial array of Strings from this text file? Is there another delimiter that you use to get this array? Have you tried to debug the code by printing out the contents of anyLines and lines?
If you need wordValue to persist with each iteration of a loop through anyLines (again, knowing what this is would help), you will need to declare and initialize it before the loop.
If you can't create and post an SSCCE, at least make your code formatting consistent and readable, something like the code below.
Have a look at the link on how to ask smart questions for more tips on information that you could give us that would help us to help you.
Sample code formatting:
try {
ReadFile files = new ReadFile(file.getPath());
String[] anyLines = files.OpenFile();
String[] anyLines = {};
int i;
// test if the program actually read the file
for (i = 0; i < anyLines.length; i++) {
String[] lines = anyLines[i].split("\n");
int wordValue = 76;
Map<String, Integer> wordValueMap = new HashMap<String, Integer>();
for (String currentLine : lines) {
if (!currentLine.startsWith("%")) {
continue;
}
currentLine = currentLine.substring(1);
Integer value;
if (currentLine.matches("\\d+")) {
value = Integer.parseInt(currentLine);
} else if (currentLine.matches("\\w+")) {
value = wordValueMap.get(currentLine);
if (value == null) {
int binaryValue = wordValue++;
wordValueMap.put(currentLine, binaryValue);
// TODO: fix below
// !! currentLine.replace(currentLine, binaryValue);
value = binaryValue;
}
} else {
System.out.println("Invalid input");
break;
}
System.out.println(Integer.toBinaryString(value));
}
}
} finally {
// Do we need a catch block? If so, catch what?
// What's supposed to go in here?
}
Luck!