I have a csv file the format is like this:
I have a another csv file which is the same number of columns and rows.
I need to check if file 1 has same value (value[0]) as file 2 and if not copy value from file 2.
Below is the code I have written, but when checking, if file 1 first row value is not equal to the row, I need to go and check the next row of file 2 without exiting the if statement.
while ((line = br4.readLine()) != null){
while ((line5 = br5.readLine()) != null){
String[] values = line.split(",");
String[] values5 = line5.split(",");
fw5.append("0").append('\n');
String comp2 = values[0];
String comp1 = values5[0];
if (values5[0] == null ? values[0] == null : values5[0].equals(values[0]))
{
fw6.append(values[0]).append("mad men ").append('\n');
}
else if ( values5[0] == null ? (values[0]) != null : !values5[0].equals(values[0])){
System.out.println("value is " +values5[0]);
fw6.append(values5[0]).append("mad women").append('\n');
fw6.flush();
}
break;
}
}
You are facing a typical newbie problem: insufficient abstractions.
You try to solve your whole problem in one method: instead create helpful abstractions. Like this:
first you create a class representing that row data. You might pass a string (content of one row) to the constructor. Then that class splits the string and puts all entries into meaningful named fields (instead of using an array named values which does not say anything).
then you add methods to that class to compare two instances of the class (could be the equals method or something you define on your own).
and while doing all of that you keep testing the code as you write.
then when you can parse that line of text and compare it as desired - then you add the code to read lines from your files. You read all lines, create objects and update them as required.
and finally you write code that writes updated objects back into file.
Long story short: slice your big problem into smaller ones and solve them one after the other.
Related
I'm working on this assignment I'm supposed to read from a text file like this...
Student Name: John
Student ID: 12344/19
College: Science
Credits Attempted: 15
Credits Earned: 15
Grade Points: 41.2
Course Code Course Title Credit Grade
COMP1007, Amazing Applications of AI, 2, B
COMP2202, Fund. of Object Oriented Prog., 3, C-
MATH2108, Calculus (2), 3, C-
MATH3340, Discrete Math. for Comp. Sci., 3, B-
STAT2101, Introduction to Statistics, 4, C+
I should read this text file and calculate the GPA of the student and create an output file that should look like this...
Output text file
So basically I'm stuck and I have no idea what I to do...
I know how to read line by line and split a line into different parts, but this doesn't seem to work here since every line is different from the other. For example the first line has two parts, the "Student Name" and the name itself in this case "John". But in line 9, there are four different parts, the course code, course name, credit and grade.
I'm honestly not looking to cheat on the assignment but only to understand it
help :)
Note I can't use Stream or Hashmap or BufferedReader
Each data record in a text file always has a Start and an End. The easiest records are obviously those that are contained on a single delimited line within the text file, where each file line is in fact a record as you can see within a typical CSV format data file. The harder records to read are the Multi-Line records whereas each data record consists of several sequential text file lines but still, there is a Start and a End to each record.
The Start of a record is usually pretty easy to distinguish. For example, in the file example you provided in your post it is obviously any file line that starts with Student Name:.
The End of a record may not always be so easy to determine since many applications do not save fields which contain no data value in order to help increase access speed and reduce file bloat. The thought is "why have a text file full of empty fields" and to be honest, rightly so. I'm not a big fan of text file records anyways since utilizing a database would make far better sense for larger amounts of data. In any case, there will always be a file line that will indicate the Start of a record so it would make sense to read from Start to Start of the next record or in the case of the last record in file, from Start to End Of File (EOF).
Here is an example (read the comments in code):
// System line separator to use in files.
String ls = System.lineSeparator();
/* Array will hold student data: Student Name, Student ID, College,
Credits Attempted, Credits Earned, and finally Grade Points. */
String[] studentData = new String[6];
// String Array to hold Course Table Header Names.
String[] coursesHeader = {"COURSE NO", "COURSE TITLE", "CREDITS", "GRADE"};
// List Interface to hold all the course Data line Arrays for each record
java.util.List<String[]> cousesList = new java.util.ArrayList<>();
// Underlines to be used for Console display and file records
// Under courses Header
String underline1 = "-------------------------------------------------------------";
// Under all the courses
String underline2 = "------------------------------------------------------------------------------------";
/* Read and Write to files using 'Try With Resources' so to
automatically close the reader an writer objects. */
try (Scanner reader = new Scanner(new java.io.File("StudentData.txt"), "UTF-8");
java.io.Writer writer = new java.io.FileWriter("StudentsGPA.txt")) {
// For console display only! [Anything (except caught errors) to Console can be deleted]
System.out.println("The 'StudentsGPA.txt' file will contain:");
System.out.println("======================================");
System.out.println();
// Will hold each line read from the reader
String line = "";
/* Will hold the name for the next record. This would be the record START
but only AFTER the first record has been read. */
String newName = "";
// Start reading the 'StudentData.txt' file (line by line)...
while (reader.hasNextLine()) {
/* If newName is empty then we're on our first record or
there is only one record in file. */
if (newName.isEmpty()) {
line = reader.nextLine(); // read in a file line...
}
else {
/* newName contains a name so we must have bumped into
the START of a new record during processing of the
previous record. We aleady now have the first line
of this new record (which is the student's name line)
currently held in the 'newName' variable so we just
make 'line' equal what is in the 'newName' variable
and carry on processing the data as normal. in essance,
we simply skipped a read because we've already read it
earlier when processing the previous record. */
line = newName;
// Clear this variable in preparation for another record START.
newName = "";
}
/* Skip comment lines (lines that start with a semicolon (;)
or a hash mark (#). Also skip any blank lines. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
continue;
}
/* Does this file line start with 'Student Name:'? If so then
this is a record START, let's process this record. If not
then just keep reading the file. */
if (line.startsWith("Student Name:")) {
/* Let's put the student name into the studentData array at
index 0. If it is detected that there has been no name
applied for some reason then we place "N/A" as the name.
We use a Ternary Operator for this. So, "N/A" will be a
Default if there is not name. This will be typical for
the other portions of student data. */
studentData[0] = line.isEmpty() ? "N/A" : line.split("\\s*:\\s*")[1].trim();
/* Let's keep reading the file from this point on and retrieve
the other bits of student data to fill the studentData[]
Array... */
for (int i = 1; i < 6; i++) {
line = reader.nextLine().trim();
/* If we encounter a comment line or a blank line then let's
just skip past it. We don't want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
i--;
continue;
}
/* Store the other portions of student data into the
studentData Array using "N/A" as a default should
any student data field contain nothing. */
studentData[i] = line.isEmpty() ? "N/A" : line.split("\\s*:\\s*")[1].trim();
}
// The current Student's Courses...
/* Clear the List Interface object in preparation for new
Courses from this particular record. */
cousesList.clear();
// Read past the courses header line...We don't want it.
reader.nextLine();
// Get the courses data (line by line)...
while (reader.hasNextLine()) {
line = reader.nextLine().trim();
/* Again, if we encounter a comment line or a blank line
in this section then let's just skip past it. We don't
want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
continue;
}
/* At this point, if we have read in a line that starts
with 'Student Name:' then we just hit the START of a
NEW Record! This then means that the record we're
currently working on is now finished. Let's store this
file line into the 'newRecord' variable and then break
out of this current record read. */
if (line.startsWith("Student Name:")) {
newName = line;
break;
}
/* Well, we haven't reached the START of a New Record yet
so let's keep creating the courses list (line by line).
Break the read in course line into a String[] array.
We use the String#split() method for this with a small
Regular Expression (regex) to split each line based on
comma delimiters no matter how the delimiter spacing
might be (ex: "," " ," " , " or even " , "). */
String[] coursesData = line.split("\\s*,\\s*");
/* Add this above newly created coursesData string array
to the list. */
cousesList.add(coursesData);
}
/* Write (append) this current record to new file. The String#format()
method is used here to save the desired data into the 'StudentGPA.txt'
file in a table style format. */
// Student Data...
writer.append(String.format("%-12s: %-25s", "ID", studentData[1])).append(ls);
writer.append(String.format("%-12s: %-25s", "Name", studentData[0])).append(ls);
writer.append(String.format("%-12s: %-25s", "College", studentData[2])).append(ls);
// Student Courses...
// The Header line
writer.append(String.format("%-13s %-30s %-10s %-4s", coursesHeader[0],
coursesHeader[1], coursesHeader[2], coursesHeader[3])).append(ls);
// Apply an Underline (underline1) under the header.
writer.append(underline1).append(ls);
// Write the Courses data in a table style format to make the Header format.
for (String[] cData : cousesList) {
writer.append(String.format("%-13s %-33s %-9s %-4s",
cData[0], cData[1], cData[2], cData[3])).append(ls);
}
// Apply an Underline (underline2) under the Courses table.
writer.append(underline2).append(ls);
// Display In Console Window (you can delete this if you want)...
System.out.println(String.format("%-12s: %-25s", "ID", studentData[1]));
System.out.println(String.format("%-12s: %-25s", "Name", studentData[0]));
System.out.println(String.format("%-12s: %-25s", "College", studentData[2]));
System.out.println(String.format("%-13s %-30s %-10s %-4s", coursesHeader[0],
coursesHeader[1], coursesHeader[2], coursesHeader[3]));
System.out.println(underline1);
for (String[] cData : cousesList) {
System.out.println(String.format("%-13s %-33s %-9s %-4s",
cData[0], cData[1], cData[2], cData[3]));
}
System.out.println(underline2);
// The LAST line of each record, the Credits...
// YOU DO THE CALCULATIONS FOR: totalAttemped, semGPA, and cumGPA
String creditsAttempted = studentData[3];
String creditsEarned = studentData[4];
int credAttempted = 0;
int credEarned = 0;
int totalAttempted = 0;
double semGPA = 0.0d;
double cumGPA = 0.0d;
/* Make sure the 'credits attemted' numerical value is in fact
a string representaion of an integer value. if it is then
convert that string numerical value to integer. */
if (creditsAttempted.matches("\\d+")) {
credAttempted = Integer.valueOf(creditsAttempted);
}
/* Make sure the 'credits earned' numerical value is in fact
a string representaion of an integer value. if it is then
convert that string numerical value to integer. */
if (creditsEarned.matches("\\d+")) {
credEarned = Integer.valueOf(creditsEarned);
}
// Build the last record line (the Credits string) with the acquired data.
String creditsString = new StringBuilder("CREDITS: TOTAL.ATTEMPTED ")
.append(totalAttempted).append("? EARNED ").append(credEarned)
.append(" ATTEMPTED ").append(credAttempted).append(" SEM GPA ")
.append(semGPA).append("? CUM GPA ").append(cumGPA).append("?")
.toString();
// Display it to the console Window (you can delete this).
System.out.println(creditsString);
System.out.println();
// Write the built 'credit string' to file which finishes this record.
writer.append(creditsString).append(ls);
writer.append(ls); // Blank Line in preparation for next record.
writer.flush(); // Flush the data buffer - write record to disk NOW.
}
}
}
// Trap Errors...Do whatever you want with these.
catch (FileNotFoundException ex) {
System.err.println("File Not Found!\n" + ex.getMessage());
}
catch (IOException ex) {
System.err.println("IO Error Encountered!\n" + ex.getMessage());
}
Yes, it looks long but if you get rid of all the comments you can see that it really isn't. Don't be afraid to experiment with the code. Make it do what you want.
EDIT: (as per comments)
To place the student info portion of each record into an ArrayList so that you can parse it the way you want:
Where the forloop is located within the example code above for gathering the student info, just change this loop to this code and parse the data the way you want:
// Place this ArrayList declaration underneath the 'underline2' variable declaration:
java.util.ArrayList<String> studentInfo = new java.util.ArrayList<>();
then:
if (line.startsWith("Student Name:")) {
studentInfo.clear();
studentInfo.add(line);
/* Let's keep reading the file from this point on and retrieve
the other bits of student data to fill the studentData[]
Array... */
for (int i = 1; i < 6; i++) {
line = reader.nextLine().trim();
/* If we encounter a comment line or a blank line then let's
just skip past it. We don't want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
i--;
continue;
}
studentInfo.add(line);
}
// .................................................
// .... The rest of the code for this `if` block ...
// .................................................
}
You will of course need to change the code after this loop to properly represent this ArrayList.
OK, so here's how you do it ...
You read in all of the file and store each line in a List<String>
For the first 8 lines you process each one in a separate way. You can even write a separate function to parse the necessary info out of every line for lines 0-7
All the remaining lines have identical structure. Therefore, you can process them all in the same way to parse out and then process the necessary data.
And a comment to this answer if something is unclear and I'll clarify.
I have created previously a CSV file with textedit called titanicLinearisedDataSet.csv. My goal is to access this file using processing 3 and check whether element in a column are equal to string value "Nil". I don't receive result whereas the csv file contains "Nil"s.
I have joint an image of the CSV file
(CSV_file_image.jpg).
Thanks for your help !
String [][] array;
void setup() {
String [] lines = loadStrings("titanicLinearisedDataSet.csv");
array = new String[lines.length][3];
int i = 0;
for(String line: lines){
String [] pieces = split(line,",");
if(pieces[3] == "Nil"){
println("It worked");
}
}
You should not use == to compare String values. You should use the equals() function instead:
if(pieces[3].equals("Nil")){
println("It worked");
}
From the reference:
To compare the contents of two Strings, use the equals() method, as in if (a.equals(b)), instead of if (a == b). A String is an Object, so comparing them with the == operator only compares whether both Strings are stored in the same memory location. Using the equals() method will ensure that the actual contents are compared.
Also watch your curly brackets. The code you posted seems to be missing one.
If that doesn't work, try to get into the habit of debugging your code. For example, try printing out the values of pieces and pieces[3] to see exactly what's going on.
I am having a small problem, I hope you can help.
I am reading a CSV in java, in which one of the column has string as follows:
a. "123345"
b. "12345 - 67890"
I want to split this like(Split it into two separate columns):
a. "123345", ""
b. "12345","67890"
Now, when I am using Java's default split function, it splits the string as follows:
a. "123345,"
b. "12345,67890" (Which is basically a string)
Any idea how can I achieve this? I have wasted my 3 hours on this. Hope any one can help.
Code as follows:
while ((line = bufferReader.readLine()) != null)
{
main = line.split("\\,"); //splitting the CSV using ","
//I know that column # 13 is the column where I can find digits like "123-123" etc..
therefore I have hard coded it.
if (main[12].contains("-"))
{
temp = main[12].split("-");
//At this point, when I print temp, it still shows me a string.
//What I have to do is to write them to the csv file.
E.g: ["86409, 3567"] <--Problem here!
}
else
{
//do nothing
}
}
after this, i will write the main[] array to the file.
Please check if java.util.StringTokenizer helps
Example:
StringTokenizer tokenizer = new StringTokenizer(inputString, ";")
Manual: StringTokenizer docs
I have the following issue: I am trying to parse a .csv file in java, and store specifically 3 columns of it in a 2 Dimensional array. The Code for the method looks like this:
public static void parseFile(String filename) throws IOException{
FileReader readFile = new FileReader(filename);
BufferedReader buffer = new BufferedReader(readFile);
String line;
String[][] result = new String[10000][3];
String[] b = new String[6];
for(int i = 0; i<10000; i++){
while((line = buffer.readLine()) != null){
b = line.split(";",6);
System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
result[i][0] = b[0];
result[i][1] = b[3];
result[i][2] = b[4];
}
}
buffer.close();
}
I feel like I have to specify this: the .csv file is HUGE. It has 32 columns, and (almost) 10.000 entries (!).
When Parsing, I keep getting the following:
XXXXX CHUNKS OF SUCCESFULLY EXTRACTED CODE
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:3
at ParseCSV.parseFile(ParseCSV.java:24)
at ParseCSV.main(ParseCSV.java:41)
However, I realized that SOME of the stuff in the file has a strange format e.g. some of the texts inside it for instance have newlines in them, but there is no newline character involved in any way. However, if I delete those blank lines manually, the output generated (before the error message is prompted) adds the stuff to the array up until the next blank line ...
Does anyone have an idea how to fix this? Any help would be greately appreciated...
Your first problem is that you probably have at least one blank line in your csv file. You need to replace:
b = line.split(";", 6);
with
b = line.split(";");
if(b.length() < 5){
System.err.println("Warning, line has only " + b.length() +
"entries, so skipping it:\n" + line);
continue;
}
If your input can legitimately have new lines or embedded semi-colons within your entries, that is a more complex parsing problem, and you are probably better off using a third-party parsing library, as there are several very good ones.
If your input is not supposed to have new lines in it, the problem probably is \r. Windows uses \r\n to represent a new line, while most other systems just use \n. If multiple people/programs edited your text file, it is entirely possible to end up with stray \r by themselves, which are not easily handled by most parsers.
A way to easily check if that's your problem is before you split your line, do
line = line.replace("\r","").
If this is a process you are repeating many times, you might need to consider using a Scanner (or library) instead to get more efficient text processing. Otherwise, you can make do with this.
When you have new lines in your CSV file, after this line
while((line = buffer.readLine()) != null){
variable line will have not a CSV line but just some text without ;
For example, if you have file
column1;column2;column
3 value
after first iteration variable line will have
column1;column2;column
after second iteration it will have
3 value
when you call "3 value".split(";",6) it will return array with one element. and later when you call b[3] it will throw exception.
CSV format has many small things, to implement which you will spend a lot of time. This is a good article about all possible csv examples
http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules_and_examples
I would recommend to you some ready CSV parsers like this
https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVParser.html
String's split(pattern, limit) method returns an array sized to the number of tokens found up to the the number specified by the limit parameter. Limit is the maximum, not the minimum number of array elements returned.
"1,2,3" split with (",", 6) with return an array of 3 elements: "1", "2" and "3".
"1,2,3,4,5,6,7" will return 6 elements: "1", "2", "3", "4", "5" and ""6,7" The last element is goofy because the split method stopped splitting after 5 and returned the rest of the source string as the sixth element.
An empty line is represented as an empty string (""). Splitting "" will return an array of 1 element, the empty string.
In your case, the string array created here
String[] b = new String[6];
and assigned to b is replaced by the the array returned by
b = line.split(";",6);
and meets it's ultimate fate at the hands of the garbage collector unseen and unloved.
Worse, in the case of the empty lines, it's replaced by a one element array, so
System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]);
blows up when trying to access b[3].
Suggested solution is to either
while((line = buffer.readLine()) != null){
if (line.length() != 0)
{
b = line.split(";",6);
System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
...
}
or (better because the previous could trip over a malformed line)
while((line = buffer.readLine()) != null){
b = line.split(";",6);
if (b.length() == 6)
{
System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
...
}
You might also want to think about the for loop around the while. I don't think it's doing you any good.
while((line = buffer.readLine()) != null)
is going to read every line in the file, so
for(int i = 0; i<10000; i++){
while((line = buffer.readLine()) != null){
is going to read every line in the file the first time. Then it going to have 9999 attempts to read the file, find nothing new, and exit the while loop.
You are not protected from reading more than 10000 elements because the while loop because the while loop will read a 10001th element and overrun your array if there are more than 10000 lines in the file. Look into replacing the big array with an arraylist or vector as they will size to fit your file.
Please check b.length>0 before accessing b[].
I am parsing a .csv file that is tab delimited. As you can see, there are arrows in place of the tabs; this is because I have enabled the "Show all characters" option in my notepad.
AAA->BBB->CCC->CRLF
agf->jui->kje->aweCRLF
bvg->qaz->plm->yhbCRLF
Now I am am using csv beans 0.7 parser to parse this .csv file and I am getting the objecTS for each columns like this
if(f.getAAA() && f.getBBB() && f.getCCC() && f.getDDD()) // IT IS GETTING THE VALUE OF ROW1 agfjuikjeawe
{ }
Now as this .csv file is received from backend, it's possible that the value of any column could also be null, as shown below
AAA->BBB->CCC->CRLF
agf->->kje->aweCRLF
bvg->qaz->plm->yhbCRLF
I am putting a condition like this to check for null values, but, as you can see, if the value is not there but tab is, then my condition to check this is correct
if(f.getAAA()!=null && f.getBBB()!=null && f.getCCC()!=null && f.getDDD()!=null)
{ }
I don't know the library you're using, but with plain java you could use the split(string) method of String in each of the lines of the csv file as this:
final String s = "abc\tefg\thij";
final String [] values = s.split("\t");
System.out.println(Arrays.toString(values));
EDIT: Where the "\t" argument of the method is the special character of the tab. Then, you could check the elements in the array to look for an empty string, that would be your null value.