openCSV parse individual columns

openCSV parse individual columns - java

I wrote a script parsing a .csv file in groovy using tokenize, which ended up not doing exactly what I needed, I am trying to use the openCSV library but I am unsure as to how I can parse out individual columns. here is my code so far:
List<String[]> rows = new CSVReader(
new InputStreamReader(getClass().classLoader.getResourceAsStream(inputFileString)))
.readAll()
rows.each { row ->
row.each { it ->
println it
}
}
and here is my input data:
1,"unknown","positive","full message","I love it."
So what I am trying to figure out is how to print select columns in said row. Also thanks in advance, I am trying to get my head around groovy/java, I come from a Ruby background.

Not sure what you mean by '...how to print select columns in said row'
But this script (for example) prints the 4th column for each row:
#Grab( 'net.sf.opencsv:opencsv:2.3' )
import au.com.bytecode.opencsv.CSVReader
// This sets example to a 2 line string
// I'm using it instead of a file, as it makes
// an easier example to follow
def example = '''1,"unknown","positive","full message","I love it."
|2,"tim","negative","whoop!","It's ok"'''.stripMargin()
List<String[]> rows = new CSVReader( new StringReader( example ) ).readAll()
rows.each {
// print the 4th column
println it[ 3 ]
}
That prints:
full message
whoop!

Related

Reading/splitting different parts from different lines in a text file in Java

I'm working on this assignment I'm supposed to read from a text file like this...
Student Name: John
Student ID: 12344/19
College: Science
Credits Attempted: 15
Credits Earned: 15
Grade Points: 41.2
Course Code Course Title Credit Grade
COMP1007, Amazing Applications of AI, 2, B
COMP2202, Fund. of Object Oriented Prog., 3, C-
MATH2108, Calculus (2), 3, C-
MATH3340, Discrete Math. for Comp. Sci., 3, B-
STAT2101, Introduction to Statistics, 4, C+
I should read this text file and calculate the GPA of the student and create an output file that should look like this...
Output text file
So basically I'm stuck and I have no idea what I to do...
I know how to read line by line and split a line into different parts, but this doesn't seem to work here since every line is different from the other. For example the first line has two parts, the "Student Name" and the name itself in this case "John". But in line 9, there are four different parts, the course code, course name, credit and grade.
I'm honestly not looking to cheat on the assignment but only to understand it
help :)
Note I can't use Stream or Hashmap or BufferedReader

Each data record in a text file always has a Start and an End. The easiest records are obviously those that are contained on a single delimited line within the text file, where each file line is in fact a record as you can see within a typical CSV format data file. The harder records to read are the Multi-Line records whereas each data record consists of several sequential text file lines but still, there is a Start and a End to each record.
The Start of a record is usually pretty easy to distinguish. For example, in the file example you provided in your post it is obviously any file line that starts with Student Name:.
The End of a record may not always be so easy to determine since many applications do not save fields which contain no data value in order to help increase access speed and reduce file bloat. The thought is "why have a text file full of empty fields" and to be honest, rightly so. I'm not a big fan of text file records anyways since utilizing a database would make far better sense for larger amounts of data. In any case, there will always be a file line that will indicate the Start of a record so it would make sense to read from Start to Start of the next record or in the case of the last record in file, from Start to End Of File (EOF).
Here is an example (read the comments in code):
// System line separator to use in files.
String ls = System.lineSeparator();
/* Array will hold student data: Student Name, Student ID, College,
Credits Attempted, Credits Earned, and finally Grade Points. */
String[] studentData = new String[6];
// String Array to hold Course Table Header Names.
String[] coursesHeader = {"COURSE NO", "COURSE TITLE", "CREDITS", "GRADE"};
// List Interface to hold all the course Data line Arrays for each record
java.util.List<String[]> cousesList = new java.util.ArrayList<>();
// Underlines to be used for Console display and file records
// Under courses Header
String underline1 = "-------------------------------------------------------------";
// Under all the courses
String underline2 = "------------------------------------------------------------------------------------";
/* Read and Write to files using 'Try With Resources' so to
automatically close the reader an writer objects. */
try (Scanner reader = new Scanner(new java.io.File("StudentData.txt"), "UTF-8");
java.io.Writer writer = new java.io.FileWriter("StudentsGPA.txt")) {
// For console display only! [Anything (except caught errors) to Console can be deleted]
System.out.println("The 'StudentsGPA.txt' file will contain:");
System.out.println("======================================");
System.out.println();
// Will hold each line read from the reader
String line = "";
/* Will hold the name for the next record. This would be the record START
but only AFTER the first record has been read. */
String newName = "";
// Start reading the 'StudentData.txt' file (line by line)...
while (reader.hasNextLine()) {
/* If newName is empty then we're on our first record or
there is only one record in file. */
if (newName.isEmpty()) {
line = reader.nextLine(); // read in a file line...
}
else {
/* newName contains a name so we must have bumped into
the START of a new record during processing of the
previous record. We aleady now have the first line
of this new record (which is the student's name line)
currently held in the 'newName' variable so we just
make 'line' equal what is in the 'newName' variable
and carry on processing the data as normal. in essance,
we simply skipped a read because we've already read it
earlier when processing the previous record. */
line = newName;
// Clear this variable in preparation for another record START.
newName = "";
}
/* Skip comment lines (lines that start with a semicolon (;)
or a hash mark (#). Also skip any blank lines. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
continue;
}
/* Does this file line start with 'Student Name:'? If so then
this is a record START, let's process this record. If not
then just keep reading the file. */
if (line.startsWith("Student Name:")) {
/* Let's put the student name into the studentData array at
index 0. If it is detected that there has been no name
applied for some reason then we place "N/A" as the name.
We use a Ternary Operator for this. So, "N/A" will be a
Default if there is not name. This will be typical for
the other portions of student data. */
studentData[0] = line.isEmpty() ? "N/A" : line.split("\\s*:\\s*")[1].trim();
/* Let's keep reading the file from this point on and retrieve
the other bits of student data to fill the studentData[]
Array... */
for (int i = 1; i < 6; i++) {
line = reader.nextLine().trim();
/* If we encounter a comment line or a blank line then let's
just skip past it. We don't want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
i--;
continue;
}
/* Store the other portions of student data into the
studentData Array using "N/A" as a default should
any student data field contain nothing. */
studentData[i] = line.isEmpty() ? "N/A" : line.split("\\s*:\\s*")[1].trim();
}
// The current Student's Courses...
/* Clear the List Interface object in preparation for new
Courses from this particular record. */
cousesList.clear();
// Read past the courses header line...We don't want it.
reader.nextLine();
// Get the courses data (line by line)...
while (reader.hasNextLine()) {
line = reader.nextLine().trim();
/* Again, if we encounter a comment line or a blank line
in this section then let's just skip past it. We don't
want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
continue;
}
/* At this point, if we have read in a line that starts
with 'Student Name:' then we just hit the START of a
NEW Record! This then means that the record we're
currently working on is now finished. Let's store this
file line into the 'newRecord' variable and then break
out of this current record read. */
if (line.startsWith("Student Name:")) {
newName = line;
break;
}
/* Well, we haven't reached the START of a New Record yet
so let's keep creating the courses list (line by line).
Break the read in course line into a String[] array.
We use the String#split() method for this with a small
Regular Expression (regex) to split each line based on
comma delimiters no matter how the delimiter spacing
might be (ex: "," " ," " , " or even " , "). */
String[] coursesData = line.split("\\s*,\\s*");
/* Add this above newly created coursesData string array
to the list. */
cousesList.add(coursesData);
}
/* Write (append) this current record to new file. The String#format()
method is used here to save the desired data into the 'StudentGPA.txt'
file in a table style format. */
// Student Data...
writer.append(String.format("%-12s: %-25s", "ID", studentData[1])).append(ls);
writer.append(String.format("%-12s: %-25s", "Name", studentData[0])).append(ls);
writer.append(String.format("%-12s: %-25s", "College", studentData[2])).append(ls);
// Student Courses...
// The Header line
writer.append(String.format("%-13s %-30s %-10s %-4s", coursesHeader[0],
coursesHeader[1], coursesHeader[2], coursesHeader[3])).append(ls);
// Apply an Underline (underline1) under the header.
writer.append(underline1).append(ls);
// Write the Courses data in a table style format to make the Header format.
for (String[] cData : cousesList) {
writer.append(String.format("%-13s %-33s %-9s %-4s",
cData[0], cData[1], cData[2], cData[3])).append(ls);
}
// Apply an Underline (underline2) under the Courses table.
writer.append(underline2).append(ls);
// Display In Console Window (you can delete this if you want)...
System.out.println(String.format("%-12s: %-25s", "ID", studentData[1]));
System.out.println(String.format("%-12s: %-25s", "Name", studentData[0]));
System.out.println(String.format("%-12s: %-25s", "College", studentData[2]));
System.out.println(String.format("%-13s %-30s %-10s %-4s", coursesHeader[0],
coursesHeader[1], coursesHeader[2], coursesHeader[3]));
System.out.println(underline1);
for (String[] cData : cousesList) {
System.out.println(String.format("%-13s %-33s %-9s %-4s",
cData[0], cData[1], cData[2], cData[3]));
}
System.out.println(underline2);
// The LAST line of each record, the Credits...
// YOU DO THE CALCULATIONS FOR: totalAttemped, semGPA, and cumGPA
String creditsAttempted = studentData[3];
String creditsEarned = studentData[4];
int credAttempted = 0;
int credEarned = 0;
int totalAttempted = 0;
double semGPA = 0.0d;
double cumGPA = 0.0d;
/* Make sure the 'credits attemted' numerical value is in fact
a string representaion of an integer value. if it is then
convert that string numerical value to integer. */
if (creditsAttempted.matches("\\d+")) {
credAttempted = Integer.valueOf(creditsAttempted);
}
/* Make sure the 'credits earned' numerical value is in fact
a string representaion of an integer value. if it is then
convert that string numerical value to integer. */
if (creditsEarned.matches("\\d+")) {
credEarned = Integer.valueOf(creditsEarned);
}
// Build the last record line (the Credits string) with the acquired data.
String creditsString = new StringBuilder("CREDITS: TOTAL.ATTEMPTED ")
.append(totalAttempted).append("? EARNED ").append(credEarned)
.append(" ATTEMPTED ").append(credAttempted).append(" SEM GPA ")
.append(semGPA).append("? CUM GPA ").append(cumGPA).append("?")
.toString();
// Display it to the console Window (you can delete this).
System.out.println(creditsString);
System.out.println();
// Write the built 'credit string' to file which finishes this record.
writer.append(creditsString).append(ls);
writer.append(ls); // Blank Line in preparation for next record.
writer.flush(); // Flush the data buffer - write record to disk NOW.
}
}
}
// Trap Errors...Do whatever you want with these.
catch (FileNotFoundException ex) {
System.err.println("File Not Found!\n" + ex.getMessage());
}
catch (IOException ex) {
System.err.println("IO Error Encountered!\n" + ex.getMessage());
}
Yes, it looks long but if you get rid of all the comments you can see that it really isn't. Don't be afraid to experiment with the code. Make it do what you want.
EDIT: (as per comments)
To place the student info portion of each record into an ArrayList so that you can parse it the way you want:
Where the forloop is located within the example code above for gathering the student info, just change this loop to this code and parse the data the way you want:
// Place this ArrayList declaration underneath the 'underline2' variable declaration:
java.util.ArrayList<String> studentInfo = new java.util.ArrayList<>();
then:
if (line.startsWith("Student Name:")) {
studentInfo.clear();
studentInfo.add(line);
/* Let's keep reading the file from this point on and retrieve
the other bits of student data to fill the studentData[]
Array... */
for (int i = 1; i < 6; i++) {
line = reader.nextLine().trim();
/* If we encounter a comment line or a blank line then let's
just skip past it. We don't want these. */
if (line.startsWith(";") || line.startsWith("#") || line.isEmpty()) {
i--;
continue;
}
studentInfo.add(line);
}
// .................................................
// .... The rest of the code for this `if` block ...
// .................................................
}
You will of course need to change the code after this loop to properly represent this ArrayList.

OK, so here's how you do it ...
You read in all of the file and store each line in a List<String>
For the first 8 lines you process each one in a separate way. You can even write a separate function to parse the necessary info out of every line for lines 0-7
All the remaining lines have identical structure. Therefore, you can process them all in the same way to parse out and then process the necessary data.
And a comment to this answer if something is unclear and I'll clarify.

String split from a CSV - Java

I am having a small problem, I hope you can help.
I am reading a CSV in java, in which one of the column has string as follows:
a. "123345"
b. "12345 - 67890"
I want to split this like(Split it into two separate columns):
a. "123345", ""
b. "12345","67890"
Now, when I am using Java's default split function, it splits the string as follows:
a. "123345,"
b. "12345,67890" (Which is basically a string)
Any idea how can I achieve this? I have wasted my 3 hours on this. Hope any one can help.
Code as follows:
while ((line = bufferReader.readLine()) != null)
{
main = line.split("\\,"); //splitting the CSV using ","
//I know that column # 13 is the column where I can find digits like "123-123" etc..
therefore I have hard coded it.
if (main[12].contains("-"))
{
temp = main[12].split("-");
//At this point, when I print temp, it still shows me a string.
//What I have to do is to write them to the csv file.
E.g: ["86409, 3567"] <--Problem here!
}
else
{
//do nothing
}
}
after this, i will write the main[] array to the file.

Please check if java.util.StringTokenizer helps
Example:
StringTokenizer tokenizer = new StringTokenizer(inputString, ";")
Manual: StringTokenizer docs

Get the list of object containing text matching a pattern

I'm currently working with the API Apache POI and I'm trying to edit a Word document with it (*.docx). A document is composed by paragraphs (in XWPFParagraph objects) and a paragraph contains text embedded in 'runs' (XWPFRun). A paragraph can have many runs (depending on the text properties, but it's sometimes random). In my document I can have specific tags which I need to replace with data (all my tags follows this pattern <#TAG_NAME#>)
So for example, if I process a paragraph containing the text Some text with a tag <#SOMETAG#>, I could get something like this
XWPFParagraph paragraph = ... // Get a paragraph from the document
System.out.println(paragraph.getText());
// Prints: Some text with a tag <#SOMETAG#>
But if I want to edit the text of that paragraph I need to process the runs and the number of runs is not fixed. So if I show the content of runs with that code:
System.out.println("Number of runs: " + paragraph.getRuns().size());
for (XWPFRun run : paragraph.getRuns()) {
System.out.println(run.text());
}
Sometimes it can be like this:
// Output:
// Number of runs: 1
// Some text with a tag <#SOMETAG#>
And other time like this
// Output:
// Number of runs: 4
// Some text with a tag
// <#
// SOMETAG
// #>
What I need to do is to get the first run containing the start of the tag and the indexes of the following runs containing the rest of the tag (if the tag is divided in many runs). I've managed to get a first version of that algorithm but it only works if the beginning of the tag (<#) and the end of the tag (#>) aren't divided. Here's what I've already done.
So what I would like to get is an algorithm capable to manage that problem and if possible get it work with any given tag (not necessarily <# and #>, so I could replace with something like this {{{ and this }}}).
Sorry if my English isn't perfect, don't hesitate to ask me to clarify any point you want.

Finally I found the answer myself, I totally changed my way of thinking my original algorithm (I commented it so it might help someone who could be in the same situation I was)
// Before using the function, I'm sure that:
// paragraph.getText().contains(surroundedTag) == true
private void editParagraphWithData(XWPFParagraph paragraph, String surroundedTag, String replacement) {
List<Integer> runsToRemove = new LinkedList<Integer>();
StringBuilder tmpText = new StringBuilder();
int runCursor = 0;
// Processing (in normal order) the all runs until I found my surroundedTag
while (!tmpText.toString().contains(surroundedTag)) {
tmpText.append(paragraph.getRuns().get(runCursor).text());
runsToRemove.add(runCursor);
runCursor++;
}
tmpText = new StringBuilder();
// Processing back (in reverse order) to only keep the runs I need to edit/remove
while (!tmpText.toString().contains(surroundedTag)) {
runCursor--;
tmpText.insert(0, paragraph.getRuns().get(runCursor).text());
}
// Edit the first run of the tag
XWPFRun runToEdit = paragraph.getRuns().get(runCursor);
runToEdit.setText(tmpText.toString().replaceAll(surroundedTag, replacement), 0);
// Forget the runs I don't to remove
while (runCursor >= 0) {
runsToRemove.remove(0);
runCursor--;
}
// Remove the unused runs
Collections.reverse(runsToRemove);
for (Integer runToRemove : runsToRemove) {
paragraph.removeRun(runToRemove);
}
}
So now I'm processing all runs of the paragraph until I found my surrounded tag, then I'm processing back the paragraph to ignore the first runs if I don't need to edit them.

Not Displaying Array List properly?

I only have one arrayList and I want the out put to print in table format I know with Arrays you would need to use a nested for loop one for the rows and the other for the columns, How would I be able to have my output be in a table format when using arrayList my for loop:
System.out.print("Inv/Mo.\tRate\tYears\tFuture Value\n");
for (int i = 0; i < FutureValueArrayList.size(); i++)
{
String FutureValueArray = FutureValueArrayList.get(i);
System.out.print(FutureValueArray + "\t");
}
my for loop gives me an output like this:
$100.00 2.0% 2 $2,450.64 $150.00 2.0% 2 $36,420.71
The bold values are a second entry by the user. How do I get it to display on the second line and for every new entry of values it outputs it line by line as opposed to everything in one line? I tried print/println and it still out puts everything in the first line.

Use Guava library where is Joiner.on(" ").join(arraylist); it produces nicely formatted output, you can define even custom iterators or filters in guava.
https://code.google.com/p/guava-libraries/wiki/StringsExplained#Joiner
To format output better you can use stringObject.replace(x,y); which allows you to replace symbols by e.g. \n - new line or you can add more spaces ...
String str = sentence.replace("and", " ");

What approach to use for parsing a file with fixed length records, when the record layout isn't known until runtime?

I want to parse a file based on a record layout provided in another file.
Basically there will be a definition file, which is a comma delimited list of fields and their respective lengths. There will be many of these, a new one will be loaded each time I run the program.
firstName,text,20
middleInitial,text,1
lastName,text,20
salary,number,10
Then I will display a blank table with the supplied column headings, and an option to add data by clicking a button or whatever - I haven't decided yet.
I also want to have an option to both load data from a file, or save data to a file, with the file matching the format described in the definition file.
For example, a file to load (or one produced by the save function) for the above definition file might look like this.
Adam DSmith 50000
Brent GWilliams 45000
Harry TThompson 47500
What kind of patterns could be useful here, and can anyone give me pointers of a rough guide on how to structure the way data is internally stored and modeled.
I would like to think I can find my way around the java documentation alright, but if anyone can point me at somewhere to start looking, it would be greatly appreciated!
Thanks

So it sounds like to me that you have a howToParse file and infoToParse file with the directions of how to parse information and the information to parse in these files respectively.
First, I would read in the howToParse file and create some sort of dynamic Parser object. It looks like each line in this file is a different ParsingStep object. Then you just need to read the line which will be stored as a String object and just split the ParsingStep into its 3 parts: field name, type of data, length of data.
// Create new parser to hold parsing steps.
Parser dynamicParser = new Parser();
// Create new scanner to read through parse file.
Scanner parseFileScanner = new Scanner(howToParseFileName);
// *** Add exception handling as necessary *** this is just an example
// Read till end of file.
while (parseFileScanner.hasNext()) {
String line = parseFileScanner.nextLine(); // Get next line in file.
String[] lineSplit = line.split(","); // Split on comma
String fieldName = lineSplit[0];
String dataType = lineSplit[1];
String dataLength = lineSplit[2]; // Convert to Integer with Integer.parseInt();
ParsingStep step = new ParsingStep(fieldName, dataType, dataLength);
dynamicParser.addStep(step);
}
parseFileScanner.close();
Then you would have how to parse a line, then you just need to parse through the other file and store the information from that file, probably in an array.
// Open infoToParse file and start reading.
Scanner infoScanner = new Scanner(infoToParseFileName);
// Add exception handling.
while (infoScanner.hasNext()) {
String line = infoScanner.nextLine();
// Parse line and return a Person object or maybe just a Map of field names to values
Map<String,String> personMap = dynamicParser.parse(line);
}
infoScanner.close();
Then the only other code is just making sure the parser is parsing in the correct order.
public class Parser {
private ArrayList<ParsingStep> steps;
public Parser() {
steps = new ArrayList<ParsingStep>();
}
public void addStep(ParsingStep step) {
steps.add(step);
}
public Map<String,String> parse(String line) {
String remainingLine = line;
for (ParsingStep step : steps) {
remainingLine = step.parse(remainingLine);
}
return map; // Somehow convert to map.
}
}
Personally, I would add some error checking in the parse steps just in case the infoToParse file is not in the proper format.
Hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

openCSV parse individual columns - java

Related

Reading/splitting different parts from different lines in a text file in Java

String split from a CSV - Java

Get the list of object containing text matching a pattern

Not Displaying Array List properly?

What approach to use for parsing a file with fixed length records, when the record layout isn't known until runtime?

Categories

Resources