Having trouble with parsing csv files in Java - java

I'm trying to parse a folder of csv files (balance sheets), and have everythings gone smoothly up until I tried to separate the row names from the values.
It looks like the last cell on the previous row is combining with the first cell (the row name in column A) in the next row.
File path = new File("/Users/Zack/Desktop/JavaDB/BALANCESHEETS");
for(File file: path.listFiles()) {
if (file.isFile()) {
String fileName = file.getName();
String ticker = fileName.split("\\_")[0];
if (ticker.equals("ASB") || ticker.equals("FRC")) {
if (ticker.equals("ASB")) {
ticker = ticker + "PRD";
}
if (ticker.equals("FRC")) {
ticker = ticker + "PRD";
}
}
Reader reader = new BufferedReader(new FileReader(file));
StringBuilder builder = new StringBuilder();
int c;
while ((c = reader.read()) != -1) {
builder.append((char) c);
}
String string = builder.toString();
ArrayList<String> stringResult = new ArrayList<String>();
if (string != null) {
String[] splitData = string.split("\\s*,\\s*");
for (int i = 0; i <splitData.length; i++) {
if (!(splitData[i] == null) || !(splitData[i].length() ==0)) {
stringResult.add(splitData[i].trim());
}
}
}
for (int i = 0; i < stringResult.size(); i++) {
int cL = stringResult.get(i).length();
for (int x = 0; x < cL; x++) {
if (Character.isLetter(stringResult.get(i).charAt(x))) {
System.out.println("index: " + i);
System.out.println(stringResult.get(i));
break;
}
}
}
Here are some photos of what's happening
https://postimg.org/image/a9qc1qggz/
https://postimg.org/image/mvna7p7s3/
Any idea on how to fix this?
I also noticed there is a space in front of the row names in the spreadsheets, which I suspect may be part of the problem.

The problem is coming from where you are reading in the file, here:
Reader reader = new BufferedReader(new FileReader(file));
StringBuilder builder = new StringBuilder();
int c;
while ((c = reader.read()) != -1) {
builder.append((char) c);
}
String string = builder.toString();
This reads all the characters into a single string, including the new line character(s). When you then split the string, you are not splitting on the new line character(s) and so you end up with what you are seeing.
As mentioned but others I strongly urge you to use one of the many csv parsers that already exist.
The simple (but ugly) fix would be to also split on newlines. A better fix would be to use the readLine() method of the BufferedReader.
Also != is your friend.

As Erwin stated in the comments, your Pattern that you are splitting on just looks for commas with whitespace around them. It looks like you know what format your data will be in since you know that the data will be separated by either whitespace comma whitespace or a newline. Seems to me you just need to change your input to "\\s*,\\s*|$", which is the regex that says that. Like has been mentioned you need to know beforehand that the data doesn't include whitespace comma whitespace in any of the fields or this breaks.

Related

Reading semicolon delimited csv

I have the below block of code which uses OpenCSV to read a CSV file and store the 7th column. The problem I face is that I use ; as delimiter in the CSV file but it takes , as delimiter as well. How can I avoid this?
Putting "" in CSV is not possible, since we get a non-editable file from client.
CSVReader reader = null;
String[] nextCsvLine = new String[50];
String splitBy = ";";
int count = 0;
try {
StringReader sr = new StringReader(new String(in, offset, len));
reader = new CSVReader(sr);
while ((nextCsvLine = reader.readNext()) != null) {
for (String linewithsemicolon : nextCsvLine) {
log.debug("Line read : "+linewithsemicolon);
String[] b = linewithsemicolon.split(splitBy);
if (count==0){
count++;
continue;
}
else {
detailItems.add(b[7]);
log.debug("7th position: "+b[7]);
count++;
}
}
Use the overloaded version with separator of OpenCSV
CSVReader(reader, ';')
Update (thanks to #Matt) - better use:
CSVReaderBuilder(reader)
.withCSVParser(new CSVParserBuilder()
.withSeparator(';')
.build()
).build()
I think the counting was done a bit wrong:
try (CSVReader reader = new CSVReader(sr, ';')) {
String[] nextCsvLine;
while ((nextCsvLine = reader.readNext()) != null) {
int count = 0;
for (String field: nextCsvLine) {
log.debug("Line read : "+linewithsemicolon);
if (count == 6) { // 7th column
detailItems.add(field);
log.debug("7th position: " + field);
}
count++;
}
}
Instead the for-loop you could have done:
if (nextCsvLine.length > 6) {
detailItems.add(nextCsvLine[6]);
}
Where the seventh field should have index 6.

Scanning a text file into an array and omitting one specified line

I'm a beginner and need some help. I'm trying to scan a text file into an array line by line, but omitting one line. My text file is
i am
you are
he is
she is
it is
I want to create a method that will scan this and put elements into an array with an exception for one line (that is chosen by entering the String as a parameter for the method). Then erase the original text file and print there the created array (without that one deleted line). Sorry, I suck at explaining.
I have tried this:
public static void deleteLine(String name, String line) throws IOException {
String sc = System.getProperty("user.dir") + new File("").separator;
FileReader fr = new FileReader(sc + name + ".txt");
Scanner scan = new Scanner(fr);
int n = countLines(name); // a well working method returning the number if lines in the file (here 5)
String[] listArray = new String[n-1];
for (int i = 0; i < n-1; i++) {
if (scan.hasNextLine() && !scan.nextLine().equals(line))
listArray[i] = scan.nextLine();
else if (scan.hasNextLine() && scan.nextLine().equals(line))
i--;
else continue;
}
PrintWriter print = new PrintWriter(sc + name + ".txt");
print.write("");
for (int i = 0; i < n-2; i++) {
print.write(listArray[i] + "\n");
}
print.close()
}
I get an error "Line not found" when I enter: deleteLine("all_names","you are") (all_names is the name of the file). I'm sure the problem lies in the for-loop, but I have no idea why this doesn't work. :(
//SOLVED//
This code worked after all. Thanks for answers!
public static void deleteLine(String name, String line) throws IOException{
String sc = System.getProperty("user.dir") + new File("").separator;
FileReader fr = null;
fr = new FileReader(sc+name+".txt");
Scanner scan = new Scanner(fr);
int n = LineCounter(name);
String[] listArray = new String[n-1];
for (int i = 0; i < n-1; i++) {
if (scan.hasNextLine()) {
String nextLine = scan.nextLine();
if (!nextLine.equals(line)) {
listArray[i] = nextLine;
}
else i--;
}
}
PrintWriter print = new PrintWriter(sc+name+".txt");
print.write("");
for(int i=0;i<n-1;i++){
print.write(listArray[i]+System.lineSeparator());
}
print.close();
}
You are reading the lines twice scan.nextLine() while comparing and because of that you run out of the lines.
Replace your loop with this one or similar
for (int i = 0; i < n; i++) {
if (scan.hasNextLine()) {
String nextLine = scan.nextLine();
if (nextLine.equals(line)) {
listArray[i] = nextLine;
}
}
}
Have a look at how you are comparing String objects. You should use the equals method to compare a String's content. Using operators like == and != compares if the String objects are identical.
Now after using equals correctly have a look at how you are using nextLine. Check its Javadoc
I feel LineCounter(name) works because you did not put a ".txt" there. Try removing the ".txt" extension from the file name in the Filereader and Printwriter objects and see if it works. Usually in windows, the extension is not a part of the file name.
Here's an alternative (easier) solution to do what you want, using easier to understand code. (I think)
Also it avoids multiple
loops, but uses a single Java 8 stream to filter instead.
public static void deleteLine(String name, String line) throws IOException {
List<String> lines = Files.readAllLines(Paths.get(name));
lines = lines.stream().filter(v -> !v.equals(line)).collect(Collectors.toList());
System.out.println(lines);
// if you want the String[] - but you don't need it
String[] linesAsStringArr = new String[lines.size()];
linesAsStringArr = lines.toArray(linesAsStringArr);
// write the file using our List<String>
Path out = Paths.get("output.txt"); // or another filename you dynamically create
Files.write(out, lines, Charset.forName("UTF-8"));
}

Java Processing input from a file

So I am doing this past sample final exam where the question asks to read input from a file and then process them into words. The end of a sentence is marked by any word that ends with one of the three characters . ? !
I was able to write a code for this however I can only split them into sentences using scanner class and using use.Delimiter. I want to process them into words and see if a word ends in the above sentence separator then I will just stop adding words into the sentence class.
Any help would be appreciated as I am learning this on my own and this is what I came up with. My code is here.
File file = new File("finalq4.txt");
Scanner scanner = new Scanner(file);
scanner.useDelimiter("[.?!]");
while(scanner.hasNext()){
sentCount++;
line = scanner.next();
line = line.replaceAll("\\r?\\n", " ");
line = line.trim();
StringTokenizer tokenizer = new StringTokenizer(line, " ");
wordsCount += tokenizer.countTokens();
sentences.add(new Sentence(line,wordsCount));
for(int i = 0; i < line.replaceAll(",|\\s+|'|-","").length(); i++){
currentChar = line.charAt(i);
if (Character.isDigit(currentChar)) {
}else{
lettersCount++;
}
}
}
What I am doing in this code is that I am splitting the input into sentences using the Delimiter method and then counting the words, letters of the entire file and storing the sentences in a sentence class.
If I want to split this into words, how can I do that without using the scanner class.
Some of the input from the file that I have to process is here:
Text that follows is based on the Wikipedia page on cryptography!
Cryptography is the practice and study of hiding information. In modern times,
cryptography is considered to be a branch of both mathematics and computer
science, and is affiliated closely with information theory, computer security, and
engineering. Cryptography is used in applications present in technologically
advanced societies; examples include the security of ATM cards, computer
passwords, and electronic commerce, which all depend on cryptography.....
I can further elaborate on this question if it needs explanation.
What I want to be able to do is to keep adding words to the sentence class and stop if the word ends in one of the above sentence separator. And then read another word and keep adding the words until I hit another separator.
The snippet below shall work
public static void main(String[] args) throws FileNotFoundException {
File file = new File("final.txt");
Scanner scanner = new Scanner(file);
scanner.useDelimiter("[.?!]");
int sentCount;
List<Sentence> sentences = new ArrayList<Sentence>();
while (scanner.hasNext()) {
String line = scanner.next();
if (!line.equals("")) { /// for the ... in the end
int wordsCount = 0;
String[] wordsOfLine = line.split(" ");
for (int i = 0; i < wordsOfLine.length; i++) {
wordsCount++;
}
Sentence sentence = new Sentence(line, wordsCount);
sentences.add(sentence);
}
}
}
public class Sentence {
String line = "";
int wordsCount = 0;
public Sentence(String line, int wordsCount) {
this.line = line;
this.wordsCount=wordsCount;
}
You can use a buffered reader to read every line of the file. Then split every line into a sentence with the split method and finally to get the words just split the sentence with the same method. In the end it would look something like this:
BufferedReader br;
try{
br = new BufferedReader(new File(fileName));
}catch (IOException e) {e.printStackTrace();}
StringBuilder sb = new StringBuilder();
String line;
while((line = br.readLine()) != null){
sb.append(line);
}
String[] sentences = sb.toString().split("\\.");
for(String sentence:sentences){
String word = sentence.split(" ");
//Add word to sentence...
}
try{
br.close();
}catch(IOException e){
e.printStackTrace();
}
Okay so i have been solving this question through several techniques and one of the approach was above. however i was able to solve this with another approach as well which does not involve using Scanner class. This one was much more accurate and it gave me the exact output whereas in the above i was off by a few words and letters.
try {
input = new BufferedReader(new FileReader("file.txt"));
strLine = input.readLine();
while(strLine!= null){
String[] tokens = strLine.split("\\s+");
for (int i = 0; i < tokens.length; i++) {
if(strLine.isEmpty()){
continue;
}
String s = tokens[i];
wordsJoin += tokens[i] + " ";
wordCount += i;
int len = s.length();
String charString = s.replaceAll("[^a-zA-Z ]", "");
for(int k =0; k<charString.length(); k++){
currentChar = charString.charAt(k);
if(Character.isLetter(currentChar)){
lettersCount++;
}
}
if (s.charAt(len - 1) == '.' || s.charAt(len - 1) == '?' || s.charAt(len - 1) == '!') {
sentences.add(new Sentence(wordsJoin, wordCount));
sentCount++;
numOfWords += countWords(wordsJoin);
wordsJoin = "";
wordCount = 0;
}
}
strLine = input.readLine();
}
This might be useful for anyone doing the same problem or just need an idea of how to count letters, words and sentences from a text file.

how to concatenate two items in array of string using java?

I have a text file i already read it and return an array of string "lines" in this structure
{(1),text,(2),text,(3),text........}
I want to restructure it as
{(1)text,(2)text,(3)text........}
which mean concatenate every number like (1) with the next text and so on
public String[] openFile() throws IOException {
FileInputStream inStream = new FileInputStream(path);
InputStreamReader inReader = new InputStreamReader(inStream,"UTF-8");
BufferedReader textReader = new BufferedReader(inReader);
int numberOfLine = countLines();
String[] textData = new String[numberOfLine];
for (int i = 0; i < numberOfLine; i++) {
// if (textReader.readLine()!= null) {
textData[i] = textReader.readLine();
//}
}
textReader.close();
return textData;
}
how can i do it please using Java language ?
Thanks for your helps and your opinions
String[] newArray = new String[textData.length / 2];
for (int i = 0; i < textData.length - 1; i+=2) {
newArray[i / 2] = textData[i] + textData[i + 1];
}
But be sure that your textData has an even length
Put this snippet before the return statement and return newArray instead;
It seems that the comma you want to omit is always preceded by a closing bracket. Assuming that that is the only time it happens in your string you could just do a simple replacement in your for-loop:
textData[i] = textData[i].replace("),", ")");
If that isn't the case, then another thing you could do is work on the basis that the comma you want to remove is the first in the string:
//Locate index of position of first comma in string
int firstComma = x.indexOf(',');
//Edit string by concatenating the bit of the string before the comma
and the bit after it, stepping over the comma in the process
textData[i] = (textData[i].substring(0, firstComma)).concat(textData[i].substring(firstComma + 1));

String.split() will keep original char array inside

I've noticed that Java String will reuse char array inside it to avoid creating new char array for a new String instance in method such as subString(). There are several unpublish constructors in String for this purpose, accepting a char array and two int as range to construct a String instance.
But until today I found that split will also reuse the char arr of original String instance. Now I read a loooooong line from a file, split it with "," and cut a very limit column for real usage. Because every part of the line secretly holding the reference of the looooong char array, I got an OOO very soon.
here is example code:
ArrayList<String> test = new ArrayList<String>(3000000);
BufferedReader origReader = new BufferedReader(new FileReader(new File(
"G:\\filewithlongline.txt")));
String line = origReader.readLine();
int i = 0;
while ((line = origReader.readLine()) != null) {
String name = line.split(',')[0];
test.add(name);
i++;
if (i % 100000 == 0) {
System.out.println(name);
}
}
System.out.println(test.size());
Is there any standard method in JDK to make sure that every String instance that spitted is a "real deep copy" not "shallow copy"?
Now I am using a very ugly workaround to force creating a new String instance:
ArrayList<String> test = new ArrayList<String>(3000000);
BufferedReader origReader = new BufferedReader(new FileReader(new File(
"G:\\filewithlongline.txt")));
String line = origReader.readLine();
int i = 0;
while ((line = origReader.readLine()) != null) {
String name = line.split(',')[0]+" ".trim(); // force creating a String instance
test.add(name);
i++;
if (i % 100000 == 0) {
System.out.println(name);
}
}
System.out.println(test.size());
The simplest approach is to create a new String directly. This is one of the rare cases where its a good idea.
String name = new String(line.split(",")[0]); // note the use of ","
An alternative is to parse the file yourself.
do {
StringBuilder name = new StringBuilder();
int ch;
while((ch = origReader.read()) >= 0 && ch != ',' && ch >= ' ') {
name.append((char) ch);
}
test.add(name.toString());
} while(origReader.readLine() != null);
String has a copy constructor you can use for this purpose.
final String name = new String(line.substring(0, line.indexOf(',')));
... or, as Peter suggested, just only read until the ,.
final StringBuilder buf = new StringBuilder();
do {
int ch;
while ((ch = origReader.read()) >= 0 && ch != ',') {
buf.append((char) ch);
}
test.add(buf.toString());
buf.setLength(0);
} while (origReader.readLine() != null);

Categories

Resources