Java Processing input from a file - java

So I am doing this past sample final exam where the question asks to read input from a file and then process them into words. The end of a sentence is marked by any word that ends with one of the three characters . ? !
I was able to write a code for this however I can only split them into sentences using scanner class and using use.Delimiter. I want to process them into words and see if a word ends in the above sentence separator then I will just stop adding words into the sentence class.
Any help would be appreciated as I am learning this on my own and this is what I came up with. My code is here.
File file = new File("finalq4.txt");
Scanner scanner = new Scanner(file);
scanner.useDelimiter("[.?!]");
while(scanner.hasNext()){
sentCount++;
line = scanner.next();
line = line.replaceAll("\\r?\\n", " ");
line = line.trim();
StringTokenizer tokenizer = new StringTokenizer(line, " ");
wordsCount += tokenizer.countTokens();
sentences.add(new Sentence(line,wordsCount));
for(int i = 0; i < line.replaceAll(",|\\s+|'|-","").length(); i++){
currentChar = line.charAt(i);
if (Character.isDigit(currentChar)) {
}else{
lettersCount++;
}
}
}
What I am doing in this code is that I am splitting the input into sentences using the Delimiter method and then counting the words, letters of the entire file and storing the sentences in a sentence class.
If I want to split this into words, how can I do that without using the scanner class.
Some of the input from the file that I have to process is here:
Text that follows is based on the Wikipedia page on cryptography!
Cryptography is the practice and study of hiding information. In modern times,
cryptography is considered to be a branch of both mathematics and computer
science, and is affiliated closely with information theory, computer security, and
engineering. Cryptography is used in applications present in technologically
advanced societies; examples include the security of ATM cards, computer
passwords, and electronic commerce, which all depend on cryptography.....
I can further elaborate on this question if it needs explanation.
What I want to be able to do is to keep adding words to the sentence class and stop if the word ends in one of the above sentence separator. And then read another word and keep adding the words until I hit another separator.

The snippet below shall work
public static void main(String[] args) throws FileNotFoundException {
File file = new File("final.txt");
Scanner scanner = new Scanner(file);
scanner.useDelimiter("[.?!]");
int sentCount;
List<Sentence> sentences = new ArrayList<Sentence>();
while (scanner.hasNext()) {
String line = scanner.next();
if (!line.equals("")) { /// for the ... in the end
int wordsCount = 0;
String[] wordsOfLine = line.split(" ");
for (int i = 0; i < wordsOfLine.length; i++) {
wordsCount++;
}
Sentence sentence = new Sentence(line, wordsCount);
sentences.add(sentence);
}
}
}
public class Sentence {
String line = "";
int wordsCount = 0;
public Sentence(String line, int wordsCount) {
this.line = line;
this.wordsCount=wordsCount;
}

You can use a buffered reader to read every line of the file. Then split every line into a sentence with the split method and finally to get the words just split the sentence with the same method. In the end it would look something like this:
BufferedReader br;
try{
br = new BufferedReader(new File(fileName));
}catch (IOException e) {e.printStackTrace();}
StringBuilder sb = new StringBuilder();
String line;
while((line = br.readLine()) != null){
sb.append(line);
}
String[] sentences = sb.toString().split("\\.");
for(String sentence:sentences){
String word = sentence.split(" ");
//Add word to sentence...
}
try{
br.close();
}catch(IOException e){
e.printStackTrace();
}

Okay so i have been solving this question through several techniques and one of the approach was above. however i was able to solve this with another approach as well which does not involve using Scanner class. This one was much more accurate and it gave me the exact output whereas in the above i was off by a few words and letters.
try {
input = new BufferedReader(new FileReader("file.txt"));
strLine = input.readLine();
while(strLine!= null){
String[] tokens = strLine.split("\\s+");
for (int i = 0; i < tokens.length; i++) {
if(strLine.isEmpty()){
continue;
}
String s = tokens[i];
wordsJoin += tokens[i] + " ";
wordCount += i;
int len = s.length();
String charString = s.replaceAll("[^a-zA-Z ]", "");
for(int k =0; k<charString.length(); k++){
currentChar = charString.charAt(k);
if(Character.isLetter(currentChar)){
lettersCount++;
}
}
if (s.charAt(len - 1) == '.' || s.charAt(len - 1) == '?' || s.charAt(len - 1) == '!') {
sentences.add(new Sentence(wordsJoin, wordCount));
sentCount++;
numOfWords += countWords(wordsJoin);
wordsJoin = "";
wordCount = 0;
}
}
strLine = input.readLine();
}
This might be useful for anyone doing the same problem or just need an idea of how to count letters, words and sentences from a text file.

Related

Java: Incrementing wordCount for each word stored into array from file

I'm trying to read a file and store the words from the file into a string array excluding spaces/multiple spaces. E.g. my file has the words "this is a test", the program should store into an array arr ["this","is","a","test"] and also increment a variable called wordCount every time it stores a word into the array.
I understand that i can use
fileContents.split("\\s+")
but that does not increment wordCount each time it adds a word into the array. I'm thinking a for loop will do the job but i don't know how.
Any help is appreciated. Thanks.
You can iterate the result of the split call and add one to the word count in the same iteration:
String fileContents = "this is a test";
int wordCount = 0;
for (String word: fileContents.split("\\s+")) {
System.out.println(word);
wordCount++;
System.out.println(wordCount);
}
You should add the storage in the array structure each time you have a new word.
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader(
"/Users/pankaj/Downloads/myfile.txt"));
String line = reader.readLine();
int count = 0;
while (line != null) {
String[] array = line.split("\\s+");
count = count + array.length;
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
Here I get the line by line so you can add additional functionality per line. Also get the count by count variable.

Scanning a text file into an array and omitting one specified line

I'm a beginner and need some help. I'm trying to scan a text file into an array line by line, but omitting one line. My text file is
i am
you are
he is
she is
it is
I want to create a method that will scan this and put elements into an array with an exception for one line (that is chosen by entering the String as a parameter for the method). Then erase the original text file and print there the created array (without that one deleted line). Sorry, I suck at explaining.
I have tried this:
public static void deleteLine(String name, String line) throws IOException {
String sc = System.getProperty("user.dir") + new File("").separator;
FileReader fr = new FileReader(sc + name + ".txt");
Scanner scan = new Scanner(fr);
int n = countLines(name); // a well working method returning the number if lines in the file (here 5)
String[] listArray = new String[n-1];
for (int i = 0; i < n-1; i++) {
if (scan.hasNextLine() && !scan.nextLine().equals(line))
listArray[i] = scan.nextLine();
else if (scan.hasNextLine() && scan.nextLine().equals(line))
i--;
else continue;
}
PrintWriter print = new PrintWriter(sc + name + ".txt");
print.write("");
for (int i = 0; i < n-2; i++) {
print.write(listArray[i] + "\n");
}
print.close()
}
I get an error "Line not found" when I enter: deleteLine("all_names","you are") (all_names is the name of the file). I'm sure the problem lies in the for-loop, but I have no idea why this doesn't work. :(
//SOLVED//
This code worked after all. Thanks for answers!
public static void deleteLine(String name, String line) throws IOException{
String sc = System.getProperty("user.dir") + new File("").separator;
FileReader fr = null;
fr = new FileReader(sc+name+".txt");
Scanner scan = new Scanner(fr);
int n = LineCounter(name);
String[] listArray = new String[n-1];
for (int i = 0; i < n-1; i++) {
if (scan.hasNextLine()) {
String nextLine = scan.nextLine();
if (!nextLine.equals(line)) {
listArray[i] = nextLine;
}
else i--;
}
}
PrintWriter print = new PrintWriter(sc+name+".txt");
print.write("");
for(int i=0;i<n-1;i++){
print.write(listArray[i]+System.lineSeparator());
}
print.close();
}
You are reading the lines twice scan.nextLine() while comparing and because of that you run out of the lines.
Replace your loop with this one or similar
for (int i = 0; i < n; i++) {
if (scan.hasNextLine()) {
String nextLine = scan.nextLine();
if (nextLine.equals(line)) {
listArray[i] = nextLine;
}
}
}
Have a look at how you are comparing String objects. You should use the equals method to compare a String's content. Using operators like == and != compares if the String objects are identical.
Now after using equals correctly have a look at how you are using nextLine. Check its Javadoc
I feel LineCounter(name) works because you did not put a ".txt" there. Try removing the ".txt" extension from the file name in the Filereader and Printwriter objects and see if it works. Usually in windows, the extension is not a part of the file name.
Here's an alternative (easier) solution to do what you want, using easier to understand code. (I think)
Also it avoids multiple
loops, but uses a single Java 8 stream to filter instead.
public static void deleteLine(String name, String line) throws IOException {
List<String> lines = Files.readAllLines(Paths.get(name));
lines = lines.stream().filter(v -> !v.equals(line)).collect(Collectors.toList());
System.out.println(lines);
// if you want the String[] - but you don't need it
String[] linesAsStringArr = new String[lines.size()];
linesAsStringArr = lines.toArray(linesAsStringArr);
// write the file using our List<String>
Path out = Paths.get("output.txt"); // or another filename you dynamically create
Files.write(out, lines, Charset.forName("UTF-8"));
}

Having trouble with parsing csv files in Java

I'm trying to parse a folder of csv files (balance sheets), and have everythings gone smoothly up until I tried to separate the row names from the values.
It looks like the last cell on the previous row is combining with the first cell (the row name in column A) in the next row.
File path = new File("/Users/Zack/Desktop/JavaDB/BALANCESHEETS");
for(File file: path.listFiles()) {
if (file.isFile()) {
String fileName = file.getName();
String ticker = fileName.split("\\_")[0];
if (ticker.equals("ASB") || ticker.equals("FRC")) {
if (ticker.equals("ASB")) {
ticker = ticker + "PRD";
}
if (ticker.equals("FRC")) {
ticker = ticker + "PRD";
}
}
Reader reader = new BufferedReader(new FileReader(file));
StringBuilder builder = new StringBuilder();
int c;
while ((c = reader.read()) != -1) {
builder.append((char) c);
}
String string = builder.toString();
ArrayList<String> stringResult = new ArrayList<String>();
if (string != null) {
String[] splitData = string.split("\\s*,\\s*");
for (int i = 0; i <splitData.length; i++) {
if (!(splitData[i] == null) || !(splitData[i].length() ==0)) {
stringResult.add(splitData[i].trim());
}
}
}
for (int i = 0; i < stringResult.size(); i++) {
int cL = stringResult.get(i).length();
for (int x = 0; x < cL; x++) {
if (Character.isLetter(stringResult.get(i).charAt(x))) {
System.out.println("index: " + i);
System.out.println(stringResult.get(i));
break;
}
}
}
Here are some photos of what's happening
https://postimg.org/image/a9qc1qggz/
https://postimg.org/image/mvna7p7s3/
Any idea on how to fix this?
I also noticed there is a space in front of the row names in the spreadsheets, which I suspect may be part of the problem.
The problem is coming from where you are reading in the file, here:
Reader reader = new BufferedReader(new FileReader(file));
StringBuilder builder = new StringBuilder();
int c;
while ((c = reader.read()) != -1) {
builder.append((char) c);
}
String string = builder.toString();
This reads all the characters into a single string, including the new line character(s). When you then split the string, you are not splitting on the new line character(s) and so you end up with what you are seeing.
As mentioned but others I strongly urge you to use one of the many csv parsers that already exist.
The simple (but ugly) fix would be to also split on newlines. A better fix would be to use the readLine() method of the BufferedReader.
Also != is your friend.
As Erwin stated in the comments, your Pattern that you are splitting on just looks for commas with whitespace around them. It looks like you know what format your data will be in since you know that the data will be separated by either whitespace comma whitespace or a newline. Seems to me you just need to change your input to "\\s*,\\s*|$", which is the regex that says that. Like has been mentioned you need to know beforehand that the data doesn't include whitespace comma whitespace in any of the fields or this breaks.

Split a text file into blocks

I need to read a text file, and break the text into blocks of 6 characters (including spaces), pad zeroes to the end of text to meet the requirement.
I tried doing it and here is what I have done.
File file = new File("Sample.txt");
String line;
try {
Scanner sc = new Scanner(file);
while(sc.hasNext()){
line = sc.next();
int chunk = line.length();
int block_size=6;
if((chunk%block_size) != 0)
{
StringBuilder sb = new StringBuilder(line);
int val = chunk%block_size;
for(int i=0; i<val; i++){
sb.append(" ");
}
line = new String(sb.toString());
}
int group = line.length() / block_size;
String[] b = new String[group];
System.out.println(line);
System.out.println(chunk);
int j =0;
for(int i=0; i<group;i++){
b[i] = line.substring(j,j+block_size);
j += block_size;
}
System.out.println("String after spliting is: ");
for(int i=0; i<group;i++){
System.out.println(b[i]);
}
}
}
Now this works fine when the text in the input file has no spaces between words. But when I add spaces gives me a different output. I am stuck up at this point. Any suggestions on the same ?
I don't want to write the solution for you, but I'd advise you that what you're trying to accomplish might be easier to do using a BufferedReader with a FileReader and by using Reader.read(buf) where buf is a char[6];

Decompose to more classes and add constructors

I want to ask you if is it possible to change/decompose my code to 2-3 classes, add constructors (if possible not empty) and/or add more methods. If need program can have more functions.
public class Testing {
public static void main(String args[]) throws Exception {
Scanner input = new Scanner(System.in);
System.out.println("Select word from list:");
System.out.println();
try {
FileReader fr = new FileReader("src/lt/kvk/i3_2/test/List.txt"); // this is list of words, everything all right here
BufferedReader br = new BufferedReader(fr);
String s;
while((s = br.readLine()) != null) {
System.out.println(s);
}
fr.close();
String stilius = input.nextLine(); // eneter word which I want to count in File.txt
BufferedReader bf = new BufferedReader(new FileReader("src/lt/kvk/i3_2/test/File.txt")); // from this file I need to count word which I entered before
int counter = 0;
String line;
System.out.println("Looking for information");
ArrayList<String> resultList = new ArrayList<String>();
String name = null;
while (( line = bf.readLine()) != null){
if (line.trim().length() == 0) name = null;
else if (name == null) name = line;
int indexfound = line.indexOf(stilius);
if (indexfound > -1) {
counter++;
resultList.add(name);
}
}
if (counter > 0) {
System.out.println("Word are repeated "+ counter + "times");}
else {
System.out.println("Error...");
}
bf.close();
}
catch (IOException e) {
System.out.println("Error:" + e.toString());
}
}
}
Program counting words (entered by keyboard) from file.txt and elect who repeated this word for ex.: if I enter word: One It shows:
Word One repeated 3 times by John, Elisa, Albert
file.txt looks like:
John //first line - name
One
Three
Four
Peter //first line - name
Two
Three
Elisa //first line - name
One
Three
Albert //first line - name
One
Three
Four
Nicole //first line - name
Two
Four
I don't know really if is possible to decompose this code to 2-3 classes. If someone could help me, thank you very much.
I would start by defining two classes:
WordFile
WordFileEntry
A WordFile-object should consist of a list of WordFileEntry-objects. A WordFileEntry consists of String name and List<String> words.
The counting of repetitions could be done by a WordFile-object itself. The logic of reading a file could be written in the WordFile-class or a separate class.

Categories

Resources