FileStatistics -- trouble counting the number of words in a file - java

In my course, we are tasked with determining three key statistics about a file that is passed via the console input: 1) number of characters, 2) number of lines, 3) number of words. Before closing this question as a duplicate, please read on to see what unique problem I'm encountering. Thank you :)
I originally wrote a solution with three separate methods and three separate Scanner variables, but I realized that for larger files, this solution would be very inefficient. Instead, I decided to write up a solution that only runs through the file a single time and calculates all three statistics in one go. Here is what I have so far:
import java.util.*;
import java.io.*;
public class FileStatistics
{
// Note: uncomment (A) and (B) below to test execution time
public static void main( String [] args ) throws IOException
{
/* (A)
long startTime = System.currentTimeMillis();
*/
File file = new File(args[0]);
Scanner input = new Scanner(file);
int numChars = 0, numWords = 0, numLines = 0;
/* Calculations */
while( input.hasNextLine() )
{
String currentLine = input.nextLine();
numLines++;
numChars+= currentLine.length();
String [] words = currentLine.split(" ");
numWords += words.length;
}
input.close();
/* Results */
System.out.println( "File " + file.getName() + " has ");
System.out.println( numChars + " characters");
System.out.println( numWords + " words");
System.out.println( numLines + " lines");
/* (B)
long endTime = System.currentTimeMillis();
System.out.println("Execution took: " + (endTime-startTime)/1000.0 + " seconds");
*/
}
}
I've been comparing the results of my program to Microsoft Word's own file statistics by simply copy/pasting the contents of whatever file I'm using into Word. The number of characters and number of lines are calculated correctly.
However, my program does not properly count the number of words. I decided to include a test statement in there to print out the contents of the array words, and it seems that certain "spatial formatting" (like tabs from a Java source code file) are being treated as individual elements in the split array. I tried doing currentLine.replace("\t", "") before invoking the split method to remove those tabs, but this didn't change a thing.
Could someone please offer some advice or hints as to what I'm doing wrong?

This is because the String array returned by currentLine.split(" ") can contain elements which are empty Strings: "". You can see this if you call System.out.println(Arrays.toString(words)).
To create the desired behavior, you can store words.length in a variable count and decrement count for each instance of the empty string "" in words.
Here is a sample solution:
while( input.hasNextLine() )
{
String currentLine = input.nextLine();
numLines++;
numChars+= currentLine.length();
String [] words = currentLine.split("\\s+");
int count = words.length;
for (int i = 0; i < words.length; i++) {
if (words[i].equals("")) {
count--;
}
}
numWords += count;
}
Alternatively, you can convert words to an ArrayList and use the removeAll() functions:
while( input.hasNextLine() )
{
String currentLine = input.nextLine();
numLines++;
numChars+= currentLine.length();
ArrayList<String> words = new ArrayList<>(Arrays.asList(currentLine.split("\\s+")));
words.removeAll(Collections.singleton(""));
numWords += words.size();
}

Related

Best way to read CSV file and store (array, 2darray?) and print to screen in tabular format?

The thing i'm hoping to do is read a csv file with 6 rows and 6 columns in it using Java. I then need to print out each row and allow the user to select 1 option. Here is what I have, I know my code chooses 1 and prints it, but I don't know how to change it from printing one random row, to printing all 6 rows. Probably in an ArrayList or 2dArray?
package theContest;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Random;
import java.util.Scanner;
public class theContest {
// The main() method
public static void main(String[] args) throws FileNotFoundException {
//
String fileName = "contest.csv";
File file = new File(fileName);
if (!file.isFile()) {
System.err.println("Cannot open file: " + fileName + ".");
System.exit(0);
}
//
int numContest = 0;
Scanner input = new Scanner(file);
while (input.hasNext()) {
input.nextLine();
numContest++;
}
input.close();
System.out.println("Total of " + numContest + " contestants.");
//
int winner = 0;
Random random = new Random();
winner = random.nextInt(numContest) + 1;
System.out.println("The winner is contestant number " + winner + ".");
//
String winnerDetails = "";
input = new Scanner(file);
for (int lineCount = 0; lineCount < winner; lineCount++) {
winnerDetails = input.nextLine();
}
input.close();
System.out.println("Winner is: " + winnerDetails);
//
String id = "";
String name = "";
String seats = "";
String trans = "";
String rate = "";
String price = "";
input = new Scanner(winnerDetails);
input.useDelimiter(",");
id = input.next();
name = input.next();
seats = input.next();
trans = input.next();
rate = input.next();
price = input.next();
input.close();
System.out.println("Details are:");
System.out.printf("%-5s : %s\n", "ID", id);
System.out.printf("%-5s : %s\n", "Name", name);
System.out.printf("%-5s : %s\n", "Seating", seats};
System.out.printf("%-5s : %s\n", "Transfer", trans};
System.out.printf("%-5s : %s\n", "Rate", rate};
System.out.printf("%-5s : %s\n", "Price", price};
}
}
Here:
for (int lineCount = 0; lineCount < winner; lineCount++) {
winnerDetails = input.nextLine();
}
Your file has N rows. The above code iterates all lines, and stores the result in a single variable. In each iteration, you overwrite what you put there before. So, what your code does is: it reads N lines, and throws away everything prior the last row.
In other words: if you have 6 lines, and you want to print all of them ... then that all your processing needs to be "part" of a loop, too.
For example, you could turn winnerDetails into an array of String, and then put each line in its own slot. Then you loop over the array, and print each slot.
And as you already know about ArrayList, best use that then. That also means: you need to read the file only once. Open the file, read each line, and push that into an ArrayList. Afterwards, you can do whatever you want with that list.
And note: that is actually the point you should start with. Dont solve your whole problem at once. Slice it into smaller parts. Like: reading data from CSV ... has nothing to do with later processing the lines and printing those. You can write code that just takes an ArrayList, processes those and prints stuff. Which you can ... test on its own, as you can hardcode such lists in your code.

Program that reads file input and displays proportion of the length of the letter and so on

I have an assignment due two days and I have been trying a lot of days to do this, but I am burned, tried to come back to it, still no progress.
THE ASSIGNMENT is the following:
Java program that computes the above statistics from
any text file. Here’s what it might look like in action:
Name of the input file: example.txt
The proportion of 1-letter words: 3.91% (74 words)
The proportion of 2-letter words: 18.52% (349 words)
The proportion of 3-letter words: 24.24% (456 words)
The proportion of 4-letter words: 19.80% (374 words)
The proportion of 5-letter words: 11.33% (212 words)
…
…
The proportion of 12-letter words: 0.45% (8 words)
Proportion of 13- (or more) letter words: 0.51% (9 words)
Now In order to do this, I thought to divide my program into three methods: Read the method, count the letters and distinguish them and finally display it as the example above. Now that I said that, here is my code right now:
/*like make smaller functions
where each function has one task
like to loop through the file and return an array of words
then use that as input to another function whose purpose is to count the
letters
and then pass that array into a function for printing that.
*/
import java.io.*;
import java.util.Scanner;
class Autorship {
public static void main(String[] args) {
try {
System.out.println("Name of input file: ");
Scanner sc1 = new Scanner(System. in );
sc1.useDelimiter("[^a-zA-Z]");
String fname = sc1.nextLine();
sc1.close();
sc1 = new Scanner(new FileReader(fname));
sc1.useDelimiter("[^a-zA-Z]");
String line;
System.out.println(WordCount(fname, sc1));
} catch (FileNotFoundException e) {
System.out.println("There was an error opening one of the files.");
}
}
public static int WordCount(String fname, Scanner sc1) {
int wordCount = 0;
int lineCount = 0;
while (sc1.hasNextLine()) {
String line;
line = sc1.nextLine();
lineCount++;
String[] strings = line.split(" ");
int[] counts = new int[14];
for (String str: strings)
if (str.length() < counts.length) counts[str.length()] += 1;
System.out.println("This is counts length: " + counts.length);
for (int i = 1; i < counts.length; i++)
System.out.println(i + " letter words: " + counts[i]);
}
return 0;
}
}
Now please I do not want the answer, as that would be plagiarism, and I am not that kind of person, I just want a bit of help to continue to progress, I'm so stuck right now, thanks ^^
Here is an adjusted and working version. I commented the lines I edited.
Your code wasn't that bad and it was working quite well. The only problem you had was that you've printed out the letter counts inside the while-loop instead of doing it outside. Therefore it repeated with every new line that was read from the file.
Please note: I strongly recommend to always use curly brackets even though Java syntax allows to not use them with if-statements and for-loops if they're followed by only one line of code to execute. But not using them makes the code harder to read and error prone.
public static void main(String[] args) {
try {
System.out.println("Name of input file: ");
Scanner sc1 = new Scanner(System. in );
sc1.useDelimiter("[^a-zA-Z]");
String fname = sc1.nextLine();
sc1.close();
sc1 = new Scanner(new FileReader(fname));
sc1.useDelimiter("[^a-zA-Z]");
String line;
System.out.println("WordCount: " + WordCount(fname, sc1)); // edited
} catch (FileNotFoundException e) {
System.out.println("There was an error opening one of the files.");
}
}
public static int WordCount(String fname, Scanner sc1) {
int wordCount = 0;
int lineCount = 0;
final int MAXIMUM_LENGTH = 14; // edited. Better use a constant here.
int[] counts = new int[MAXIMUM_LENGTH]; // edited. Constant applied
while (sc1.hasNextLine()) {
String line = sc1.nextLine();
// increment line count
lineCount++;
String[] strings = line.split(" ");
// increment word count
wordCount += strings.length; // added
// edited. curly brackets and constant MAXIMUM_LENGTH
for (String str: strings) {
if (str.length() < MAXIMUM_LENGTH) {
counts[str.length()] += 1;
}
}
}
// edited / added. finally show the results
System.out.println("maximum length: " + MAXIMUM_LENGTH);
System.out.println("line count: " + lineCount);
System.out.println("word count: " + wordCount);
// edited. moved out of the while-loop. MAXIMUM_LENGTH applied.
for (int i = 1; i < MAXIMUM_LENGTH; i++) {
System.out.println(i + " letter words: " + counts[i]);
}
// edited.
return wordCount;
}

Printing unknown number list of words

I created a program that checks a text file of how many words of a specific length appear in it. I wanted to print the number of words that my program found of this specific length and then print out that list of words. However, the list of words is printing in my while loop first, because I have to print the count outside of this loop. Do I have to make this unknown numbered list into an array and then return the array to print it in the main method in order for this to print second? Here's what I have so far:
public static void countLetters(PartArray part, int num) throws Exception{
Scanner inputFile = new Scanner(new File("2of12inf.txt"));
int count = 0;
while( inputFile.hasNext() ){
String word = inputFile.next();
if (word.length() == num)
{
count++;
expandArray (part , 2*MAX_SIZE);
System.out.println(word);
}
}
System.out.println("I found " + count + " " + num + "-letter words.");
System.out.println("The list of words is: ");
inputFile.close();
If you want to avoid printing the words in the while loop, you can take the println out of the loop. You don't have to add each word to a data structure if you don't want to. You can append each word to a "wordBuffer" StringBuffer (String Buffer concatenation is faster and more efficient than a String), for more info on this matter read this: http://www.javaworld.com/article/2076072/build-ci-sdlc/stringbuffer-versus-string.html
like this:
int count = 0;
StringBuffer wordBuffer = new StringBuffer ("");
while( inputFile.hasNext() ){
String word = inputFile.next();
if (word.length() == num)
{
count++;
//Adding \n assuming you want new line between elements
wordBuffer.append(word+"\n");
}
}
System.out.println("I found " + count + " " + num + "-letter words.");
System.out.println("The list of words is: "wordBuffer);
inputFile.close();
Is this what you were looking for?
The problem with arrays in this context is an array has an immutable size once initialized. If you're retrieving a mutable collection, you need a collector that has no such limitation. An ArrayList would achieve this.
public static void countLetters(PartArray part, int num) throws Exception {
Scanner inputFile = new Scanner(new File("2of12inf.txt"));
int count = 0;
ArrayList<String> words = new ArrayList();
while( inputFile.hasNext() ){
String word = inputFile.next();
if (word.length() == num)
{
count++;
//expandArray (part , 2*MAX_SIZE);
//System.out.println(word);
words.add(word);//adding each word to a new index with each iteration
}
}
System.out.println("I found " + count + " " + num + "-letter words.");
System.out.println("The list of words is: ");
for (String w in words) {//for each word in the words ArrayList
System.out.println(w);//print out the values
}
inputFile.close();

For loop iterating through string and adding/replacing characters

I need to write for loop to iterate through a String object (nested within a String[] array) to operate on each character within this string with the following criteria.
first, add a hyphen to the string
if the character is not a vowel, add this character to the end of the string, and then remove it from the beginning of the string.
if the character is a vowel, then add "v" to the end of the string.
Every time I have attempted this with various loops and various strategies/implementations, I have somehow ended up with the StringIndexOutOfBoundsException error.
Any ideas?
Update: Here is all of the code. I did not need help with the rest of the program, simply this part. However, I understand that you have to see the system at work.
import java.util.Scanner;
import java.io.IOException;
import java.io.File;
public class plT
{
public static void main(String[] args) throws IOException
{
String file = "";
String line = "";
String[] tempString;
String transWord = ""; // final String for output
int wordTranslatedCount = 0;
int sentenceTranslatedCount = 0;
Scanner stdin = new Scanner(System.in);
System.out.println("Welcome to the Pig-Latin translator!");
System.out.println("Please enter the file name with the sentences you wish to translate");
file = stdin.nextLine();
Scanner fileScanner = new Scanner(new File(file));
fileScanner.nextLine();
while (fileScanner.hasNextLine())
{
line = fileScanner.nextLine();
tempString = line.split(" ");
for (String words : tempString)
{
if(isVowel(words.charAt(0)) || Character.isDigit(words.charAt(0)))
{
transWord += words + "-way ";
transWord.trim();
wordTranslatedCount++;
}
else
{
transWord += "-";
// for(int i = 0; i < words.length(); i++)
transWord += words.substring(1, words.length()) + "-" + words.charAt(0) + "ay ";
transWord.trim();
wordTranslatedCount++;
}
}
System.out.println("\'" + line + "\' in Pig-Latin is");
System.out.println("\t" + transWord);
transWord = "";
System.out.println();
sentenceTranslatedCount++;
}
System.out.println("Total number of sentences translated: " + sentenceTranslatedCount);
System.out.println("Total number of words translated: " + wordTranslatedCount);
fileScanner.close();
stdin.close();
}
public static boolean isVowel (char c)
{
return "AEIOUYaeiouy".indexOf(c) != -1;
}
}
Also, here is the example file from which text is being pulled (we are skipping the first line):
2
How are you today
This example has numbers 1234
Assuming that the issue is StringIndexOutOfBoundsException, then the only way this is going to occur, is when one of the words is an empty String. Knowing this also provides the solution: do something different (if \ else) when words is of length zero to handle the special case differently. This is one way to do this:
if (!"".equals(words)) {
// your logic goes here
}
another way, is to simply do this inside the loop (when you have a loop):
if ("".equals(words)) continue;
// Then rest of your logic goes here
If that is not the case or the issue, then the clue is in the parts of the code you are not showing us (you didn't give us the relevant code after all in that case). Better provide a complete subset of the code that can be used to replicate the problem (testcase), and the complete exception (so we don't even have to try it out ourselves.

Can not figure out Javas strings?

I am a student at the moment so I am still learning. I picked up VB pretty quick and it was simple Java on the other hand I am pretty confused on.
The Assignment I have been given this time has me confused "Write a method to determine the number of positions that two strings differ by. For Example,"Peace" and "Piece" differ in two positions. The method is declared int compare(String word1, String word2); if the strings are identical, the method returns 0. It returns -1 if the two strings have different lengths."
Additional "Write a main method to test the method. The main method should tell how many, positions the strings differ, or that they are identical, or if they are different lengths, state the lengths. Get the strings from the console.
So far this is where I am at and I am looking for someone to help break this down in I DUMDUM terms if they can I don't need a solution only help understanding it.
package arraysandstrings;
import java.util.Scanner;
public class differStrings {
public static void main (String agrs[]){
Scanner scanner = new Scanner (System.in);
System.out.print("Enter a word");
String word1;
String word2;
word1 = scanner.next();
System.out.print("Enter another word");
word2 = scanner.next();
int count = 0;
int length = word1.length();
for(int x = 0; x >= length; x = x+1) {
if (word1.charAt(x) == word2.charAt(x)) {
count = count + 1;
System.out.print (count);
}
}
}
}
Additional Question
package arraysandstrings;
import java.util.Scanner;
public class differStrings {
public static void main (String agrs[]){
Scanner scanner = new Scanner (System.in);
System.out.println("Enter a word");
String word1 = scanner.next();
System.out.println("Enter another word");
String word2 = scanner.next();
int count = 0;
int word1Length = word1.length();
int word2Length = word2.length();
if (word1Length != word2Length) {
System.out.println ("Words are a diffrent length");
System.out.println (word1 + "Has" + word1.length() + " chars");
System.out.println (word2 + "Has" + word2.length() + " chars");
}
for(int x = 0; x < word1Length; x = x+1) {
if (word1.charAt(x) != word2.charAt(x)) {
count = count + 1;
}}}
System.out.println (count+" different chars");
}
After implementing the knowledge Iv gained from your responses I have ran in to a problem with the last line:
System.out.println (count+" different chars");
It says Error expected however it worked before I added the next part of my assignment which was this:
if (word1Length != word2Length) {
System.out.println ("Words are a diffrent length");
System.out.println (word1 + "Has" + word1.length() + " chars");
System.out.println (word2 + "Has" + word2.length() + " chars");
}
for(int x = 0; x >= length; x = x+1) {
You probably mean
for(int x = 0; x < length; x = x+1) {
Shifting around some code, adding some line breaks and making 2 small tweaks to the logic produces a program that is closer to what you are trying to build.
package arraysandstrings;
import java.util.Scanner;
public class differStrings {
public static void main (String agrs[]){
Scanner scanner = new Scanner (System.in);
System.out.println("Enter a word");
String word1 = scanner.next();
System.out.println("Enter another word");
String word2 = scanner.next();
int count = 0;
int length = word1.length();
for(int x = 0; x < length; x = x+1) {
if (word1.charAt(x) != word2.charAt(x)) {
count = count + 1;
}
}
System.out.println (count+" different chars");
}
}
It looks like in addition to the for loop that #LouisWasserman pointed out you had code that was trying to find characters that are the same.
What you need is a loop which compares the two strings and counts the places where they are not equal.
Your logic counts the number of places where the two characters are the same. You are also printing the count each time the two characters are equal.
What it sounds like you need is a loop that iterates over the characters in the two strings comparing each character and incrementing the count of mis-matched or different characters. Then after getting a count of different characters by comparing all of the characters, you would print out the count of different characters.
So the basics would be: (1) read each of the strings, (2) check that the lengths are the same, (3) if same length then loop over the string comparing each character and incrementing the count of mis-matched characters each time there is a difference, (4) print out the count. If the string lengths are different then just set the count to negative one (-1) and do not bother to compare the two strings.
What would be kind of neat to do is to create a string of underscores and asterisk, in which each matching character position is represented by an underscore and each mis-matching character position is represented by an asterisk or perhaps the string would contain all of the matching characters and the mis-matching characters would be replaced by an asterisk.
Edit: adding example program
The example below is an annotated rewrite of your program. One change that I made was to use a function to perform the counting of the non-matching characters. The function, countNonMatchChars () is a static function in order to work around the object oriented nature of Java. This function is a utility type function and not really part of a class. It should be available to anyone who wants to use it.
Also rather than incrementing variables with the syntax of var = var + 1; I instead use the postincrement operator of ++ as in var++;.
package arraysandstrings;
import java.util.Scanner;
public class so_strings_main {
// function to compare two strings and count the number
// of characters that do not match.
//
// this function returns an integer indicating the number
// of characters that did not match or a negative one if the
// strings are not equal in length.
//
// "john" "john" returns 0
// "john1" "john2" returns 1
// "mary1" "john1" returns 4
// "john" "john1" returns -1 (lengths are not equal)
public static int countNonMatchChars (String s1, String s2)
{
// initialize the count to negative one indicating strings unequal in length
// get the lengths of the two strings to see if any comparison is needed
int count = -1;
int word1Length = s1.length();
int word2Length = s2.length();
if (word1Length == word2Length) {
// the lengths of the two strings are equal so we now do our comparison
// we start count off at zero. as we find unmatched characters, we
// will increment our count. if no unmatched characters found then
// we will return a count of zero.
count = 0;
for(int iLoop = 0; iLoop < word1Length; iLoop++) {
if (s1.charAt(iLoop) != s2.charAt(iLoop)) {
// the characters at this position in the string do not match
// increment our count of non-matching characters
count++;
}
}
}
// return the count of non-matching characters we have found.
return count;
}
public static void main (String agrs[]){
Scanner scanner = new Scanner (System.in);
System.out.println("Count non-matching characters in two strings.");
System.out.println("Enter first word");
String word1 = scanner.next();
System.out.println("Enter second word");
String word2 = scanner.next();
int count = countNonMatchChars (word1, word2);
if (count < 0) {
System.out.println ("Words are a diffrent length");
System.out.println (" " + word1 + " Has " + word1.length() + " chars");
System.out.println (" " + word2 + " Has " + word2.length() + " chars");
} else {
System.out.println (count + " different chars");
}
}
}

Categories

Resources