Paragraphs being miscounted - java

So I want my program to count the number of paragraphs from a text file but unfortunately I end being 1 number off. I need the answer of 4 when I keep getting 5. Here is the text:
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in
Liberty, and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so
dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a
portion of that field, as a final resting place for those who here gave their lives that that nation might
live. It is altogether fitting and proper that we should do this.
But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground.
The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add
or detract. The world will little note, nor long remember what we say here, but it can never forget
what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which
they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great
task remaining before us -- that from these honored dead we take increased devotion to that cause for which
they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have
died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of
the people, by the people, for the people, shall not perish from the earth.
Abraham Lincoln
November 19, 1863
And here is my code:
public static void main(String[] args) {
String input;
Scanner kbd = new Scanner(System.in);
System.out.print("Enter the name of the input file: ");
input = kbd.nextLine();
try
{
// Set up connection to the input file
Scanner input1 = new Scanner(new FileReader(input));
// Set up connection to an output file
PrintWriter output=newPrintWriter(newFileOutputStream(input".txt"));
// initialize the counter for the line numbers
int lineNum = 0;
int words = 0;
int characters = 0;
int paragraphs = 0;
// as long as there are more lines left in the input file
// read from the input file, and copy to the output file
while (input1.hasNextLine())
{
// read a line from the input file
String line;
line = input1.nextLine();
// copy the line to the output file, adding a
// line number at the front of the line
output.println(line + "\n");
// increment the line counter
lineNum++;
//Section for counting the words
boolean word = false;
for (int i = 0; i < line.length(); i++) {
//checks for letters and counts as word till it finds a space then checks for a letter again.
if (!Character.isWhitespace(line.charAt(i)) && !word) {
words++;
word = true;
}
else if (Character.isWhitespace(line.charAt(i)) && word){
word = false;
}
}
characters += line.length();
paragraphs += getPara(line);
}
// close the files
input1.close();
output.close();
System.out.println("Lines: " + lineNum);
System.out.println("Words: " + words);
System.out.println("Characters: " + ((characters)));
System.out.println("Paragraphs: " + paragraphs);
}
catch(FileNotFoundException e)
{
System.out.println("There was an error opening one of the files.");
}
}
public static int getPara(String line){
int count = 0;
boolean p = false;
if (line.isEmpty() && !p){
count++;
p = true;
}
else if (!line.isEmpty() && p){
p = false;
}
return count;
}
}

Your code counts empty lines rather than paragraphs. Stray empty lines in your input file will add to your paragraph count.
Assumptions and definitions are key in cases like this. How is a paragraph defined in your requirements documentation? Can you assume that paragraphs will be a single line terminated by a newline or will you need to account for newlines within paragraphs? If it's the former, all nonempty lines returned by nextLine() will be paragraphs. If it's the latter, then you will need to add to your paragraph count only if an empty line or EOF follows a nonempty line.
In any case, you will be better served using language utilities like String.split() to count your words, unless you're required to do it manually.

Related

Java - How to Delimit Single Quotes Around a Phrase but Not an Apostrophe in a Word

I am practicing Java on my own from a book. I read the chapter on text processing and wrapper classes and attempted the excercise below.
Word Counter
Write a program that asks the user for the name of a file. The program should display the number of words that the file contains.
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import java.util.StringTokenizer;
public class FileWordCounter {
public static void main(String[] args) throws IOException {
// Create a Scanner object
Scanner keyboard = new Scanner(System.in);
// Ask user for filename
System.out.print("Enter the name of a file: ");
String filename = keyboard.nextLine();
// Open file for reading
File file = new File(filename);
Scanner inputFile = new Scanner(file);
int words = 0;
String word = "";
while (inputFile.hasNextLine()) {
String line = inputFile.nextLine();
System.out.println(line); // for debugging
StringTokenizer stringTokenizer = new StringTokenizer(line, " \n.!?;,()"); // Create a StringTokenizer object and use the current line contents and delimiters as parameters
while (stringTokenizer.hasMoreTokens()) { // for each line do this
word = stringTokenizer.nextToken();
System.out.println(word); // for debugging
words++;
}
System.out.println("Line contains " + words + " words");
}
// Close file
inputFile.close();
System.out.println("The file has " + words + " words.");
}
}
I chose this random poem from online to test this program. I put the poem in a file called TheSniper.txt:
Two hundred yards away he saw his head;
He raised his rifle, took quick aim and shot him.
Two hundred yards away the man dropped dead;
With bright exulting eye he turned and said,
'By Jove, I got him!'
And he was jubilant; had he not won
The meed of praise his comrades haste to pay?
He smiled; he could not see what he had done;
The dead man lay two hundred yards away.
He could not see the dead, reproachful eyes,
The youthful face which Death had not defiled
But had transfigured when he claimed his prize.
Had he seen this perhaps he had not smiled.
He could not see the woman as she wept
To the news two hundred miles away,
Or through his very dream she would have crept.
And into all his thoughts by night and day.
Two hundred yards away, and, bending o'er
A body in a trench, rough men proclaim
Sadly, that Fritz, the merry is no more.
(Or shall we call him Jack? It's all the same.)
Here is some of my output...
For debugging purposes, I print out each line and the total words in the file up including those in the current line.
Enter the name of a file: TheSniper.txt
Two hundred yards away he saw his head;
Two
hundred
yards
away
he
saw
his
head
Line contains 8 words
He raised his rifle, took quick aim and shot him.
He
raised
his
rifle
took
quick
aim
and
shot
him
Line contains 18 words
...
At the end, my program displays that the poem has 176 words. However, Microsoft Word counts 174 words. I see from printing each word that I am miscounting apostrophes and single quotes. Here is the last section of the poem in my output where the problem occurs:
(Or shall we call him Jack? It's all the same.)
Or
shall
we
call
him
Jack
It
s
all
the
same
Line contains 176 words
The file has 176 words
In my StringTokenizer parameter list, when I don't delimit a single quote, which looks like an apostrophe, the word "It's" is counted as one. However, when I do, its counted as two words (It and s) because the apostrophe, which looks like a single quote, gets delimited. Also, the phrase 'By Jove, I got him!' is miscounted when I don't delimit the single quote/apostrophe. Are the apostrophe and single quote the same character when it comes to delimiting them?? I'm not sure how to delimit single quotes that surround a phrase but not an apostrophe between a word like "It's". I hope I am somewhat clear in asking my question. Please ask for any clarifications. Any guidance is appreciated. Thank you!
Why not use another Scanner for each line to count the number of words?
int words = 0;
while (inputFile.hasNextLine()) {
int lineLength = 0;
Scanner lineScanner = new Scanner(inputFile.nextLine());
while (lineScanner.hasNext()) {
System.out.println(lineScanner.next());
lineLength++;
}
System.out.println("Line contains " + lineLength + " words");
words += lineLength;
}
I don't believe it is possible to delimit a single quote for a phrase like "'By Jove, I got him!'", but ignore it in "it's" unless you use a regex search to ignore single quotes in the middle of a word.
Alternatively, you could treat the characters ".!?;,()" as part of a single word (eg. "Jack?" is one word), which will give you the correct word count. This is what the scanner does. Just change the delimiter in your StringTokenizer to " " (\n isn't required since you're already searching each line):
StringTokenizer stringTokenizer = new StringTokenizer(line, " ");

How do I check see a newline character with my scanner if I'm using a delimiter?

I'm trying to make an undirected graph with some of the nodes (not all, unlike my example) being connected to one another. So my input format will look like
3
1:2,3
2:1,3
3:1,2
Meaning there's three nodes in all, and 1 is connected to 2 and 3, 2 is connected to 1 and 3 and so on.
However, I cannot understand how to take the input in a meaningful way. Here's what I've got so far.
public Graph createGraph() {
Scanner scan = new Scanner(System.in).useDelimiter("[:|,|\\n]");
int graphSize = scan.nextInt();
System.out.println(graphSize);
for (int i = 0; i < graphSize; i++) {
while (!scan.hasNext("\\n")) {
System.out.println("Scanned: " + scan.nextInt());
}
}
return new Graph(graphSize);
}
Can my
while (!scan.hasNext("\\n"))
see the newline character when I'm using a delimiter on it?
In my opinion, you shouldn't be using those delimiters if they are meaningful tokens. In the second line for example, the first integer doesn't have the same meaning as the others, so the : is meaningful and should be scanned, even if only to be discarded later. , however doesn't change the meaning of the tokens that are separated by it, so it's safe to use as a delimiter : you can grab integers as long as they are delimited by ,, they still have the same meaning.
So in conclusion, I would use , as a delimiter and check manually for \nand : so I can adapt my code behaviour when I encounter them.
yup, scanner can definitely detect new line. infact you dont even have to explicitly specify it. just use
scan.hasNextLine()
which essentially keeps going as long as there are lines in your input
Edit
Why dont you read everything first and then use your for loop?
Alright I figured it out. It's not the prettiest code I've ever written, but it gets the job done.
public Graph createGraph() {
Scanner scan = new Scanner(System.in);
number = scan.nextLine();
graphSize = Integer.valueOf(number);
System.out.println(graphSize);
for (int i = 0; i < graphSize; i++) {
number = scan.nextLine();
Scanner reader = new Scanner(number).useDelimiter(",|:");
while (reader.hasNextByte()) {
System.out.println("Scanned: " + reader.nextInt());
}
}
return new Graph(graphSize);
}

Using bufferedreader then convert to a string

Hi im having this assignment that I don't really understand how to pull off.
Ive been programing java for 2.5 weeks so Im really new.
Im supposed to import a text document into my program and then do these operations, count letters, sentences and average length of words. I've to perform the counting task letter by letter, I'm not allowed to scan the entire document at the same time. Ive managed to import the text and also print it out, but my problem is I cant use my string "line" to do any of these operations. Ive tried converting it to arrays, strings and after a lot of failed attempts im giving up. So how do I convert my input to something I can use, because i always get the error message "line is not a variable" or smth like that.
Jesper
UPDATE WITH MY SOLUTION! also some of it is in Swedish, sorry for that.
Somehow the Format is wrong so I uploaded the code here instead, really don't feel to argue with this wright now!
http://txs.io/3eIb
To count letters, check each character. If it's a space or punctuation, ignore it. Otherwise, it's a letter and we should this increment.
Every word should have a space after it unless it is the last word of the sentence. To get the number of words, track the number of spaces + number of sentences. To get number of sentences, find the number of ! ? and .
I would do that by looking at the ascii value of each character.
int numSentences = 0;
int numWords = 0;
while (line = ...){
for(int i = 0; i <line.length(); i++){
int curCharAsc = (int)(line.at(i)) //get ascii value by casting char to int
if((curCharAsc >= 65 && curCharAsc <= 90) || (curCharAsc >= 97 && curCharAsc <= 122) //check if letter is uppercase or lowercase
numLetters++;
if(curCharAsc == 32){ //ascii for space
numWords++;
}
else if (curCharAsc == 33 || curCharAsc == 46 || curCharAsc == 63){
numWords++;
numSentences++;
}
}
}
double avgWordLength = ((double)(letters))/numWords; //cast to double before dividing to avoid round-off
Your code as presented works fine, it loads a file and prints out the contents line by line. What you probably need to do is capture each of those lines. Java has two useful classes for this StringBuilder or StringBuffer (pick one).
BufferedReader input = new BufferedReader(new FileReader(args[0]));
String line;
StringBuffer buffer = new StringBuffer();
while ((line = input.readLine()) != null) {
System.out.println(line);
buffer.append(line+" ");
}
input.close();
performOperations(buffer.toString());
The only other possibility is (if your own code is not running for you) - possibly you aren't passing the input file name as a parameter when you run this class?
UPDATE
NB - I've modified the line
buffer.append(line+"\n");
to add a space instead of a line break, so that it is compatible with algorithms in the #faraza answer
The method performOperations doesn't exist yet. So you should / could add something like this
public static void performOperations(String data){
}
You method could in turn make calls out to separate methods for each operation
public static void performOperations(String data){
countWords(data);
countLetters(data);
averageWordLength(data);
}
To take it to the next level, and introduce Object Orientation, you could create a class TextStatsCollector.
public class TextStatsCollector{
private final String data;
public TextStatsCollector(final String data) {
this.data = data;
}
public int countWords(){
//word count impl here
}
public int countLetters(){
//letter count impl here
}
public int averageWordLength(){
//average word length impl here
}
public void performOperations(){
System.out.println("Number of Words is " + countWords());
System.out.println("Number of Letters is " + countLetters());
System.out.println("Average word length is " + averageWordLength());
}
}
Then you could use TextStatsCollector like the following in your main method
new TextStatsCollector(buffer.toString()).performOperations();

Using a user inputted string of characters find the longest word that can be made

Basically I want to create a program which simulates the 'Countdown' game on Channel 4. In effect a user must input 9 letters and the program will search for the largest word in the dictionary that can be made from these letters.I think a tree structure would be better to go with rather than hash tables. I already have a file which contains the words in the dictionary and will be using file io.
This is my file io class:
public static void main(String[] args){
FileIO reader = new FileIO();
String[] contents = reader.load("dictionary.txt");
}
This is what I have so far in my Countdown class
public static void main(String[] args) throws IOException{
Scanner scan = new Scanner(System.in);
letters = scan.NextLine();
}
I get totally lost from here. I know this is only the start but I'm not looking for answers. I just want a small bit of help and maybe a pointer in the right direction. I'm only new to java and found this question in an interview book and thought I should give it a .
Thanks in advance
welcome to the world of Java :)
The first thing I see there that you have two main methods, you don't actually need that. Your program will have a single entry point in most cases then it does all its logic and handles user input and everything.
You're thinking of a tree structure which is good, though there might be a better idea to store this. Try this: http://en.wikipedia.org/wiki/Trie
What your program has to do is read all the words from the file line by line, and in this process build your data structure, the tree. When that's done you can ask the user for input and after the input is entered you can search the tree.
Since you asked specifically not to provide answers I won't put code here, but feel free to ask if you're unclear about something
There are only about 800,000 words in the English language, so an efficient solution would be to store those 800,000 words as 800,000 arrays of 26 1-byte integers that count how many times each letter is used in the word, and then for an input 9 characters you convert to similar 26 integer count format for the query, and then a word can be formed from the query letters if the query vector is greater than or equal to the word-vector component-wise. You could easily process on the order of 100 queries per second this way.
I would write a program that starts with all the two-letter words, then does the three-letter words, the four-letter words and so on.
When you do the two-letter words, you'll want some way of picking the first letter, then picking the second letter from what remains. You'll probably want to use recursion for this part. Lastly, you'll check it against the dictionary. Try to write it in a way that means you can re-use the same code for the three-letter words.
I believe, the power of Regular Expressions would come in handy in your case:
1) Create a regular expression string with a symbol class like: /^[abcdefghi]*$/ with your letters inside instead of "abcdefghi".
2) Use that regular expression as a filter to get a strings array from your text file.
3) Sort it by length. The longest word is what you need!
Check the Regular Expressions Reference for more information.
UPD: Here is a good Java Regex Tutorial.
A first approach could be using a tree with all the letters present in the wordlist.
If one node is the end of a word, then is marked as an end-of-word node.
In the picture above, the longest word is banana. But there are other words, like ball, ban, or banal.
So, a node must have:
A character
If it is the end of a word
A list of children. (max 26)
The insertion algorithm is very simple: In each step we "cut" the first character of the word until the word has no more characters.
public class TreeNode {
public char c;
private boolean isEndOfWord = false;
private TreeNode[] children = new TreeNode[26];
public TreeNode(char c) {
this.c = c;
}
public void put(String s) {
if (s.isEmpty())
{
this.isEndOfWord = true;
return;
}
char first = s.charAt(0);
int pos = position(first);
if (this.children[pos] == null)
this.children[pos] = new TreeNode(first);
this.children[pos].put(s.substring(1));
}
public String search(char[] letters) {
String word = "";
String w = "";
for (int i = 0; i < letters.length; i++)
{
TreeNode child = children[position(letters[i])];
if (child != null)
w = child.search(letters);
//this is not efficient. It should be optimized.
if (w.contains("%")
&& w.substring(0, w.lastIndexOf("%")).length() > word
.length())
word = w;
}
// if a node its end-of-word we add the special char '%'
return c + (this.isEndOfWord ? "%" : "") + word;
}
//if 'a' returns 0, if 'b' returns 1...etc
public static int position(char c) {
return ((byte) c) - 97;
}
}
Example:
public static void main(String[] args) {
//root
TreeNode t = new TreeNode('R');
//for skipping words with "'" in the wordlist
Pattern p = Pattern.compile(".*\\W+.*");
int nw = 0;
try (BufferedReader br = new BufferedReader(new FileReader(
"files/wordsEn.txt")))
{
for (String line; (line = br.readLine()) != null;)
{
if (p.matcher(line).find())
continue;
t.put(line);
nw++;
}
// line is not visible here.
br.close();
System.out.println("number of words : " + nw);
String res = null;
// substring (1) because of the root
res = t.search("vuetsrcanoli".toCharArray()).substring(1);
System.out.println(res.replace("%", ""));
}
catch (Exception e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Output:
number of words : 109563
counterrevolutionaries
Notes:
The wordlist is taken from here
the reading part is based on another SO question : How to read a large text file line by line using Java?

do-while loop not working as it should

Ok so basically I am having trouble finding out why this is not working as I think it should, and need help getting to the right output. I have tried messing with this format a few ways, but nothing works, and I really don't understand why. Here are the instructions, followed by my source for it:
INSTRUCTIONS
Write a loop that reads strings from standard input where the string is either "land", "air", or "water". The loop terminates when "xxxxx" (five x characters) is read in. Other strings are ignored. After the loop, your code should print out 3 lines: the first consisting of the string "land:" followed by the number of "land" strings read in, the second consisting of the string "air:" followed by the number of "air" strings read in, and the third consisting of the string "water:" followed by the number of "water" strings read in. Each of these should be printed on a separate line.
ASSUME the availability of a variable, stdin , that references a Scanner object associated with standard input.
SOURCE:
int land = 0;
int air = 0;
int water = 0;
do
{
String stdInput = stdin.next();
if (stdInput.equalsIgnoreCase("land"))
{
land++;
}else if (stdInput.equalsIgnoreCase("air"))
{
air++;
}else if (stdInput.equalsIgnoreCase("water"))
{
water++;
}
}while (stdin.equalsIgnoreCase("xxxxx") == false); // I think the issue is here, I just dont't know why it doesn't work this way
System.out.println("land: " + land);
System.out.println("air: " + air);
System.out.println("water: " + water);
You are storing user info in stdInput but your while checks stdin. Try this way
String stdInput = null;
do {
stdInput = stdin.next();
//your ifs....
} while (!stdInput.equalsIgnoreCase("xxxxx"));
This Works:)
I just submitted this code to codelab and it works just fine.
Write a loop that reads strings from standard input where the string is either "land", "air", or "water". The loop terminates when "xxxxx" (five x characters ) is read in. Other strings are ignored. After the loop, your code should print out 3 lines: the first consisting of the string "land:" followed by the number of "land" strings read in, the second consisting of the string "air:" followed by the number of "air" strings read in, and the third consisting of the string "water:" followed by the number of "water" strings read in. Each of these should be printed on a separate line.
int land = 0;
int air = 0;
int water = 0;
String word = "";
while(!(word.equals("xxxxx"))) {
word = stdin.next();
if(word.equals("land")) {
land++;
}else if(word.equals("air")) {
air++;
}else if(word.equals("water")) {
water++;
}
}
System.out.println("land:" + land);
System.out.println("air:" + air);
System.out.println("water:" + water);
I think you want stdInput.equalsIgnoreCase("xxxxx") == false instead of stdin.equalsIgnoreCase("xxxxx") == false.
You are right - the problem is where you indicated. The solution is to not read again from stdin:
Also, you must declare the stdInput before the loop so its scope reaches the while condition:
String stdInput = null;
do {
stdInput = stdin.next();
// rest of code the same
} while (!stdInput.equalsIgnoreCase("xxxxx"));
An alternate way would be a for loop:
for (String stdInput = stdin.next(); !stdInput.equalsIgnoreCase("xxxxx"); stdInput = stdin.next()) {
// rest of code the same
}

Categories

Resources