Method count words in a file

Method count words in a file - java

Hi guys I'm writing a method which counts words in a file, but apparently there is a mistake somewhere in the code and the method does not work. Here's my code:
public class Main2 {
public static void main(String[] args) {
count("/home/bruno/Desktop/WAR_JEE_S_09_Podstawy/MojPlik");
}
static int count(String fileName){
Path path = Paths.get(fileName);
int ilosc = 0;
String wyjscie = "";
try {
for (String charakter : Files.readAllLines(path)){
wyjscie += charakter;
}
StringTokenizer token = new StringTokenizer(wyjscie," \n");
} catch (IOException e) {
e.printStackTrace();
}
return ilosc;
}
}
The file path is correct, here is the file content
test test
test
test
after i call the method in main it displays nothing. Where is the mistake ?

Your code would count lines in a file ... well, if you followed up on that thought.
Right ow your code is simply reading lines, putting them into one large string, to then do nothing about the result of that operation. You have a single int counter ... who is init'ed to 0, and then just returned without ever being used/increased! And unless I am mistaken, readAllLines() will automatically remove the newline char in the end, so overall, your code is nothing but useless.
To count words you have to take each line and (for example) split that one-line-string for spaces. That gives you a number. Then add up these numbers.
Long story short: the real answer here is that you should step back. Don't just write code, assuming that this will magically solve the problem. Instead: first think up a strategy (algorithm) that solves the problem. Write down the algorithm ideas using a pen and paper. Then "manually" run the algorithm on some sample data. Then, in the end, turn the algorithm into code.

Also, beside that you does not output anything, there is a slight error behind you logic. I have made a few changes here and there to get your code working.
s.trim() removes any leading and trainling whitespace, and trimmed.split("\\s+") splits the string at any whitespace character, including spaces.
static int count(String fileName) throws IOException {
Path path = Paths.get(fileName);
int count = 0;
List<String> lines = Files.readAllLines(path);
for (String s : lines) {
String trimmed = s.trim();
count += trimmed.isEmpty() ? 0 : trimmed.split("\\s+").length;
}
return count;
}

Here is the code using functional-style programming in Java 8. This is also a common example of using Stream's flatMap - may be used for counting or printing words from a file.
long n = Files.lines(Paths.get("test.txt"))
.flatMap(s -> Stream.of(s.split("\\s+")))
.count();
System.out.println("No. of words: " + n);
Note the Files.lines(Path) returns a Stream<String> which has the lines from the input file. This method is similar to readAllLines, but returns a stream instead of a List.

Related

Stop printing line of text from a file after a character appears a second time

I am currently trying to stop printing a line of text after a , character is read on that line a second time from a text file. Example; 14, "Stanley #2 Philips Screwdriver", true, 6.95. Stop reading and print out the text after the , character is read a second time. So the output text should look like 14, "Stanley #2 Philips Screwdriver". I tried to use a limit on the regex to achieve this but, it just omits all the commas and prints out the entire text. This is what my code looks like so far;
public static void fileReader() throws FileNotFoundException {
File file = new File("/Users/14077/Downloads/inventory.txt");
Scanner scan = new Scanner(file);
String test = "4452";
while (scan.hasNext()) {
String line = scan.nextLine();
String[] itemID = line.split(",", 5); //attempt to use a regex limit
if(itemID[0].equals(test)) {
for(String a : itemID)
System.out.println(a);
}//end if
}//end while
}//end fileReader
I also tried to print just part of the text up until the first comma like;
String itemID[] = line.split(",", 5);
System.out.println(itemID[0]);
But no luck, it just prints 14. Please any help will be appreciated.

What about something using String.indexOf and String.substring functions (https://docs.oracle.com/javase/7/docs/api/java/lang/String.html)
int indexSecondOccurence = line.indexOf(",", line.indexOf(",") + 1);
System.out.println(line.substring(0, indexSecondOccurence + 1));

I'd suggest to modify your code as follows.
...
String[] itemID = line.split(",", 3); //attempt to use a regex limit
if(itemID[0].equals(test)) {
System.out.println(String.join (",", itemID[0],itemID[1]));
}
...
The split() call will produce an array with maximum 3 elements. First two will be the string pieces that you need. The last element is the remaining "tail" of the original string.
Now we only need to merge the pieces back with the join() method.
Hope this helps.

Java | Finding specific lines from file

So I'm currently working on a project from school. I've saved data from customers in a .txt file in this format
-----
17-03-2020 15:49
WashType: De Luxe
ID: 1, Name: Janus Pedersen
-----
-----
20-03-2020 13:07
WashType: Standard
ID: 2, Name: Hardy Akira
-----
In order to thank customers for using this service, I'd like to give a customer some cinema tickets after each 10th purchase from us. To do that, I thought of reading this file again and look for their ID and count that but I simply can't make that work. My initial idea was something like the following, but it keeps giving me a null pointer
String[] words;
FileReader fr = new FileReader("stats.txt");
BufferedReader br = new BufferedReader(fr);
String s;
String input = String.valueOf(washCard.getCardID());
int count=0;
while((s=br.readLine())!=null)
{
words=s.split(" ");
for (String word : words)
{
if (word.equals(input))
{
count++;
System.out.println(word);
}
}
}
Anyone who has any great ideas for this?
For things to be easier I've added it all to a github repo:
https://github.com/rasm937k/curly-broccoli

The below code is based on Michał Kaciuba's answer but adapted to suit the actual format of your stats.txt file. I didn't know how to post this as a comment, hence I am posting it as an answer, but as I said, Michał Kaciuba should get the credit and I think you should accept his answer. Note that explanation of the code follows the actual code.
String input = String.valueOf(washCard.getCardID());
Pattern pttrn = Pattern.compile("^ID: (\\d+)");
Path p = Paths.get("stats.txt");
try {
long count = Files.lines(p) //throws java.io.IOException
.filter(l -> {Matcher mtchr = pttrn.matcher(l); return mtchr.find() && input.equals(mtchr.group(1));})
.count();
System.out.println(count);
}
catch (IOException x) {
x.printStackTrace();
}
Files.lines(p) creates a Stream where every element in the stream is a line from file stats.txt, i.e. a String.
The regular expression matches lines that start with ID: followed by a single space, followed by a series of one or more digits. The digits part is known as a capturing group because it is enclosed in parentheses.
The filter() checks to see if the line from the file matches the regular expression and if it does, the filter() then checks whether the "digits" in that line match your input, i.e. String.valueOf(washCard.getCardID()).
The count() counts all the elements in the stream returned by filter() and count() returns a long.

You can use Java 8 streams for that:
Files.lines(Paths.get("stats.txt"))
.map(line -> line.split(" "))
.filter(words -> words[5].equals(washCardId))
.count();
Also here's a nice tutorial on Java 8 Streams: https://www.baeldung.com/java-8-streams

Counting frequency of words from a .txt file in java

I am working on a Comp Sci assignment. In the end, the program will determine whether a file is written in English or French. Right now, I'm struggling with the method that counts the frequency of words that appears in a .txt file.
I have a set of text files in both English and French in their respective folders labeled 1-20. The method asks for a directory (which in this case is "docs/train/eng/" or "docs/train/fre/") and for how many files that the program should go through (there are 20 files in each folder). Then it reads that file, splits all the words apart (I don't need to worry about capitalization or punctuation), and puts every word in a HashMap along with how many times they were in the file. (Key = word, Value = frequency).
This is the code I came up with for the method:
public static HashMap<String, Integer> countWords(String directory, int nFiles) {
// Declare the HashMap
HashMap<String, Integer> wordCount = new HashMap();
// this large 'for' loop will go through each file in the specified directory.
for (int k = 1; k < nFiles; k++) {
// Puts together the string that the FileReader will refer to.
String learn = directory + k + ".txt";
try {
FileReader reader = new FileReader(learn);
BufferedReader br = new BufferedReader(reader);
// The BufferedReader reads the lines
String line = br.readLine();
// Split the line into a String array to loop through
String[] words = line.split(" ");
int freq = 0;
// for loop goes through every word
for (int i = 0; i < words.length; i++) {
// Case if the HashMap already contains the key.
// If so, just increments the value
if (wordCount.containsKey(words[i])) {
wordCount.put(words[i], freq++);
}
// Otherwise, puts the word into the HashMap
else {
wordCount.put(words[i], freq++);
}
}
// Catching the file not found error
// and any other errors
}
catch (FileNotFoundException fnfe) {
System.err.println("File not found.");
}
catch (Exception e) {
System.err.print(e);
}
}
return wordCount;
}
The code compiles. Unfortunately, when I asked it to print the results of all the word counts for the 20 files, it printed this. It's complete gibberish (though the words are definitely there) and is not at all what I need the method to do.
If anyone could help me debug my code, I would greatly appreciate it. I've been at it for ages, conducting test after test and I'm ready to give up.

Let me combine all the good answers here.
1) Split up your methods to handle one thing each. One to read the files into strings[], one to process the strings[], and one to call the first two.
2) When you split think deeply about how you want to split. As #m0skit0 suggest you should likely split with \b for this problem.
3) As #jas suggested you should first check if your map already has the word. If it does increment the count, if not add the word to the map and set it's count to 1.
4) To print out the map in the way you likely expect, take a look at the below:
Map test = new HashMap();
for (Map.Entry entry : test.entrySet()){
System.out.println(entry.getKey() + " " + entry.getValue());
}

I would have expected something more like this. Does it make sense?
if (wordCount.containsKey(words[i])) {
int n = wordCount.get(words[i]);
wordCount.put(words[i], ++n);
}
// Otherwise, puts the word into the HashMap
else {
wordCount.put(words[i], 1);
}
If the word is already in the hashmap, we want to get the current count, add 1 to that and replace the word with the new count in the hashmap.
If the word is not yet in the hashmap, we simply put it in the map with a count of 1 to start with. The next time we see the same word we'll up the count to 2, etc.

If you split by space only, then other signs (parenthesis, punctuation marks, etc...) will be included in the words. For example: "This phrase, contains... funny stuff", if you split it by space you get: "This" "phrase," "contains..." "funny" and "stuff".
You can avoid this by splitting by word boundary (\b) instead.
line.split("\\b");
Btw your if and else parts are identical. You're always incrementing freq by one, which doesn't make much sense. If the word is already in the map, you want to get the current frequency, add 1 to it, and update the frequency in the map. If not, you put it in the map with a value of 1.
And pro tip: always print/log the full stacktrace for the exceptions.

Using a user inputted string of characters find the longest word that can be made

Basically I want to create a program which simulates the 'Countdown' game on Channel 4. In effect a user must input 9 letters and the program will search for the largest word in the dictionary that can be made from these letters.I think a tree structure would be better to go with rather than hash tables. I already have a file which contains the words in the dictionary and will be using file io.
This is my file io class:
public static void main(String[] args){
FileIO reader = new FileIO();
String[] contents = reader.load("dictionary.txt");
}
This is what I have so far in my Countdown class
public static void main(String[] args) throws IOException{
Scanner scan = new Scanner(System.in);
letters = scan.NextLine();
}
I get totally lost from here. I know this is only the start but I'm not looking for answers. I just want a small bit of help and maybe a pointer in the right direction. I'm only new to java and found this question in an interview book and thought I should give it a .
Thanks in advance

welcome to the world of Java :)
The first thing I see there that you have two main methods, you don't actually need that. Your program will have a single entry point in most cases then it does all its logic and handles user input and everything.
You're thinking of a tree structure which is good, though there might be a better idea to store this. Try this: http://en.wikipedia.org/wiki/Trie
What your program has to do is read all the words from the file line by line, and in this process build your data structure, the tree. When that's done you can ask the user for input and after the input is entered you can search the tree.
Since you asked specifically not to provide answers I won't put code here, but feel free to ask if you're unclear about something

There are only about 800,000 words in the English language, so an efficient solution would be to store those 800,000 words as 800,000 arrays of 26 1-byte integers that count how many times each letter is used in the word, and then for an input 9 characters you convert to similar 26 integer count format for the query, and then a word can be formed from the query letters if the query vector is greater than or equal to the word-vector component-wise. You could easily process on the order of 100 queries per second this way.

I would write a program that starts with all the two-letter words, then does the three-letter words, the four-letter words and so on.
When you do the two-letter words, you'll want some way of picking the first letter, then picking the second letter from what remains. You'll probably want to use recursion for this part. Lastly, you'll check it against the dictionary. Try to write it in a way that means you can re-use the same code for the three-letter words.

I believe, the power of Regular Expressions would come in handy in your case:
1) Create a regular expression string with a symbol class like: /^[abcdefghi]*$/ with your letters inside instead of "abcdefghi".
2) Use that regular expression as a filter to get a strings array from your text file.
3) Sort it by length. The longest word is what you need!
Check the Regular Expressions Reference for more information.
UPD: Here is a good Java Regex Tutorial.

A first approach could be using a tree with all the letters present in the wordlist.
If one node is the end of a word, then is marked as an end-of-word node.
In the picture above, the longest word is banana. But there are other words, like ball, ban, or banal.
So, a node must have:
A character
If it is the end of a word
A list of children. (max 26)
The insertion algorithm is very simple: In each step we "cut" the first character of the word until the word has no more characters.
public class TreeNode {
public char c;
private boolean isEndOfWord = false;
private TreeNode[] children = new TreeNode[26];
public TreeNode(char c) {
this.c = c;
}
public void put(String s) {
if (s.isEmpty())
{
this.isEndOfWord = true;
return;
}
char first = s.charAt(0);
int pos = position(first);
if (this.children[pos] == null)
this.children[pos] = new TreeNode(first);
this.children[pos].put(s.substring(1));
}
public String search(char[] letters) {
String word = "";
String w = "";
for (int i = 0; i < letters.length; i++)
{
TreeNode child = children[position(letters[i])];
if (child != null)
w = child.search(letters);
//this is not efficient. It should be optimized.
if (w.contains("%")
&& w.substring(0, w.lastIndexOf("%")).length() > word
.length())
word = w;
}
// if a node its end-of-word we add the special char '%'
return c + (this.isEndOfWord ? "%" : "") + word;
}
//if 'a' returns 0, if 'b' returns 1...etc
public static int position(char c) {
return ((byte) c) - 97;
}
}
Example:
public static void main(String[] args) {
//root
TreeNode t = new TreeNode('R');
//for skipping words with "'" in the wordlist
Pattern p = Pattern.compile(".*\\W+.*");
int nw = 0;
try (BufferedReader br = new BufferedReader(new FileReader(
"files/wordsEn.txt")))
{
for (String line; (line = br.readLine()) != null;)
{
if (p.matcher(line).find())
continue;
t.put(line);
nw++;
}
// line is not visible here.
br.close();
System.out.println("number of words : " + nw);
String res = null;
// substring (1) because of the root
res = t.search("vuetsrcanoli".toCharArray()).substring(1);
System.out.println(res.replace("%", ""));
}
catch (Exception e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Output:
number of words : 109563
counterrevolutionaries
Notes:
The wordlist is taken from here
the reading part is based on another SO question : How to read a large text file line by line using Java?

More efficient or more modern? Reading in & Sorting A Text File With Java

I've been trying to upgrade my Java skills to use more of Java 5 & Java 6. I've been playing around with some programming exercises. I was asked to read in a paragraph from a text file and output a sorted (descending) list of words and output the count of each word.
My code is below.
My questions are:
Is my file input routine the most respectful of JVM resources?
Is it possible to cut steps out in regards to reading the file contents and getting the content into a collection that can make a sorted list of words?
Am I using the Collection classes and interface the most efficient way I can?
Thanks much for any opinions. I'm just trying to have some fun and improve my programming skills.
import java.io.*;
import java.util.*;
public class Sort
{
public static void main(String[] args)
{
String sUnsorted = null;
String[] saSplit = null;
int iCurrentWordCount = 1;
String currentword = null;
String pastword = "";
// Read the text file into a string
sUnsorted = readIn("input1.txt");
// Parse the String by white space into String array of single words
saSplit = sUnsorted.split("\\s+");
// Sort the String array in descending order
java.util.Arrays.sort(saSplit, Collections.reverseOrder());
// Count the occurences of each word in the String array
for (int i = 0; i < saSplit.length; i++ )
{
currentword = saSplit[i];
// If this word was seen before, increase the count & print the
// word to stdout
if ( currentword.equals(pastword) )
{
iCurrentWordCount ++;
System.out.println(currentword);
}
// Output the count of the LAST word to stdout,
// Reset our counter
else if (!currentword.equals(pastword))
{
if ( !pastword.equals("") )
{
System.out.println("Word Count for " + pastword + ": " + iCurrentWordCount);
}
System.out.println(currentword );
iCurrentWordCount = 1;
}
pastword = currentword;
}// end for loop
// Print out the count for the last word processed
System.out.println("Word Count for " + currentword + ": " + iCurrentWordCount);
}// end funciton main()
// Read The Input File Into A String
public static String readIn(String infile)
{
String result = " ";
try
{
FileInputStream file = new FileInputStream (infile);
DataInputStream in = new DataInputStream (file);
byte[] b = new byte[ in.available() ];
in.readFully (b);
in.close ();
result = new String (b, 0, b.length, "US-ASCII");
}
catch ( Exception e )
{
e.printStackTrace();
}
return result;
}// end funciton readIn()
}// end class Sort()
/////////////////////////////////////////////////
// Updated Copy 1, Based On The Useful Comments
//////////////////////////////////////////////////
import java.io.*;
import java.util.*;
public class Sort2
{
public static void main(String[] args) throws Exception
{
// Scanner will tokenize on white space, like we need
Scanner scanner = new Scanner(new FileInputStream("input1.txt"));
ArrayList <String> wordlist = new ArrayList<String>();
String currentword = null;
String pastword = null;
int iCurrentWordCount = 1;
while (scanner.hasNext())
wordlist.add(scanner.next() );
// Sort in descending natural order
Collections.sort(wordlist);
Collections.reverse(wordlist);
for ( String temp : wordlist )
{
currentword = temp;
// If this word was seen before, increase the count & print the
// word to stdout
if ( currentword.equals(pastword) )
{
iCurrentWordCount ++;
System.out.println(currentword);
}
// Output the count of the LAST word to stdout,
// Reset our counter
else //if (!currentword.equals(pastword))
{
if ( pastword != null )
System.out.println("Count for " + pastword + ": " +
CurrentWordCount);
System.out.println(currentword );
iCurrentWordCount = 1;
}
pastword = currentword;
}// end for loop
System.out.println("Count for " + currentword + ": " + iCurrentWordCount);
}// end funciton main()
}// end class Sort2

There are more idiomatic ways of reading in all the words in a file in Java.
BreakIterator is a better way of reading in words from an input.
Use List<String> instead of Array in almost all cases. Array isn't technically part of the Collection API and isn't as easy to replace implementations as List, Set and Map are.
You should use a Map<String,AtomicInteger> to do your word counting instead of walking the Array over and over. AtomicInteger is mutable unlike Integer so you can just incrementAndGet() in a single operation that just happens to be thread safe. A SortedMap implementation would give you the words in order with their counts as well.
Make as many variables, even local ones final as possible. and declare them right before you use them, not at the top where their intended scope will get lost.
You should almost always use a BufferedReader or BufferedStream with an appropriate buffer size equal to a multiple of your disk block size when doing disk IO.
That said, don't concern yourself with micro optimizations until you have "correct" behavior.

the SortedMap type might be efficient enough memory-wise to use here in the form SortedMap<String,Integer> (especially if the word counts are likely to be under 128)
you can provide customer delimiters to the Scanner type for breaking streams
Depending on how you want to treat the data, you might also want to strip punctuation or go for more advanced word isolation with a break iterator - see the java.text package or the ICU project.
Also - I recommend declaring variables when you first assign them and stop assigning unwanted null values.
To elaborate, you can count words in a map like this:
void increment(Map<String, Integer> wordCountMap, String word) {
Integer count = wordCountMap.get(word);
wordCountMap.put(word, count == null ? 1 : ++count);
}
Due to the immutability of Integer and the behaviour of autoboxing, this might result in excessive object instantiation for large data sets. An alternative would be (as others suggest) to use a mutable int wrapper (of which AtomicInteger is a form.)

Can you use Guava for your homework assignment? Multiset handles the counting. Specifically, LinkedHashMultiset might be useful.

Some other things you might find interesting:
To read the file you could use a BufferedReader (if it's text only).
This:
for (int i = 0; i < saSplit.length; i++ ){
currentword = saSplit[i];
[...]
}
Could be done using a extended for-loop (the Java-foreach), like shown here.
if ( currentword.equals(pastword) ){
[...]
} else if (!currentword.equals(pastword)) {
[...]
}
In your case, you can simply use a single else so the condition isn't checked again (because if the words aren't the same, they can only be different).
if ( !pastword.equals("") )
I think using length is faster here:
if (!pastword.length == 0)

Input method:
Make it easier on yourself and deal directly with characters instead of bytes. For example, you could use a FileReader and possibly wrap it inside a BufferedReader. At the least, I'd suggest looking at InputStreamReader, as the implementation to change from bytes to characters is already done for you. My preference would be using Scanner.
I would prefer returning null or throwing an exception from your readIn() method. Exceptions should not be used for flow control, but, here, you're sending an important message back to the caller: the file that you provided was not valid. Which brings me to another point: consider whether you truly want to catch all exceptions, or just ones of certain types. You'll have to handle all checked exceptions, but you may want to handle them differently.
Collections:
You're really not use Collections classes, you're using an array. Your implementation seems fine, but...
There are certainly many ways of handling this problem. Your method -- sorting then comparing to last -- is O(nlogn) on average. That's certainly not bad. Look at a way of using a Map implementation (such as HashMap) to store the data you need while only traversing the text in O(n) (HashMap's get() and put() -- and presumably contains() -- methods are O(1)).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Method count words in a file - java

Related

Stop printing line of text from a file after a character appears a second time

Java | Finding specific lines from file

Counting frequency of words from a .txt file in java

Using a user inputted string of characters find the longest word that can be made

More efficient or more modern? Reading in & Sorting A Text File With Java

Categories

Resources