listing the frequency of words from multiple files - java

I've created a program that will look at a text file in a certain directory and then proceed to list the words in that file.
So for example if my text file contained this.
hello my name is john hello my
The output would show
hello 2
my 2
name 1
is 1
john 1
However now I want my program to search through multiple text files in directory and list all the words that occur in all the text files.
Here is my program that will list the words in a single file.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Scanner;
public class WordCountstackquestion implements Runnable {
private String filename;
public WordCountstackquestion(String filename) {
this.filename = filename;
}
public void run() {
int count = 0;
try {
HashMap<String, Integer> map = new HashMap<String, Integer>();
Scanner in = new Scanner(new File(filename));
while (in.hasNext()) {
String word = in.next();
if (map.containsKey(word))
map.put(word, map.get(word) + 1);
else {
map.put(word, 1);
}
count++;
}
System.out.println(filename + " : " + count);
for (String word : map.keySet()) {
System.out.println(word + " " + map.get(word));
}
} catch (FileNotFoundException e) {
System.out.println(filename + " was not found.");
}
}
}
My main class.
public class Mainstackquestion
{
public static void main(String args[])
{
if(args.length > 0)
{
for (String filename : args)
{
CheckFile(filename);
}
}
else
{
CheckFile("C:\\Users\\User\\Desktop\\files\\1.txt");
}
}
private static void CheckFile(String file)
{
Runnable tester = new WordCountstackquestion(file);
Thread t = new Thread(tester);
t.start();
}
}
I've made an attempt using some online sources to make a method that will look at multiple files. However I'm struggling and can't seem to implement it correctly in my program.
I would have a worker class for each file.
int count;
#Override
public void run()
{
count = 0;
/* Count the words... */
...
++count;
...
}
Then this method to use them.
public static void main(String args[]) throws InterruptedException
{
WordCount[] counters = new WordCount[args.length];
for (int idx = 0; idx < args.length; ++idx) {
counters[idx] = new WordCount(args[idx]);
counters[idx].start();
}
int total = 0;
for (WordCount counter : counters) {
counter.join();
total += counter.count;
}
System.out.println("Total: " + total);
}

I'm going to assume that all of these files lie in the same directory. You can do it this way:
public void run() {
// Replace the link to your filename variable
File f = new File("link/to/folder/here");
// Check if file is a directory (always do this if you are going to use listFiles()
if (f.isDirectory()) {
// I've moved to scanner object outside the code in order to prevent mass creation of an object
Scanner in = null;
// Lists all files in a directory
// You could also use a for loop, but I prefer enchanced for loops
for (File file : f.listFiles()) {
// Everything here is your old code, utilizing a new file (now named "f" instead of "filename"
int count = 0;
try {
HashMap<String, Integer> map = new HashMap<String, Integer>();
in = new Scanner(f);
while (in.hasNext()) {
String word = in.next();
if (map.containsKey(word))
map.put(word, map.get(word) + 1);
else {
map.put(word, 1);
}
count++;
}
System.out.println(f + " : " + count);
for (String word : map.keySet()) {
System.out.println(word + " " + map.get(word));
}
} catch (FileNotFoundException e) {
System.out.println(file + " was not found.");
}
}
// Once done with the scanner, close it (I didn't see it in your code, so including it now)
in.close();
}
}
If you wanted to use a for loop rather than an enhanced for loop (for compatibility purposes), the link shared in the comments.
Otherwise, you can just keep scanning user input, and throwing it all into an ArrayList (or some other form of an ArrayList, whatever is required for your needs) and loop through the arraylist and move around the "File f" variable (to inside the loop), sorta like this:
for(String s : arraylist){
File f = new File(s);
}

Related

how to find most repetitive word in a text file

The code :
import java.io.File;
import java.util.Scanner;
class Main {
public static void main(String[] args) throws Exception{
//code
int max = 0;
int count = 0;
String rep_word = "none";
File myfile = new File("rough.txt");
Scanner reader = new Scanner(myfile);
Scanner sub_reader = new Scanner(myfile);
while (reader.hasNextLine()) {
String each_word = reader.next();
while (sub_reader.hasNextLine()){
String check = sub_reader.next();
if (check == each_word){
count+=1;
}
}
if (max<count){
max = count;
rep_word = each_word;
}
}
System.out.println(rep_word);
reader.close();
sub_reader.close();
}
}
the rough.txt file :
I want to return the most repetitive word from the text file without using arrays.
I'm not getting the desired output. i found that the if statement is not satisfying even when the variable 'check' and 'each_word' are same, I dont understand where i went wrong.
You should be using a map HashMap to quickly and efficiently count the frequency of each word without repetitive re-readings of the input file with two readers.
To do this, Map::merge method is used, it also returns current frequency of the word, so the max frequency can be tracked immediately.
int max = 0;
int count = 0;
String rep_word = "none";
// use LinkedHashMap to maintain insertion order
Map<String, Integer> freqMap = new LinkedHashMap<>();
// use try-with-resources to automatically close scanner
try (Scanner reader = new Scanner(new File("rough.txt"))) {
while (reader.hasNext()) {
String word = reader.next();
count = freqMap.merge(word, 1, Integer::sum);
if (count > max) {
max = count;
rep_word = word;
}
}
}
System.out.println(rep_word + " repeated " + max + " times");
If there are several words with the same frequency, it is easier to find all of them in the map:
for (Map.Entry<String, Integer> entry : freqMap.entrySet()) {
if (max == entry.getValue()) {
System.out.println(entry.getKey() + " repeated " + max + " times");
}
}
You could use a hashMap to store your text as key-value pair: the key is a word and the value will contain its occurrence, Then get the key of maximum value.
Something like the following :
class Main {
public static void main(String[] args) throws Exception{
Map<String, Integer> map = new HashMap<>();
File myfile = new File("/rough.txt");
Scanner reader = new Scanner(myfile);
while (reader.hasNextLine()) {
Scanner sub_reader = new Scanner(reader.nextLine());
while (sub_reader.hasNext()){
String word = sub_reader.next();
// if the word already exist increment the counter
if(map.containsKey(word)) map.put(word, map.get(word) + 1);
else map.put(word, 1);
}
sub_reader.close();
}
// get the key of the max value in the hashmap (java 8 and higher)
String mostRepeated = map.entrySet().stream().max(Comparator.comparing(Map.Entry::getValue)).get().getKey()
System.out.println(mostRepeated);
reader.close();
}
}

Write a program NumberCount that counts the numbers (including integers and floating point values) in one or more text files. (Due today lol)

INSTRUCTIONS:
Write a program NumberCount that counts the numbers (including integers and floating point values) in
one or more text files. Note that only numbers separated by whitespace characters are counted, i.e., only
those numbers that can be read by either readInt() or readDouble() are considered.
So iv been trying to get this program to read text files and the title is pretty much the instructions but it does not want to read my textfiles that i have in the project folder (i tried moving it a bunch of times but anywhere i put it it didnt load up) This is my code
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
public class NumberCount implements Runnable {
private static int combinedCount = 0;
public static synchronized int getCombinedCount() {
return combinedCount;
}
public synchronized void setCombinedCount( int combinedCount) {
this.combinedCount = combinedCount;
}
String filename;
NumberCount(String filename) {
this.filename = filename;
}
NumberCount() {
}
#Override
public void run() {
String fileText = this.getTextFromFile(filename);
System.out.println(filename + ": " + countNumbers(fileText));
setCombinedCount(getCombinedCount() + countNumbers(fileText));
}
String getTextFromFile(String filename) {
try {
String data = "";
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String st;
while ((st = br.readLine()) !=null) {
data += "\n" + st;
}
data = data.replaceAll("\n", " ");
return data;
} catch(Exception e) {
System.out.println("Unable to retrieve text from file: " + filename );
return "";
}
}
int countNumbers(String text) {
int count = 0;
String words[] = text.split(" ");
for (String word : words) {
try {
Integer.parseInt(word);
Float.parseFloat(word);
count++;
} catch (Exception e) {
}
}
return count;
}
String helpMessage() {
String data = "Please call as NumberCount <list of file names>\n";
data += "File names to be present in the same directory";
return data;
}
public static void main(String[] args) {
if (args.length == 0) {
System.out.println(new NumberCount().helpMessage());
} else {
Thread threads[] = new Thread[args.length];
int i = 0;
for (String filename : args) {
NumberCount nc = new NumberCount(filename);
threads[i] = new Thread(nc);
threads[i++].start();
}
try {
for(Thread t : threads) {
t.join();
}
}catch(Exception e) {
e.printStackTrace();
}
System.out.println("combined count: " + getCombinedCount());
}
}
}

Reading input files in Java

The purpose of this program is to read an input file and parse it looking for words. I used a class and instantiated objects to hold each unique word along with a count of that word as found in the input file. For instance, for a sentence “Word” is found once, “are” is found once, “fun” is found twice, ... This program ignores numeric data (e.g. 0, 1, ...) as well as punctuation (things like . , ; : - )
The assignment does not allow using a fixed size array to hold word strings or counts. The program should work regardless of the size of the input file.
I am getting the following compiling error:
'<>' operator is not allowed for source level below 1.7 [line: 9]
import java.io.*;
import java.util.*;
public class Test {
public static void main(String args[]) throws IOException {
HashMap<String,Word> map = new HashMap<>();
// The name of the file to open.
String fileName = "song.txt";
// This will reference one line at a time
String line = null;
try {
// FileReader reads text files in the default encoding.
FileReader fileReader =
new FileReader(fileName);
// Always wrap FileReader in BufferedReader.
BufferedReader bufferedReader =
new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
String[] words = line.split(" ");
for(String word : words){
if(map.containsKey(word)){
Word w = map.get(word);
w.setCount(w.getCount()+1);
}else {
Word w = new Word(word, 1);
map.put(word,w);
}
}
}
// Always close files.
bufferedReader.close();
}
catch(FileNotFoundException ex) {
System.out.println(
"Unable to open file '" +
fileName + "'");
}
catch(IOException ex) {
System.out.println(
"Error reading file '"
+ fileName + "'");
// Or we could just do this:
// ex.printStackTrace();
}
for(Map.Entry<String,Word> entry : map.entrySet()){
System.out.println(entry.getValue().getWord());
System.out.println("count:"+entry.getValue().getCount());
}
}
static class Word{
public Word(String word, int count) {
this.word = word;
this.count = count;
}
String word;
int count;
public String getWord() {
return word;
}
public void setWord(String word) {
this.word = word;
}
public int getCount() {
return count;
}
public void setCount(int count) {
this.count = count;
}
}
}
You either need to compile with a JDK of version 1.7 or later, or change the line:
HashMap<String,Word> map = new HashMap<>();
to
HashMap<String,Word> map = new HashMap<String,Word>();
replace
HashMap<String,Word> map = new HashMap<>();
with:
HashMap<String,Word> map = new HashMap<String,Word>();

How do i load a text file into this program?

How do i go about loading a text file into a java program that i have posted below. I have tried but am out of luck, any help will be appreciated!
Thank you.
import java.io.*;
public class test1 {
public static void main(String args[]) throws Exception {
if (args.length != 1) {
System.out.println("usage: Tut16_ReadText filename");
System.exit(0);
}
try {
FileReader infile = new FileReader(args[0]);
BufferedReader inbuf = new BufferedReader(infile);
String str;
int totalwords = 0, totalchar = 0;
while ((str = inbuf.readLine()) != null) {
String words[] = str.split(" ");
totalwords += words.length;
for (int j = 0; j < words.length; j++) {
totalchar += words[j].length();
}
}
double density = (1.0 * totalchar) / totalwords;
if (totalchar > 0) {
System.out.print(args[0] + " : " + density + " : ");
if (density > 6.0)
System.out.println("heavy");
else
System.out.println("light");
} else
System.out.println("This is an error - denisty of zero.");
infile.close();
} catch (Exception ee) {
System.out.println("This is an error - execution caught.");
}
}
}
If you are running java 8 it is a breeze with the new io streams. Advantage is on large file all text is not read into memory.
public void ReadFile(String filePath){
File txtFile = new File(filePath);
if (txtFile.exists()) {
System.out.println("reading file");
try (Stream<String> filtered = Files.
lines(txtFile.toPath()).
filter(s -> s.contains("2006]"))) {//you can leave this out, but is handy to do some pre filtering
filtered.forEach(s -> handleLine(s));
}
} else {
System.out.println("file not found");
}
}
private void handleLine(String lineText) {
System.out.println(lineText);
}
First of all, there is an easier way to read files. From Java 7 the Files and Paths classes can be used like this:
public static void main(String[] args) throws IOException {
if (args.length != 1) {
System.out.println("usage: Tut16_ReadText filename");
System.exit(0);
}
final List<String> lines = Files.readAllLines(Paths.get(args[0]));
for (String line : lines) {
// Do stuff...
}
// More stuff
}
Then, in order to start the program and get it to read a file that you specify you must provide an argument when starting the app. You pass that argument after the class name on the command prompt like this:
$ java Tut16_ReadText /some/path/someFile.txt
This passes "/some/path/someFile.txt" to the program and then the program will try to read that file.
Another method is to use a Scanner.
Scanner s = new Scanner(new File(args[0]));
while(s.hasNext()){..}

Listing and counting files by their extensions

So far I have a code like this:
import java.io.File;
import java.util.Scanner;
public class Test {
static Scanner input = new Scanner(System.in);
public static void fileListing(File[] files, int depth) {
if(depth == 0)
return;
else {
for(File file: files) {
if(file.isDirectory())
fileListing(file.listFiles(), depth-1);
else {
String ext;
String fileName = file.getName();
if(fileName.lastIndexOf(".") != -1 && fileName.lastIndexOf(".") != 0)
ext = fileName.substring(fileName.lastIndexOf(".")+1);
else
return;
System.out.println(ext);
}
}
}
}
public static void main(String [] args) {
System.out.printf("Path: ");
String path = input.nextLine();
if(new File(path).isDirectory()) {
System.out.printf("Depth: ");
int depth = input.nextInt();
File[] file = new File(path).listFiles();
fileListing(file, depth);
}
else {
System.out.printf("The path %s isn't valid.", path);
System.exit(0);
}
}
}
My output lists files' extensions in a certain directory, e. g.
txt
txt
doc
How to improve this code to show files' extensions with a counter? For example above, output should look like this:
2 txt
1 doc
You can use a Map for it: The code would be:
Map<String,Integer> countExt = new HashMap<String,Integer>();
// Start from here inside your if statement
ext = fileName.substring(fileName.lastIndexOf(".")+1);
// If object already exists
if(countExt.containsKey(ext)){
Integer count = countExt.get(ext);
count++;
//Remove old object and add new
countExt.remove(ext));
countExt.put(ext,count);
}
// If extension is new
else
countExt.put(ext,1);
//For Display
Set<String> keySet = countExt.keys();
for(String key : keySet){
System.out.println(key +" : "+countExt.get(key));
}

Categories

Resources