Print Total Number of Different Words (case sensitive) from a file

Print Total Number of Different Words (case sensitive) from a file - java

**Edit after reviewing Tormod's answer and implementing his advice.
As the title states I'm attempting to print the total number of different words after receiving a file name from command line input. I receive the following message after attempting to compile the program:
Note: Project.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Here is my code. Any help is greatly appreciated:
import java.lang.*;
import java.util.*;
import java.io.*;
public class Project {
public static void main(String[] args) throws IOException {
File file = new File(args[0]);
Scanner s = new Scanner(file);
HashSet lib = new HashSet<>();
try (Scanner sc = new Scanner(new FileInputStream(file))) {
int count = 0;
while(sc.hasNext()) {
sc.next();
count++;
}
System.out.println("The total number of word in the file is: " + count);
}
while (s.hasNext()) {
String data = s.nextLine();
String[] pieces = data.split("\\s+");
for (int count = 0; count < pieces.length; count++)
{
if(!lib.contains(pieces[count])) {
lib.add(pieces[count]);
}
}
}
System.out.print(lib.size());
}
}

I would implement it using a HashSet Add all the words, and read out the size. If you want to make it case insensitive just manipulate all the words to uppercase or something like that. this uses some memory but...
one problem you got with the algorithm is that you do only have one "words". it only holds the words at the same line. so you only count same words at the same line.
HashSet stores strings by their hash value, and thus stores one word only one time.
construction: HashSet lib = new HashSet<>();
inside the loop: if(!lib.contains(word)){lib.add(word);}
check the word count: lib.size()

for(String s : words) {
if(s.equals(word))
count++;
}
You are comparing the words to an empty String, since it's a word it's always gonna be false.
Like Tormod said, the best would be to store the words in a HashSet, as it won't keep duplicates. Then just read out its size.

Related

Java issues with Scanner and hasNextLine() while reading a file

I am having an issue with this unfinished program. I do not understand why this returns a "no Line found exception" when I run it. I have a while loop set up whose purpose is checking for this but I have done something wrong. I am trying to store information from a file into a 2d array for class.
import java.util.Scanner;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.File;
import java.util.Arrays;
public class LabProgram {
public static void main(String[] args) throws IOException {
Scanner scnr = new Scanner(System.in);
int NUM_CHARACTERS = 26; // Maximum number of letters
int MAX_WORDS = 10; // Maximum number of synonyms per starting letter
String userWord = (scnr.next()) + ".txt"; //Get word user wants to search
char userChar = scnr.next().charAt(0); //Get char user wants to search
String[][] synonyms = new String[NUM_CHARACTERS][MAX_WORDS]; // Declare 2D array for all synonyms
String[] words = new String[MAX_WORDS]; // The words of each input line
File aFile = new File(userWord);
Scanner inFile = new Scanner(aFile);
while(inFile.hasNextLine()) {
for(int i = 0; i < synonyms.length; i++) {
words = inFile.nextLine().trim().split(" ");
for(int wordCount = 0; wordCount < words.length; wordCount++) {
synonyms[i][wordCount] = words[wordCount];
}
}
}
}
}

The issue is with this for loop:
for (int i = 0; i < synonyms.length; i++) {
words = inFile.nextLine().trim().split(" ");
....
}
You're iterating from i=0 upto synonym.length-1 times, but that file does not have these much lines, so, as soon as your file is out of lines but the for loop has scope to iterate more, the inFile.nextLine() gets no line and thus throw the exception.
I don't know what you are exactly doing here or want to achieve through this code, but this is what causing you the trouble.
Hope that answers your query.

Basically your problem is that you're only checking hasNextLine() before the for loop starts, and you're actually getting the next line on every iteration of the loop. So if you run out of lines while in the middle of your for loop an exception is thrown.
I'm actually not quite sure what your code is supposed to do but at the very least you need to add a hasNextLine() check every time before you actually run nextLine() to avoid errors like this.

Why is sequential file naming not working?

I am trying to solve this question.
Problem Statement
You are developing a File Manager but encountered a problem. You realised that two files cannot have the same names and if a conflict arises, the file which came later has to be appended with a number N such that N is the smallest positive number that is not used with that particular file name. The number is append in the form of file_name(N). Write a code to solve your problem. You will be given an array of strings of file names. You need to assume that if a file name appears earlier in an array, it was created first.
NOTE: file_name and file_name(2) are two different file names i.e if a file name already has a number appended to it, its a different file name.
Input
The first line contains N, the number of strings.
The next line contains N space-separated strings (file names).
Output
Print the names of files, after making the necessary changes separated by space.
Constraints
1 ≤ N ≤ 50
1 ≤ file_name.length ≤ 25
filename has no white space characters
Sample Input
7
file sample sample file file file(1) file(1)
Sample Output
file sample sample(1) file(1) file(2) file(1)(1) file(1)(2)
Below is my code. When I tested it with my own file names, it renames well but when I submit it, the tests fail. I would like to know what's wrong with my code and why its not working.
import java.util.Scanner;
public class Dcoder {
public static void main (String[] args) {
Scanner scanner = new Scanner (System.in);
// Read number of file names and create
// an array to hold them
String[] fileNames = new String[scanner.nextInt ()];
// Fill the array with the supplied names
// from System.in
for (int i = 0; i < fileNames.length; i++)
fileNames [i] = scanner.next ();
// Modify the file names
for (String fileName : fileNames) {
int count = 0;
for (int i = 0; i < fileNames.length; i++)
if (fileName.equals (fileNames [i])) {
fileNames [i] = fileNames [i] + (count == 0 ? "" : "(" + count + ")");
count++;
}
}
// Print out the modified list of file names
for (String fileName : fileNames)
System.out.print (" " + fileName);
}
}

If all tests fail, then it is likely because your output has a space before the first name.
The output should be the file name, space-separated, not space-prefixed.
If you try input "file file(1) file file", your code outputs
file file(1) file(1)(1) file(2)
but correct output is
file file(1) file(2) file(3)
For better performance, you should use a Set.
static void printUnique(String... fileNames) {
Set<String> used = new HashSet<>();
for (int i = 0; i < fileNames.length; i++) {
String newName = fileNames[i];
for (int j = 1; ! used.add(newName); j++)
newName = fileNames[i] + "(" + j + ")";
if (i != 0)
System.out.print(" ");
System.out.print(newName);
}
System.out.println();
}
Test
printUnique("file", "sample", "sample", "file", "file", "file(1)", "file(1)");
printUnique("file", "file(1)", "file", "file");
Output
file sample sample(1) file(1) file(2) file(1)(1) file(1)(2)
file file(1) file(2) file(3)

Your solution is a procedural approach to the Problem.
Procedural approaches are not bad on their own.
But Java is an Object Oriented programming language and if you want to become a good Java programmer you should start looking for more OO-like solutions.
But OOP doesn't mean to "split up" code into random classes.
The ultimate goal of OOP is to reduce code duplication, improve readability and support reuse as well as extending the code.
Doing OOP means that you follow certain principles which are (among others):
information hiding / encapsulation
single responsibility
separation of concerns
KISS (Keep it simple (and) stupid.)
DRY (Don't repeat yourself.)
"Tell! Don't ask."
Law of demeter ("Don't talk to strangers!")
So what could a more OO-like approach look like?
The underlaying question of that problem is: "How often does a specific file name appear in the input?" We want to find an association between Strings (file Names) and integer values (number of occurrence). This could be represented as a Map<String,Integer>. The whole logic is as simple as looking in the output if the current fileName already exists there and if so add the counter suffix. This means we need another Collection to hold the output.
My Solution would look like this:
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
public class FileNameCounter {
public List<String> renameDoubledFiles(List<String> input) {
Map<String, Integer> occurrencesOfNames = new HashMap<>();
LinkedList<String> output = new LinkedList<>();
for (String fileName : input) {
if (output.contains(fileName)) {
Integer counter = updateCountFor(fileName, occurrencesOfNames);
String suffixedName = appendCounterSuffix(fileName, counter);
output.add(suffixedName);
} else {
output.add(fileName);
}
}
return output;
}
private Integer updateCountFor(String fileName, Map<String, Integer> occurrencesOfNames) {
Integer counter = occurrencesOfNames.getOrDefault(fileName, Integer.valueOf(0));
occurrencesOfNames.put(fileName, ++counter);
return counter;
}
private String appendCounterSuffix(String fileName, Integer counter) {
return String.format("%s(%d)", fileName, counter);
}
}
and here is the JUnit test to prove that it works:
import static org.junit.jupiter.api.Assertions.*;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
import org.junit.jupiter.api.Test;
class FileNameCounterTest {
#Test
void test() {
List<String> input = Arrays.asList("file sample sample file file file(1) file(1)".split(" "));
List<String> renamedDoubledFiles = new FileNameCounter().renameDoubledFiles(input);
String output = renamedDoubledFiles.stream().collect(Collectors.joining(" "));
assertEquals("file sample sample(1) file(1) file(2) file(1)(1) file(1)(2)", output);
}
}

How to take integer and remove other data types from the file java?

I do not know how to take the integer and ignore the strings from the file using scanner. This is what I have so far. I need to know how to read the file token by token. Yes, this is a homework problem. Thank you so much.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ClientMergeAndSort{
public static void main(String[] args){
int length = 13;
try{
Scanner input = new Scanner(System.in);
System.out.print("Enter the file name with extention : ");
File file = new File(input.nextLine());
input = new Scanner(file);
while (!input.hasNextInt()) {
input.next();
}
int[] arraylist = new int[length];
for(int i =0; i < length; i++){
length++;
arraylist[i] = input.nextInt();
System.out.print(arraylist[i] + " ");
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}

Take a look at the API for what you're doing.
http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html#hasNextInt()
Specifically, Scanner.hasNextInt().
"Returns true if the next token in this scanner's input can be interpreted as an int value in the default radix using the nextInt() method. The scanner does not advance past any input."
So, your code:
while (!input.hasNextInt()) {
input.next();
}
That's going to look and see if input hasNextInt().
So if the next token - one character - is an int, it's false, and skips that loop.
If the next token isn't an int, it goes into the loop... and iterates to the next character.
That's going to either:
- find the first number in the input, and stop.
- go to the end of the input, not find any numbers, and probably hits an IllegalStateException when you try to keep going.
Write down in words what you want to do here.
Use the API docs to figure out how the hell to tell the computer that. :) Get one bit at a time right; this has several different parts, and the first one doesn't work yet.
Example: just get it to read a file, and display each line first. That lets you do debugging; it lets you build one thing at a time, and once you know that thing works, you build one more part on it.
Read the file first. Then display it as you read it, so you know it works.
Then worry about if it has numbers or not.

A easy way to do this is read all the data from file in a way that you prefer (line by line for example) and if you need to take tokens, you can use split function (String.split see Java doc) or StringTokenizer for each line of String that you are reading using a loop, in order to create tokens with a specific delimiter (a space for example) so now you have the tokens and you can do something that you need with them, hope you can resolve, if you have question you can ask.
Have a nice programming.

import static java.nio.file.Files.readAllBytes;
import static java.nio.file.Paths.get;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String args[]) throws IOException {
String newStr=new String(readAllBytes(get("data.txt")));
Pattern p = Pattern.compile("-?\\d+");
Matcher m = p.matcher(newStr);
while (m.find()) {
System.out.println("- "+m.group());
}
}
}
This code fill read the file and then using the regular expression you can get only Integer values.
Note: This code works in Java 8

I Think This will work for you requirement.
Before reading the data from the file initially,try to write some content to the file by using scanner and filewriter then try to execute the below code snippet.
File file = new File(your filepath);
List<Integer> list = new ArrayList<Integer>();
try {
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
String str =null;
while(true) {
str = bufferedReader.readLine();
if(str!=null) {
System.out.println(str);
char[] chars = str.toCharArray();
String finalInt = "";
for(int i=0;i<chars.length;i++) {
if(Character.isDigit(chars[i])) {
finalInt=finalInt+chars[i];
}
}
list.add(Integer.parseInt(finalInt));
System.out.println(list.size());
System.out.println(list);
} else {
break;
}
}
}catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
The final println statement will display all the integer in your file line by line.
Thanks

ArrayList: Get length of longest string, Get average length of string

In Java, I have a method that reads in a text file that has all the words in the dictionary, each on their own line.
It reads each line by using a for loop and adds each word to an ArrayList.
I want to get the length of the longest word (String) in the Array. In addition, I want to get the length of the longest word in the dictionary file. It would probably be easier to split this into several methods, but I don't know the syntax.
So far, the code is have is:
public class spellCheck {
static ArrayList <String> dictionary; //the dictonary file
/**
* load file
* #param fileName the file containing the dictionary
* #throws FileNotFoundException
*/
public static void loadDictionary(String fileName) throws FileNotFoundException {
Scanner in = new Scanner(new File(fileName));
while (in.hasNext())
{
for(int i = 0; i < fileName.length(); ++i)
{
String dictionaryword = in.nextLine();
dictionary.add(dictionaryword);
}
}

Assuming that each word is on it's own line, you should be reading the file more like...
try (Scanner in = new Scanner(new File(fileName))) {
while (in.hasNextLine()) {
String dictionaryword = in.nextLine();
dictionary.add(dictionaryword);
}
}
Remember, if you open a resource, you are responsible for closing. See The try-with-resources Statement for more details...
Calculating the metrics can be done after reading the file, but since your here, you could do something like...
int totalWordLength = 0;
String longest = "";
while (in.hasNextLine()) {
String dictionaryword = in.nextLine();
totalWordLength += dictionaryword.length();
dictionary.add(dictionaryword);
if (dictionaryword.length() > longest.length()) {
longest = dictionaryword;
}
}
int averageLength = Math.round(totalWordLength / (float)dictionary.size());
But you could just as easily loop through the dictionary and use the same idea
(nb- I've used local variables, so you will either want to make them class fields or return them wrapped in some kind of "metrics" class - your choice)

Set a two counters and a variable that holds the current longest word found before you start reading in with your while loop. To find the average have one counter be incremented by one each time the line is read and have the second counter add up the total number of characters in each word (obviously the total number of characters entered, divided by the total number of words read -- as denoted by the total number of lines -- is the average length of each word.
As for the longest word, set the longest word to be the empty string or some dummy value like a single character. Each time you read in a line compare the current word with the previously found longest word (using the .length() method on the String to find its length) and if its longer set a new longest word found
Also, if you have all this in a file, I'd use a buffered reader to read in your input data

May be this could help
String words = "Rookie never dissappoints, dont trust any Rookie";
// read your file to string if you get string while reading then you can use below code to do that.
String ss[] = words.split(" ");
List<String> list = Arrays.asList(ss);
Map<Integer,String> set = new Hashtable<Integer,String>();
int i =0;
for(String str : list)
{
set.put(str.length(), str);
System.out.println(list.get(i));
i++;
}
Set<Integer> keys = set.keySet();
System.out.println(keys);
System.out.println(set);
Object j[]= keys.toArray();
Arrays.sort(j);
Object max = j[j.length-1];
set.get(max);
System.out.println("Tha longest word is "+set.get(max));
System.out.println("Length is "+max);

Interview Coding Java Sorting

Write a java program to read input from a file, and then sort the characters within each word. Once you have done that, sort all the resulting words in ascending order and finally followed by the sum of numeric values in the file.
Remove the special characters and stop words while processing the data
Measure the time taken to execute the code
Lets Say the content of file is: Sachin Tendulkar scored 18111 ODI runs and 14692 Test runs.
Output:achins adeklnrtu adn cdeors dio estt nrsu nrsu 32803
Time Taken: 3 milliseconds
My Code takes 15milliseconds to execute.....
please suggest me any fast way to solve this problem...........
Code:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;
public class Sorting {
public static void main(String[] ags)throws Exception
{
long st=System.currentTimeMillis();
int v=0;
List ls=new ArrayList();
//To read data from file
BufferedReader in=new BufferedReader(
new FileReader("D:\\Bhive\\File.txt"));
String read=in.readLine().toLowerCase();
//Spliting the string based on spaces
String[] sp=read.replaceAll("\\.","").split(" ");
for(int i=0;i<sp.length;i++)
{
//Check for the array if it matches number
if(sp[i].matches("(\\d+)"))
//Adding the numbers
v+=Integer.parseInt(sp[i]);
else
{
//sorting the characters
char[] c=sp[i].toCharArray();
Arrays.sort(c);
String r=new String(c);
//Adding the resulting word into list
ls.add(r);
}
}
//Sorting the resulting words in ascending order
Collections.sort(ls);
//Appending the number in the end of the list
ls.add(v);
//Displaying the string using Iteartor
Iterator it=ls.iterator();
while(it.hasNext())
System.out.print(it.next()+" ");
long time=System.currentTimeMillis()-st;
System.out.println("\n Time Taken:"+time);
}
}

Use indexOf() to extract words from your string instead of split(" "). It improves performance.
See this thread: Performance of StringTokenizer class vs. split method in Java
Also, try to increase the size of the output, copy-paste the line Sachin Tendulkar scored 18111 ODI runs and 14692 Test runs. 50,000 times in the text file and measure the performance. That way, you will be able to see considerable time difference when you try different optimizations.
EDIT
Tested this code (used .indexOf())
long st = System.currentTimeMillis();
int v = 0;
List ls = new ArrayList();
// To read data from file
BufferedReader in = new BufferedReader(new FileReader("D:\\File.txt"));
String read = in.readLine().toLowerCase();
read.replaceAll("\\.", "");
int pos = 0, end;
while ((end = read.indexOf(' ', pos)) >= 0) {
String curString = read.substring(pos,end);
pos = end + 1;
// Check for the array if it matches number
try {
// Adding the numbers
v += Integer.parseInt(curString);
}
catch (NumberFormatException e) {
// sorting the characters
char[] c = curString.toCharArray();
Arrays.sort(c);
String r = new String(c);
// Adding the resulting word into TreeSet
ls.add(r);
}
}
//sorting the list
Collections.sort(ls);
//adding the number
list.add(v);
// Displaying the string using Iteartor
Iterator<String> it = ls.iterator();
while (it.hasNext()) {
System.out.print(it.next() + " ");
}
long time = System.currentTimeMillis() - st;
System.out.println("\n Time Taken: " + time + " ms");
Performance using 1 line in file
Your code: 3 ms
My code: 2 ms
Performance using 50K lines in file
Your code: 45 ms
My code: 32 ms
As you see, the difference is significant when the input size increases. Please test it on your machine and share results.

The only thing I see: the following line is needlessly expensive:
System.out.print(it.next()+" ");
That's because print is inefficient, due to all the flushing going on. Instead, construct the entire string using a string builder, and then reduce to one call of print.

I removed the list and read it using Arrays only, In my machine the code to 6 msec with your code, by using Arrays only it taking 4 to 5 msec. Run this code in your machine and let me know the time.
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;
public class Sorting {
public static void main(String[] ags)throws Exception
{
long st=System.currentTimeMillis();
int v=0;
//To read data from file
BufferedReader in=new BufferedReader(new FileReader("File.txt"));
String read=in.readLine().toLowerCase();
//Spliting the string based on spaces
String[] sp=read.replaceAll("\\.","").split(" ");
int j=0;
for(int i=0;i<sp.length;i++)
{
//Check for the array if it matches number
if(sp[i].matches("(\\d+)"))
//Adding the numbers
v+=Integer.parseInt(sp[i]);
else
{
//sorting the characters
char[] c=sp[i].toCharArray();
Arrays.sort(c);
read=new String(c);
sp[j]= read;
j++;
}
}
//Sorting the resulting words in ascending order
Arrays.sort(sp);
//Appending the number in the end of the list
//Displaying the string using Iteartor
for(int i=0;i<j; i++)
System.out.print(sp[i]+" ");
System.out.print(v);
st=System.currentTimeMillis()-st;
System.out.println("\n Time Taken:"+st);
}
}

I ran the same code using a PriorityQueue instead of a List. Also, as nes1983 suggested, building the output string first, instead of printing every word individually helps reduce the runtime.
My runtime after these modifications was definitely reduced.

I have modified the code like this further by including #Teja logic as well and resulted in 1 millisecond from 2 millisescond:
long st=System.currentTimeMillis();
BufferedReader in=new BufferedReader(new InputStreamReader(new FileInputStream("D:\\Bhive\\File.txt")));
String read= in.readLine().toLowerCase();
String[] sp=read.replaceAll("\\.","").split(" ");
int v=0;
int len = sp.length;
int j=0;
for(int i=0;i<len;i++)
{
if(isNum(sp[i]))
v+=Integer.parseInt(sp[i]);
else
{
char[] c=sp[i].toCharArray();
Arrays.sort(c);
String r=new String(c);
sp[j] = r;
j++;
}
}
Arrays.sort(sp, 0, len);
long time=System.currentTimeMillis()-st;
System.out.println("\n Time Taken:"+time);
for(int i=0;i<j; i++)
System.out.print(sp[i]+" ");
System.out.print(v);
Wrote small utility to perform for checking a string contains number instead of regular expression:
private static boolean isNum(String cs){
char [] s = cs.toCharArray();
for(char c : s)
{
if(Character.isDigit(c))
{
return true;
}
}
return false;
}
Calcluate time before calling System.out operation as this one is blocking operation.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Print Total Number of Different Words (case sensitive) from a file - java

for(String s : words) { if(s.equals(word)) count++; } You are comparing the words to an empty String, since it's a word it's always gonna be false. Like Tormod said, the best would be to store the words in a HashSet, as it won't keep duplicates. Then just read out its size.

Related

Java issues with Scanner and hasNextLine() while reading a file

Why is sequential file naming not working?

How to take integer and remove other data types from the file java?

ArrayList: Get length of longest string, Get average length of string

Interview Coding Java Sorting

Categories

Resources