Write a java program to read input from a file, and then sort the characters within each word. Once you have done that, sort all the resulting words in ascending order and finally followed by the sum of numeric values in the file.
Remove the special characters and stop words while processing the data
Measure the time taken to execute the code
Lets Say the content of file is: Sachin Tendulkar scored 18111 ODI runs and 14692 Test runs.
Output:achins adeklnrtu adn cdeors dio estt nrsu nrsu 32803
Time Taken: 3 milliseconds
My Code takes 15milliseconds to execute.....
please suggest me any fast way to solve this problem...........
Code:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;
public class Sorting {
public static void main(String[] ags)throws Exception
{
long st=System.currentTimeMillis();
int v=0;
List ls=new ArrayList();
//To read data from file
BufferedReader in=new BufferedReader(
new FileReader("D:\\Bhive\\File.txt"));
String read=in.readLine().toLowerCase();
//Spliting the string based on spaces
String[] sp=read.replaceAll("\\.","").split(" ");
for(int i=0;i<sp.length;i++)
{
//Check for the array if it matches number
if(sp[i].matches("(\\d+)"))
//Adding the numbers
v+=Integer.parseInt(sp[i]);
else
{
//sorting the characters
char[] c=sp[i].toCharArray();
Arrays.sort(c);
String r=new String(c);
//Adding the resulting word into list
ls.add(r);
}
}
//Sorting the resulting words in ascending order
Collections.sort(ls);
//Appending the number in the end of the list
ls.add(v);
//Displaying the string using Iteartor
Iterator it=ls.iterator();
while(it.hasNext())
System.out.print(it.next()+" ");
long time=System.currentTimeMillis()-st;
System.out.println("\n Time Taken:"+time);
}
}
Use indexOf() to extract words from your string instead of split(" "). It improves performance.
See this thread: Performance of StringTokenizer class vs. split method in Java
Also, try to increase the size of the output, copy-paste the line Sachin Tendulkar scored 18111 ODI runs and 14692 Test runs. 50,000 times in the text file and measure the performance. That way, you will be able to see considerable time difference when you try different optimizations.
EDIT
Tested this code (used .indexOf())
long st = System.currentTimeMillis();
int v = 0;
List ls = new ArrayList();
// To read data from file
BufferedReader in = new BufferedReader(new FileReader("D:\\File.txt"));
String read = in.readLine().toLowerCase();
read.replaceAll("\\.", "");
int pos = 0, end;
while ((end = read.indexOf(' ', pos)) >= 0) {
String curString = read.substring(pos,end);
pos = end + 1;
// Check for the array if it matches number
try {
// Adding the numbers
v += Integer.parseInt(curString);
}
catch (NumberFormatException e) {
// sorting the characters
char[] c = curString.toCharArray();
Arrays.sort(c);
String r = new String(c);
// Adding the resulting word into TreeSet
ls.add(r);
}
}
//sorting the list
Collections.sort(ls);
//adding the number
list.add(v);
// Displaying the string using Iteartor
Iterator<String> it = ls.iterator();
while (it.hasNext()) {
System.out.print(it.next() + " ");
}
long time = System.currentTimeMillis() - st;
System.out.println("\n Time Taken: " + time + " ms");
Performance using 1 line in file
Your code: 3 ms
My code: 2 ms
Performance using 50K lines in file
Your code: 45 ms
My code: 32 ms
As you see, the difference is significant when the input size increases. Please test it on your machine and share results.
The only thing I see: the following line is needlessly expensive:
System.out.print(it.next()+" ");
That's because print is inefficient, due to all the flushing going on. Instead, construct the entire string using a string builder, and then reduce to one call of print.
I removed the list and read it using Arrays only, In my machine the code to 6 msec with your code, by using Arrays only it taking 4 to 5 msec. Run this code in your machine and let me know the time.
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;
public class Sorting {
public static void main(String[] ags)throws Exception
{
long st=System.currentTimeMillis();
int v=0;
//To read data from file
BufferedReader in=new BufferedReader(new FileReader("File.txt"));
String read=in.readLine().toLowerCase();
//Spliting the string based on spaces
String[] sp=read.replaceAll("\\.","").split(" ");
int j=0;
for(int i=0;i<sp.length;i++)
{
//Check for the array if it matches number
if(sp[i].matches("(\\d+)"))
//Adding the numbers
v+=Integer.parseInt(sp[i]);
else
{
//sorting the characters
char[] c=sp[i].toCharArray();
Arrays.sort(c);
read=new String(c);
sp[j]= read;
j++;
}
}
//Sorting the resulting words in ascending order
Arrays.sort(sp);
//Appending the number in the end of the list
//Displaying the string using Iteartor
for(int i=0;i<j; i++)
System.out.print(sp[i]+" ");
System.out.print(v);
st=System.currentTimeMillis()-st;
System.out.println("\n Time Taken:"+st);
}
}
I ran the same code using a PriorityQueue instead of a List. Also, as nes1983 suggested, building the output string first, instead of printing every word individually helps reduce the runtime.
My runtime after these modifications was definitely reduced.
I have modified the code like this further by including #Teja logic as well and resulted in 1 millisecond from 2 millisescond:
long st=System.currentTimeMillis();
BufferedReader in=new BufferedReader(new InputStreamReader(new FileInputStream("D:\\Bhive\\File.txt")));
String read= in.readLine().toLowerCase();
String[] sp=read.replaceAll("\\.","").split(" ");
int v=0;
int len = sp.length;
int j=0;
for(int i=0;i<len;i++)
{
if(isNum(sp[i]))
v+=Integer.parseInt(sp[i]);
else
{
char[] c=sp[i].toCharArray();
Arrays.sort(c);
String r=new String(c);
sp[j] = r;
j++;
}
}
Arrays.sort(sp, 0, len);
long time=System.currentTimeMillis()-st;
System.out.println("\n Time Taken:"+time);
for(int i=0;i<j; i++)
System.out.print(sp[i]+" ");
System.out.print(v);
Wrote small utility to perform for checking a string contains number instead of regular expression:
private static boolean isNum(String cs){
char [] s = cs.toCharArray();
for(char c : s)
{
if(Character.isDigit(c))
{
return true;
}
}
return false;
}
Calcluate time before calling System.out operation as this one is blocking operation.
Related
**Edit after reviewing Tormod's answer and implementing his advice.
As the title states I'm attempting to print the total number of different words after receiving a file name from command line input. I receive the following message after attempting to compile the program:
Note: Project.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Here is my code. Any help is greatly appreciated:
import java.lang.*;
import java.util.*;
import java.io.*;
public class Project {
public static void main(String[] args) throws IOException {
File file = new File(args[0]);
Scanner s = new Scanner(file);
HashSet lib = new HashSet<>();
try (Scanner sc = new Scanner(new FileInputStream(file))) {
int count = 0;
while(sc.hasNext()) {
sc.next();
count++;
}
System.out.println("The total number of word in the file is: " + count);
}
while (s.hasNext()) {
String data = s.nextLine();
String[] pieces = data.split("\\s+");
for (int count = 0; count < pieces.length; count++)
{
if(!lib.contains(pieces[count])) {
lib.add(pieces[count]);
}
}
}
System.out.print(lib.size());
}
}
I would implement it using a HashSet Add all the words, and read out the size. If you want to make it case insensitive just manipulate all the words to uppercase or something like that. this uses some memory but...
one problem you got with the algorithm is that you do only have one "words". it only holds the words at the same line. so you only count same words at the same line.
HashSet stores strings by their hash value, and thus stores one word only one time.
construction: HashSet lib = new HashSet<>();
inside the loop: if(!lib.contains(word)){lib.add(word);}
check the word count: lib.size()
for(String s : words) {
if(s.equals(word))
count++;
}
You are comparing the words to an empty String, since it's a word it's always gonna be false.
Like Tormod said, the best would be to store the words in a HashSet, as it won't keep duplicates. Then just read out its size.
I have been working on an assignment in that I have to read words from a file and find the longest word and check how many sub words contains in that longest word?
this should work for all the words in the file.
I tried using java the code I wrote works for the small amount of data in file but my task is to process huge amount of data.
Example:
File words: "call","me","later","hey","how","callmelater","now","iam","busy","noway","nowiambusy"
o/p:
callmelater : subwords->call,me,later
In this I'm reading file words storing in linked list and then finding the longest word & removing it from the list then checking how many sub-words extracted word contains.
Main Class Assignment:
import java.util.Scanner;
public class Assignment {
public static void main (String[] args){
long start = System.currentTimeMillis();;
Assignment a = new Assignment();
a.throwInstructions();
Scanner userInput = new Scanner(System.in);
String filename = userInput.nextLine();
// String filename = "ab.txt";
// String filename = "abc.txt";
Logic testRun = new Logic(filename);
// //testRun.result();
long end = System.currentTimeMillis();;
System.out.println("Time taken:"+(end - start) + " ms");
}
public void throwInstructions(){
System.out.println("Keep input file in same directory, where the code is");
System.out.println("Please specify the fie name : ");
}
Subclass Logic for processing:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class Logic {
private String filename;
private File file;
private List<String> words = new LinkedList<String>();
private Map<String, String> matchedWords = new HashMap();
#Override
public String toString() {
return "Logic [words=" + words + "]";
}
// constructor
public Logic(String filename) {
this.filename = filename;
file = new File(this.filename);
fetchFile();
run();
result();
}
// find the such words and store in map
public void run() {
while (!words.isEmpty()) {
String LongestWord = extractLongestWord(words);
findMatch(LongestWord);
}
}
// find longest word
private String extractLongestWord(List<String> words) {
String longWord;
longWord = words.get(0);
int maxLength = words.get(0).length();
for (int i = 0; i < words.size(); i++) {
if (maxLength < words.get(i).length()) {
maxLength = words.get(i).length();
longWord = words.get(i);
}
}
words.remove(words.indexOf(longWord));
return longWord;
}
// find the match for word in array of sub words
private void findMatch(String LongestWord) {
boolean chunkFound = false;
int chunkCount = 0;
StringBuilder subWords = new StringBuilder();
for (int i = 0; i < words.size(); i++) {
if (LongestWord.indexOf(words.get(i)) != -1) {
subWords.append(words.get(i) + ",");
chunkFound = true;
chunkCount++;
}
}
if (chunkFound) {
matchedWords.put(LongestWord,
"\t" + (subWords.substring(0, subWords.length() - 1))
+ "\t:Subword Count:" + chunkCount);
}
}
// fetch data from file and store in list
public void fetchFile() {
String word;
try {
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
while ((word = br.readLine()) != null) {
words.add(word);
}
fr.close();
br.close();
} catch (FileNotFoundException e) {
// e.printStackTrace();
System.out
.println("ERROR: File -> "
+ file.toString()
+ " not Exists,Please check filename or location and try again.");
} catch (IOException e) {
// e.printStackTrace();
System.out.println("ERROR: Problem reading -> " + file.toString()
+ " File, Some problem with file format.");
}
}
// display result
public void result() {
Set set = matchedWords.entrySet();
Iterator i = set.iterator();
System.out.println("WORD:\tWORD-LENGTH:\tSUBWORDS:\tSUBWORDS-COUNT");
while (i.hasNext()) {
Map.Entry me = (Map.Entry) i.next();
System.out.print(me.getKey() + ": ");
System.out.print("\t" + ((String) me.getKey()).length() + ": ");
System.out.println(me.getValue());
}
}
}
This is where my programs lacks and goes into some never ending loop.
Complexity of my program is high.
To reduce the processing time I need an efficient approach like Binary/merge sort approach which will take least time like O(log n) or O(nlog n).
If someone can help me with this or at least suggestion in which direction I should proceed. Also please suggest me which programming language would be good to implement such text processing tasks in fast way ?
Thanks in advance
This problem requires a Trie. But you have to augment your trie: a generic one will not do. Geek Viewpoint has a good Trie written in Java. Where your particular work will happen is in the method getWordList. Your getWordList will take as input the longest word (i.e. longestWord) and then try to see if each substring comprises words that exist in the dictionary. I think I have given you enough -- I can't do your work for you. But if you have further question, don't hesitate to ask.
Other than in getWordList, you might be able to pretty much keep the trie from Geek Viewpoint the way it is.
You are also in luck because Geek Viewpoint demonstrates the trie using a Boggle example and your problem is a very very trivial version of Boggle.
Not sure I understand your context, but from reading the problem description it sounds to me like a Linked List is an inappropriate data structure. You don't need to check every single word to the longest word.
A "trie" is probably a perfect data structure for this application.
But if you haven't learned about that in your class, then perhaps you can at least cut down your search space with hashtables. While you are doing the initial list processing calculating the longest word, you can simultaneously process each word into a hash table based on first letter. That way, when you are ready to check your longest word for subwords, you can check only those words with first letters in the longest word. (I'm assuming there could be overlapping words, unlike your example.)
Do you know anything about the input you will be receiving? If you have more details about the input word distribution, then you can customize your solution to the data you expect.
If you can choose your language, and time efficiency is important, you might want to switch to C++, as for many applications it's several times faster than Java.
In Java, I have a method that reads in a text file that has all the words in the dictionary, each on their own line.
It reads each line by using a for loop and adds each word to an ArrayList.
I want to get the length of the longest word (String) in the Array. In addition, I want to get the length of the longest word in the dictionary file. It would probably be easier to split this into several methods, but I don't know the syntax.
So far, the code is have is:
public class spellCheck {
static ArrayList <String> dictionary; //the dictonary file
/**
* load file
* #param fileName the file containing the dictionary
* #throws FileNotFoundException
*/
public static void loadDictionary(String fileName) throws FileNotFoundException {
Scanner in = new Scanner(new File(fileName));
while (in.hasNext())
{
for(int i = 0; i < fileName.length(); ++i)
{
String dictionaryword = in.nextLine();
dictionary.add(dictionaryword);
}
}
Assuming that each word is on it's own line, you should be reading the file more like...
try (Scanner in = new Scanner(new File(fileName))) {
while (in.hasNextLine()) {
String dictionaryword = in.nextLine();
dictionary.add(dictionaryword);
}
}
Remember, if you open a resource, you are responsible for closing. See The try-with-resources Statement for more details...
Calculating the metrics can be done after reading the file, but since your here, you could do something like...
int totalWordLength = 0;
String longest = "";
while (in.hasNextLine()) {
String dictionaryword = in.nextLine();
totalWordLength += dictionaryword.length();
dictionary.add(dictionaryword);
if (dictionaryword.length() > longest.length()) {
longest = dictionaryword;
}
}
int averageLength = Math.round(totalWordLength / (float)dictionary.size());
But you could just as easily loop through the dictionary and use the same idea
(nb- I've used local variables, so you will either want to make them class fields or return them wrapped in some kind of "metrics" class - your choice)
Set a two counters and a variable that holds the current longest word found before you start reading in with your while loop. To find the average have one counter be incremented by one each time the line is read and have the second counter add up the total number of characters in each word (obviously the total number of characters entered, divided by the total number of words read -- as denoted by the total number of lines -- is the average length of each word.
As for the longest word, set the longest word to be the empty string or some dummy value like a single character. Each time you read in a line compare the current word with the previously found longest word (using the .length() method on the String to find its length) and if its longer set a new longest word found
Also, if you have all this in a file, I'd use a buffered reader to read in your input data
May be this could help
String words = "Rookie never dissappoints, dont trust any Rookie";
// read your file to string if you get string while reading then you can use below code to do that.
String ss[] = words.split(" ");
List<String> list = Arrays.asList(ss);
Map<Integer,String> set = new Hashtable<Integer,String>();
int i =0;
for(String str : list)
{
set.put(str.length(), str);
System.out.println(list.get(i));
i++;
}
Set<Integer> keys = set.keySet();
System.out.println(keys);
System.out.println(set);
Object j[]= keys.toArray();
Arrays.sort(j);
Object max = j[j.length-1];
set.get(max);
System.out.println("Tha longest word is "+set.get(max));
System.out.println("Length is "+max);
This is my code. It produces the error java.util.NoSuchElementException.
It is meant to search a file, example.txt for a word (eg. and) and find all instances of the the word and print the word either side of it also (eg. cheese and ham, tom and jerry) in ONE JOptionPane. Code:
import java.io.File;
import java.util.Arrays;
import java.util.Scanner;
import javax.swing.JOptionPane;
public class openFileSearchWord {
public static void main(String Args[])
{
int i=0,j=0;
String searchWord = JOptionPane.showInputDialog("What Word Do You Want To Search For?");
File file = new File("example.txt");
try
{
Scanner fileScanner = new Scanner(file);
String[] array = new String[5];
String[] input = new String[1000];
while (fileScanner.hasNextLine())
{
for(i=0;i<1000;i++)
{
input[i] = fileScanner.next();
if(input[i].equalsIgnoreCase(searchWord))
{
array[j] = input[i-1] + input[i] + input[i+1];
j++;
}
}
}
Arrays.toString(array);
JOptionPane.showMessageDialog(null, array);
fileScanner.close();
}
catch(Exception e)
{
System.out.println(e);
}
}
}
It looks like you're assuming each line will have 1000 words.
while (fileScanner.hasNextLine())
{
for(i=0;i<1000;i++) <-------- Hardcoded limit?
{
....
}
}
You can try putting another catch loop, or check hasNext() during that for loop.
while (fileScanner.hasNextLine())
{
for(i=0;i<1000 && fileScanner.hasNext();i++)
{
....
}
}
There are also many issues with your code, like if input[i-1] hits the -1 index, or if your 'array' array hits the limit.
I took the liberty to have some fun.
Scanner fileScanner = new Scanner(file);
List<String> array = new ArrayList<String>();
String previous, current, next;
while (fileScanner.hasNext())
{
next = fileScanner.next()); // Get the next word
if(current.equalsIgnoreCase(searchWord))
{
array.add( previous + current + next );
}
// Shift stuff
previous = current;
current = next;
next = "";
}
fileScanner.close();
// Edge case check - if the last word was the keyword
if(current.equalsIgnoreCase(searchWord))
{
array.add( previous + current );
}
// Do whatever with array
....
I see a few error here ...
You are creating two arrays one with 5 and one with 1000 elements.
In your code you are referencing elements directly by index ... but this index might not be present.
input[i-1] ... what if i = 0? ...index is -1
array[j] ... what if j > 4 ... index 5 doesn't exist
I suggest using List of elements instead of fixed arrays.
List<String> array = new ArrayList<>();
You are assuming that the input is something but don't do anything to check what it actually is.
Just as Drejc told you, The first iteration would fail because of the negative index and the program will fail as well if it finds more than 5 matches of the desired word.
Also I want to add another one. You should think that when you do this line:
array[j] = input[i-1] + input[i] + input[i+1];
You have not assigned input[i+1] yet. In that iteration you've just assigned input[i], but no the next one.
You should process the concatenation of the three elements (previousWord + match + nextWord) when reaching nextWord.
Another solution, but inefficient, would be copying all the words to an Array at beginning and using your actual code without modifying. This would work, but you would go twice through all the words.
I need some help here with my java school work.
We were told to prompt the user for five words and from there determine the longest word of them and print to console the longest word as well as the number of characters in it.
Right now, I only manage to sort them out using the arrays by displaying the longest number of characters but i'm not sure how to display the word itself. Can someone please help me with it and please bear in mind i'm a total newbie in programming and my progress is still just in the basics so try to make it not too complicated for me please. In addition, feel free to pinpoint those redundant codes as I know I have quite a few. :) Thanks!
import java.util.Scanner;
import java.util.Arrays;
class LongestWord
{
public static void main(String [] args)
{
Scanner theInput = new Scanner(System.in);
System.out.println("Please enter your five words");
String fWord = theInput.next();
String sWord = theInput.next();
String tWord = theInput.next();
String fhWord = theInput.next();
String ffWord = theInput.next();
System.out.println(fWord + sWord + tWord + fhWord + ffWord);
int [] wordCount = new int[5];
wordCount[0] = fWord.length();
wordCount[1] = sWord.length();
wordCount[2] = tWord.length();
wordCount[3] = fhWord.length();
wordCount[4] = ffWord.length();
Arrays.sort(wordCount);
System.out.println(wordCount[4]);
}
}
You need to add all the string to array and iterate all of them.
sample:
String [] wordCount = new String[5];
wordCount[0] = fWord;
wordCount[1] = sWord;
wordCount[2] = tWord;
wordCount[3] = fhWord;
wordCount[4] = ffWord;
String longest = "";
longest = wordCount[0]; //get the first array of words for checking
for(String s : wordCount) //iterate to all the array of words
{
if(longest.length() < s.length()) //check if the last longest word is greater than the current workd
longest = s; //if the current word is longer then make it the longest word
}
System.out.println("Longest Word: " + longest + " lenght: " + longest.length());
result:
Please enter your five words
12345
1234
123
12
1
123451234123121
Longest Word: 12345 lenght: 5
You need to store all words into array and get the maximum value after sort according to its length.
String[] words = ....//Store all words into this array.
Arrays.sort(words, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o2.length() - o1.length();
}
});
System.out.println(words[0]);
or, if you use java-8 than you will get the result more easily,
String longWord=
Arrays.stream(words).max((o1, o2)->o1.length()-o2.length()).get();
Instead of putting lengths into an array, you should put all the words in an array and then loop them using for/while and check length of each string comparing with the previous one to record the max length string.
Or another way may be to read strings using loop and you can perform same logic of comparing lengths without using additional array.