Finding Most Frequent Element(s) In A File Of Integers

Finding Most Frequent Element(s) In A File Of Integers - java

I am working on a program to find the most frequent element(s) in a text file. Thus far I have made the file read into a List then iterate through the list to find the occurrences of every value and map them in a SortedMap.
The issue is occurring with files where every digit occurs equally. My Map is not filling with all the data and will only contain one of the digits at the end.
Here is my code:
public class FileAnalyzer {
public static void main(String[] args) throws IOException, FileNotFoundException {
System.out.print("Please Enter A File Name: ");
String file = new Scanner(System.in).nextLine();
final long startTime = System.currentTimeMillis();
BufferedReader reader = new BufferedReader(new FileReader(file));
List<Integer> numbers = new ArrayList<>();
SortedMap<Integer, Integer> sortedMap = new TreeMap<>();
String line;
while ((line = reader.readLine()) != null) {
numbers.add(Integer.parseInt(line));
}
Collections.sort(numbers);
int frequency = 0;
int tempNum = 0;
for (int i = 0; i < numbers.size(); i++) {
if (tempNum == numbers.get(i)) {
frequency++;
} else {
if (frequency != 0) {
sortedMap.put((frequency+1), tempNum);
}
frequency = 0;
tempNum = numbers.get(i);
}
}
if (frequency !=0) {
sortedMap.put((frequency+1), tempNum);
}
final long duration = System.currentTimeMillis() - startTime;
System.out.println(sortedMap);
System.out.println("Runtime: " + duration + " ms\n");
System.out.println("Least Frequent Digit(s): " + sortedMap.get(sortedMap.firstKey()) + "\nOccurences: " + sortedMap.firstKey());
}
}
Also this is the text file I am running into issues when reading from:
1
2
1
1
2
1
1
2
1
2
2
2
Thanks in advance!

You should look up the Java Documentation for TreeMap. It is designed to not store duplicate keys, so since you are sorting on frequency as a key, values with the same frequency will be overwritten in your map!

Related

Is there a reason .contains() would not work with scanner?

I am working on a linear search problem that takes a file of names and compares it to a phonebook file of names and numbers. My only task right now is to see how many names are in the phonebook file. Everything works as expected up until the if statement in my main method, but for the life of me, I cannot figure out what I am doing wrong. Through testing, I can print out all the lines in both files, so I know I am reading the files correctly. Output should be 500 / 500 as all the names are in the phonebook file of over a million lines. Please help.
package phonebook;
import java.util.Objects;
import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;
public class Main {
final static String NAME_PATH = "C:\\Users\\{user}\\Downloads\\find.txt";
final static String PHONEBOOK_PATH = "C:\\Users\\{user}\\Downloads\\directory.txt";
private static String[] namesList(File file) {
int count = 0;
try (Scanner scanner = new Scanner(file)) {
while (scanner.hasNextLine()) {
scanner.nextLine();
count++;
}
String[] names = new String[count];
Scanner sc = new Scanner(file);
for (int i = 0; i < count; i++) {
names[i] = sc.nextLine();
}
return names;
} catch (FileNotFoundException e) {
System.out.printf("File not found: %s", NAME_PATH);
return null;
}
}
private static String timeDifference(long timeStart, long timeEnd) {
long difference = timeEnd - timeStart;
long minutes = (difference / 1000) / 60;
long seconds = (difference / 1000) % 60;
long milliseconds = difference - ((minutes * 60000) + (seconds * 1000));
return "Time taken: " + minutes + " min. " + seconds + " sec. " +
milliseconds + " ms.";
}
public static void main(String[] args) {
File findFile = new File(NAME_PATH);
File directoryFile = new File(PHONEBOOK_PATH);
String[] names = namesList(findFile);
int count = 0;
try (Scanner scanner = new Scanner(directoryFile)) {
System.out.println("Start searching...");
long timeStart = System.currentTimeMillis();
for (int i = 0; i < Objects.requireNonNull(names).length; i++) {
while (scanner.hasNextLine()) {
if (scanner.nextLine().contains(names[i])) {
count++;
break;
}
}
}
long timeEnd = System.currentTimeMillis();
System.out.print("Found " + count + " / " + names.length + " entries. " +
timeDifference(timeStart, timeEnd));
} catch (FileNotFoundException e) {
System.out.printf("File not found: %s", PHONEBOOK_PATH);
}
}
}
Output:
Start searching...
Found 1 / 500 entries. Time taken: 0 min. 0 sec. 653 ms.
Process finished with exit code 0

The problem is how you are searching. If you want to search iteratively then you need to re-start the iteration for each name. Otherwise, you are merely searching forward in the phonebook. If the second name in the name list appears before the first name then you will only find one name since you will have exhausted the phonebook before finding anything.
However, repeatedly reading the phonebook file is a costly endeavor. Instead, load the phone list (as you have done for the name list) and then you can iteratively search that list for each element in the name list. The following examples assume you are using List rather than arrays. Using for-each loops to make it obvious what is going on (versus using Stream API).
List<String> names = loadNames();
// each phonebook entry contains the name and the phone number in one string
List<String> phonebook = loadPhonebook();
int numFound = 0;
for (String name : names) {
for (String entry : phonebook) {
if (entry.contains(name)) {
++numFound;
}
}
}
However, this is still an expensive task because you are repeatedly doing nested iterations. Depending on the format of the phonebook file you should be able to parse out the names and store these in a TreeSet. Then the search is constant time.
List<String> names = loadNames();
// phonebookNames are just the names - the phone number has been stripped away
TreeSet<String> phonebookNames = loadPhonebookNames();
int numFound = 0;
for (String name : names) {
if (phonebookNames.contains(name)) {
++numFound;
}
}
Presumably, your assignment will eventually want to use the phone number for something so you probably don't want to drop that on the floor. Instead of parsing out just the name, you can capture the name and the phone number using a Map (key=name, value=phone number). Then you can count the presence of names thusly.
List<String> names = loadNames();
// phonebook is a Map of phone number values keyed on name
Map<String,String> phonebook = loadPhonebook();
int numFound = 0;
for (String name : names) {
if (phonebook.containsKey(name)) {
++numFound;
}
}

You are moving forward in your file for every name (using nextLine), you should do the loop on names for each line instead.
In your code, if your first name (name[0]) is on the last line of your file, you are already at the end of your file on the first iteration, and when searching for the second name, there is already no more line.
Try something like this:
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
for (int i = 0; i < Objects.requireNonNull(names).length; i++) {
if (line.contains(names[i])) {
count++;
break;
}
}
}

Java Write to File Perfomance

i am trying to write to file with redirection from command line.
my programm is very slow when i read a file of 25MB and 90% of execution time spent in "System.out.println" .I tried some other methods than System.out.print but coulnt fix..
which method i have to use to print a big ArrayList? (with redirection)
i would appreciate your help and an example
thanks
here is my code:
public class Ask0 {
public static void main(String args[]) throws IOException {
int i = 0, token0, token1;
String[] tokens;
List<String> inputList = new ArrayList<>();
Map<Integer, List<Integer>> map = new HashMap<>();
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String input;
while ((input = br.readLine()) != null) {
tokens = input.split("\\|");
inputList.add(tokens[0] + "|" + tokens[1]);
token0 = Integer.parseInt(tokens[0]);
token1 = Integer.parseInt(tokens[1]);
List<Integer> l = map.get(token0);
if (l == null) {
l = new ArrayList<>();
map.put(token0, l);
}
if (l.contains(token1) == false) {
l.add(token1);
}
i++;
}
i = 0;
for (int j = inputList.size(); j > 0; j--) {
tokens = inputList.get(i).split("\\|");
token0 = Integer.parseInt(tokens[0]);
token1 = Integer.parseInt(tokens[1]);
List l = map.get(token0);
System.out.println(tokens[0] + "|" + tokens[1] + "["
+ (l.indexOf(token1) + 1) + "," + l.size() + "]");
i++;
}
}
}
Input
3|78 4|7765 3|82 2|8 4|14 3|78 2|8 4|12
Desired result
3|78[1,2] 4|7765[1,3] 3|82[2,2] 2|8[1,1] 4|14[2,3] 3|78[1,2] 2|8[1,1] 4|12[3,3]

For speed, the below code:
Uses a StringBuilder for fast concatenation into a resulting String and fast output since only one massive String is printed at the end, saving unnecessary buffer flushes.
Doesn't create a bunch of Strings when parsing the input, just a small byte[] and Integers in an ArrayList.
Manually uses a 64kiB buffer for reading.
Doesn't rejoin the tokens with "|" in the middle only to split them again later.
Uses a HashMap<Integer, HashMap<Integer, Integer>> instead of a HashMap<Integer, ArrayList<Integer>> to save time on element lookups in the list (turns algorithm from O(n2) time to O(n) time).
Some speedups that might not work as you want:
Doesn't waste time properly handling Unicode.
Doesn't waste time properly handling negative or overflowed numbers.
Doesn't care what the separator characters are (you could input "1,2,3,4,5,6" instead and it would still work just like "1|2\n3|4\n5|6\n").
You can see that it gives the correct results for your test input here (except that it separates the outputs by newlines like in your code).
private static final int BUFFER_SIZE = 65536;
private static enum InputState { START, MIDDLE }
public static void main(final String[] args) throws IOException {
// Input the numbers
final byte[] inputBuffer = new byte[BUFFER_SIZE];
final List<Integer> inputs = new ArrayList<>();
int inputValue = 0;
InputState inputState = InputState.START;
while (true) {
int j = 0;
final int bytesRead = System.in.read(inputBuffer, 0, BUFFER_SIZE);
if (bytesRead == -1) {
if (inputState == InputState.MIDDLE) {
inputs.add(inputValue);
}
break;
}
for (int i = 0; i < bytesRead; i++) {
byte ch = inputBuffer[i];
int leftToken = 0;
if (ch < 48 || ch > 57) {
if (inputState == InputState.MIDDLE) {
inputs.add(inputValue);
inputState = InputState.START;
}
}
else {
if (inputState == InputState.START) {
inputValue = ch - 48;
inputState = InputState.MIDDLE;
}
else {
inputValue = 10*inputValue + ch - 48;
}
}
}
}
System.in.close();
// Put the numbers into a map
final Map<Integer, Map<Integer, Integer>> map = new HashMap<>();
for (int i = 0; i < inputs.size();) {
final Integer left = inputs.get(i++);
final Integer right = inputs.get(i++);
final Map<Integer, Integer> rights;
if (map.containsKey(left)) {
rights = map.get(left);
}
else {
rights = new HashMap<>();
map.put(left, rights);
}
rights.putIfAbsent(right, rights.size() + 1);
}
// Prepare StringBuilder with results
final StringBuilder results = new StringBuilder();
for (int i = 0; i < inputs.size();) {
final Integer left = inputs.get(i++);
final Integer right = inputs.get(i++);
final Map<Integer, Integer> rights = map.get(left);
results.append(left).append('|').append(right);
results.append('[').append(rights.get(right)).append(',');
results.append(rights.size()).append(']').append('\n');
}
System.out.print(results);
}
You can alternatively manually use a 64 kiB byte[] output buffer with System.out.write(outputBuffer, 0, bytesToWrite); System.out.flush(); as well if you want to save memory, though that's a lot more work.
Also, if you know the minimum and maximum values that you'll see, you can use int[] or int[][] arrays instead of Map<Integer, Integer> or Map<Integer, Map<Integer, Integer>>, though that's somewhat more involved as well. It would be very fast, though.

Sort the words and letters in Java

The code below counts how many times the words and letters appeared in the string. How do I sort the output from highest to lowest? The output should be like:
the - 2
quick - 1
brown - 1
fox - 1
t - 2
h - 2
e - 2
b - 1
My code:
import java.util.HashMap;
import java.util.Map;
import java.util.StringTokenizer;
public class Tokenizer {
public static void main(String[] args) {
int index = 0;
int tokenCount;
int i = 0;
Map<String, Integer> wordCount = new HashMap<String, Integer>();
Map<Integer, Integer> letterCount = new HashMap<Integer, Integer>();
String message = "The Quick brown fox the";
StringTokenizer string = new StringTokenizer(message);
tokenCount = string.countTokens();
System.out.println("Number of tokens = " + tokenCount);
while (string.hasMoreTokens()) {
String word = string.nextToken().toLowerCase();
Integer count = wordCount.get(word);
Integer lettercount = letterCount.get(word);
if (count == null) {
// this means the word was encountered the first time
wordCount.put(word, 1);
} else {
// word was already encountered we need to increment the count
wordCount.put(word, count + 1);
}
}
for (String words : wordCount.keySet()) {
System.out.println("Word : " + words + " has count :" + wordCount.get(words));
}
for (i = 0; i < message.length(); i++) {
char c = message.charAt(i);
if (c != ' ') {
int value = letterCount.getOrDefault((int) c, 0);
letterCount.put((int) c, value + 1);
}
}
for (int key : letterCount.keySet()) {
System.out.println((char) key + ": " + letterCount.get(key));
}
}
}

You have a Map<String, Integer>; I'd suggest something along the lines of another LinkedHashMap<String, Integer> which is populated by inserting keys that are sorted by value.

It seems that you want to sort the Map by it's value (i.e., count). Here are some general solutions.
Specifically for your case, a simple solution might be:
Use a TreeSet<Integer> to save all possible values of counts in the HashMap.
Iterate the TreeSetfrom high to low.
Inside the iteration mentioned in 2., use a loop to output all word-count pairs with count equals to current iterated count.
Please see if this may help.

just use the concept of the list and add all your data into list and then use sort method for it

How to find the number of different duplicate values in an array with Java

I have a slight problem with a programme im working on, I need to be able to look through an array in Java and find the number of different duplicates in that array, for example if the array have the values 4, 6, 4 I need to be able to display:
There are:
2 words of length 1 (4 characters)
1 word of length 2 (6 characters)
What I've currently got is -
public class wordLength {
public static void main(String[] args) {
String userInput1 = "Owen Bishop Java ";
String [] inputArray = userInput1.split(" ");
for (int i = 0; i < inputArray.length; i++) {
int length = inputArray[i].length();
int inputArray2 = length;
System.out.println(inputArray2);
}
}
}
This currently will split the string into an array whenever there is a space, and then find and print the length of each of the words in the array, I need to show the amount of words that are the same length.
I'm really new to Java and appreciate this is probably an incredibly easy problem but any help would be hugely appreciated, thanks.

Without writing the whole thing (or making use of 3rd party libraries - I note you're new to Java so let's not complicate things), I would consider the following.
Make use of a Map<Integer,Integer> which would store the number of words of a particular length. e.g. to populate:
Map<Integer, Integer> counts = new HashMap<Integer, Integer>();
for (String word : words) {
Integer current = counts.get(word.length());
if (current == null) {
current = 0;
}
current++;
counts.put(word.length(), current);
}
and then iterate through that to output the number of words per word count. Note that the above makes use of boxing.
The advantage of using a Map is that (unlike your array) you don't need to worry about empty counts (e.g. you won't have an entry if you have no words of length 5). That may/may not be an issue depending on your use case.

You can create an int array of length 20(or the maximum word length in English) and increase the index value before you print that value.
arr[length]++;

public class wordLength {
public static void main(String[] args) {
String userInput1 = "Owen Bishop Java ";
String [] inputArray = userInput1.split(" ");
Map<Integer,Integer> wordLengths = new HashMap<Integer,Integer>();
for (int i = 0; i < inputArray.length; i++) {
int length = inputArray[i].length();
if (wordLengths.containsKey(length))
wordLengths.put(length, wordLengths.get(length) + 1);
else
wordLengths.put(length, 1);
}
for (Integer length : new TreeSet<Integer>(wordLengths.keySet()))
System.out.println("Length: " + length + " Count: " + wordLengths.get(length));
}
}
}

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
public class wordLength {
public static void main(String[] args) {
String userInput1 = "Owen Bishop Java ";
String [] inputArray = userInput1.split(" ");
HashMap<Integer,Integer> map = new HashMap<Integer,Integer>();
for (int i = 0; i < inputArray.length; i++) {
int length = inputArray[i].length();
if(map.get(length)==null){
map.put(length, 1);
}
else map.put(length, map.get(length)+1);
}
Iterator it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pairs = (Map.Entry)it.next();
System.out.println("words of length " +pairs.getKey() + " are " + pairs.getValue());
}
}
}
The output will be:
words of length 4 are 2
words of length 6 are 1

The beginning of your code looks good. What you basically want to keep track of (in my understanding) is a mapping from String length to the number of times this occurred.
public class wordLength {
public static void main(String[] args) {
String userInput1 = "Owen Bishop Java ";
String [] inputArray = userInput1.split(" ");
Map<Integer, Integer> counter = new HashMap<>(); //please note that I use the 'diamond' notation here, this is for Java 1.7 and higher.
for (int i = 0; i < inputArray.length; i++) {
int length = inputArray[i].length();
Integer num = counter.get(length);
if (num == null) {
num = 1;
}
else {
num++;
}
counter.put(length, num);
//or counter.put(length, num == null ? 1 : num++); instead of the if-else
}
//print the results
for (Entry<Integer, Integer> entry : map.entrySet()) {
System.out.println("There are " + entry.getValue() + " words with length " + entry.getKey() + ".");
}
}
}
The previous submitted method of arr[length]++; does work, but uses way to many space. Say you have only words of length 20 and beyond, then the first 20 elements of this arr are useless...
Please also note that you can use the map.entrySet() method from the Map interface. It is a better coding practice to use this method than using map.keySet() and after that looking up the associated value. This saves you much look up time. (especially with large user inputs!!!)

I have 2 solutions for above problem
Using extra space i.e. Map to store unique value. Complexity O(n) + Space(N) [N= #unique value]
No extra space. Sorts the input and counts values. Complexity nLog(n) + n
First Solution
Create a map to store each unique value as key and its count as value
Iterate over input array
If value exists as key then increment counter
ELSE if value does exist in map, then put value and set counter as 1
private static void UniqueValueUsingMap(int[] a){
//Map with
// key: unique values
// Value: count as each value (key)
Map<Integer, Integer> store = new HashMap<Integer, Integer>();
for (int i : a) {
Integer key = Integer.valueOf(i);
Integer count = null;
if(store.containsKey(key)){
count = store.get(key);
}else{
count = 0;
}
count++;
store.put(key, count);
}
}
Second solution
Sort the given Array nlog(n). All same values will be together after sort.
Iterate over sorted array
Maintain 2 variable i.e. previous value and counter for current value
When value change, print the value/count and reset the counter
Note: : int defaults to 0 inital value. Need to add additional check for it.
private static void CountValueUsingSort(int[] a){
//Sort the input array
Arrays.sort(a);
int currentValue = 0;
int counter =0;
for (int curr : a) {
if(curr != currentValue){
System.out.println("Value: " + currentValue + " Count:" + counter);
//Reset counter
counter = 0;
}
currentValue = curr;
//increment Count
counter++;
}
}

How can i keep track of multiple counter variables

I have written some code that count the number of "if" statements from unknown number of files. How can i keep a count for each file separate and a total of "if" from all files?
code:
import java.io.*;
public class ifCounter4
{
public static void main(String[] args) throws IOException
{
// variable to keep track of number of if's
int ifCount = 0;
for (int c = 0; c < args.length; c++)
{
// parameter the TA will pass in
String fileName = args[c];
// create a new BufferReader
BufferedReader reader = new BufferedReader( new FileReader (fileName));
String line = null;
StringBuilder stringBuilder = new StringBuilder();
String ls = System.getProperty("line.separator");
// read from the text file
while (( line = reader.readLine()) != null)
{
stringBuilder.append(line);
stringBuilder.append(ls);
}
// create a new string with stringBuilder data
String tempString = stringBuilder.toString();
// create one last string to look for our valid if(s) in
// with ALL whitespace removed
String compareString = tempString.replaceAll("\\s","");
// check for valid if(s)
for (int i = 0; i < compareString.length(); i++)
{
if (compareString.charAt(i) == ';' || compareString.charAt(i) == '}' || compareString.charAt(i) == '{') // added opening "{" for nested ifs :)
{
i++;
if (compareString.charAt(i) == 'i')
{
i++;
if (compareString.charAt(i) == 'f')
{
i++;
if (compareString.charAt(i) == '(')
ifCount++;
} // end if
} // end if
} // end if
} // end for
// print the number of valid "if(s) with a new line after"
System.out.println(ifCount + " " + args[c]); // <-- this keeps running total
// but not count for each file
}
System.out.println();
} // end main
} // end class

You can create a Map that stores the file names as keys and the count as values.
Map<String, Integer> count = new HashMap<String, Integer>();
After each file,
count.put(filename, ifCount);
ifcount = 0;
Walk the value set to get the total.

How about a Map which uses the file name as key and keeps the count of ifs as value? For overall count, store it in its own int, or just calculate it when needed by adding up all the values in the Map.
Map<String, Integer> ifsByFileName = new HashMap<String, Integer>();
int totalIfs = 0;
for each if in "file" {
totalIfs++;
Integer currentCount = ifsByFileName.get(file);
if (currentCount == null) {
currentCount = 0;
}
ifsByFileName.put(file, currentCount + 1);
}
// total from the map:
int totalIfsFromMap = 0;
for (Integer fileCount : ifsByFileName.values()) {
totalIfsFromMap += fileCount;
}

Using an array would solve this problem.
int[] ifCount = new int[args.length];
and then in your loop ifCount[c]++;

Problematic in this scenario is when many threads want to increase the same set of counters.
Operations such as ifCount[c]++; and ifsByFileName.put(file, currentCount + 1);are not thread safe.
The obvious solution to use a ConcurrentMap and AtomicLong is also insufficient, since you must place the initial values of 0, which would require additional locking.
The Google Guava project provides a convenient out of the box sollution: AtomicLongMap
With this class you can write:
AtomicLongMap<String> cnts = AtomicLongMap.create();
cnts.incrementAndGet("foo");
cnts.incrementAndGet("bar");
cnts.incrementAndGet("foo");
for (Entry<String, Long> entry : cnts.asMap().entrySet()) {
System.out.println(entry);
}
which prints:
foo=2
bar=1
And is completely thread safe.

This is a counter that adds to 100 and if you edit the value of N it puts a * next to the multiples of.
public class SmashtonCounter_multiples {
public static void main(String[] args) {
int count;
int n = 3; //change this variable for different multiples of
for(count = 1; count <= 100; count++) {
if((count % n) == 0) {
System.out.print(count + "*");
}
else {
System.out.print(count);
if (count < 100) {
System.out.print(",");
}
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding Most Frequent Element(s) In A File Of Integers - java

You should look up the Java Documentation for TreeMap. It is designed to not store duplicate keys, so since you are sorting on frequency as a key, values with the same frequency will be overwritten in your map!

Related

Is there a reason .contains() would not work with scanner?

Java Write to File Perfomance

Sort the words and letters in Java

How to find the number of different duplicate values in an array with Java

How can i keep track of multiple counter variables

Categories

Resources