hashCode collisions are putting two different words in same position - java

I am supposed to implement an interface with a hash table. Problem is that I'm getting the wrong output and it's due to collision (from what I understand). I haven't been writing this code completely solo, I've been getting help. I'm not a master at Java, very early in my course so this is all very hard for me so please be patient.
Here is my code so far:
runStringDictionary.java
import java.io.BufferedReader;
import java.io.FileReader;
public class runStringDictionary {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
if (args.length == 0 || args.length > 1) {
System.out.println("Syntax to run the program: java runStringDictionary <inputFile>");
}
if (args.length == 1) {
try {
Dictionary myDictionary = new Dictionary(); //Initialize a Dictionary to store input words
BufferedReader br = new BufferedReader(new FileReader(args[0])); //Read the text file input
String line;
while ((line = br.readLine()) != null) {//Read each line
String[] strArray = line.split(" "); //Separate each word in the line and store in another Array
for (int i = 0; i < strArray.length; i++) { //Loop over the Array
if (myDictionary.contains(strArray[i])) { //Check if word exists in the dictionary
myDictionary.remove(strArray[i]); //if it does remove it
} else {
myDictionary.add(strArray[i]); //if it doesn't then add it
}
}
}//while loop ends
//print the contents of myDictionary
for (int i = 0; i < 25; i++) {
if (myDictionary.table[i] != null) {
System.out.println(myDictionary.table[i]);
}
}
} catch (Exception e) {
System.out.println("Error found : " + e);
}
}
}
}
StringDictionary.java
public interface StringDictionary {
public boolean add(String s);
public boolean remove(String s);
public boolean contains(String s);
}
Dictionary.java
public class Dictionary implements StringDictionary {
private int tableSize = 25;
Object[] table;
// constructor
Dictionary() {
this.table = new Object[this.tableSize];
}
#Override
public boolean add(String s) {
// TODO Auto-generated method stub
int hashCode = s.hashCode() % this.tableSize;
if (!this.contains(s)) {
this.table[hashCode] = s;
}
return false;
}
#Override
public boolean remove(String s) {
// TODO Auto-generated method stub
int hashCode = s.hashCode() % this.tableSize;
if (this.contains(s)) {
this.table[hashCode] = null;
return true;
}
return false;
}
#Override
public boolean contains(String s) {
// TODO Auto-generated method stub
int hashCode = s.hashCode() % this.tableSize;
if (table[hashCode] != null) {
if (table[hashCode].equals(s))
return true;
}
return false;
}
}

Hashcode collisions are expected and normal; the hashcode is used to narrow down the pool of potential matches and those potential matches must then be checked for canonical equality.

This int hashCode = s.hashCode() % this.tableSize; says that your Dictionary can contain only 25 elements. For any string you'll get a hashCode from 0 to 24.
You need to keep an array of lists. Each list contains string with the same hasCode.

Hash code collisions are normal in hash tables, you would have to have a perfect hash function in order to avoid them. There are multiple strategies, that you can implement in order to deal with collisions, but basically, either you move items around in the list, placing them to different buckets, or you allow each bucket to store multiple values, such as through ArrayList.
Deciding, which value to retrieve from the table, if multiple values share the same hash code brings additional cost in terms of lookup time, therefore a good hash function will minimise the number of collisions as much as possible.

int hashCode = s.hashCode() % this.tableSize; will
You are bound to get collisions here. Valid indices into table run from 0 to this.tableSize - 1, which in your case is 24.
Hashcode collisions are expected and are a normal occurrence. Having the same hash code doesn't mean that the elements are equal; it just means that they hash to the same value. You have to look at the contents to be sure.
The aim in a structure like this is usually to create a hash function that reduces the probability of collisions. You currently have a very simple hashing function that is simply the modulus of the hash code with the size of your table, and so you have a 1 / tableSize chance of a collision (someone please correct me here if I am wrong).

Related

How to get the data from a file into an array?

Over the passed couple of hours I have been working on an assigment with no luck in figuring it out. For reference I am going to post the instructions below and then explain what I have done.
Write a Java class that implements the StringSet interface (see
attached text document). One of your instance variables must be an
array of Strings that holds the data; you may determine what, if any
other instance variables you need. You will also need to implement the
required methods, one or more constructors, and any other methods you
deem necessary. I have also provided a tester class that you should be
able to run your code with.
So far I have created an implementation of the interface named MyStringSet. I have put all of the methods from the interface into my implementation and have written the code to what I think will work. My main problem is that I don't know how to put the data from the main method that is called into an array. The user types in a file and and then it is supposed to return word count and other methods. Since the file is being called from the tester class, I need to store that data into an array or an array list which I have already created. Below I have listed my current implementation and the tester class that I use. Any help is greatly appreciated!
My Implementation:
public class MyStringSet implements StringSet {
String[] myArray = new String [] {};
List<String> myList = Arrays.asList(myArray);
//default constructor
public MyStringSet(){
resize(5);
}
// precondition: larger is larger than current Set size
// postcondition: enlarges Set
public void resize(int larger) {
myArray = Arrays.copyOf(myArray, myArray.length + larger);
}
// postcondition: entry is inserted in Set if identical String
// not already present; if identical entry exists, takes no
// action. Calls resize if necessary
public void insert(String entry) {
Set<String> myArray = new HashSet<String>();
Collections.addAll(myArray, entry);
}
// postcondition: removes target value from Set if target is
// present; takes no action otherwise
public void remove(String target) {
if(target != null){
int n = 0;
int index = n;
for(int i = index; i < myArray.length - 1; i++) {
myArray[i] = myArray[i+1];
}
}
}
// precondition: Set is not empty
// postcondition: A random String is retrieved and removed from
// the Set
public String getRandomItem () {
String s = "String is Empty";
if (myArray != null) {
int rnd = new Random().nextInt(myArray.length);
return myArray[rnd];
}
else {
return s ;
}
}
// precondition: Set is not empty
// postcondition: the first item in the Set is retrieved and
// removed from the Set
public String getFirstItem () {
String firstItem = myList.get(0);
return firstItem;
}
// postcondition: returns true if target is present, false
// if not
public boolean contains(String target) {
if (target == null) {
return false;
}
else {
return true;
}
}
// postcondition: returns true if Set is empty, false if not
public boolean is_empty( ) {
if(myArray == null){
return true;
}
else {
return false;
}
}
// postcondition: returns total number of Strings currently in set
public int inventory() {
int total = myList.size();
return total;
}
// postcondition: returns total size of Set (used & unused portions)
public int getCapacity( ) {
int capacity = myArray.length;
return capacity;
}
}
Tester class:
public class SetTester
{
public static void main(String [] args) {
StringSet words = new MyStringSet();
Scanner file = null;
FileInputStream fs = null;
String input;
Scanner kb = new Scanner(System.in);
int wordCt = 0;
boolean ok = false;
while (!ok)
{
System.out.print("Enter name of input file: ");
input = kb.nextLine();
try
{
fs = new FileInputStream(input);
ok = true;
}
catch (FileNotFoundException e)
{
System.out.println(input + " is not a valid file. Try again.");
}
}
file = new Scanner(fs);
while (file.hasNext())
{
input = file.next();
words.insert(input);
System.out.println("Current capacity: " + words.getCapacity());
wordCt++;
}
System.out.println("There were " + wordCt + " words in the file");
System.out.println("There are " + words.inventory() + " elements in the set");
System.out.println("Enter a value to remove from the set: ");
input = kb.nextLine();
while (!words.contains(input))
{
System.out.println(input + " is not in the set");
System.out.println("Enter a value to remove from the set: ");
input = kb.nextLine();
}
words.remove(input);
System.out.println("There are now " + words.inventory() + " elements in the set");
System.out.println("The first 10 words in the set are: ");
for (int x=0; x<10; x++)
System.out.println(words.getFirstItem());
System.out.println("There are now " + words.inventory() + " elements in the set");
System.out.println("5 random words from the set are: ");
for (int x=0; x<5; x++)
System.out.println(words.getRandomItem());
System.out.println("There are now " + words.inventory() + " elements in the set");
}
}
main problem is that I don't know how to put the data from the main method that is called into an array
You're doing that correctly already, following this simple example
StringSet words = new MyStringSet();
words.insert("something");
However, the contents of the insert method seem incorrect 1) you never check for identical values 2) never resizing 3) Collections.addAll doesn't do what you want, most likely
So, with those in mind, and keeping an array, you should loop over it, and check where the next non-null, non-identical value would be placed
// postcondition: entry is inserted in Set if identical String
// not already present; if identical entry exists, takes no
// action. Calls resize if necessary
public void insert(String entry) {
int i = 0;
while (i < myArray.length && myArray[i] != null) {
if (myArray[i].equals(entry)) return; // end function because found matching entry
i++;
}
if (i >= myArray.length) {
// TODO: resize()
insert(entry); // retry inserting same entry into larger array
}
// updates the next non-null array position
myArray[i] = entry;
}
As far as displaying the MyStringSet class goes, to see the contents of the array, you'll want to add a toString method

HashSet vs TreeSet different size()

I'm reading a file and adding the words to a HashSet and a TreeSet. HashSet.size() gives me 350 items but TreeSet.size() 349 items. Does anyone have an explanation of this difference?
public static void main(String[] args) throws FileNotFoundException {
File file = new File("src/words.txt");
Scanner read = new Scanner(file);
Set<Word> hashSet = new HashSet<Word>();
Set<Word> treeSet = new TreeSet<Word>();
while(read.hasNext()) {
Word word = new Word(read.next());
hashSet.add(word);
treeSet.add(word);
}
System.out.println(hashSet.size());
System.out.println(treeSet.size());
Iterator<Word> itr = treeSet.iterator();
while (itr.hasNext()) {
System.out.println(itr.next().toString());
}
}
public class Word implements Comparable<Word> {
private String word;
public Word(String str) {
this.word = str; }
public String toString() {
return word.toLowerCase(); }
/* Override Object methods */
public int hashCode() {
int hashCode = 0;
int temp;
for(int i = 0; i<word.length();i++){
temp = (int) word.charAt(i);
hashCode += temp^hashCode;
}
return hashCode;
}
public boolean equals(Word other) {
if(other instanceof Word){
if(compareTo(((Word) other))==0)
return true;
else
return false;}
else
return false;
}
public int compareTo(Word w) {
if(this.word.compareToIgnoreCase(w.toString())>0)
return 1;
if(this.word.compareToIgnoreCase(w.toString())<0)
return -1;
else
return 0;
}
}
Change your equals from equals(Word) to equals(Object). Please also add #Override attribute.
Moreover, your hashCode method does not guarantee that for two words that are equal (ignoring case), they will have the same hash code. You can use toUpperCase() on word before computing the hash code.
Your equals and compareTo method behaves differently for same input.
E.g.
Word w1 = new Word("Word");
Word w2 = new Word("word");
System.out.println(w1 == w2);
System.out.println(w1.equals(w2));
System.out.println(w1.compareTo(w2));
will give
false
true
0
HashSet uses equals method to compare keys, while TreeSet will use compareTo method to check equivalence of keys. Since your implementation is not correct, for different scenarios, hashset will treat keys as different while treeset might be considering them as same.
To know which values are getting treated as same by TreeSet you can print the result of addition to the Sets. Both will return true, if key does not exist otherwise false is returned.
while(read.hasNext()) {
Word word = new Word(read.next());
System.out.println(hashSet.add(word));
System.out.println(treeSet.add(word));
}

Two Sum with classes

public class TwoSum {
private HashMap<Integer, Integer> elements = new HashMap<Integer, Integer>();
public void add(int number) {
if (elements.containsKey(number)) {
elements.put(number, elements.get(number) + 1);
} else {
elements.put(number, 1);
}
}
public boolean find(int value) {
for (Integer i : elements.keySet()) {
int target = value - i;
if (elements.containsKey(target)) {
if (i == target && elements.get(target) < 2) {
continue;
}
return true;
}
}
return false;
}
}
I am not sure how the class is able to take the numbers in the hash-map and tell us whether or not 2 numbers can be added together to create another number. Specifically, I do not understand how the find boolean works or why the add void puts numbers in the hash map the way it does and for what reason. Effectively what this class is supposed do is add items to a hash-map with the add function and then use find in order to determine if any two integers can be used to add up to the target.
See comments in the code below.
public class TwoSum {
// create a hashmap to contain the NUMBER added and the COUNT of that number
private HashMap<Integer, Integer> elements = new HashMap<Integer, Integer>();
public void add(int number) {
// does the hashmap have the NUMBER as a key
if (elements.containsKey(number)) {
// get the COUNT of the NUMBER and increment it by 1
// and update the hashmap
elements.put(number, elements.get(number) + 1);
} else {
// the NUMBER doesn't exist in the hashmap,
// so add it and set the COUNT to 1
elements.put(number, 1);
}
}
public boolean find(int value) {
// Loop through the NUMBERS (which are keys in the hashmap
for (Integer i : elements.keySet()) {
// subtract the NUMBER (i) from the VALUE then
// all we have to do is look for the TARGET in the hashmap
int target = value - i;
// start looking for the TARGET
if (elements.containsKey(target)) {
// If we made it here, we found a match
// if I == TARGET, then there has to be a COUNT of at least 2
// for example if VALUE = 6 and I = 3 then TARGET also = 3
// so the COUNT of 3s in the hashmap has to be at least 2
// if the COUNT is not >= 2 then we jump to the next I
if (i == target && elements.get(target) < 2) {
continue; // jump to next I
}
return true; // we found a match to TARGET so we can exit
}
}
return false; // no matches for TARGET
}
}

Check duplicates of numbers input by user

I'm doing a program where user input five numbers and in the end the numbers are printed out which is working fine. What I can't get to work is a boolean function to check for duplicates. It should check for duplicates as the user write them in, so e.g. if number one is 5 and the second numbers is also 5, you should get an error until you write in a different number. Meaning if the user input a duplicate it should NOT be saved in the array. This is obviously an assignment, so I'm just asking for a hint or two.
This program is written based on pseudo-code given to me, and therefore I have to use a boolean to check for duplicates with the public boolean duplicate( int number ) class.
I've tried getting my head around it and tried something by myself, but obviously I'm doing a stupid mistake. E.g.:
if(int i != myNumbers[i])
checkDuplicates = false
else
checkDuplicates = true;
return checkDuplicates;
DuplicatesTest class:
public class DuplicatesTest {
public final static int AMOUNT = 5;
public static void main(String[] args) {
Duplicates d = new Duplicates(AMOUNT);
d.inputNumber();
d.duplicate(AMOUNT);
d.printInputNumbers();
}
}
Duplicates class:
public class Duplicates {
private int amount;
private int[] myNumbers;
private boolean checkDuplicates;
public Duplicates(int a) {
amount = a;
myNumbers = new int[amount];
}
public void inputNumber() {
for(int i = 0; i < amount; i++ ) {
int input = Integer.parseInt(JOptionPane.showInputDialog("Input 5 numbers"));
myNumbers[i] = input;
}
}
public boolean duplicate( int number ) {
<BOOLEAN TO CHECK FOR DUPLICATES, RETURN FALSE OR TRUE>
}
public void printInputNumbers() {
JTextArea output = new JTextArea();
output.setText("Your numbers are:" + "\n");
for(int i = 0; i < myNumbers.length; i++) {
if (i % 5 == 0) {
output.append("\n");
}
output.append(myNumbers[i] + "\t");
}
JOptionPane.showMessageDialog(null, output, "Numbers", JOptionPane.PLAIN_MESSAGE);
}
}
Sorry if the code tag is messy, I had some trouble with white fields in between and such. I'm new here.
Don't store the numbers in an array. Use a Set<Integer> instead. And then do a Set#contains() operation. It's O(1) operation which is actually far better than iterating over the array to search for duplicates.
Ok, if it's a compulsion to use an array, then you should modify your current approach, to return true as soon as you find a duplicate, instead of iterating over the array again. In your current approach, since you are setting the boolean variable to false in the else block, your method will return false if the last element of the array is not the same as what you are checking. So, just modify your approach to:
// loop over the array
if (number == myNumbers[i])
return true;
// outside the loop, if you reach, return false
return false;
Note that your current if statement will not compile. You are declaring an int variable there, which you can't do.
if (int i == myNumbers[i]) // this is not a valid Java code.
int nums[] = new int[5];
int count = 0;
public boolean duplicate(int number)
{
boolean isDup = false;
for (int i = 0; i <= count; i++)
{
if (number == nums[i])
{
isDup = true;
break;
}
}
if (!isDup)
{
count++;
nums[count] = number;
}
return isDup;
}

Priority Queues with Huffman tree

i am trying to create a Huffman tree by reading in a file and counting the frequency of each letter space symbol etc. i'm using a Priorityqueue to queue the items from smallest to largest but when i insert them into the queue they dont queue correctly here is my code.
package huffman;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.PriorityQueue;
import java.util.Scanner;
public class Huffman {
public ArrayList<Frequency> fileReader(String file)
{
ArrayList<Frequency> al = new ArrayList<Frequency>();
Scanner s;
try {
s = new Scanner(new FileReader(file)).useDelimiter("");
while (s.hasNext())
{
boolean found = false;
int i = 0;
String temp = s.next();
while(!found)
{
if(al.size() == i && !found)
{
found = true;
al.add(new Frequency(temp, 1));
}
else if(temp.equals(al.get(i).getString()))
{
int tempNum = al.get(i).getFreq() + 1;
al.get(i).setFreq(tempNum);
found = true;
}
i++;
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return al;
}
public void buildTree(ArrayList<Frequency> al)
{
PriorityQueue<Frequency> pq = new PriorityQueue<Frequency>();
for(int i = 0; i < al.size(); i++)
{
pq.add(al.get(i));
}
while(pq.size() > 0)
{
System.out.println(pq.remove().getString());
}
}
public void printFreq(ArrayList<Frequency> al)
{
for(int i = 0; i < al.size(); i++)
{
System.out.println(al.get(i).getString() + "; " + al.get(i).getFreq());
}
}
}
in the buildTree() method is where im having the problem. what im trying to do is queue Frequency objects which holds the letter/space/symbol and the frequency as an int the frequency class is this.
public class Frequency implements Comparable {
private String s;
private int n;
Frequency(String s, int n)
{
this.s = s;
this.n = n;
}
public String getString()
{
return s;
}
public int getFreq()
{
return n;
}
public void setFreq(int n)
{
this.n = n;
}
#Override
public int compareTo(Object arg0) {
// TODO Auto-generated method stub
return 0;
}
}
how can i get the priorityqueue to use the frequency number to queue them from smallest to biggest?
Actually you missed to implement the compareTo method to make your object effectively comparable.
The compareTo method, as documentation states, should
return a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.
This means that in your case you should do something like:
public int compareTo(Object arg0)
{
Frequency other = (Frequency)arg0;
return n < other.n ? -1 : (n == other.n ? 0 : 1);
}
But mind that comparable has a generic type that is preferable: Comparable<T> so you can avoid the cast on arg0 to make it a Frequency object with static type safety too:
class Frequency implements Comparable<Frequency> {
public int compareTo(Frequency f2) {
// directly compare
}
}
I think that "Auto-generated method stub" needs to be filled in with a real implementation of a "compareTo" so as to satisfy the requirements for something to be Comparable, which I assume the PriorityQueue is going to rely upon. The implementation is probably going to be "n < arg0", with appropriate downcasting from Object.
A Priority Queue, just as a data structure, is based on the concept of an ordering - you use such a structure when you want to order elements in a certain way - which elements are more important than others, etc.
In Java, ordering objects is usually done in one of two ways - your objects implement the Comparable interface, or you supply a Comparator<E> which knows how to order objects of type E.
To determine which object is "more important" than another, the compareTo() method is invoked. This method has a pretty simple contract:
Compares this object with the specified object for order. Returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.
Your implementation of Frequency.compareTo() always returns 0 for the comparison. Thus, you are specifying that all Frequency objects are equal to any other Frequency objects. This is clearly not what you want.

Categories

Resources