HashSet vs TreeSet different size()

HashSet vs TreeSet different size() - java

I'm reading a file and adding the words to a HashSet and a TreeSet. HashSet.size() gives me 350 items but TreeSet.size() 349 items. Does anyone have an explanation of this difference?
public static void main(String[] args) throws FileNotFoundException {
File file = new File("src/words.txt");
Scanner read = new Scanner(file);
Set<Word> hashSet = new HashSet<Word>();
Set<Word> treeSet = new TreeSet<Word>();
while(read.hasNext()) {
Word word = new Word(read.next());
hashSet.add(word);
treeSet.add(word);
}
System.out.println(hashSet.size());
System.out.println(treeSet.size());
Iterator<Word> itr = treeSet.iterator();
while (itr.hasNext()) {
System.out.println(itr.next().toString());
}
}
public class Word implements Comparable<Word> {
private String word;
public Word(String str) {
this.word = str; }
public String toString() {
return word.toLowerCase(); }
/* Override Object methods */
public int hashCode() {
int hashCode = 0;
int temp;
for(int i = 0; i<word.length();i++){
temp = (int) word.charAt(i);
hashCode += temp^hashCode;
}
return hashCode;
}
public boolean equals(Word other) {
if(other instanceof Word){
if(compareTo(((Word) other))==0)
return true;
else
return false;}
else
return false;
}
public int compareTo(Word w) {
if(this.word.compareToIgnoreCase(w.toString())>0)
return 1;
if(this.word.compareToIgnoreCase(w.toString())<0)
return -1;
else
return 0;
}
}

Change your equals from equals(Word) to equals(Object). Please also add #Override attribute.
Moreover, your hashCode method does not guarantee that for two words that are equal (ignoring case), they will have the same hash code. You can use toUpperCase() on word before computing the hash code.

Your equals and compareTo method behaves differently for same input.
E.g.
Word w1 = new Word("Word");
Word w2 = new Word("word");
System.out.println(w1 == w2);
System.out.println(w1.equals(w2));
System.out.println(w1.compareTo(w2));
will give
false
true
0
HashSet uses equals method to compare keys, while TreeSet will use compareTo method to check equivalence of keys. Since your implementation is not correct, for different scenarios, hashset will treat keys as different while treeset might be considering them as same.
To know which values are getting treated as same by TreeSet you can print the result of addition to the Sets. Both will return true, if key does not exist otherwise false is returned.
while(read.hasNext()) {
Word word = new Word(read.next());
System.out.println(hashSet.add(word));
System.out.println(treeSet.add(word));
}

Related

Unable to print out b.toString and c.toString

The program's purpose was to teach me how to create a character list, and practice using toString and booleanequals(object other).
public class CharList {
private char[] Array = new char[100];
private int numElements = 0;
public CharList() {
}
public CharList(String startStr){
Array=startStr.toCharArray();
}
public CharList(CharList other){
other.Array=new char[100];
}
public void add(char next) {
Array[numElements++] = next;
}
public char get(int index) {
return Array[index];
}
private int size() {
return numElements;
}
#Override
public String toString() {
String str = new String(Array);
return str;
}
public boolean equals(Object other) {
if(other == null) {
return false;
}
if(other instanceof CharList == false) {
return false;
}
else {
CharList that = (CharList) other;
return this.Array == that.Array ;
}
}
public static void main(String[] args) {
System.out.println("uncomment the code to use the charListDriver");
CharList a = new CharList();
CharList b = new CharList("Batman");
CharList c = new CharList(b);
a.add('k');
a.add('a');
a.add('t');
a.add('n');
a.add('i');
a.add('s');
System.out.println("a is :"+a.toString() +" and has " + a.size() + " chars");
System.out.println("b is :"+b.toString() +" and has " + b.size() + " chars");
System.out.println("c is :"+c.toString() +" and has " + c.size() + " chars")
System.out.println("B and A are equal : " + b.equals(a));
System.out.println("B and C are equal : " + b.equals(c));
}
}
my output is:
a is: katnis and has 6 chars
b is: and has 0 chars
c is: and has 0 chars
The main function was provided for me by my instructor. I don't understand why it is not printing out "batman".

The issue is with your constructor that takes a CharList
public CharList(CharList other){
other.Array=new char[100];
}
You see that it is setting other.Array equal to a new array of size 100.
So when you do this
CharList c = new CharList(b);
You are setting the Array of b to be a new array wiping out the array that contained the characters from "Batman".
If you fix the constructor in question to be
Array = other.Array.clone()
it'll fix the problem. I cloned the other array so that b and c aren't pointing to the exact same array. If they were then when you added chars to one, it would add chars to the other as well.
Next you'll see an issue with your size() method. It returns numElements but numElements isn't set in your constructors that take a String or a CharList so it's always 0. So be sure to set numElements in those constructors. You'll see that because of this error that when you call add on a CharList that was initialized form a String it changes the first char instead of adding it to the end.
I've only really answered the question about Batman and then size. But there are several other issues with this code as well.
What happens if someone calls add more than 100 times on a CharList initialized with default constructor
equals method is doing a reference equality check rather than making sure the chars in the arrays are identical
What happens when you call add to a CharList instantiated with String or CharList? As I noted it currently changes the char at index 0. But even if you fix that and set numElements correctly what will happen? It'll try to write past the end of the Array.

2 Things to go over (plus a 0th thing):
0)
You need to have a getArray() function. Because Array is marked private, there is no way to access it from the outside. You can write other.Array, but because Array is private, it is better practice to use a getArray function. Adding a getArray() is the way to go. (it would be simple, and look like: getArray() {return this.Array;})
1)
Your constructors that you wrote that looks like:
public CharList() {
}
public CharList(CharList other){
other.Array=new char[100];
}
is wrong.
You should change these like so:
public CharList() {
this.Array=new char[100];
}
public CharList(CharList other){
this.Array=other.Array;
}
Here, we made the empty constructor initialize to a set char length of 100. For the other, we made it so that this.Array = other.Array by using other.getArray().
Now, if you try this, it should work.
2)
Lets say you had this:
CharList batman1 = new CharList("batman");
CharList batman2 = new CharList("batman");
Then, java batman1.equals(batman2) would return false. This is because of pointers in java, and the way variable assignment works. for batman1.Array to equal batman2.array, it is not enough for their values to be equal. They also have to have to be pointing to the same thing. See Shallow copy for arrays, why can't simply do newArr = oldArr? for more info.
To fix this, we need a getArray(). Assuming we have it:
public boolean equals(Object other) {
if(other == null) {
return false;
}
if(!(other instanceof CharList)) {
return false;
}
if(other.size()!=this.size()) {
return false;
}
CharList that = (CharList) other;
for (int i=0; i<other.size(); i++) {
if (that.get(i)!=other.get(i)) return false;
}
return true;
}
I did a lot of things here. First, we cleaned up the if statements. You don't need that else at the end. Then, I implemented what is known as a shallow check. It checks if the two Arrays have the same values. If everything is the same, then return true.
If you have followed all of these steps, then it should work.

Why can't TreeSet have student ages similar when the whole student object is unique?

I have a datatype StudentSet. It accepts name and age. When I add this to TreeSet, students with similar ages are not added. I know that TreeSet only add uniques, but i have different name for student so as a whole isn't the StudentSet unique? I want to know the reason behind this.
Following is my code
import java.util.Comparator;
import java.util.TreeSet;
public class StudentSet implements Comparable<StudentSet> {
String name;
int age;
public StudentSet(String name, int age) {
super();
this.name = name;
this.age = age;
}
#Override
public String toString() {
return "Student[Name= " + name + "," + " Age= " + age + "]";
}
public static void main(String[] args) {
TreeSet<StudentSet> set = new TreeSet<>();
set.add(new StudentSet("xyz", 21));
set.add(new StudentSet("abc", 23));
set.add(new StudentSet("xyxyxr", 24));
System.out.println(set.add(new StudentSet("aaaaaa", 20))); //prints false
System.out.println(set.add(new StudentSet("bbbbbb", 20))); //prints false
System.out.println(set.add(new StudentSet("cccc", 20))); //prints false
TreeSet<StudentSet> sort = new TreeSet<>(new Comparator<StudentSet>() {
#Override
public int compare(StudentSet o1, StudentSet o2) {
return o1.name.compareTo(o2.name);
}
});
sort.addAll(set);
System.out.println("Sorting According to Name\n");
for (StudentSet s : sort) {
System.out.println(s);
}
System.out.println();
sort = new TreeSet<>(new Comparator<StudentSet>() {
#Override
public int compare(StudentSet o1, StudentSet o2) {
return o1.compareTo(o2);
}
});
System.out.println("Sorting According to Age\n");
sort.addAll(set);
for (StudentSet s : sort) {
System.out.println(s);
}
System.out.println();
sort = new TreeSet<>(new Comparator<StudentSet>() {
#Override
public int compare(StudentSet o1, StudentSet o2) {
int lastIndex1 = o1.name.lastIndexOf(" ");
int lastIndex2 = o2.name.lastIndexOf(" ");
String lastName1 = o1.name.substring(lastIndex1);
String lastName2 = o2.name.substring(lastIndex2);
if (lastName1.equals(lastName2)) {
return o1.name.compareTo(o2.name);
} else {
return lastName1.compareTo(lastName2);
}
}
});
System.out.println("Sorting According to Last Name\n");
sort.addAll(set);
for (StudentSet s : sort) {
System.out.println(s);
}
}
#Override
public int compareTo(StudentSet o) {
return ((Integer) this.age).compareTo(o.age);
}
}
Update:
The main culprit was that i have set the compareTo() method in StudentSet to compare ages, which the TreeSet internally uses to both compare and check for uniqueness.
Here is my corrected and satisfied code.
#Override
public int compareTo(StudentSet o)
{
int i = Integer.compare(this.age, o.age);
if (i == 0)
return this.name.compareTo(o.name);
else
return i;
}

You are creating 4 TreeSet instances, each with a different Comparator. The Comparator passed to the TreeSet determines if two elements are considered to be identical.
In the first TreeSet, you are not passing any Comparator to the constructor, which means the natural ordering (defined by Comparable) is used. The Comparable's compareTo compares by age only.
In the third TreeSet, you are using this compare method :
#Override
public int compare(StudentSet o1, StudentSet o2)
{
return o1.compareTo(o2);
}
Since compareTo compares only by age, two StudentSet instances having the same age are considered the same, and only one of them will be added to the TreeSet.
If you want your original set TreeSet as well as the 3 TreeSets assigned to the sort variable to contain all the unique elements, all your compare and compareTo methods must sort by all properties that determine a unique StudentSet instance.
They can sort the TreeSet using different orderings by comparing the properties in a different order each time. For example, one can compare the names first and then the ages (if the names are equal), while another can compare the ages first and then the names (if the ages are equal).

It is working fine as per the implementation of compareTo method. If you want it to be based upon name not the ages then you need to correct this in the compareTo method as follows,
#Override
public int compareTo(StudentSet o)
{
return this.name.compareTo(o.name);
//return ((Integer) this.age).compareTo(o.age);
}

Redefine your compareTo method, StudentSet is your own custom object, so by implementing Comparable interface, you are telling compiler to add to TreeSet following your implementation.
The implementation below say, if age of two or more StudentSet are same, then they are unique equal.
#Override
public int compareTo(StudentSet o)
{
return ((Integer) this.age).compareTo(o.age);
}
Modify the implementation as below,
#Override
public int compareTo(StudentSet o)
{
int i = ((Integer) this.age).compareTo(o.age);
if (i == 0)
return this.name.compareTo(o.name);
else
return i;
}
The above will check for both Age and Name and if same, the your object is unique.

hashCode collisions are putting two different words in same position

I am supposed to implement an interface with a hash table. Problem is that I'm getting the wrong output and it's due to collision (from what I understand). I haven't been writing this code completely solo, I've been getting help. I'm not a master at Java, very early in my course so this is all very hard for me so please be patient.
Here is my code so far:
runStringDictionary.java
import java.io.BufferedReader;
import java.io.FileReader;
public class runStringDictionary {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
if (args.length == 0 || args.length > 1) {
System.out.println("Syntax to run the program: java runStringDictionary <inputFile>");
}
if (args.length == 1) {
try {
Dictionary myDictionary = new Dictionary(); //Initialize a Dictionary to store input words
BufferedReader br = new BufferedReader(new FileReader(args[0])); //Read the text file input
String line;
while ((line = br.readLine()) != null) {//Read each line
String[] strArray = line.split(" "); //Separate each word in the line and store in another Array
for (int i = 0; i < strArray.length; i++) { //Loop over the Array
if (myDictionary.contains(strArray[i])) { //Check if word exists in the dictionary
myDictionary.remove(strArray[i]); //if it does remove it
} else {
myDictionary.add(strArray[i]); //if it doesn't then add it
}
}
}//while loop ends
//print the contents of myDictionary
for (int i = 0; i < 25; i++) {
if (myDictionary.table[i] != null) {
System.out.println(myDictionary.table[i]);
}
}
} catch (Exception e) {
System.out.println("Error found : " + e);
}
}
}
}
StringDictionary.java
public interface StringDictionary {
public boolean add(String s);
public boolean remove(String s);
public boolean contains(String s);
}
Dictionary.java
public class Dictionary implements StringDictionary {
private int tableSize = 25;
Object[] table;
// constructor
Dictionary() {
this.table = new Object[this.tableSize];
}
#Override
public boolean add(String s) {
// TODO Auto-generated method stub
int hashCode = s.hashCode() % this.tableSize;
if (!this.contains(s)) {
this.table[hashCode] = s;
}
return false;
}
#Override
public boolean remove(String s) {
// TODO Auto-generated method stub
int hashCode = s.hashCode() % this.tableSize;
if (this.contains(s)) {
this.table[hashCode] = null;
return true;
}
return false;
}
#Override
public boolean contains(String s) {
// TODO Auto-generated method stub
int hashCode = s.hashCode() % this.tableSize;
if (table[hashCode] != null) {
if (table[hashCode].equals(s))
return true;
}
return false;
}
}

Hashcode collisions are expected and normal; the hashcode is used to narrow down the pool of potential matches and those potential matches must then be checked for canonical equality.

This int hashCode = s.hashCode() % this.tableSize; says that your Dictionary can contain only 25 elements. For any string you'll get a hashCode from 0 to 24.
You need to keep an array of lists. Each list contains string with the same hasCode.

Hash code collisions are normal in hash tables, you would have to have a perfect hash function in order to avoid them. There are multiple strategies, that you can implement in order to deal with collisions, but basically, either you move items around in the list, placing them to different buckets, or you allow each bucket to store multiple values, such as through ArrayList.
Deciding, which value to retrieve from the table, if multiple values share the same hash code brings additional cost in terms of lookup time, therefore a good hash function will minimise the number of collisions as much as possible.

int hashCode = s.hashCode() % this.tableSize; will
You are bound to get collisions here. Valid indices into table run from 0 to this.tableSize - 1, which in your case is 24.
Hashcode collisions are expected and are a normal occurrence. Having the same hash code doesn't mean that the elements are equal; it just means that they hash to the same value. You have to look at the contents to be sure.
The aim in a structure like this is usually to create a hash function that reduces the probability of collisions. You currently have a very simple hashing function that is simply the modulus of the hash code with the size of your table, and so you have a 1 / tableSize chance of a collision (someone please correct me here if I am wrong).

cheking on symmetric number

How check number on symmetrics?
public static int Symmetric(int a) {
if(new StringBuilder(Integer.toString(a)) ==
new StringBuilder(Integer.toString(a)).reverse())
return a;
else
return 0;
}
I try do it smth like this but always return 0.

You can't use == to compare Strings (or StringBuilders), you need to use equals().
Also, you need to turn the StringBuilders back to Strings before comparing:
EDIT:
Also, there is really no need for the first StringBuilder:
public static int symmetric(int a) {
if (Integer.toString(a).equals(new StringBuilder(Integer.toString(a)).reverse().toString()))
return a;
else
return 0;
}

Equality is explained here in JLS.
You must use equals() on Strings: StringBuilder.toString().equals().
public static int Symmetric( int a ) {
return
new StringBuilder(Integer.toString(a)).toString().equals(
StringBuilder(Integer.toString(a)).reverse().toString())
? a : 0;
}

Priority Queues with Huffman tree

i am trying to create a Huffman tree by reading in a file and counting the frequency of each letter space symbol etc. i'm using a Priorityqueue to queue the items from smallest to largest but when i insert them into the queue they dont queue correctly here is my code.
package huffman;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.PriorityQueue;
import java.util.Scanner;
public class Huffman {
public ArrayList<Frequency> fileReader(String file)
{
ArrayList<Frequency> al = new ArrayList<Frequency>();
Scanner s;
try {
s = new Scanner(new FileReader(file)).useDelimiter("");
while (s.hasNext())
{
boolean found = false;
int i = 0;
String temp = s.next();
while(!found)
{
if(al.size() == i && !found)
{
found = true;
al.add(new Frequency(temp, 1));
}
else if(temp.equals(al.get(i).getString()))
{
int tempNum = al.get(i).getFreq() + 1;
al.get(i).setFreq(tempNum);
found = true;
}
i++;
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return al;
}
public void buildTree(ArrayList<Frequency> al)
{
PriorityQueue<Frequency> pq = new PriorityQueue<Frequency>();
for(int i = 0; i < al.size(); i++)
{
pq.add(al.get(i));
}
while(pq.size() > 0)
{
System.out.println(pq.remove().getString());
}
}
public void printFreq(ArrayList<Frequency> al)
{
for(int i = 0; i < al.size(); i++)
{
System.out.println(al.get(i).getString() + "; " + al.get(i).getFreq());
}
}
}
in the buildTree() method is where im having the problem. what im trying to do is queue Frequency objects which holds the letter/space/symbol and the frequency as an int the frequency class is this.
public class Frequency implements Comparable {
private String s;
private int n;
Frequency(String s, int n)
{
this.s = s;
this.n = n;
}
public String getString()
{
return s;
}
public int getFreq()
{
return n;
}
public void setFreq(int n)
{
this.n = n;
}
#Override
public int compareTo(Object arg0) {
// TODO Auto-generated method stub
return 0;
}
}
how can i get the priorityqueue to use the frequency number to queue them from smallest to biggest?

Actually you missed to implement the compareTo method to make your object effectively comparable.
The compareTo method, as documentation states, should
return a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.
This means that in your case you should do something like:
public int compareTo(Object arg0)
{
Frequency other = (Frequency)arg0;
return n < other.n ? -1 : (n == other.n ? 0 : 1);
}
But mind that comparable has a generic type that is preferable: Comparable<T> so you can avoid the cast on arg0 to make it a Frequency object with static type safety too:
class Frequency implements Comparable<Frequency> {
public int compareTo(Frequency f2) {
// directly compare
}
}

I think that "Auto-generated method stub" needs to be filled in with a real implementation of a "compareTo" so as to satisfy the requirements for something to be Comparable, which I assume the PriorityQueue is going to rely upon. The implementation is probably going to be "n < arg0", with appropriate downcasting from Object.

A Priority Queue, just as a data structure, is based on the concept of an ordering - you use such a structure when you want to order elements in a certain way - which elements are more important than others, etc.
In Java, ordering objects is usually done in one of two ways - your objects implement the Comparable interface, or you supply a Comparator<E> which knows how to order objects of type E.
To determine which object is "more important" than another, the compareTo() method is invoked. This method has a pretty simple contract:
Compares this object with the specified object for order. Returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.
Your implementation of Frequency.compareTo() always returns 0 for the comparison. Thus, you are specifying that all Frequency objects are equal to any other Frequency objects. This is clearly not what you want.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HashSet vs TreeSet different size() - java

Change your equals from equals(Word) to equals(Object). Please also add #Override attribute. Moreover, your hashCode method does not guarantee that for two words that are equal (ignoring case), they will have the same hash code. You can use toUpperCase() on word before computing the hash code.

Related

Unable to print out b.toString and c.toString

Why can't TreeSet have student ages similar when the whole student object is unique?

hashCode collisions are putting two different words in same position

cheking on symmetric number

Priority Queues with Huffman tree

Categories

Resources