YES, this is a homework project.
That being said, I'm looking to learn from my mistakes rather than just have someone do it for me.
My project is a word frequency list - I accept a text file (or website URL) and count the:
- Number of unique words, and
- How many times they appear.
All methods are provided for me except for one: the insert(E word) method, where the argument is a generic type word.
The word is stored in a Node (Linked List project) that also has a 'count' value, which is the value representing the number of times the word appears in the text being read.
What this method has to do is the following:
If the argument is already in the list, increment the count of that element. I have done this part
If the argument is not found in the list, append it to the list. I also have done this part.
sort the list by descending count value. i.e. highest -> lowest count
3.5. If two elements have the same count value, they are sorted by the dictionary order of their word.
I am VERY unfamiliar with Linked Lists, so as such I am running into a lot of NullPointerExceptions. This is my current insert method:
public void insert(E word){
if(word.equals("")){
return;
}
if(first == null){//if list is null (no elements)
/*Node item = new Node(word);
first = item;*/
first = new Node(word);
}
else{//first != null
Node itemToAdd = new Node(word);
boolean inList = false;
for(Node x = first; x != null; x=x.next){
if (x.key.equals(word)){// if word is found in list
x.count++;//incr
inList = true;//found in list
break;//get out of for
}//end IF
if(x.next == null && inList == false){//if end of list && not found
x.next = itemToAdd;//add to end of list
break;
}//end IF
}//end FOR
//EVERYTHING ABOVE THIS LINE WORKS.
if (!isSorted()){
countSort();
}
}//end ELSE
}//end method
My isSorted() method:
public boolean isSorted(){
for(Node copy = first; copy.next != null; copy = copy.next){
if (copy.count < copy.next.count){
return false;
}
}
return true;
}
and last but not least, the part where I'm struggling, the sort method:
public void countSort(){
for (Node x = first, p = x.next; p != null; x=x.next, p=p.next){
// x will start at the first Node, P will always be 1 node ahead of X.
if(x == first && (x.count < p.count)){
Node oldfirst = first;
x.next = p.next;
first = p;
first.next = oldfirst;
break;
}
if (x.count < p.count){
//copy.next == x.
Node oldfirst = first;
oldfirst.next = first.next;
x.next = p.next;
first = p;
first.next = oldfirst;
break;
}
if (x.count == p.count){
if(x.toString().charAt(0) < p.toString().charAt(0)){
//[x]->[p]->[q]
Node oldfirst = first;
x.next = p.next;
first = p;
first.next = oldfirst;
break;
}
}
}
}
Here is the output of my insert method when called by the classes/methods given to me:
Elapsed time:0.084
(the,60)
(of,49)
(a,39)
(is,46)
(to,36)
(and,31)
(can,9)
(in,19)
(more,7)
(thing,7)
(violent,3)
(things,3)
(from,9)
(collected,1)
(quotes,1)
(albert,1)
(einstein,2)
(any,2)
(intelligent,1)
(fool,1)
(make,1)
(bigger,1)
(complex,1)
(it,11)
(takes,1)
(touch,1)
(genius,1)
(lot,1)
(courage,1)
(move,1)
(opposite,1)
(direction,1)
(imagination,1)
(important,5)
(than,3)
(knowledge,3)
(gravitation,1)
(not,17)
(responsible,1)
(for,14)
(people,2)
(falling,1)
(love,2)
(i,13)
(want,1)
(know,3)
(god,4)
(s,8)
(thoughts,2)
(rest,2)
(are,11)
(details,2)
(hardest,1)
(world,7)
(understand,3)
(income,1)
(tax,1)
(reality,3)
(merely,1)
(an,7)
(illusion,2)
(albeit,1)
(very,3)
(persistent,2)
(one,12)
(only,7)
(real,1)
(valuable,1)
(intuition,1)
(person,1)
(starts,1)
(live,2)
(when,3)
(he,11)
(outside,1)
(himself,4)
(am,1)
(convinced,1)
(that,14)
(does,5)
(play,2)
(dice,1)
(subtle,1)
(but,8)
(malicious,1)
(weakness,2)
(attitude,1)
(becomes,1)
(character,1)
(never,3)
(think,1)
(future,2)
(comes,1)
(soon,1)
(enough,1)
(eternal,1)
(mystery,1)
(its,4)
(comprehensibility,1)
(sometimes,1)
My initial idea has been to try and loop the if(!isSorted()){ countSort();} part to just repeatedly run until it's sorted, but I seem to run into an infinite loop when doing that. I've tried following my professor's lecture notes, but unfortunately he posted the previous lecture's notes twice so I'm at a loss.
I'm not sure if it's worth mentioning, but they provided me an iterator with methods hasNext() and next() - how can I use this as well? I can't imagine they'd provide it if it were useless.
Where am I going wrong?
You are close. First the function to compare the items is not complete, so isSorted() could yield wrong results (if the count is the same but the words are in wrong order). This is also used to sort, so it's best to extract a method for the comparison:
// returns a value < 0 if a < b, a value > 0 if a > b and 0 if a == b
public int compare(Node a, Node b) {
if (a.count == b.count)
return a.word.compareTo(b.word);
// case-insensitive: a.word.toLoweCase().compareTo(b.word.toLowerCase())
} else {
return a.count - b.count;
}
}
Or simplified which is enough in your case:
public boolean correctOrder(Node a, Node b) {
if (a.count > b.count)
return true;
else if (a.count < b.count)
return false;
else
return a.word.compareTo(b.word) <= 0;
}
For the sort you seem to have chosen bubble sort, but you are missing the outer part:
boolean change;
do {
change = false;
Node oldX = null;
// your for:
for (Node x = first; x.next != null; x = x.next) {
if (!correctOrder(x, x.next)) {
// swap x and x.next, if oldX == null then x == first
change = true;
}
oldX = x;
}
} while (change);
We could use the help of Java native library implementation or more efficient sort algorithms, but judging from the exercise the performance of the sort algorithm is of no concern yet, first need to grasp basic concepts.
With looking your codes, it sounds like to me that two things can be done:
Firstly, you can make use of Comparable class method. So, I assume you wrote the class Node, thus you may want to inherit from Comparable class. When you inherited from that class, java will automatically provide you the compareTo method, and all you need to do is to specify in that method that "I want to compare according to your counts and I want it to be in ascending order."
**Edit(1):By the way, I forgot the mention before but after you impelement your compareTo method, you can use Collections.sort(LinkedList list), and it will be done.
The second solution came to mind is that you can sort your list during the countSort() operation with the technique of adding all to an another list with sorting and after add all them back to the real list. The sorting technique I'm trying to say is, keep going towards to the end of the list until you find a Node in the list that has a count smaller than currently adding Node's counts. Hope that doesn't confuse your head, but by this way you can achieve more clear method and less complicated view. To be clear I want to repeat the procedure:
Look the next
If (next is null), add it //You are at the end.
else{
if (count is smaller than current count), add it there
else, keep moving to the next Node. //while can be used for that.
}
It's basically just an implementation of the Huffman Coding algorithm, but when I check the probability of the end BinaryTree (the only item in the queue left) it's grossly high.
// Make a BinaryTree for each item in CharOccurrences and add as an entry in initialQueue
for (int i = 0; i < charOccurrences.size(); i++) {
BinaryTree<CharProfile> bTree = new BinaryTree<CharProfile>();
bTree.makeRoot(charOccurrences.get(i));
initialQueue.add(bTree);
}
// Create the BinaryTree that we're adding to the resultQueue
BinaryTree<CharProfile> treeMerge = new BinaryTree<CharProfile>();
// Create the CharProfile that will hold the probability of the two merged trees
CharProfile data;
while (!initialQueue.isEmpty()) {
// Check if the resultQueue is empty, in which case we only need to look at initialQueue
if (resultQueue.isEmpty()) {
treeMerge.setLeft(initialQueue.remove());
treeMerge.setRight(initialQueue.remove());
// Set treeMerge's data to be the sum of its two child trees' probabilities with a null char value
data = new CharProfile('\0');
data.setProbability(treeMerge.getLeft().getData().getProbability() + treeMerge.getRight().getData().getProbability());
treeMerge.setData(data);
}
else {
// Set the left part of treeMerge to the lowest of the front of the two queues
if (initialQueue.peek().getData().getProbability() <= resultQueue.peek().getData().getProbability()) {
treeMerge.setLeft(initialQueue.remove());
}
else {
treeMerge.setLeft(resultQueue.remove());
}
if (!initialQueue.isEmpty()) {
// Set the right part of treeMerge to the lowest of the front of the two queues
if (initialQueue.peek().getData().getProbability() <= resultQueue.peek().getData().getProbability()) {
treeMerge.setRight(initialQueue.remove());
}
else {
treeMerge.setRight(resultQueue.remove());
}
}
// In the case that initialQueue is now empty (as a result of just dequeuing the last element), simply make the right tree resultQueue's head
else {
treeMerge.setRight(resultQueue.remove());
}
// Set treeMerge's data to be the sum of its two child trees' probabilities with a null char value
data = new CharProfile('\0');
data.setProbability(treeMerge.getLeft().getData().getProbability() + treeMerge.getRight().getData().getProbability());
treeMerge.setData(data);
}
// Add the new tree we create to the resultQueue
resultQueue.add(treeMerge);
}
if (resultQueue.size() > 1) {
while (resultQueue.size() != 1) {
treeMerge.setLeft(resultQueue.remove());
treeMerge.setRight(resultQueue.remove());
data = new CharProfile('\0');
data.setProbability(treeMerge.getLeft().getData().getProbability() + treeMerge.getRight().getData().getProbability());
treeMerge.setData(data);
resultQueue.add(treeMerge);
}
}
I then have this at the end:
System.out.println("\nProbability of end tree: "
+ resultQueue.peek().getData().getProbability());
Which gives me:
Probability of end tree: 42728.31718061674
Move the following lines inside the while loop:
// Create the BinaryTree that we're adding to the resultQueue
BinaryTree<CharProfile> treeMerge = new BinaryTree<CharProfile>();
Otherwise, one iteration adds treeMerge to resultQueue, and the next one may do treeMerge.setLeft(resultQueue.remove());, which makes treeMerge a child of itself...
I've created a TreeNode class which holds an ArrayList of tree nodes (named branch) and I want to add new branches to the tree by user entered paths. An example path being /Monkey/King/Bar where each of those would ideally be an existing branch with the exception of the last one (Bar would be the branch I'd like to add to King). Temp is a global variable I use for adding new branches to the tree and with recursion I'm trying to move down the path validating that each branch is a child of the previous branch and I'm having some trouble getting it to work. This is what I have so far and was wondering if it has something to do with not setting the parent when I re-declare the temp TreeNode. Any help would be appreciated and if anything I said is too vague please ask for clarification.
TreeNode tree = root;
boolean valid = false;
String y = x; //User entered path
for (int i = 0; i < x.length(); i++)
{
if (x.charAt(i) == '/')
{
for (int j = 0; j < tree.branch.size(); j++){
if (tree.branch.get(j).toString().equals(y)){
System.out.println(temp.value);
tree = tree.branch.get(j);
temp = tree;
valid = true;
}
else
valid = false;
}
y = "";
}
It looks like you're trying to traverse the tree and add a node (or a branch) at the end if the path exists or something similar?
I think the problem is mainly how you're handling the string, not the nodes. You're not actually getting the parts of the path, you're doing the whole string, then nothing.
First, a better way to work with string in this way is to use String.split
String[] pathparts = String.split("/");
Next, you need to know if it's the part of the strign
for(int i=0;i<pathparts.length-1;i++){ // we don't want the last string
// your code with .get(pathparts[i])
}
It looks like you're handling the rest of the code correctly if my assesment of what you're doign is correct.
I am trying to build a binary tree from a string input piped to System.in with Java. Whenever a letter from a-z is encountered in the string I am making an internal node (with 2 children). Whenever a 0 is encountered in the string I am making an external node (a leaf). The string is in preorder, so just as an example, if I had an input such as:
abcd000e000
I am supposed to make the following binary tree
a
/ \
b 0
/ \
c e
/ \ / \
d 00 0
/ \
0 0
At least that is what I think I am supposed to make according to the assignment details (in a link below). We were also given sample input and output for the entire program:
Input
a0
0
a00
ab000
Output
Tree 1:
Invalid!
Tree 2:
height: -1
path length: 0
complete: yes
postorder:
Tree 3:
height: 0
path length: 0
complete: yes
postorder: a
Tree 4:
height: 1
path length: 1
complete: yes
postorder: ba
I am trying to implement a program that will do this for me with Java, but I don't think I am making the binary tree correctly. I have provided the code I have come up with so far and detailed in the comments above each method what trouble I have run into so far while debugging. If more context is needed the following link details the entire assignment and what the ultimate goal is supposed to be (building the binary tree is only the first step, but I'm stuck on it):
Link to Assignment
import java.io.*;
// Node
class TreeNode {
char value;
TreeNode left;
TreeNode right;
}
// Main class
public class btsmall {
// Global variables
char[] preorder = new char[1000];
int i = 0;
// Main method runs gatherOutput
public static void main(String[] args) throws IOException {
new btsmall().gatherOutput();
}
// This takes tree as input from the gatherOutput method
// and whenever a 0 is encountered in the preorder character array
// (from a string from System.in) a new external node is created with
// a value of 0. Whenever a letter is encountered in the character
// array, a new internal node is created with that letter as the value.
//
// When I debug through this method, the tree "appears" to be made
// as expected as the tree.value is the correct value, though I
// can't check the tree.left or tree.right values while debugging
// as the tree variable seems to get replaced each time the condition
// checks restart.
public void createTree(TreeNode tree) throws IOException {
// Check that index is not out of bounds first
if (i >= preorder.length) {
i++;
} else if (preorder[i] == '0') {
tree = new TreeNode();
tree.value = '0';
tree.left = tree.right = null;
i++;
} else {
tree = new TreeNode();
tree.value = preorder[i];
i++;
createTree(tree.left);
createTree(tree.right);
}
}
// Supposed to print out contents of the created binary trees.
// Intended only to test that the binary tree from createTree()
// method is actually being created properly.
public void preorderTraversal(TreeNode tree) {
if (tree != null) {
System.out.println(tree.value + " ");
preorderTraversal(tree.left);
preorderTraversal(tree.right);
}
}
// Reads System.in for the Strings used in making the binary tree
// and is supposed to make a different binary tree for every line of input
//
// While debugging after the createTree method runs, the resulting tree variable
// has values of tree.left = null, tree.right = null, and tree.value has no value
// (it's just a blank space).
//
// This results in preorderTraversal printing out a single square (or maybe the square
// is some character that my computer can't display) to System.out instead of all
// the tree values like it's supposed to...
public void gatherOutput() throws IOException {
InputStreamReader input = new InputStreamReader(System.in);
BufferedReader reader = new BufferedReader(input);
String line = null;
TreeNode tree = new TreeNode();
while ((line = reader.readLine()) != null) {
preorder = line.toCharArray();
createTree(tree);
preorderTraversal(tree);
i = 0;
}
}
}
Can anyone help me with building the binary tree properly and point out what I am doing wrong that would result in the output I'm currently getting? Am I at least on the right track? Any hints?
Thanks.
EDIT:
Here's a picture of the "square" output (this is in Eclipse).
Your createTree() method ... doesn't create a tree.
You never attach internal nodes to anything ... you just create them, insert the value, then pass them to the next createTree() call (which does the same).
A quick fix can be a simple modification of your createTree(..) method,
public void createTree(TreeNode tree) throws IOException {
// Check that index is not out of bounds first
if (i >= preorder.length) {
i++;
} else if (preorder[i] == '0') {
tree.value = '0';
tree.left = tree.right = null;
i++;
} else {
tree.value = preorder[i];
i++;
tree.left = new TreeNode();
createTree(tree.left);
tree.right = new TreeNode();
createTree(tree.right);
}
}
Notice you were creating a TreeNode inside this method, whereas it was already passed as an argument. So, you were not at all using the same. Whatever you did was not in the original TreeNode passed.
NB: Not arguing about the correctness of binary tree. Just fix a problem in hand. This might help the OP.
but I don't think I am making the binary tree correctly
Yes, it is incorrect. In binary tree one sub-tree is "less" than current element, another one is "more". You have, for example, "b" as the parent for "c" and "e", while (if followed natural sorting) both "c" and "e" are "more".
You need to rebalance you tree in the process.
P.S. I don't know what zeros supposed to mean in the input, but if the input is limited the simplest way to build a binary tree from a sorted sequence is:
load whole sequence into an array
get middle element as the root one
repeat step 2 recursively for the sub-arrays on the left and right of the root element
Update
And yes, as stated in some other answer, you need to have something like:
} else if (preorder[i] == '0') {
TreeNode subTree = new TreeNode();
subTree.value = '0';
tree.rigth = subTree;
i++;
and then pass subTree into recursive call.
Also I see an implementation problem:
while ((line = reader.readLine()) != null) {
does not seem to be a correct stop condition. It will loop forever because if you press just enter, line will not be null, but an empty string.
This one is more suitable:
while (!(line = reader.readLine()).equals("")) {
My Huffman tree which I had asked about earlier has another problem! Here is the code:
package huffman;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.PriorityQueue;
import java.util.Scanner;
public class Huffman {
public ArrayList<Frequency> fileReader(String file)
{
ArrayList<Frequency> al = new ArrayList<Frequency>();
Scanner s;
try {
s = new Scanner(new FileReader(file)).useDelimiter("");
while (s.hasNext())
{
boolean found = false;
int i = 0;
String temp = s.next();
while(!found)
{
if(al.size() == i && !found)
{
found = true;
al.add(new Frequency(temp, 1));
}
else if(temp.equals(al.get(i).getString()))
{
int tempNum = al.get(i).getFreq() + 1;
al.get(i).setFreq(tempNum);
found = true;
}
i++;
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return al;
}
public Frequency buildTree(ArrayList<Frequency> al)
{
Frequency r = al.get(1);
PriorityQueue<Frequency> pq = new PriorityQueue<Frequency>();
for(int i = 0; i < al.size(); i++)
{
pq.add(al.get(i));
}
/*while(pq.size() > 0)
{
System.out.println(pq.remove().getString());
}*/
for(int i = 0; i < al.size() - 1; i++)
{
Frequency p = pq.remove();
Frequency q = pq.remove();
int temp = p.getFreq() + q.getFreq();
r = new Frequency(null, temp);
r.left = p;
r.right = q;
pq.add(r); // put in the correct place in the priority queue
}
pq.remove(); // leave the priority queue empty
return(r); // this is the root of the tree built
}
public void inOrder(Frequency top)
{
if(top == null)
{
return;
}
else
{
inOrder(top.left);
System.out.print(top.getString() +", ");
inOrder(top.right);
return;
}
}
public void printFreq(ArrayList<Frequency> al)
{
for(int i = 0; i < al.size(); i++)
{
System.out.println(al.get(i).getString() + "; " + al.get(i).getFreq());
}
}
}
What needs to be done now is I need to create a method that will search through the tree to find the binary code (011001 etc) to the specific character. What is the best way to do this? I thought maybe I would do a normal search through the tree as if it were an AVL tree going to the right if its bigger or left if it's smaller.
But because the nodes don't use ints doubles etc. but only using objects that contain characters as strings or null to signify its not a leaf but only a root. The other option would be to do an in-order run through to find the leaf that I'm looking for but at the same time how would I determine if I went right so many times or left so many times to get the character.
package huffman;
public class Frequency implements Comparable {
private String s;
private int n;
public Frequency left;
public Frequency right;
Frequency(String s, int n)
{
this.s = s;
this.n = n;
}
public String getString()
{
return s;
}
public int getFreq()
{
return n;
}
public void setFreq(int n)
{
this.n = n;
}
#Override
public int compareTo(Object arg0) {
Frequency other = (Frequency)arg0;
return n < other.n ? -1 : (n == other.n ? 0 : 1);
}
}
What I'm trying to do is find the binary code to actually get to each character. So if I were trying to encode aabbbcccc how would I create a string holding the binary code for a going left is 0 and going right is 1.
What has me confused is because you can't determine where anything is because the tree is obviously unbalanced and there is no determining if a character is right or left of where you are. So you have to search through the whole tree but if you get to a node that isn't what you are looking for, you have backtrack to another root to get to the other leaves.
Traverse through the huffman tree nodes to get a map like {'a': "1001", 'b': "10001"} etc. You can use this map to get the binary code to a specific character.
If you need to do in reverse, just handle it as a state machine:
state = huffman_root
for each bit
if (state.type == 'leaf')
output(state.data);
state = huffman_root
state = state.leaves[bit]
Honestly said, I didn't look into your code. It ought be pretty obvious what to do with the fancy tree.
Remember, if you have 1001, you will never have a 10010 or 10011. So your basic method looks like this (in pseudocode):
if(input == thisNode.key) return thisNode.value
if(input.endsWith(1)) return search(thisNode.left)
else return search(thisNode.right)
I didn't read your program to figure out how to integrate it, but that's a key element of huffman encoding in a nutshell
Try something like this - you're trying to find token. So if you wanted to find the String for "10010", you'd do search(root,"10010")
String search(Frequency top, String token) {
return search(top,token,0);
}
// depending on your tree, you may have to switch top.left and top.right
String search(Frequency top, String token, int depth) {
if(token.length() == depth) return "NOT FOUND";
if(token.length() == depth - 1) return top.getString();
if(token.charAt(depth) == '0') return search(top.left,token,depth+1);
else return search(top.right,token,depth+1);
}
I considered two options when I was having a go at Huffman coding encoding tree.
option 1: use pointer based binary tree. I coded most of this and then felt that, to trace up the tree from the leaf to find an encoding, I needed parent pointers. other wise, like mentioned in this post, you do a search of the tree which is not a solution to finding the encoding straight away. The disadvantage of the pointer based tree is that, I have to have 3 pointers for every node in the tree which I thought was too much. The code to follow the pointers is simple but more complicated that in option 2.
option 2: use an array based tree to represent the encoding tree that you will use on the run to encode and decode. so if you want the encoding of a character, you find the character in the array. Pretty straight forward, I use a table so smack right and there I get the leaf. now I trace up to the root which is at index 1 in the array. I do a (current_index / 2) for the parent. if child index is parent /2 it is a left and otherwise right.
option 2 was pretty easy to code up and although the array can have a empty spaces. I thought it was better in performance than a pointer based tree. Besides identifying the root and leaf now is a matter of indices rather than object type. ;) This will also be very usefull if you have to send your tree!?
also, you dont search (root, 10110) while decoding the Huffman code. You just walk the tree through the stream of encoded bitstream, take a left or right based on your bit and when you reach the leaf, you output the character.
Hope this was helpful.
Harisankar Krishna Swamy (example)
I guess your homework is either done or very late by now, but maybe this will help someone else.
It's actually pretty simple. You create a tree where 0 goes right and 1 goes left. Reading the stream will navigate you through the tree. When you hit a leaf, you found a letter and start over from the beginning. Like glowcoder said, you will never have a letter on a non-leaf node. The tree also covers every possible sequence of bits. So navigating in this way always works no matter the encoded input.
I had an assignment to write an huffman encoder/decoder just like you a while ago and I wrote a blog post with the code in Java and a longer explanation : http://www.byteauthor.com/2010/09/huffman-coding-in-java/
PS. Most of the explanation is on serializing the huffman tree with the least possible number of bits but the encoding/decoding algorithms are pretty straightforward in the sources.
Here's a Scala implementation: http://blog.flotsam.nl/2011/10/huffman-coding-done-in-scala.html