Java: How should I create a binary tree given binary codes - java

Say I have 2 leaves {1,2} and I was given their binary codes (same length), meaning that I should construct the binary tree based on the binary codes. And after constructing the binary tree, if I traverse the tree, I should retrieve the same binary codes for leave 1 and 2.
The data format is as follow:
leave : binary code
1: 0 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1
2: 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 0 1 0 0 1 0 1
For example, when the binary code is 0 1 1 0, I start from the root of the tree and I go left when I see 0 and right when I see 1 ...
Question: How can I construct a binary tree based on the binary code given ? (Please note that I am in fact dealing with 200000 leaves , and hence 200000 lines of binary code. So I need an efficient method to do this.)

You may want to look at this
http://www.cs.princeton.edu/courses/archive/spring01/cs126/assignments/prefix.html
this should hopefully give you insight on what you need to do to create the tree you want

Related

How to Implement Counting the Max Component of a 2D Array Representation of a Graph

Okay, so here is the scenario:
I have a 2D integer array representing my graph / matrix. If there is a connection, there is a 1, if no connection then there is a 0. Pretty simple, however I am iterating back through the array to create a subset as given. So if I have a graph:
{ 0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0 }
and I pass the subset {3,1}
then I will be left with
{ 0 0 1 0
0 0 0 0
1 0 0 0
0 0 0 0 }
Now my question is how to go about counting the maximum vertices in the components? So the output I want is the maximum vertices of a single component. My problem is I don't understand how I'm suppose to tell the difference in components. It's easier for me to understand on paper, but I am stumped on how to interpret it through code. I will say I am doing this in Java.
Any insight would be helpful
Edit Note:
I am trying to use BFS or some other search method to count each vertex and its connections. Then iterate over each vertex that has yet to be seen or checked, and continue. Then output the number of max components
Lets say I have a graph with connections as above before the subset given. The subset will be removed, then we are left with pieces of a graph. I then need to iterate over those pieces to find which piece has the most connections.

java understanding bitwise manipulation

Recently I have been learning about bitwise operators and along the way there is this code that finds the binary digits of a decimal number using the AND(&) bitwise operator, the code is as follows:
byte b = -34;
for(int t = 128;t > 0; t = t/2)
{
if((b & t) != 0)System.out.println("1 ");
else System.out.println("0 ");
System.out.println("b & t yields: " + (b & t));
}
I have modified the code to show the value calculated by b&t during each iteration. I would like to understand the exact mechanism behind this code as to why it works to find the binary digits, please explain why is b compared to t each iteration and why is t divided by 2 each iteration?
In addition, I would like to know how is (b&t) calculated manually by listing the binary digits.I do have an understanding how & works but when I listed out the binary digits of 34 and 128 and compared them:
1 0 0 0 0 0 0 0(128)
0 0 1 0 0 0 1 0(34) //I am unsure if the negative sign should be included
---------------
0 0 0 0 0 0 0 0
the result I got was 0, however the program returns 128 which is perplexing.
Below I will also include the result of the execution of the program:
1
b & t yields: 128
1
b & t yields: 64
0
b & t yields: 0
1
b & t yields: 16
1
b & t yields: 8
1
b & t yields: 4
1
b & t yields: 2
0
b & t yields: 0
Much obliged for the help :)
Dividing t by 2 is a bit-shift to the right:
1 0 0 0 0 0 0 0 128 = t
0 1 0 0 0 0 0 0 64 = t / 2
0 0 1 0 0 0 0 0 32 = t / 2 / 2
...
t always has one bit set to 1, all others are 0.
Then you compare that to b using &. Each result bit is 1 if and only if the corresponding bit in both inputs is 1 as well.
That means that we basically check if the bit in b is 1 at the location that the t-bit is 1. That is done for all bits from left to right.
Since there were two questions:
(1) OP is perplex why his hand-calculation does not match with program's code and
(2) OP would like to know why one of the variables is divided by 2.
I merely combine the answers:
(1) Negative numbers are represented as two's complement. Therefore negative 34 is
0 0 1 0 0 0 1 0 <-- +34
1 1 0 1 1 1 0 1 <-- one's complement of 34
1 1 0 1 1 1 1 0 <-- two's complement of 34
Note: two's complement is one's complement + 1.
(2) Division by 2 is shifting to the right (if it is binary). That's why, the third time in the loop, it outputs zero (The only 1 in 128 is 'AND'ed with the third zero in -34).

Evaluating the result of stanford nlp for sentiment analysis

I want to test few sentence using stanford NLP package and want to get sentiment result with it's score.
I tried in couple of ways. In few test I got partial result, like polarity of the text I gave. But not the sentiment score.
This is the command I executed: H:\Drive E\Stanford\stanfor-corenlp-full-2013~>java -cp "*" -mx1g edu.stanford. nlp.sentiment.Evaluate edu/stanford/nlp/models/sentiment/sentiment.ser.gz test.txt
Gives result:
EVALUATION SUMMARY
Tested 0 labels
0 correct
0 incorrect
? accuracy
Tested 0 roots
0 correct
0 incorrect
? accuracy
Label confusion matrix: rows are gold label, columns predicted label
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
Root label confusion matrix: rows are gold label, columns predicted label
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
Approximate negative label accuracy: ?
Approximate positive label accuracy: ?
Combined approximate label accuracy: ?
Approximate negative root label accuracy: ?
Approximate positive root label accuracy: ?
Combined approximate root label accuracy: ?
text.txt contains
This movie doesn't care about cleverness, wit or any other kind of intelligent humor.
Those who find ugly meanings in beautiful things are corrupt without being charming.
There are slow and repetitive parts, but it has just enough spice to keep it interesting.
if you want to get sentiment scores use this command -
java -cp stanford-corenlp-3.3.1.jar:stanford-corenlp-3.3.1-models.jar:xom.jar:joda-time.jar:jollyday.jar:ejml-0.23.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,parse,pos,sentiment -file input.txt
this will generate a xml file with name input.txt.xml, with sentiment score like this -
<sentence id="1" sentimentValue="1" sentiment="Negative">
You can't evaluate text file. You have to evaluate treebank.

Iterative Reduction to Null Matrix

Here's the problem: I'm given a matrix like
Input:
1 1 1
1 1 1
1 1 1
At each step, I need to find a "second" matrix of 1's and 0's with no two 1's on the same row or column. Then, I'll subtract the second matrix from the original matrix. I will repeat the process until I get a matrix with all 0's. Furthermore, I need to take the least possible number of steps.
I need to print all the "second" matrices in O(n) time. In the above example I can get to the null matrix in 3 steps by subtracting these three matrices in order:
Expected output:
1 0 0
0 1 0
0 0 1
0 0 1
1 0 0
0 1 0
0 1 0
0 0 1
1 0 0
I have coded an attempt, in which I am finding the first maximum value and creating the second matrices based on the index of that value. But for the above input I am getting 4 output matrices, which is wrong:
My output:
1 0 0
0 1 0
0 0 1
0 1 0
1 0 0
0 0 0
0 0 1
0 0 0
1 0 0
0 0 0
0 0 1
0 1 0
My solution works for most of the test cases but fails for the one given above. Can someone give me some pointers on how to proceed, or find an algorithm that guarantees optimality?
Test case that works:
Input:
0 2 1
0 0 0
3 0 0
Output
0 1 0
0 0 0
1 0 0
0 1 0
0 0 0
1 0 0
0 0 1
0 0 0
1 0 0
Summing of each row / column and taking the largest of those sums gives you the optimal number of matrix subtractions required to reduce to a null matrix.
For example:
1 2 4 0 = 7
2 2 0 1 = 5
0 0 1 0 = 1
3 0 2 1 = 6
= = = =
6 4 7 2
Which means that this matrix will take 7 optimal subtractions to empty.
I believe that counting backwards from this and removing from columns / row with that value will solve your problem (I am not sure of an efficient way of selecting these - brute force?).
You can also use your previous method to remove extra elements.
For example (using the above matrix).
Step 7:
We must subtract from row 1 & column 3.
0 0 1 0
0 0 0 0
0 0 0 0
0 0 0 0
Solves this, so now we can use your previous method to remove "bonus" elements.
0 0 1 0
1 0 0 0
0 0 0 0
0 0 0 1
Now apply the sum of each row / column again and continue for the next step.
Step 6:
1 2 3 0 = 6
1 2 0 1 = 4
0 0 1 0 = 1
3 0 2 0 = 5
= = = =
5 4 6 1
Next subtraction:
0 0 1 0
0 1 0 0
0 0 0 0
1 0 0 0
And so on.
Note: This still does not work very well with "all 1" matrices, as you get stuck on the problem of selecting 1 from every row and column (same as you did in your example).
But someone may be able to extend my solution.
Let Number of rows = Number of columns = N
for iteration=1:N
for row=1:N
cell(row,(row+iteration)%N) := 0
Number of iterations is N. In every iteration N one's will be changed to 0
I'm not entirely sure if this is what you are after, but could you create a list of available columns and mark them as used for each iteration.
For Example:
repeat until an empty matrix
mark all columns as available
for each row
find the maximum value in all available columns and store it's coordinates
mark that column as unavailable
print, decrement and clear the list of stored coordinates
This doesn't work, but it does show the algorithm that user1459032 is using.
1) If all you want to do is iterate through all the elements in your matrix...
2) then all you have to do is loop for (int i=0; i < rows*cols; i++) {} ...
3) And such a loop is ALREADY O(n) (i.e. it increases LINEARLY with the #/elements in your matrix)
I'm pretty sure that this is some kind of variant of the exact cover problem, which is known to be NP-complete. Your proposed algorithm is a simple greedy solution. The problem with greedy solutions is that they often work well enough to convince you that greed is good and then suddenly leave you high and dry looking for a better solution. (Consider the global economy, for example.) Anyway, Knuth's Dancing Links technique is a standard way of solving the problem (exact set cover, not global economy).

logics for crossword

I have a task to create a crossword, a specific one. All the answers are given, but their places are unknown. Program must read a file with board scheme like this :
0 1 0 0 0 0 0 0 1 0 0
0 1 0 1 1 1 1 1 1 1 1
0 1 0 1 0 0 1 0 1 0 1
0 S 1 1 0 1 1 1 1 0 1
0 1 0 0 1 0 1 0 1 0 0
1 1 1 1 1 1 1 S 1 1 0
0 0 0 0 1 0 1 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0
treating each column/row of ones as one possible answer. Is there any way to parse through this file and marking answers without using gazilion if's for each field ?
Rest of the logics is as follows :
- on the base of the parsed file crossword is created.
- user selects answers from lists of possibilities
- user clicks on the first block of Answer and if length and letters of selected answer and Answer match - fields are updated
Game board should be stored in 2d array I guess, and each Answer should have indexes of fields in it ?
Crossword puzzle construction is NP-Complete in general (i.e nxn board of 1s and 0s and a given set from which to pick answers). Look at: http://en.wikipedia.org/wiki/List_of_NP-complete_problems which just mentions this. Garey and Johnson's classic book also has a mention of this, saying Exact cover by 3 sets can be reduced to it.
So, you probably will have to use some backtracking/heuristic to fill the grid.
Perhaps this project report of two students from Dartmouth college will be of some help: Crossword Puzzle Generator. It contains some heuristics which you might be able to use.
Of course, you seem to imply there is a human involved, but it is not clear if you can leverage that person to fill the grid and whether your problem is basically some UI programming problem in helping the user out.

Categories

Resources