Shannon entropy not printing anything

Shannon entropy not printing anything - java

Java, Intellij IDE
Coursera, Computer Science: Programming with a Purpose
Princeton University
My program is not returning any output because n and f[] aren't returning any value outside the while loop - I checked it using the print statement. However, when I use the same print statement to print the value of n and f[] inside the while loop it prints the value. It seems like n and f[] becomes obsolete outside the while loop.
The question is Shannon entropy. Write a program ShannonEntropy.java that takes a command-line integer m; reads a sequence of integers between 1 and m from standard input; and prints the Shannon entropy to standard output, with 4 digits after the decimal point. The Shannon entropy of a sequence of integers is given by the formula:
H=−(p1log2p1+p2log2p2+…+pmlog2pm)
where pi denotes the proportion of integers whose value is i. If pi=0, then treat pilog2pi as 0.
If the question is unclear please take a look
It will be great if you can help me out. Thanks in advance.
public class ShannonEntropy {
public static void main(String[] args) {
int m = Integer.parseInt(args[0]);
int[] f = new int[m + 1];
int n = 0;
// calculating the frequency by incrementing the array and incrementing n alongside
while (!StdIn.isEmpty()) {
int value = StdIn.readInt();
f[value]++;
n++;
}
double entropy = 0;
for (int i = 1; i <= m; i++) {
double p = (double) f[i] / n;
System.out.println(p);
if (f[i] > 0)
entropy -= p * (Math.log(p) / Math.log(2));
}
// printing the output
StdOut.println((double) Math.round(entropy * 10000) / 10000);
}
}

Hi I have tested your code. There is indeed a problem with the way you wrote your output code.
Instead of:
StdOut.println((double) Math.round(entropy * 10000) / 10000);
You should use formatted printing aka printf() to achieve 4 decimal places instead of using Math.round() as 1.0 will be printed as 1.0 instead of the desired 1.0000
Use this:
StdOut.printf("%.4f\n" ,entropy);
For more regarding printf(), refer to this link: printf() guide

Related

Java random with low percentage on boolean array (quantile function)

I have a boolean array of aproximattely 10 000 elements. I would like to with rather low,set probability (cca 0,1-0,01) change the value of the elements, while knowing the indexes of changed elements. The code that comes to mind is something like:
int count = 10000;
Random r = new Random();
for (int i = 0; i < count; i++) {
double x = r.nextDouble();
if (x < rate) {
field[i]=!field[i];
do something with the index...
}
}
However, as I do this in a greater loop (inevitably), this is slow. The only other possibility that I can come up with is using quantile function (gaussian math), however I have yet to find any free to use code or library to use. Do you have any good idea how to work around this problem, or any library (standard would be best) that could be used?

Basically, you have set up a binomial model, with n == count and p == rate. The relevant number of values you should get, x, can be modeled as a normal model with center n*p == count*rate and standard deviation sigma == Math.sqrt(p*(1-p)/n) == Math.sqrt(rate * (1-rate) / count).
You can easily calculate
int x = (int) Math.round(Math.sqrt(rate * (1-rate) / count)
* r.nextGaussian() + count * rate)
Then you can generate x random numbers in the range using the following code.
Set<Integer> indices = new HashSet<Integer>();
while(indices.size() < x){
indices.add(r.nextInt(count));
}
indices will now contain the correct indices, which you can use as you wish.
You'll only have to call nextInt a little more than x times, which should be much less than the count times you had to call it before.

Find missing term in arithmetic progression -

So I'm working on this programming challenge online where I'm supposed to write a program that finds the missing term in an arithmetic progression. I solved the problem in two ways: one that used summing all the given terms of the progression and then subtracting that from the sum of the actual progression. My other solution was based on finding the difference of the progression, and then using that difference in a for loop to find the missing term. While my first solution successfully passes all test cases, my second solution fails two out of the 7 test cases. The challenge doesn't allow anyone to see their test cases so I had no idea what was wrong. Can anyone think of cases where my second solution fails to find the missing term of an arithmetic progression? Code for my second solution is below.
import java.io.*;
import java.util.Vector;
public class Solution {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int num = Integer.parseInt(br.readLine());
String[] numbers = br.readLine().split(" ");
Vector<Integer> ap = new Vector<Integer>();
for (String str: numbers){
ap.add(Integer.parseInt(str));
}
int first = ap.get(0);
int last = ap.get(ap.size()-1);
int incr = (last-first)/num;
for(int i = first; i<=last; i+= incr){
if(!ap.contains(i)){
System.out.println(i);
break;
}
}
}
}
Input Format The first line contains an Integer N, which is the number of terms which will be provided as input. This is followed by N consecutive Integers, with a space between each pair of integers. All of these are on one line, and they are in AP (other than the point where an integer is missing).

public class MissingAp
{
public static void main(String args[]){
int arr[] ={10,8,4,2,0,-2,-4,-6,-8,-10,-12};
int difference[]=new int[arr.length-1];
int missingTerm;
for(int i=1;i<arr.length;i++){
difference[i-1] = arr[i]-arr[i-1];
}
for(int j =0;j<arr.length-1;j++){
if(difference[j]!=difference[j+1]){
missingTerm = arr[j]+difference[j+1];
System.out.println("The missing term is: " + missingTerm );
break;
}}}}
This program will help you find missing term of an AP.

Wouldn't this fail if the sequence is decreasing instead of increasing?
If I had the numbers 10, 8, 4, 2, 0, the missing value would be 6.
You find increment of -10/5 = -2 properly.
But then the loop you start from i = 10, decrease by 2... as long asi <= 0. Well immediately i is > 0, so it'd exit the loop before decreasing at all. < only works if increasing.
So it's the i<=last statement that I think is the problem.
So you'd need seem kind of way to adjust the i<=last; statement based upon whether it is a positive or negative increment. I'm thinking it would have to do with absolute value and\or Math.signum, or including separate code section based upon a negative increment (not the fastest way, but reasonable). But I've never done much in Java, and you asked for how it failed. So hopefully there's your answer :-)

Sort the array to ensure that this works for any case.
Arrays.sort(input_array)

A JavaScript based solution for the same:
This has 2 cases:
CASE 1:
The array passed in has just one missing term, and the first and last
terms are not the missing ones. It definitely has a missing term.
In this case, we just need the array and use the basic school formula
sum of n terms = n/2 * (first + last)
function getMissingTerm(terms, n) {
var expectedSum = ((terms.length + 1)/2) * (terms[0] + terms[terms.length - 1]),
actualSum = 0;
for (var i = 0; i < terms.length; ++i) {
actualSum += parseInt(terms[i], 10);
}
return expectedSum - actualSum;
}
CASE 2:
The array passed does not have a missing term in itself
meaning the missing term is either the first or last term
In this case one must pass the length of the terms, n, which is greater than array length
function getMissingTerm(terms, n) {
var smallestDifference = Math.abs(terms[1] - terms[0]),
missingTerm = null;
for (var i = 2, diff; i < terms.length; ++i) {
diff = Math.abs(terms[i] - terms[i - 1]);
if (diff !== smallestDifference) {
missingTerm = diff < smallestDifference ? i : i + 1;
}
}
return (n && terms.length < n) ?
[terms[0] - smallestDifference, terms[n-2] + smallestDifference] : // return possible 2 terms, at the start and end of array
terms[0] + (missingTerm - 1) * smallestDifference; // return the missing term
}

Use of integers and doubles give different answers when they shouldn't

I'm solving a Project Euler Problem 14 using java. I am NOT asking for help solving the problem. I have already solved it, but I ran into something I can't figure out.
The problem is like this:
The following iterative sequence is defined for the set of positive
integers:
n = n/2, if n is even
n = 3n + 1, if n is odd
Using the rule above and starting with 13, we generate the following
sequence:
13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1. Here, the length of the chain is 10 numbers.
Find the starting number below 1,000,000 that produces the longest chain.
So I wrote this code:
public class Euler014 {
public static void main(String[] args){
int maxChainCount = 0;
int answer = 0;
int n;
int chainCount = 1;
for(int i = 0; i < 1000000; i++){
n = i;
while(n > 1){
if(n%2 == 0){ //check if even
n /= 2;
}else{ //else: odd
n = 3*n + 1;
}
chainCount++;
}
if(chainCount > maxChainCount){ //check if it's the longest chain so far
maxChainCount = chainCount;
answer = i;
}
chainCount = 1;
}
System.out.println("\n\nLongest chain: i = " + answer);
}
}
This gives me the answer 910107, which is wrong.
HOWEVER, if i change the type of my n variable to double n it runs and gives me the answer 837799, which is right!
This really confuses me, as I can't see what the difference would be at all. I understand that if we use int and do divisions we can end up rounding numbers when we don't intend to. But in this case, we always check to see if the n is divisble by 2, BEFORE dividing by 2. So I thought that it would be totally safe to use integers. What am I not seeing?
This is the code in its entirety, copy, paste and run it if you'd like to see for yourself. It runs in a couple of seconds despite much iteration. =)

Your problem is overflow. If you change int n to long n, you'll get the right answer.
Remember: The numbers in the sequence can be really big. So big they overflow int's range. But not (in this case) double's, or long's.
At one point in the chain, n is 827,370,449 and you follow the 3n + 1 branch. That value wants to be 2,482,111,348, but it overflows the capacity of int (which is 2,147,483,647 in the positive realm) and takes you to -1,812,855,948. And things go south from there. :-)
So your theory that you'd be fine with integer (I should say integral) numbers is correct. But they have to have the capacity for the task.

Improving performance of addition with very big numbers

I wrote this program to calculate very big numbers without using any BigInteger method. I finished it and it's working properly. I used StringBuilder and lots of parseInt call to get it done. Is there a more efficient way to do this?
By the way, this is just worksheet, ignore bad programming style, after finishing my job, I will reorganize that.
private String add (String x, String y)
{
String g = "";
StringBuilder str = new StringBuilder();
int sum;
double atHand = 0;
int dif = (int)(Math.abs(x.length()-y.length()));
if(x.length() >= y.length()) //adding zero for equalise number of digits.
{
for(int i = 0; i<dif; i++)
g += "0";
y = g+y;
}
else
{
for(int i = 0; i<dif; i++)
g += "0";
x = g + x;
}
for (int i = y.length()-1; i >=0 ; i--)
{
sum = Integer.parseInt(x.substring(i, i+1)) +Integer.parseInt(y.substring(i,i+1)) + (int)atHand;
if(sum<10)
{
str.insert(0, Integer.toString(sum));
atHand = 0;
}else
{
if(i==0)
str.insert(0, Integer.toString(sum));
else
{
atHand = sum *0.1;
sum = sum %10;
str.insert(0, Integer.toString(sum));
}
}
}
return str.toString();
}

Instead of doing it character by character, you should take k chars at a time, such that it can fit into a Java int or long. use some predetermined threshold that can hold both the "block", and depending on implementation, any overflow (i.e. such that (threshold * 2) < positive_type_limit). To make things easier, use a threshold that is a power of ten, so you can directly map it to characters in a string-representation of a base 10 number (e.g. if your overflow threshold is one million, then you can take 6 characters at a time) - this also have the added benefit that you can efficiently convert it back to a string.
Then your "blocks" are much bigger. you would then add and do overflow using these blocks and your limit/threshold (which is based on what integer primitive type you use). So basically you are operating on an array of ints.
You will still have time complexity of O(n), but it will be a smaller n (more specifically, it will be O(n/k) where k is the number of decimal digits one block represents.)
I believe that all solutions involve splitting the big number into smaller blocks and operating on them. You have already done this, just your current solution is the special case of blocksize=k=1.
To get the most of the block, you might use a power of 2 as the limit e.g. for a 32 bit unsigned integer type, you would set your threshold to 2^31 (or you could set it to 2^32, but it depends on where and how you are storing the overflow to pass over to the next element).
I would not be surprised if BigInteger uses a similar technique.

Java: random integer with non-uniform distribution

How can I create a random integer n in Java, between 1 and k with a "linear descending distribution", i.e. 1 is most likely, 2 is less likely, 3 less likely, ..., k least likely, and the probabilities descend linearly, like this:
I know that there are dozens of threads on this topic already, and I apologize for making a new one, but I can't seem to be able to create what I need from them. I know that using import java.util.*;, the code
Random r=new Random();
int n=r.nextInt(k)+1;
creates a random integer between 1 and k, distributed uniformly.
GENERALIZATION: Any hints for creating an arbitrarily distributed integer, i.e. f(n)=some function, P(n)=f(n)/(f(1)+...+f(k))), would also be appreciated, for example:
.

This should give you what you need:
public static int getLinnearRandomNumber(int maxSize){
//Get a linearly multiplied random number
int randomMultiplier = maxSize * (maxSize + 1) / 2;
Random r=new Random();
int randomInt = r.nextInt(randomMultiplier);
//Linearly iterate through the possible values to find the correct one
int linearRandomNumber = 0;
for(int i=maxSize; randomInt >= 0; i--){
randomInt -= i;
linearRandomNumber++;
}
return linearRandomNumber;
}
Also, here is a general solution for POSITIVE functions (negative functions don't really make sense) along the range from start index to stopIndex:
public static int getYourPositiveFunctionRandomNumber(int startIndex, int stopIndex) {
//Generate a random number whose value ranges from 0.0 to the sum of the values of yourFunction for all the possible integer return values from startIndex to stopIndex.
double randomMultiplier = 0;
for (int i = startIndex; i <= stopIndex; i++) {
randomMultiplier += yourFunction(i);//yourFunction(startIndex) + yourFunction(startIndex + 1) + .. yourFunction(stopIndex -1) + yourFunction(stopIndex)
}
Random r = new Random();
double randomDouble = r.nextDouble() * randomMultiplier;
//For each possible integer return value, subtract yourFunction value for that possible return value till you get below 0. Once you get below 0, return the current value.
int yourFunctionRandomNumber = startIndex;
randomDouble = randomDouble - yourFunction(yourFunctionRandomNumber);
while (randomDouble >= 0) {
yourFunctionRandomNumber++;
randomDouble = randomDouble - yourFunction(yourFunctionRandomNumber);
}
return yourFunctionRandomNumber;
}
Note: For functions that may return negative values, one method could be to take the absolute value of that function and apply it to the above solution for each yourFunction call.

So we need the following distribution, from least likely to most likely:
*
**
***
****
*****
etc.
Lets try mapping a uniformly distributed integer random variable to that distribution:
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
etc.
This way, if we generate a uniformly distributed random integer from 1 to, say, 15 in this case for K = 5, we just need to figure out which bucket it fits it. The tricky part is how to do this.
Note that the numbers on the right are the triangular numbers! This means that for randomly-generated X from 1 to T_n, we just need to find N such that T_(n-1) < X <= T_n. Fortunately there is a well-defined formula to find the 'triangular root' of a given number, which we can use as the core of our mapping from uniform distribution to bucket:
// Assume k is given, via parameter or otherwise
int k;
// Assume also that r has already been initialized as a valid Random instance
Random r = new Random();
// First, generate a number from 1 to T_k
int triangularK = k * (k + 1) / 2;
int x = r.nextInt(triangularK) + 1;
// Next, figure out which bucket x fits into, bounded by
// triangular numbers by taking the triangular root
// We're dealing strictly with positive integers, so we can
// safely ignore the - part of the +/- in the triangular root equation
double triangularRoot = (Math.sqrt(8 * x + 1) - 1) / 2;
int bucket = (int) Math.ceil(triangularRoot);
// Buckets start at 1 as the least likely; we want k to be the least likely
int n = k - bucket + 1;
n should now have the specified distribution.

Let me try another answer too, inspired by rlibby. This particular distribution is also the distribution of the smaller of two values chosen uniformly and random from the same range.

There are lots of ways to do this, but probably the easiest is just to generate
two random integers, one between 0 and k, call it x, one between 0 and h, call it y. If y > mx + b (m and b chosen appropriately...) then
k-x, else x.
Edit: responding to comments up here so I can have a little more space.
Basically my solution exploits symmetry in your original distribution, where p(x) is a linear function of x. I responded before your edit about generalization, and this solution doesn't work in the general case (because there is no such symmetry in the general case).
I imagined the problem like this:
You have two right triangles, each k x h, with a common hypotenuse. The composite shape is a k x h rectangle.
Generate a random point that falls on each point within the rectangle with equal probability.
Half the time it will fall in one triangle, half the time in the other.
Suppose the point falls in the lower triangle.
The triangle basically describes the P.M.F., and the "height" of the triangle over each x-value describes the probability that the point will have such an x-value. (Remember that we're only dealing with points in the lower triangle.) So by yield the x-value.
Suppose the point falls in the upper triangle.
Invert the coordinates and handle it as above with the lower triangle.
You'll have to take care of the edge cases also (I didn't bother). E.g. I see now that your distribution starts at 1, not 0, so there's an off-by-one in there, but it's easily fixed.

There is no need to simulate this with arrays and such, if your distribution is such that you can compute its cumulative distribution function (cdf). Above you have a probability distribution function (pdf). h is actually determined, since the area under the curve must be 1. For simplicity of math, let me also assume you're picking a number in [0,k).
The pdf here is f(x) = (2/k) * (1 - x/k), if I read you right. The cdf is just integral of the pdf. Here, that's F(x) = (2/k) * (x - x^2 / 2k). (You can repeat this logic for any pdf function if it's integrable.)
Then you need to compute the inverse of the cdf function, F^-1(x) and if I weren't lazy, I'd do it for you.
But the good news is this: once you have F^-1(x), all you do is apply it to a random value distribution uniformly in [0,1] and apply the function to it. java.util.Random can provide that with some care. That's your randomly sampled value from your distribution.

This is called a triangular distribution, although yours is a degenerate case with the mode equal to the minimum value. Wikipedia has equations for how to create one given a uniformly distributed (0,1) variable.

The first solution that comes to mind is to use a blocked-array. Each index would specify a range of values depending on how "probable" you want it to be. In this case, you would use a wider range for 1, less wider for 2, and so on until you reach a small value (lets say 1) for k.
int [] indexBound = new int[k];
int prevBound =0;
for(int i=0;i<k;i++){
indexBound[i] = prevBound+prob(i);
prevBound=indexBound[i];
}
int r = new Random().nextInt(prevBound);
for(int i=0;i<k;i++){
if(r > indexBound[i];
return i;
}
Now the problem is just finding a random number, and then mapping that number to its bucket.
you can do this for any distribution provided you can discretize the width of each interval.
Let me know if i am missing something either in explaining the algorithm or its correctness. Needless to say, this needs to be optimized.

Something like this....
class DiscreteDistribution
{
// cumulative distribution
final private double[] cdf;
final private int k;
public DiscreteDistribution(Function<Integer, Double> pdf, int k)
{
this.k = k;
this.cdf = new double[k];
double S = 0;
for (int i = 0; i < k; ++i)
{
double p = pdf.apply(i+1);
S += p;
this.cdf[i] = S;
}
for (int i = 0; i < k; ++i)
{
this.cdf[i] /= S;
}
}
/**
* transform a cumulative distribution between 0 (inclusive) and 1 (exclusive)
* to an integer between 1 and k.
*/
public int transform(double q)
{
// exercise for the reader:
// binary search on cdf for the lowest index i where q < cdf[i]
// return this number + 1 (to get into a 1-based index.
// If q >= 1, return k.
}
}

The Cumulative Distribution Function is x^2 for a triangular distribution [0,1] with mode (highest weighted probability) of 1, as shown here.
Therefore, all we need to do to transform a uniform distribution (such as Java's Random::nextDouble) into a convenient triangular distribution weighted towards 1 is: simply take the square root Math.sqrt(rand.nextDouble()), which can then multiplied by any desired range.
For your example:
int a = 1; // lower bound, inclusive
int b = k; // upper bound, exclusive
double weightedRand = Math.sqrt(rand.nextDouble()); // use triangular distribution
weightedRand = 1.0 - weightedRand; // invert the distribution (greater density at bottom)
int result = (int) Math.floor((b-a) * weightedRand);
result += a; // offset by lower bound
if(result >= b) result = a; // handle the edge case

The simplest thing to do it to generate a list or array of all the possible values in their weights.
int k = /* possible values */
int[] results = new int[k*(k+1)/2];
for(int i=1,r=0;i<=k;i++)
for(int j=0;j<=k-i;j++)
results[r++] = i;
// k=4 => { 1,1,1,1,2,2,2,3,3,4 }
// to get a value with a given distribution.
int n = results[random.nextInt(results.length)];
This best works for relatively small k values.ie. k < 1000. ;)
For larger numbers you can use a bucket approach
int k =
int[] buckets = new int[k+1];
for(int i=1;i<k;i++)
buckets[i] = buckets[i-1] + k - i + 1;
int r = random.nextInt(buckets[buckets.length-1]);
int n = Arrays.binarySearch(buckets, r);
n = n < 0 ? -n : n + 1;
The cost of the binary search is fairly small but not as efficient as a direct look up (for a small array)
For an arbitary distrubution you can use a double[] for the cumlative distrubution and use a binary search to find the value.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.