Selecting a value proportionally based on its double key - java

I have a list of values, keyed with doubles between 0 and 1 that represent how likely I think it is for a thing to be useful to me. For example, for getting an answer to a question:
0.5 call your mom
0.25 go to the library
0.6 StackOverflow
0.9 just Google it
So, we think that Googling it is (about) twice as likely to be helpful as asking your mom. When I attempt to figure out the next thing to do, I'd like "just Google it" to be returned about twice as often as "call your mom".
I've been searching for solutions with little success. Most of the things that I've found rely on having integer keys (like How to randomly select a key based on its Integer value in a Map with respect to the other values in O(n) time?), which I don't have and which I can't easily generate.
I feel like there should be some Java datatype that can do this for me. Any suggestions?

You can think of a solution based on the java interface NavigableMap, and if you use the TreeMap implementation you will always get a O(logn) complexity.
You can use one of the following:
lowerEntry
ceilingEntry
floorEntry
higherEntry
Now you just need to extract random numbers with the right probability. For that I would refer to this post:
How to generate a random number from specified discrete distribution?

If I understood correctly, what you're looking for is a weighted random.
You should sum all your weights, and maybe normalize this to an integer value, so you will be able to use the rand.nextInt as suggested by comments.
Normalization can be done by multiplying by 100 for example, so your normalized weights are now:
50, 25, 60, 90 - The sum is 225.
You should define ranges:
0 - 49 is for "calling your mum"
50 - 74 - is for "go to library"
Now you need to perform this.rand.nextInt(sum) - and get a value,
and this value should be mapped to one of the defined ranges.

If you keep track of what the total value of the probabilities are, you can do something like this:
double interval = 100;
double counter = 0;
double totalProbabilities = 2.25;
int randInt = new Random().nextInt((int)interval);
for (Element e: list) {
counter += (interval * e.probability() / totalProbabilities);
if (randInt < counter) {
return e.activity();
}
}

Related

First random number after setSeed in Java always similar

To give some context, I have been writing a basic Perlin noise implementation in Java, and when it came to implementing seeding, I had encountered a bug that I couldn't explain.
In order to generate the same random weight vectors each time for the same seed no matter which set of coordinates' noise level is queried and in what order, I generated a new seed (newSeed), based on a combination of the original seed and the coordinates of the weight vector, and used this as the seed for the randomization of the weight vector by running:
rnd.setSeed(newSeed);
weight = new NVector(2);
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
weight.normalize()
Where NVector is a self-made class for vector mathematics.
However, when run, the program generated very bad noise:
After some digging, I found that the first element of each vector was very similar (and so the first nextDouble() call after each setSeed() call) resulting in the first element of every vector in the vector grid being similar.
This can be proved by running:
long seed = Long.valueOf(args[0]);
int loops = Integer.valueOf(args[1]);
double avgFirst = 0.0, avgSecond = 0.0, avgThird = 0.0;
double lastfirst = 0.0, lastSecond = 0.0, lastThird = 0.0;
for(int i = 0; i<loops; i++)
{
ran.setSeed(seed + i);
double first = ran.nextDouble();
double second = ran.nextDouble();
double third = ran.nextDouble();
avgFirst += Math.abs(first - lastfirst);
avgSecond += Math.abs(second - lastSecond);
avgThird += Math.abs(third - lastThird);
lastfirst = first;
lastSecond = second;
lastThird = third;
}
System.out.println("Average first difference.: " + avgFirst/loops);
System.out.println("Average second Difference: " + avgSecond/loops);
System.out.println("Average third Difference.: " + avgSecond/loops);
Which finds the average difference between the first, second and third random numbers generated after a setSeed() method has been called over a range of seeds as specified by the program's arguments; which for me returned these results:
C:\java Test 462454356345 10000
Average first difference.: 7.44638117976783E-4
Average second Difference: 0.34131692827329957
Average third Difference.: 0.34131692827329957
C:\java Test 46245445 10000
Average first difference.: 0.0017196011123287126
Average second Difference: 0.3416750057190849
Average third Difference.: 0.3416750057190849
C:\java Test 1 10000
Average first difference.: 0.0021601598225344998
Average second Difference: 0.3409914232342002
Average third Difference.: 0.3409914232342002
Here you can see that the first average difference is significantly smaller than the rest, and seemingly decreasing with higher seeds.
As such, by adding a simple dummy call to nextDouble() before setting the weight vector, I was able to fix my perlin noise implementation:
rnd.setSeed(newSeed);
rnd.nextDouble();
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
Resulting in:
I would like to know why this bad variation in the first call to nextDouble() (I have not checked other types of randomness) occurs and/or to alert people to this issue.
Of course, it could just be an implementation error on my behalf, which I would be greatful if it were pointed out to me.
The Random class is designed to be a low overhead source of pseudo-random numbers. But the consequence of the "low overhead" implementation is that the number stream has properties that are a long way off perfect ... from a statistical perspective. You have encountered one of the imperfections. Random is documented as being a Linear Congruential generator, and the properties of such generators are well known.
There are a variety of ways of dealing with this. For example, if you are careful you can hide some of the most obvious "poor" characteristics. (But you would be advised to run some statistical tests. You can't see non-randomness in the noise added to your second image, but it could still be there.)
Alternatively, if you want pseudo-random numbers that have guaranteed good statistical properties, then you should be using SecureRandom instead of Random. It has significantly higher overheads, but you can be assured that many "smart people" will have spent a lot of time on the design, testing and analysis of the algorithms.
Finally, it is relatively simple to create a subclass of Random that uses an alternative algorithm for generating the numbers; see link. The problem is that you have to select (or design) and implement an appropriate algorithm.
Calling this an "issue" is debatable. It is a well known and understood property of LCGs, and use of LCGs was a concious engineering choice. People want low overhead PRNGs, but low overhead PRNGs have poor properties. TANSTAAFL.
Certainly, this is not something that Oracle would contemplate changing in Random. Indeed, the reasons for not changing are stated clearly in the javadoc for the Random class.
"In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code."
This is known issue. Similar seed will generate similar few first values. Random wasn't really designed to be used this way. You are supposed to create instance with a good seed and then generate moderately sized sequence of "random" numbers.
Your current solution is ok - as long as it looks good and is fast enough. You can also consider using hashing/mixing functions which were designed to solve your problem (and then, optionally, using the output as seed). For example see: Parametric Random Function For 2D Noise Generation
Move your setSeed out of the loop. Java's PRNG is a linear congruential generator, so seeding it with sequential values is guaranteed to give results that are correlated across iterations of the loop.
ADDENDUM
I dashed that off before running out the door to a meeting, and now have time to illustrate what I was saying above.
I've written a little Ruby script which implements Schrage's portable prime modulus multiplicative linear congruential generator. I instantiate two copies of the LCG, both seeded with a value of 1. However, in each iteration of the output loop I reseed the second one based on the loop index. Here's the code:
# Implementation of a Linear Congruential Generator (LCG)
class LCG
attr_reader :state
M = (1 << 31) - 1 # Modulus = 2**31 - 1, which is prime
# constructor requires setting a seed value to use as initial state
def initialize(seed)
reseed(seed)
end
# users can explicitly reset the seed.
def reseed(seed)
#state = seed.to_i
end
# Schrage's portable prime modulus multiplicative LCG
def value
#state = 16807 * #state % M
# return the generated integer value AND its U(0,1) mapping as an array
[#state, #state.to_f / M]
end
end
if __FILE__ == $0
# create two instances of LCG, both initially seeded with 1
mylcg1 = LCG.new(1)
mylcg2 = LCG.new(1)
puts " default progression manual reseeding"
10.times do |n|
mylcg2.reseed(1 + n) # explicitly reseed 2nd LCG based on loop index
printf "%d %11d %f %11d %f\n", n, *mylcg1.value, *mylcg2.value
end
end
and here's the output it produces:
default progression manual reseeding
0 16807 0.000008 16807 0.000008
1 282475249 0.131538 33614 0.000016
2 1622650073 0.755605 50421 0.000023
3 984943658 0.458650 67228 0.000031
4 1144108930 0.532767 84035 0.000039
5 470211272 0.218959 100842 0.000047
6 101027544 0.047045 117649 0.000055
7 1457850878 0.678865 134456 0.000063
8 1458777923 0.679296 151263 0.000070
9 2007237709 0.934693 168070 0.000078
The columns are iteration number followed by the underlying integer generated by the LCG and the result when scaled to the range (0,1). The left set of columns show the natural progression of the LCG when allowed to proceed on its own, while the right set show what happens when you reseed on each iteration.

algorithm to match number of items value to be as close to price point(JAVA)

I'm not looking for answers, as this is a internship interview question for their coding problem. Rather, i'm looking a clue to head in the right direction.
Basically, the user puts in 2 parameters. Number of items and price point. For example, if the user puts in 3 for items and $150 for price point, the algorithm should find as many combinations as possible that is close to the price point of 150.
I've thought really hard about this problem, and my initial attempt was to just divide the price point by the total number of items. But this answer only gives me a restricted range for each item.
Is this question a P NP type question?
This is a variation of Subset-Sum problem with an additional dimension of number of items. This problem is NP-Complete - so there is no known polynomial solution to it, but there is a pseudo polynomial one, assuming the prices are relatively small integers.
The dynamic programming is has 1 additional dimension from the 'usual' subset sum problem, because there is an additional constraint - the number of elements you want to chose. It is basically based on the following recursive approach:
base:
f(0,i,0) = 1 //a combination with the required number of items and the desired price
f(x,0,k) = 0 (x != 0) //no item is left to chose from and the price was not found
f(x,i,-1) = 0 //used more elements than desired
step:
f(x,i,k) = f(x,i-1,k) + f(x-price[i],i-1,k-1)
^ ^
did not take element i used element i
This approach is basically brute-force, checking all possibilities at each step, but avoiding double calculations for smaller subproblems that were already solved.
The dynamic programming solution to this problem will be solved in O(n*k*W) where n is the number of items in the collection, k is the given number of items you want to select (3 in your example) and W is the desired weight/price.
Edits and clarifications:
If you wish to allow an element to be picked more than once, change the step to:
f(x,i,k) = f(x,i-1,k) + f(x-price[i],i,k-1)
^
giving a chosen element a chance to be re-chosen
If you wish to allow some 'tolerance' (allow combinations that sums to W' such that |W-W'| <= CONSTANT, you can do it by changing the first two stop clauses to the following:
f(x,0,k) = 0 (|x| > CONSTANT)
f(x,i,0) = 1 (|x| <= CONSTANT)
An alternative will be a solution that is O(n^k) which is generating all combinations with k element and examining each of them.
The problem is same as subset-sum problem which can be solved using DP solution similar to knapsack problem but with slight variation as you can have subset which can be greater than price point. This variation can be managed using a smart adjustment in the solution to general subset-sum problem :-
DP solution for subset-sum :-
1. Sum = Price Point.
2. SubSum(Sum,n) = Maximum(SubSum(Sum-Price[n],n),SubSum(Sum,n-1)))
3. SubSum(PricePoint,n) = Maximum Closest Subset Sum to Price Point <= Price Point
The above gives the subset sum which is closest but less or equal to Price Point but there are cases where the subset sum which slightly greater than price point is correct subset sum so you need to evaluate Subset Sum for a value larger than Price Point. The upper bound of that value up to which you need to evaluate is Bound = PricePoint + MinPrice where MinPrice is minimum price of among all items. Than find value the SubSum(x,n) such that it is closest to PricePoint

FastSineTransformer - pad array with zeros to fit length

I'm trying to implement a poisson solver for image blending in Java. After descretization with 5-star method, the real work begins.
To do that i do these three steps with the color values:
using sine transformation on rows and columns
multiply eigenvalues
using inverse sine transformation on rows an columns
This works so far.
To do the sine transformation in Java, i'm using the Apache Commons Math package.
But the FastSineTransformer has two limitations:
first value in the array must be zero (well that's ok, number two is the real problem)
the length of the input must be a power of two
So right now my excerpts are of the length 127, 255 and so on to fit in. (i'm inserting a zero in the beginning, so that 1 and 2 are fulfilled) That's pretty stupid, because i want to choose the size of my excerpt freely.
My Question is:
Is there a way to extend my array e.g. of length 100 to fit the limitations of the Apache FastSineTransformer?
In the FastFourierTransfomer class it is mentioned, that you can pad with zeros to get a power of two. But when i do that, i get wrong results. Perhaps i'm doing it wrong, but i really don't know if there is anything i have to keep in mind, when i'm padding with zeros
As far as I can tell from http://books.google.de/books?id=cOA-vwKIffkC&lpg=PP1&hl=de&pg=PA73#v=onepage&q&f=false and the sources http://grepcode.com/file/repo1.maven.org/maven2/org.apache.commons/commons-math3/3.2/org/apache/commons/math3/transform/FastSineTransformer.java?av=f
The rules are as follows:
According to implementation the dataset size should be a power of 2 - presumable in order for algorithm to guarantee O(n*log(n)) execution time.
According to James S. Walker function must be odd, that is the mentioned assumptions must be fullfiled and implementation trusts with that.
According to implementation for some reason the first and the middle element must be 0:
x'[0] = x[0] = 0,
x'[k] = x[k] if 1 <= k < N,
x'[N] = 0,
x'[k] = -x[2N-k] if N + 1 <= k < 2N.
As for your case when you may have a dataset which is not a power of two I suggest that you can resize and pad the gaps with zeroes with not violating the rules from the above. But I suggest referring to the book first.

How to multiply two big big numbers

You are given a list of n numbers L=<a_1, a_2,...a_n>. Each of them is
either 0 or of the form +/- 2k, 0 <= k <= 30. Describe and implement an
algorithm that returns the largest product of a CONTINUOUS SUBLIST
p=a_i*a_i+1*...*a_j, 1 <= i <= j <= n.
For example, for the input <8 0 -4 -2 0 1> it should return 8 (either 8
or (-4)*(-2)).
You can use any standard programming language and can assume that
the list is given in any standard data structure, e.g. int[],
vector<int>, List<Integer>, etc.
What is the computational complexity of your algorithm?
In my first answer I addressed the OP's problem in "multiplying two big big numbers". As it turns out, this wish is only a small part of a much bigger problem which I'm going to address now:
"I still haven't arrived at the final skeleton of my algorithm I wonder if you could help me with this."
(See the question for the problem description)
All I'm going to do is explain the approach Amnon proposed in little more detail, so all the credit should go to him.
You have to find the largest product of a continuous sublist from a list of integers which are powers of 2. The idea is to:
Compute the product of every continuous sublist.
Return the biggest of all these products.
You can represent a sublist by its start and end index. For start=0 there are n-1 possible values for end, namely 0..n-1. This generates all sublists that start at index 0. In the next iteration, You increment start by 1 and repeat the process (this time, there are n-2 possible values for end). This way You generate all possible sublists.
Now, for each of these sublists, You have to compute the product of its elements - that is come up with a method computeProduct(List wholeList, int startIndex, int endIndex). You can either use the built in BigInteger class (which should be able to handle the input provided by Your assignment) to save You from further trouble or try to implement a more efficient way of multiplication as described by others. (I would start with the simpler approach since it's easier to see if Your algorithm works correctly and first then try to optimize it.)
Now that You're able to iterate over all sublists and compute the product of their elements, determining the sublist with the maximum product should be the easiest part.
If it's still to hard for You to make the connections between two steps, let us know - but please also provide us with a draft of Your code as You work on the problem so that we don't end up incrementally constructing the solution and You copy&pasting it.
edit: Algorithm skeleton
public BigInteger listingSublist(BigInteger[] biArray)
{
int start = 0;
int end = biArray.length-1;
BigInteger maximum;
for (int i = start; i <= end; i++)
{
for (int j = i; j <= end; j++)
{
//insert logic to determine the maximum product.
computeProduct(biArray, i, j);
}
}
return maximum;
}
public BigInteger computeProduct(BigInteger[] wholeList, int startIndex,
int endIndex)
{
//insert logic here to return
//wholeList[startIndex].multiply(wholeList[startIndex+1]).mul...(
// wholeList[endIndex]);
}
Since k <= 30, any integer i = 2k will fit into a Java int. However the product of such two integers might not necessarily fit into a Java int since 2k * 2k = 22*k <= 260 which fill into a Java long. This should answer Your question regarding the "(multiplication of) two numbers...".
In case that You might want to multiply more than two numbers, which is implied by Your assignment saying "...largest product of a CONTINUOUS SUBLIST..." (a sublist's length could be > 2), have a look at Java's BigInteger class.
Actually, the most efficient way of multiplication is doing addition instead. In this special case all you have is numbers that are powers of two, and you can get the product of a sublist by simply adding the expontents together (and counting the negative numbers in your product, and making it a negative number in case of odd negatives).
Of course, to store the result you may need the BigInteger, if you run out of bits. Or depending on how the output should look like, just say (+/-)2^N, where N is the sum of the exponents.
Parsing the input could be a matter of switch-case, since you only have 30 numbers to take care of. Plus the negatives.
That's the boring part. The interesting part is how you get the sublist that produces the largest number. You can take the dumb approach, by checking every single variation, but that would be an O(N^2) algorithm in the worst case (IIRC). Which is really not very good for longer inputs.
What can you do? I'd probably start from the largest non-negative number in the list as a sublist, and grow the sublist to get as many non-negative numbers in each direction as I can. Then, having all the positives in reach, proceed with pairs of negatives on both sides, eg. only grow if you can grow on both sides of the list. If you cannot grow in both directions, try one direction with two (four, six, etc. so even) consecutive negative numbers. If you cannot grow even in this way, stop.
Well, I don't know if this alogrithm even works, but if it (or something similar) does, its an O(N) algorithm, which means great performance. Lets try it out! :-)
Hmmm.. since they're all powers of 2, you can just add the exponent instead of multiplying the numbers (equivalent to taking the logarithm of the product). For example, 2^3 * 2^7 is 2^(7+3)=2^10.
I'll leave handling the sign as an exercise to the reader.
Regarding the sublist problem, there are less than n^2 pairs of (begin,end) indices. You can check them all, or try a dynamic programming solution.
EDIT: I adjusted the algorithm outline to match the actual pseudo code and put the complexity analysis directly into the answer:
Outline of algorithm
Go seqentially over the sequence and store value and first/last index of the product (positive) since the last 0. Do the same for another product (negative) which only consists of the numbers since the first sign change of the sequence. If you hit a negative sequence element swap the two products (positive and negative) along with the associagted starting indices. Whenever the positive product hits a new maximum store it and the associated start and end indices. After going over the whole sequence the result is stored in the maximum variables.
To avoid overflow calculate in binary logarithms and an additional sign.
Pseudo code
maxProduct = 0
maxProductStartIndex = -1
maxProductEndIndex = -1
sequence.push_front( 0 ) // reuses variable intitialization of the case n == 0
for every index of sequence
n = sequence[index]
if n == 0
posProduct = 0
negProduct = 0
posProductStartIndex = index+1
negProductStartIndex = -1
else
if n < 0
swap( posProduct, negProduct )
swap( posProductStartIndex, negProductStartIndex )
if -1 == posProductStartIndex // start second sequence on sign change
posProductStartIndex = index
end if
n = -n;
end if
logN = log2(n) // as indicated all arithmetic is done on the logarithms
posProduct += logN
if -1 < negProductStartIndex // start the second product as soon as the sign changes first
negProduct += logN
end if
if maxProduct < posProduct // update current best solution
maxProduct = posProduct
maxProductStartIndex = posProductStartIndex
maxProductEndIndex = index
end if
end if
end for
// output solution
print "The maximum product is " 2^maxProduct "."
print "It is reached by multiplying the numbers from sequence index "
print maxProductStartIndex " to sequence index " maxProductEndIndex
Complexity
The algorithm uses a single loop over the sequence so its O(n) times the complexity of the loop body. The most complicated operation of the body is log2. Ergo its O(n) times the complexity of log2. The log2 of a number of bounded size is O(1) so the resulting complexity is O(n) aka linear.
I'd like to combine Amnon's observation about multiplying powers of 2 with one of mine concerning sublists.
Lists are terminated hard by 0's. We can break the problem down into finding the biggest product in each sub-list, and then the maximum of that. (Others have mentioned this).
This is my 3rd revision of this writeup. But 3's the charm...
Approach
Given a list of non-0 numbers, (this is what took a lot of thinking) there are 3 sub-cases:
The list contains an even number of negative numbers (possibly 0). This is the trivial case, the optimum result is the product of all numbers, guaranteed to be positive.
The list contains an odd number of negative numbers, so the product of all numbers would be negative. To change the sign, it becomes necessary to sacrifice a subsequence containing a negative number. Two sub-cases:
a. sacrifice numbers from the left up to and including the leftmost negative; or
b. sacrifice numbers from the right up to and including the rightmost negative.
In either case, return the product of the remaining numbers. Having sacrificed exactly one negative number, the result is certain to be positive. Pick the winner of (a) and (b).
Implementation
The input needs to be split into subsequences delimited by 0. The list can be processed in place if a driver method is built to loop through it and pick out the beginnings and ends of non-0 sequences.
Doing the math in longs would only double the possible range. Converting to log2 makes arithmetic with large products easier. It prevents program failure on large sequences of large numbers. It would alternatively be possible to do all math in Bignums, but that would probably perform poorly.
Finally, the end result, still a log2 number, needs to be converted into printable form. Bignum comes in handy there. There's new BigInteger("2").pow(log); which will raise 2 to the power of log.
Complexity
This algorithm works sequentially through the sub-lists, only processing each one once. Within each sub-list, there's the annoying work of converting the input to log2 and the result back, but the effort is linear in the size of the list. In the worst case, the sum of much of the list is computed twice, but that's also linear complexity.
See this code. Here I implement exact factorial of a huge large number. I am just using integer array to make big numbers. Download the code from Planet Source Code.

Scale numbers to be <= 255?

I have cells for whom the numeric value can be anything between 0 and Integer.MAX_VALUE. I would like to color code these cells correspondingly.
If the value = 0, then r = 0. If the value is Integer.MAX_VALUE, then r = 255. But what about the values in between?
I'm thinking I need a function whose limit as x => Integer.MAX_VALUE is 255. What is this function? Or is there a better way to do this?
I could just do (value / (Integer.MAX_VALUE / 255)) but that will cause many low values to be zero. So perhaps I should do it with a log function.
Most of my values will be in the range [0, 10,000]. So I want to highlight the differences there.
The "fairest" linear scaling is actually done like this:
floor(256 * value / (Integer.MAX_VALUE + 1))
Note that this is just pseudocode and assumes floating-point calculations.
If we assume that Integer.MAX_VALUE + 1 is 2^31, and that / will give us integer division, then it simplifies to
value / 8388608
Why other answers are wrong
Some answers (as well as the question itself) suggsted a variation of (255 * value / Integer.MAX_VALUE). Presumably this has to be converted to an integer, either using round() or floor().
If using floor(), the only value that produces 255 is Integer.MAX_VALUE itself. This distribution is uneven.
If using round(), 0 and 255 will each get hit half as many times as 1-254. Also uneven.
Using the scaling method I mention above, no such problem occurs.
Non-linear methods
If you want to use logs, try this:
255 * log(value + 1) / log(Integer.MAX_VALUE + 1)
You could also just take the square root of the value (this wouldn't go all the way to 255, but you could scale it up if you wanted to).
I figured a log fit would be good for this, but looking at the results, I'm not so sure.
However, Wolfram|Alpha is great for experimenting with this sort of thing:
I started with that, and ended up with:
r(x) = floor(((11.5553 * log(14.4266 * (x + 1.0))) - 30.8419) / 0.9687)
Interestingly, it turns out that this gives nearly identical results to Artelius's answer of:
r(x) = floor(255 * log(x + 1) / log(2^31 + 1)
IMHO, you'd be best served with a split function for 0-10000 and 10000-2^31.
For a linear mapping of the range 0-2^32 to 0-255, just take the high-order byte. Here is how that would look using binary & and bit-shifting:
r = value & 0xff000000 >> 24
Using mod 256 will certainly return a value 0-255, but you wont be able to draw any grouping sense from the results - 1, 257, 513, 1025 will all map to the scaled value 1, even though they are far from each other.
If you want to be more discriminating among low values, and merge many more large values together, then a log expression will work:
r = log(value)/log(pow(2,32))*256
EDIT: Yikes, my high school algebra teacher Mrs. Buckenmeyer would faint! log(pow(2,32)) is the same as 32*log(2), and much cheaper to evaluate. And now we can also factor this better, since 256/32 is a nice even 8:
r = 8 * log(value)/log(2)
log(value)/log(2) is actually log-base-2 of value, which log does for us very neatly:
r = 8 * log(value,2)
There, Mrs. Buckenmeyer - your efforts weren't entirely wasted!
In general (since it's not clear to me if this is a Java or Language-Agnostic question) you would divide the value you have by Integer.MAX_VALUE, multiply by 255 and convert to an integer.
This works! r= value /8421504;
8421504 is actually the 'magic' number, which equals MAX_VALUE/255. Thus, MAX_VALUE/8421504 = 255 (and some change, but small enough integer math will get rid of it.
if you want one that doesn't have magic numbers in it, this should work (and of equal performance, since any good compiler will replace it with the actual value:
r= value/ (Integer.MAX_VALUE/255);
The nice part is, this will not require any floating-point values.
The value you're looking for is: r = 255 * (value / Integer.MAX_VALUE). So you'd have to turn this into a double, then cast back to an int.
Note that if you want brighter and brighter, that luminosity is not linear so a straight mapping from value to color will not give a good result.
The Color class has a method to make a brighter color. Have a look at that.
The linear implementation is discussed in most of these answers, and Artelius' answer seems to be the best. But the best formula would depend on what you are trying to achieve and the distribution of your values. Without knowing that it is difficult to give an ideal answer.
But just to illustrate, any of these might be the best for you:
Linear distribution, each mapping onto a range which is 1/266th of the overall range.
Logarithmic distribution (skewed towards low values) which will highlight the differences in the lower magnitudes and diminish differences in the higher magnitudes
Reverse logarithmic distribution (skewed towards high values) which will highlight differences in the higher magnitudes and diminish differences in the lower magnitudes.
Normal distribution of incidence of colours, where each colour appears the same number of times as every other colour.
Again, you need to determine what you are trying to achieve & what the data will be used for. If you have been tasked to build this then I would strongly recommend you get this clarified to ensure that it is as useful as possible - and to avoid having to redevelop it later on.
Ask yourself the question, "What value should map to 128?"
If the answer is about a billion (I doubt that it is) then use linear.
If the answer is in the range of 10-100 thousand, then consider square root or log.
Another answer suggested this (I can't comment or vote yet). I agree.
r = log(value)/log(pow(2,32))*256
Here are a bunch of algorithms for scaling, normalizing, ranking, etc. numbers by using Extension Methods in C#, although you can adapt them to other languages:
http://www.redowlconsulting.com/Blog/post/2011/07/28/StatisticalTricksForLists.aspx
There are explanations and graphics that explain when you might want to use one method or another.
The best answer really depends on the behavior you want.
If you want each cell just to generally have a color different than the neighbor, go with what akf said in the second paragraph and use a modulo (x % 256).
If you want the color to have some bearing on the actual value (like "blue means smaller values" all the way to "red means huge values"), you would have to post something about your expected distribution of values. Since you worry about many low values being zero I might guess that you have lots of them, but that would only be a guess.
In this second scenario, you really want to distribute your likely responses into 256 "percentiles" and assign a color to each one (where an equal number of likely responses fall into each percentile).
If you are complaining that the low numbers are becoming zero, then you might want to normalize the values to 255 rather than the entire range of the values.
The formula would become:
currentValue / (max value of the set)
I could just do (value / (Integer.MAX_VALUE / 255)) but that will cause many low values to be zero.
One approach you could take is to use the modulo operator (r = value%256;). Although this wouldn't ensure that Integer.MAX_VALUE turns out as 255, it would guarantee a number between 0 and 255. It would also allow for low numbers to be distributed across the 0-255 range.
EDIT:
Funnily, as I test this, Integer.MAX_VALUE % 256 does result in 255 (I had originally mistakenly tested against %255, which yielded the wrong results). This seems like a pretty straight forward solution.

Categories

Resources