Random class acting odd?

Random class acting odd? - java

In this code:
Random random = new Random(441287210);
for(int i=0;i<10;i++)
System.out.print(random.nextInt(10)+" ");
}
The output is 1 1 1 1 1 1 1 1 1 1, every time.
Why is this? Isn't Random supposed to be... well... random? I thought that the Random class use System.nanoTime, so the output should be generally random. Can someone please explain?

Let it print a couple more, the first 100 are
1 1 1 1 1 1 1 1 1 1 3 4 7 2 2 6 0 3 0 2 8 4 1 6 0 0 0 2 8 2 9 8 9 2 5 2 1 1 4 5 3 4 1 4 1
8 7 6 6 0 6 5 0 4 5 5 6 0 8 3 8 9 7 4 0 9 9 7 7 9 3 9 6 4 5 0 6 3 7 4 9 8 7 6 2 8 9 8 4 4
8 4 9 0 1 6 9 6 1 5
which looks okay.
Every good (pseudo) random sequence contains streaks of repeated numbers, this one begins with one.

The values generated by Random class are pseudo-random: they are created using a deterministic algorithm, based on seed value. Typically (if you use parameterless constructor, for example) the seed is initialized using current time, which is obviously a unique value. Hence a unique, 'random' sequence is generated.
Here you are using a constant seed value which doesn't change between executions of your code. Therefore you always get the same sequence. It just happens that this sequence is 1 1 1 1 1 1 ... for this particular seed.

There's nothing to say that a sequence of 10 1s in a row is not possible. Whoever gave you the seed value 441287210 just happens to have found such a value that results in starting with 10 1s in a row. If you continue calling nextInt() (i.e. more than 10 times) you will see random values. It should be possible to find other seed values that will result in other "apparently non-random" sequences.

Random is a linear congruential generator; i.e. it is based on a formula of the form:
N <- (N * C1 + C2) % M
where C1, C2 and M are constants.
One of the properties of this class of generator is that has high auto-correlation. Indeed, if you plot successive numbers you can see clear stripping patterns in the numbers.
Your test program has effectively taken 10 successive numbers from the underlying generator, calculated their value modulo 10 ... and found that they are all the same. Effectively, the modulo 10 is "resonating" with the natural periodicity of the generator ... over a short period of time.
This is one of the downsides of using a PRNG with high auto-correlation. In layman's terms ... it is "not very random" ... and you can get into trouble if you use it in a situation where randomness is critical.
Notes:
Random is not a random number generator. It is a pseudo-random number generator. That means that if you know the initial state, the numbers generated are entirely predictable.
Using a true random seed for Random doesn't really help. It just makes the problem harder to reproduce.
There are likely to be other seeds for Random that will give you similar patterns with this particular test.
From a purely mathematical standpoint, ten ones is no more or less "random" than any other sequence of ten numbers. But from a mathematical perspective, Random is not random at all. In fact, it is totally predictable once you have figured out what the current value of N is. The problem is the auto-correlation that is making the sequence appear intuitively non-random.
If you want to avoid this kind of intuitive non-randomness, use SecureRandom which should either be a true random number source, or a generator of pseudo-random numbers that are much, much harder to predict.

If you use for(int i=0;i<100;i++), the sequence outputted is "more random" again. The probability of a random sequence of ten 1s in succession occuring might be small, but it's not impossible. (Insofar that given enough samples, any sequence is almost certain to occur.)
It's merely an interesting coincidence.

Random class uses seed to generate random number when you call nextInt() and is advised to be a long number, when you are creating random object, you are providing an int which is not sufficient enough for randomness.
Try to run the loop for 20 times, you will see randomness or remove seed or provide a very long seed value

Related

What is the quickest way to find all missing numbers in unsorted array [duplicate]

This question already has answers here:
Quickest way to find missing number in an array of numbers
(31 answers)
Closed 6 years ago.
I found this answer: Quickest way to find missing number in an array of numbers, which is great when you have only one number missing.
Further to that question - I have wondered what is the best (and quickest) way of finding all missing numbers and also to sort the unsorted array. (for the example the array is like the one was described in the linked question - the array size is 100, with random numbers from 1-100, but some of them are missing)

Obviously the numbers are in a certain range or it does not make sense to talk about missing numbers. Therefore a counting sort can be applied. Then do a linear scan through the array to find the holes. Alternatively you could scan the counts from the sorting step to find the missing elements.
This all runs in O(n).
Example:
The following array is given 6,1,7,8,3,4,9,1,10,3.
Now calculate how often each number appears in the array by going once through the array and incrementing the count for each encountered number:
number 1 2 3 4 5 6 7 8 9 10
count 2 0 2 1 0 1 1 1 1 1
You immediately see that 2 and 5 did appear 0 times and are therefore missing.
The counts can also be used to come up with a sorted array: we need 2 times 1, two times 3, one time 4, and so on.
1 1 3 3 4 6 7 8 9 10

How to change probability of random numbers

First of all, I did a search on this topic but I could not found anything similar to what I'm trying to accomplished, so this could be a duplicate question.
I would like to have a function that returns a number (1 or 2) with a probability of 0.8% for the number one and 0.2% for the number two.
How can I do it?

Generate a random number between 0 and 1. If the number is between 0 and 0.8, return 1. else, return 2:
return rnd.nextFloat() < 0.8 ? 1 : 2;

You just make intervals, and if your random number fits in the interval, you have a hit. In this most simple example, you make intervals 1-4 and 5. So you fetch random number from 1 to 5. if it's a 1-4 (80% probability) you act as it's 1, and if it's 5 (20% probability) you act as it's 2.

For example, make an array of 100 numbers. 20 numbers with number two, and 80 numbers with number one.
Use the Random class to generate a number between 0 and 100. Then you use that number to get the result from the array.
This is just an example, you can use another values in the array.

Weighed Number Generator

I've been searching for, and found similar topics, but cannot understand them or work out how to apply them.
Simple: all I want is to generate a number between 1 and 100:
Numbers 1 to 30 should have a 60% probability.
Numbers 31 to 60 should have a 35% probability.
Numbers 61 to 100 should have a 5% probability.

Get numbers in your ranges
First generate 3 random numbers in your 3 intervals.
1: 1-30
2: 31-60
3: 61-100
Generate the probability number
Next, generate a number 1-100. If the number is 1-60 choose the first random number from the first step if it is 61-95 do the second option and if it is 96-100 chose the third.

This sounds like a homework problem so I will not provide the code, but here is a description of a simple algorithm:
Generate a random number between 1 and 100. Lets call this X. X will be used to determine how to generate your final result:
If X is between 1 and 60, generate a random number between 1 and 30 to be your final result.
If X is between 61 and 95, generate a random number between 31 and 60 to be your final result.
If X is between 96 and 100, generate a random number between 61 and 100 to be your final result.
You can see that this requires two random number generations for every weighted number that you want. It can actually be simplified into a single random number generation, and that is left as an exercise for you.
FYI, how to generate a random number within a range is found here: How do I generate random integers within a specific range in Java?

Simplest way for me would be to generate four randoms. The first would be a number 1-100. The second would be a number 1-30, the third would be a number 31-60, and the fourth would be a number 61-100. Think of the first random as a percent. If it is 1-60 you then move on to run the second random, if it is 60-95 run the third random, and if it is 95-100 run the fourth random. There are other ways to make it shorter, but in my opinion this is easiest to comprehend.
Create random number 1-100 with this: (int)(Math.random()*100)+1
The rest should just be conditionals.

Generate random numbers correctly

I would like to have 5 random numbers for every object I process. I process many objects (separately) and need to make sure that randomness is achieved across all numbers. If I process 5 objects, I will have 25 random numbers:
RN1 RN2 RN3 RN4 RN5
Object 1 1 2 3 4 5
Object 2 6 7 8 9 10
Object 3 11 12 13 14 15
Object 4 16 17 18 19 20
Object 5 21 22 23 24 25
Questions are:
for single object, does it make a difference if I create random number generator for every single number using current time in milliseconds as seed or when I create one random number generator and get series of numbers using nextDouble in terms of randomness quality?
once I process multiple objects and I take all first random numbers of all objects, will these will these form uniform random distribution (e.g. numbers 1, 6, 11, 16, 21) or this will be somehow broken?
My view is that it would be best to create one random number generator only (shared by all objects) so that whenever new random number is required I can call nextDouble() and get next number in sequence of random numbers.

Have a look at the ThreadLocalRandom class from Java.
It provides uniform distribution and avoids bottleneck as each of your threads will have its own copy.
Regarding them having different sequences, it's all about changing their seed. One common practice in that case is to see the generator with the Thread/Task/Process's identifier.

•for single object, does it make a difference if I create random
number generator for every single number using current time in
milliseconds as seed or when I create one random number generator and
get series of numbers using nextDouble in terms of randomness quality?
Don't use current time as seed for every number. The generation takes less time than the the resolution of current time in milliseconds.

The safest way is probably to preliminary generate the required number of random numbers, save it into an array, and establish the rules of access order. In such way you have full control over the process. There is also no "loss of randomness".
Otherwise, if you launch several generators at once, they most likely will be seeded with the same value (system time by default), and if you use single generator accessed simultaneously by different threads, you need to pass an object of Random class, which is may be good but also can lead to the loss of reproducibility (I'm not sure, if this is crucial in your case).

On integer multiplication, overflow, and information loss

I'm reading through Chapter 3 of Joshua Bloch's Effective Java. In Item 8: Always override hashCode when you override equals, the author uses the following combining step in his hashing function:
result = 37 * result + c;
He then explains why 37 was chosen (emphasis added):
The multiplier 37 was chosen because it is an odd prime. If it was even and
the multiplication overflowed, information would be lost because multiplication
by two is equivalent to shifting. The advantages of using a prime number are less
clear, but it is traditional to use primes for this purpose.
My question is why does it matter that the combining factor (37) is odd? Wouldn't multiplication overflow result in a loss of information regardless of whether the factor was odd or even?

Consider what happens when a positive value is repeatedly multiplied by two in a base-2 representation -- all the set bits eventually march off the end, leaving you with zero.
An even multiplier would result in hash codes with less diversity.
Odd numbers, on the other hand, may result in overflow, but without loss of diversity.

The purpose of a hashCode is to have random bits based on the input (especially the lower bits as these are often used more)
When you multiple by 2 the lowest bit can only be 0, which lacks randomness. If you multiple by an odd number the lowest bit can be odd or even.
A similar question is what do you get here
public static void main(String... args) {
System.out.println(factorial(66));
}
public static long factorial(int n) {
long product = 1;
for (; n > 1; n--)
product *= n;
return product;
}
prints
0
Every second number is an even and every forth a multiple of 4 etc.

The solution lies in Number Theory and the Lowest common denominator of your multiplier and your modulo number.
An example may help. Lets say instead of 32bit you only got 2 bit to represent a number. So you got 4 numbers(classes). 0, 1, 2 and 3
An overflow in the CPU is the same as a modulo operation
Class - x2 - mod 4 - x2 - mod 4
0 0 0 0 0
1 2 2 4 0
2 4 0 0 0
3 6 2 4 0
After 2 operations you only got 1 possible number(class) left. So you have 'lost' information.
Class - x3 - mod 4 - x3 - mod 4 ...
0 0 0 0 0
1 3 3 9 1
2 6 2 6 2
3 9 1 3 3
This can go on forever and you still have all 4 classes. So you dont lose information.
The key is, that the LCD of your muliplier and your modulo class is 1. That holds true for all odd numbers because your modulo number is currently always a power of 2. They dont have to be primes and they dont have to be 37 specifically. But information loss is just one criteria why 37 is picked other criterias are distribution of values etc.

Non-math simple version of why...
Prime numbers are used for hashing to keep diversity.
Perhaps diversity is more important because of Set and Map implementations. These implementations use last bits of object hash numbers to index internal arrays of entries.
For example, in a HashMap with internal table (array) for entries with size 8 it will use last 3 bits of hash numbers to adress table entry.
static int indexFor(int h, int length) {
return h & (length-1);
}
In fact it's not but if Integer object would have
hash = 4 * number;
most of table elements will be empty but some will contain too many entries. This would lead to extra iterations and comparison operations while searching for particular entry.
I guess the main concern of Joshua Bloch was to distribute hash integers as even as possible to optimize performance of collections by distributing objects evenly in Maps and Sets. Prime numbers intuitively are seems to be a good factor of distribution.

Prime numbers aren't strictly necessary to ensure diversity; what's necessary is that the factor be relatively prime to the modulus.
Since the modulus for binary arithmetic is always a power of two, any odd number is relatively prime, and would suffice. If you were to take a modulus other than by overflow, though, a prime number would continue to ensure diversity (assuming you didn't choose the same prime...).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.