Making the range of perlin noise smaller

Making the range of perlin noise smaller - java

I have a perlin noise function, and I want to use it to pick biomes for a map for my game. The problem is that biomes are determined by two factors - average precipitation and average temperature. So, I thought, I'd just make two perlin noise functions and overlap them.
The issue now is that biomes do not encompass all possible precipitation temperature combinations. For example, there is no biome with high precipitation and low temperature, as shown in this picture.
(source: wikimedia.org)
How can I still use perlin noise but never reach the areas that aren't covered by biomes?

You can clamp the values into the allowed range (eg. maximum allowed precipitation in an area with temperature 0 °C is 100 cm).
You can do this during the noise algorithm itself, not only after the entire value field is finished. I would imagine it would work like this:
First generate the temperature map.
When you are generating each random value for the rainfall noise, generate a value in range scaled down appropriately to the range of values allowed by the temperature map.
Example:
If you would generate random value in range 0 - 250 mm (about 50% of maximum possible rainfall, for one of the low frequency noise layers), look at the temperature in that pixel, see it's 10 °C, so the range of the random value will be scaled down to 0-100mm (50% of the 0-200 mm allowed by that temperature).
Therefore, even if you roll maximum random value for each of the layers, the composite value will be restricted by the maximum dictated by the temperature.
I don't know how realistic this is and how important realism is for you. What precisely prevents low temperature areas from having great rainfall? The solution I proposed might actually simulate factors like reduced evaporation in low temperature areas quite well.
EDIT:
One more idea, which might end up being equivalent to my first solution:
Generate both the temperature map and rainfall map independently.
Multiply the rainfall map by the temperature map (scaled to range <0, 1>). This will reduce rainfall in areas with low temperature.

Related

How do we get the probability of bucket being empty or not is 0.5 in hash map

Java official document says "Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75". I want to know How do we get bucket being empty or not is 0.5. How to prove it mathematically.

Here's a discussion of where the 0.5 probably came from.
The context is in the Java HashMap implementation, regarding the point at which a bucket should be converted ("treeified") from a linked list to a red-black tree. This should occur extremely rarely under typical operation. What's a good value to choose that gives "extremely rarely"? An implementation comment from the HashMap implementation bases its justification on a Poisson distribution:
Ideally, under random hashCodes, the frequency of
nodes in bins follows a Poisson distribution
(http://en.wikipedia.org/wiki/Poisson_distribution) with a
parameter of about 0.5 on average for the default resizing
threshold of 0.75, although with a large variance because of
resizing granularity. Ignoring variance, the expected
occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
factorial(k)).
https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/HashMap.java#L181
It concludes that a bucket with 8 or more collisions occurs with a probability of less than one in ten million, so 8 was chosen as the "treeify threshold".
Where did the parameter of 0.5 come from? In this case, the parameter means the average load (or fullness) of a HashMap, that is, the number of mappings divided by the "capacity" (table length, or number of buckets). The question can thus be restated as: What is the average load of a HashMap? This can't be derived mathematically, because it depends on the way a program uses a HashMap, the program's input, and so forth. But we can make some reasonable assumptions and arrive at some conclusions which probably apply to typical cases.
First, assume most HashMaps use a load factor of 0.75. Anecdotally I believe this to be true, as I've rarely seen cases where code uses an explicit load factor.
Second, assume that the number of mappings in a HashMap is uniformly distributed. I'm almost positive this is not true, but let's start with this as a working assumption and revisit it later.
Third, let's set aside cases where a HashMap contains over a billion or so elements, where things like Java's array size limitations come into play.
Fourth, for simplicity, let's consider only HashMaps that are created without a preallocated capacity and are populated with some unknown number of mappings.
Recall the definition of load factor: if a HashMap's load exceeds the load factor, the HashMap will be resized larger to keep the load below the load factor.
Clearly, a HashMap in normal operation can't have a load of more than 0.75. A HashMap also won't have a load of less than 0.375, because it won't be resized until its load reaches 0.75, and resizing halves the load. So maybe the average load is midway between 0.375 and 0.75, which is 0.5625.
One could do some math to figure this out, but it was easier for me to write a program to populate HashMaps of sizes between 1 and N and compute the average load over all those cases. If N is about 0.75 of a power of two (say, 768) then the average load is indeed quite close to 0.5625. However, this varies a lot depending on one's choice of N. If a chosen N is midway between 0.75 of successive powers of two (say, 576) then the average load is only 0.507. So the average load of HashMaps with up to N mappings varies between about 0.507 and 0.5625, depending on one's choice of N. (This is the "resizing granularity" variance mentioned in the comment.)
Recall that we assumed that HashMap sizes are uniformly distributed. It's probably not true, but what is the actual distribution? I don't know, but I'd guess that there is exponential falloff as the sizes get larger. If so, this would skew the distribution of sizes towards smaller numbers, reducing the average load to be closer to 0.507 than 0.5625. I'd also guess that a lot of people create HashMaps with the default capacity (16) and populate them with six or fewer mappings. That would pull the average down farther.
In summary, our model is telling us that the average load is somewhere between 0.507 and 0.5625. Some educated guesses tell us that sizes are skewed smaller, so the average load is also smaller. The choice of 0.5 therefore seems reasonable.
But we really don't need a precise answer for the analysis of when treeify HashMap buckets. The point of the analysis is to find a threshold such that conversion is extremely unlikely to occur during normal operation. If 0.4 or 0.7 were used instead of 0.5, the Poisson analysis would still indicate that the chance of having 8 collisions in a single bucket is still less than one in ten million.

How to calculate percentage format prediction confidence of face recognition using opencv?

I am doing a two-faces comparison work using OpenCV FaceRecognizer of LBP type. My question is how to calculate the percentage format prediction confidence? Giving the following code(javacv):
int n[] = new int[1];
double p[] = new double[1];
personRecognizer.predict(mat, n, p);
int confidence = p[0];
but the confidence is a double value, how should I convert it into a percentage % value of probability?
Is there an existing formula?
Sorry if I didn't state my question in a clear way. Ok, here is the scenario:
I want to compare two face images and get out the likeliness of the two face, for example input John's pic and his classmate Tom's pic, and let's say the likeliness is 30%; and then input John's pic and his brother Jack's pic, comes the likeliness is 80%.
These two likeliness factor shows that Jack is more like his brother John than Tom does... so the likeliness factor in percentage format is what i want, more the value means more likeliness of the two input face.
Currently I did this by computing the confidence value of the input using opencv function FaceRecognizer.predict, but the confidence value actually stands for the distance between the inputs in their feature vectors space, so how can I scale the distance(confidence) into the likeliness percentage format?

You are digging too deep by your question. Well, according to the OpenCV documentation:
predict()
Predicts a label and associated confidence (e.g. distance) for a given
input image
I am not sure what are you looking for here but the question is not really easy to be answered. Intra-person face variants (variation of the same person) are vast and inter-person face variation (faces from different persons) can be more compact (e.g. when both face front while the intra-person second facial image is profile) so this is a whole topic that expect an answer.
Probably you should have a ground truth (i.e. some faces with labels already known) and deduct form this set the percentage you want associating the distances with the labels. Though this is also often inaccurate as distance would not coincide with your perception of similarity (as mentioned before inter-person faces can vary a lot).
Edit:
First of all, there is no universal human perception of face similarity. On the other half, most people would recognize a face that belongs to the same person in various poses and postures. Most word here is important. As you pressure the limits the human perception will start to diverge, e.g. when asked to recognize a face over the years and the time span becomes quite large (child vs adolescence vs old person).
You are asking to compute the similarity of noses/eyes etc? If so, I think the best way is to find a set of noses/eyes belonging to the same persons and train over this and then check your performance on a different set from different persons.
The usual approach as I know is to train and test using pairs of images comprising positive and negative samples. A positive sample is a pair of images belonging to the same person while a negative one is an image pair belong to two different ones.
I am not sure what you are asking exactly so maybe you can check out this link.
Hope it helped.
Edit 2:
Well, since you want to convert the distance that you are getting to a similarity expressed as percentage you can somehow invert the distance to get the similarity. There are some problems arising here though:
There is a value for absolute match, that is dis = 0; or equivalently similarity is sim = 100% but there is no value explicit for total mismatch: dis = infinite so sim = 0%. On the other hand the inverse progress has explicit boundaries 0% - 100%.
Since extreme values include 0 and infinite there must be a smarter conversion than simple inversion.
You can easily assign 1.0 (or 100% to similarity) corresponding to the absolute match but what you are going to take as total mismatch is not clear. You can consider an arbitrary high value as 0.0 (since you there is no big difference e.g. in using distance 10000 to 11000 I guess) and all values higher than this (distance values that is) to be considered 0.0.
To find which value that should be I would suggest to compare two quite distinct images and use the distance between them as 0.0.
Let's suppose that this value is disMax = 250.0; and simMax = 100.0;
then a simple approach could be:
double sim = simMax - simMax/disMax*dis;
which gives a 100.0 similarity for 0 distance and 0.0 for 250 distance. Values larger than 250 would give negative similarity values which should be considered 0.0.

What are some ways to store and recover numbers in this situation?

So i'm going to be running a simulator that plays craps.
My assignment requires me to run the sim 10,000,000 times.
None of that is an issue; I have the sim made and I know how to run in and I know how to create the required variables.
What I'm unsure of, is how I should go about storing the results of each game?
What I need to find in the end is:
Average # of Rolls Per Game
Max # of Rolls in a game
number of games that needed more than 30 rolls
number of wins
number of losses
probability of a win
longest sequence of wins and longest sequence of losses
All easy enough, I'm just not sure how to store 10,000,000 numbers and then access them easily.
For example the first:
Average number of rolls
should I create an arraylist that has 10,000,000 items in it? add one item at the end of each game and then add them all up and divide by 10,000,000?
I realize this should work, I'm just wondering if there is another way, or perhaps a better (more efficient) way.
New part to this question:
Can I return more than one value from a method? Currently the simulation runs 10,000,000 times and returns a win or loss from each time. But I also need it to return the number of rolls from each game... Otherwise I can't figure out the values for avg rolls and highest number of rolls and number of games over 30 rolls.
Any ideas here?

You don't need to maintain array for any of the statistics you want.
For average number of rolls per game, just keep a variable, say cumulativeNumberOfRolls; after every game, just output the number of rolls in that game and add it to this variable. When all simulations are done, just divide this value by total number of simulations (10,000,000).
For max. number of rolls, again keep a single variable, say maxRolls; after every game, output the number of rolls in that game and compare that with this variable. If the number of rolls in this game is greater, then just update maxRolls with the new value. Try the same approach - of having a single variable and updating it after every game - to get the value for games that required more than 30 rolls, number of wins and number of losses. If you face problems, we can discuss them in comments.
For longest sequence of wins and losses, you would need to maintain a bunch of variables:
longest win sequence overall
longest loss sequence overall
current sequence count
current sequence type (indicates if current sequence is a win sequence or loss sequence)
Here's the overview of the approach.
After every game, compare the result of the game with the current sequence type. If they are same, for instance result of current game is win and the current sequence type is also a win, then just update the current sequence count and move on to the next game. If they are different, you need to consider two scenarios and do slightly different things for them. I'll explain for one - the result of current game is loss and the current sequence type is win. In this scenario, compare current sequence count with longest win sequence overall and if it (current sequence count) is greater then just update the longest win sequence overall. After this, change the current sequence type to loss and set the current sequence count to 1.
Extend the above approach for the second scenario - the result of the current game is win and the current sequence type is loss. If you have clarifications, feel free to post back in comments.

You could just calculate the statistics as you go without storing them. For instance, if you hava an "average" field in your class, then after each simulation average = ((number of rolls this game) + (total rolls so far)) / (number of games so far). The same could be done for the other statistics.

Well, you've got a fixed number of runs, so you might as well use an array rather than an arraylist (faster). It seems to me that you actually only need two total arrays: one listing the outcome of each game (maybe true/false for win/lose), and one with the number of rolls in that game. You fill these up as you run the simulations; then you get to do a bunch of simple math involving one array or the other to get your stats. That seems like the best way to go about it to me; I don't think you're going to get much more efficient without a lot of undue effort.

What does Lucene's ScoreDoc.score mean?

I am performing a boolean query with multiple terms. I only want to process results with a score above a particular threshold. My problem is, I don't understand how this value is calculated. I understand that high numbers mean its a good match, and low numbers mean its a bad match, but there doesn't seem to be any upper bounds?
Is it possible to normalize the scores over the range [0,1]?

Here is a page describing how scores are calculated in Lucene:
http://lucene.apache.org/java/3_0_0/scoring.html
The short answer is that the absolute values of each document's score doesn't really mean anything outside the context of a given search result set. In other words, there isn't really a good way of translating the scores to a human definition of relevance, even if you do normalize the scores.
That being said you can easily normalize the scores by dividing each hit's score by the maximum score. So if the first hit's score is 2.5, then divide every hit's score by 2.5, and you'll get a number in between 0 and 1.

calculate average based on 1 number only

How to calculate percentage ( or average) when You have dividend but not deviser?

You have a lot of values, and some of them figure into your average - or percentage - and some of them probably don't. You are not expressing the problem clearly enough for anyone to be able to give you a meaningful answer.
A percentage represents a fraction, one value divided by another (multiplied by 100 to express it in percentage, but that's trivial and not part of the problem). What is the value that represents 100%? And what value are you trying to assign? In what way do you think that the quantity of bonuses should affect the percentage?
Some possible answers:
The total bonus earned by an individual, as compared to her nominal salary. If she earns $50k and her bonus is $20K, that is 20/50 *100 = 40%.
The total bonus earned by an individual, as compared to all the bonuses given out that year. If she received the same $20K, but the company gave out $100K in bonuses, then the percentage is 20/100 * 100 = 20%.
The most recent bonus earned by an individual, as compared to all bonuses awarded to her this year. If she got $5K for her last bonus, and the total was $20, that's 5/20 * 100 = 25%.
We really don't have enough information to go on; it could be any of these, or something entirely different. It is entirely possible to have a percentage value greater than 100%.

The average of one value is that value (Total number=1).
But this probably means I don't understand your question.

Without knowing the number of years, you need to know something else about the range of bonuses possible. i.e. does it have to be a whole number between 15 - 25%. However, this is largely guessing.
To get an average, you need a total and a count. BTW: In your case you want the geometric average, but you need to know the same things.

If your input is a list of numbers, showing percentage values means you need to compute the total and then see how much of the total each of them is:
For instance, if you have 110, 110, 110, you'll have a total of 330 and each of the values will be shown as 110/330 = 0.33 = 33% of the total.

In addition, if I have three decimal
values 120, 4420, and 230. How can I
get a number less than 1 that
represent the average of these 3
values?
You cannot. The average of those 3 numbers will be (120 + 4420 + 230) / 3. That will never be less than one. Maybe you are confused about what average means?

You need to be more specific or give an example. But I will give an answer based off of what I THINK you mean.
You cannot find the average of one lone number. If you were saying a temperature of 125 degrees every hour you could do it, The answer would obviously be 125. It is the closest thing that I can think of to what you are asking. You need to be more specific or the problem cannot be done. Otherwise use the simple formula: Sum/Number of integers. Also known as the mean. So that would be 125/1, which is still 125.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.