Random Number Generation within range with different distribution in Java - java

I want to generate Random number in different range. For example range 10^14 in Java with different distribution like log, normal, binomial etc. Is there any particular library for the same. I found discussion on colt and math uncommon library. But is it safe enough to generate values as int and then multiply by the corresponding range suffix. What is best practice for the same.

Apache Commons Math has a RandomDataImpl class that does nextBinomial, nextExponential and some other types (above my head unfortunately).
Hopefully that gets you everything you need. You might need to check some of the other classes in the library.

Related

Can BigIntegers be used in java to represent bitboards?

I recently started working on my school project which is writing a chinese chess game with a computer player in Java, I want to represent the board with bitboards, however since the board is 9x10, bigint or double aren't large enough to represent it. I though about using the BigInteger class from java.math, however I'm afraid it isn't efficient and therefore I will run into problems whnen writing the code for the computer player.... Does anyone know how efficient the BigInteger class is? Will I run into problems with it when trying to calculate the best computer moves?
Thanks.
Either the Java SE BitSet or BigInteger classes could be used to represent a bitboard. And I nooticed that there are alternatives to the standard Java SE implementations1.
But the real question is whether you could come up with an alternative implementation of the bitboard abstraction that is more efficient than those general purpose data structures.
For example, if your bitboard requires 80 bits, then you could represent it as a long array of length 2 or an int array of length 3. This should be at least as fast as the better of BitSet or BigInteger, because those Java SEclasses both use arrays of integers under the hood.
1 - A Google search is advised ...
My advice: pick whatever representation is easiest to use. Get interesting part of your game implementation working first. Then test it to see how fast it is. If it is not fast enough ... put some effort into profiling and optimizing it; e.g. by tuning the bitboard implementation. Don't optimize too early.

Using int flags in lieu of booleans

So, for example, Notification has the following flag:
public static final int FLAG_AUTO_CANCEL = 0x00000010;
This is hexadecimal for the number 16. There are other flags with values:
0x00000020
0x00000040
0x00000080
Each time, it goes up by a power of 2. Converting this to binary, we get:
00010000
00100000
01000000
10000000
Hence, we can use a bitwise operators to determine which of the flags are present, etc, since each flag contains only one 1 and they are all in different locations.
Question:
This all makes perfect sense, but why not just use booleans? Is this merely stylistic, or are there memory or efficiency benefits?
EDIT:
I understand that by combining them, we can store a lot of information in a single int. Is this used solely so we can pass a lot of boolean type values in a single int instead of having to pass a ton of parameters? I don't mean to trivialize that, it's very convenient, but are there any other benefits?
What you're talking about is called a Bit Field. One advantage is that all the information can be contained in a single variable (with no overhead like that of an ArrayList). This is useful for keeping function signatures tidy, and will have some minor benefits with efficiency because of fewer stack operations, but probably this will be offset by additional bitshift operations. Additionally, you can use (for example) one byte to store 8 fields rather than wasting 7 additional bytes. You can also, if you're clever with it, perform several flag checks in a single operation.
Having said that, personal preference may see the list of booleans as cleaner or preferable. Bitfields are most common in embedded systems where space is limited or something of that nature.
In reference to your edit: it's storing the values of the flags in ints, but those are just reference constants-- you aren't editing those, you're sticking those bits into (or out of) the flags field, which is a single int. I don't really know why they chose a bitfield for this application; perhaps someone that grew up programming space-limited microcontrollers coded that specific class. The general consensus seems to be that bitfields shouldn't be included in new code.
This is a common idiom in C, where resource constraints are a much larger concern, and you usually see it in Java where the Java API is directly mapping an underlying well-known C API. However, it's not a great idea in Java for a wide number of reasons.
As of Java 5, most of the uses for one-bit bit fields are taken care of very nicely by EnumSet, which is internally implemented using a bit field (so it's extremely fast) but is type-safe, easy to read, and Iterable.

What java library are there provides the the facility to generate unique random string combination from a given set of characters?

What java library are there provides the the facility to generate unique random string combination from a given set of characters?
Say I have these set of characters: [a-zA-Z0-9]
And I need to generate 4-character string from this set that is less likely to collide.
Apache Commons Lang has a RandomStringUtils class with a method that takes a sequence of characters and a count, and does what you ask. It makes no guarantee of collision avoidance, though, and with only 4 characters, you're going to struggle to achieve that.
And I need to generate 4-character string from this set that is less likely to collide.
Less likely than what? There are 62^4 = 14.8 million such strings. Due to the birthday paradox, you get about a 50% chance of a collision if you randomly generate 3800 of them. If that's not acceptable, no library will help you, you need to use a longer string or establish uniqueness explicitly (e.g. via incrementing an integer and formatting it in base 62).
if you'd be ok with a longer hash, you'd certainly be able to find some md5 libraries. It's most common for this kind of task. A lot of web sites use it to generate password hashes.

Algorithm / Library for measuring degree of equality of strings

Is there an algorithm that given two strings yields the degree of equality between them, applying metrics that can be provided externally? For example, the two strings "Plant code" and "PlantCode" could be 0.8 equal, "Plant code" and "Plant" could be 0.6 equal, "Truck no" and "shipment details" could be 0.6 equal (using extrenally provided synonyms dictionary). The numbers are made up, but I hope they get the point across. Does there exist such an algorithm? I'd prefer if it comes as a library, rather than having to implement it on my own. Any help would be greatly appreciated. Thanks.
Try the Simmetrics library. It provides a whole number of simmilarity metrics.
Maybe the google-diff-match-patch library can help: This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff.
There's also Levenshtein distance algorithm and its example java implementation. It does not make it possible to provide an external metrics, though.

Best way to test CRC logic?

How can I verify two CRC implementations will generate the same checksums?
I'm looking for an exhaustive implementation evaluating methodology specific to CRC.
You can separate the problem into edge cases and random samples.
Edge cases. There are two variables to the CRC input, number of bytes, and value of each byte. So create arrays of 0, 1, and MAX_BYTES, with values ranging from 0 to MAX_BYTE_VALUE. The edge case suite will be something you'll most likely want to keep within a JUnit suite.
Random samples. Using the ranges above, run CRC on randomly generated arrays of bytes in a loop. The longer you let the loop run, the more you exhaust the inputs. If you are low on computing power, consider deploying the test to EC2.
Create several unit tests with the same input that will compare the output of both implementations against each other.
One nice property of CRCs is that for a given set of parameters (polynomial, reflection, initial state, etc.) you will get a constant value when you recompute the CRC over the original dataset + the original CRC. These constants are documented for common CRCs but you can just blindly generate them using two different random data sets and check that they are the same:
implementation 1: crc(rand_data_1 + crc(rand_data_1)) -> constant_1
implementation 2: crc(rand_data_2 + crc(rand_data_2)) -> constant_2
assert constant_1 == constant_2
You can use the same method within an implementation to get a warm fuzzy feeling about its correctness. If your implementation works with arbitrary polynomials, you can have the unittest exhaustively check every possible polynomial using this method without needing to know what the constants are.
This technique is powerful but it would also be wise to add an independent test that verifies the result based on known input for the pathological case where your CRC implementations both produce bad results that happen to get by the constant equivalence check.
First, if it is a standard CRC implementation, you should be able to find known values somewhere on the net.
Second, you could generate some number of payloads and run the each CRC on the payloads and check that the CRC values match.
By writing a unit test for each which takes the same input and verify against the expected output.

Categories

Resources