strategy to create 4 bytes unique Id in java - java

Our java application has a 4 bytes size restriction to hold unique id.
We are forced to implement a strategy to create unique ids which are 4 bytes in size.
Does any one know a strategy to create it

Yes, start with a random 32 bit integer and increment it.
Anything else will be too demanding as you scale up (e.g. if you have 1 billion already created ids and need to randomly generate a new one, you have to have a 1 billion entry table to check for existence inside... ouch!).
But if it absolutely has to be random and unique, the two strategies you can take are:
1) Have a big HashSet of every id used so far and check for existence in the set whenever you generate a new random ID. If it is, discard and try again.
2) Store all randomly used IDs in the database, and do a SELECT to see if your newly generated random ID exists. If it does, discard and try again.
If the unique ID was larger, you could use a Guid (also known as uuid), which are generated large enough and in such a way that you'll never see two Guids have the same value ever, anywhere, without needing to check.
For Guids/UUIDs in java, see http://docs.oracle.com/javase/7/docs/api/java/util/UUID.html

I think int can meet your demands.

you can try like this
private static byte[] synhead = {(byte)0xAA,0x55,0x7E,0x0B};

Related

Pattern Databases Storing all permutations

I am looking for some advice on storing all possible permutations for the fringe pattern database.
So the fifteen tile problem has 16! possible permutations, however storing the values for fringe so the 0 (blank tile),3,7,11,12,13,14,15 is 16!/(16-8)! = 518,918,400 permutations.
I am looking to store all of these permutations in a datastructure along with the value of the heuristic function (which is just incremented each time a iteration of the breadth first search), so far I am doing so but very slowly and took me 5 minutes to store 60,000 which is time I don't have!
At the moment I have a structure which looks like this.
Value Pos0 Pos3 Pos7 Pos11 Pos12 Pos13 Pos14 Pos15
Where I store the position of the given numbers. I have to use these positions as the ID for when I am calculating the heuristic value I can quickly trawl through to the given composition and retrieve the value.
I am pretty unsure about this. The state of the puzzle is represented by an array example:
int[] goalState = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
My question is what would be the best data structure to store these values? and the best way to retrieve them.
(This question was originally based on storing in a database, but now I want to store them in some form of local data structure - as retrieving from a database slow )
I can't really grasp, what special meaning do 0,3,7,11,12,13,14,15 have in your case. Is their position unchangeable? Is their position enough to identify the whole puzzle state?
Anyway, here is a general approach, you can narrow it down anytime:
As you have 16 possible states at max, I would try to use hexadecimal numbers to represent your permutations. So the state {1,2,3,6,5,4,7,8,9,10,11,12,13,14,15,0} would look like 0x123654789ABCDEF0 = 1312329218393956080. The biggest number possible would be 0xFEDCBA9876543210, which still can be stored in an unsigned long (only since Java 8) or alternatively in BigInteger (there are many examples, I would prefer this). Such number would be unique for each permutation and could be used as primary key and if you have the whole state, retrieving it from the database would be pretty fast.
//saving your permutation
String state = "0xFEDCBA9876543210";
BigInteger permutationForDatabase = new BigInteger(state, 16);
//and then you can insert it into database as a number
//reading your permutation
char searchedCharacter = 'A';//lets say you look for tile 10
BigInteger permutation = ...;//here you read the number from the database
int tilePosition = permutation.toString(16).indexOf(searchedCharacter);
There might be a more elegant/performant solution to get the tile position (maybe some bit operation magic).
Each number 0-15 is a 4-bit number. You must represent 7 such numbers, making a minimum requirement of 28 bits, which is well within the 31 signed bit space of an int. Thus all permutations may be assigned, and derived from, an int.
To calculate this number, given variables a through g:
int key = a | (b << 4) | (c << 8) | (d << 12) | (e << 16) | (f << 20) | (g << 24);
To decode (if you need to):
int a = key & 0xF;
int b = key & 0xF0;
int c = key & 0xF00; // etc
Storing ints in a database is very efficient and will use minimal disk space:
create table heuristics (
key_value int not null,
heuristic varchar(32) not null -- as small as you can, char(n) if all the same length
);
After inserting all the rows, create a covering index for super fast lookup:
create unique index heuristics_covering heuristics(key_value, heuristic);
If you create this index before insertion, insertions will be very, very slow.
Creating the data and inserting it is relatively straightforward coding.
So is my understanding correct that you're calculating a heuristic value for each possible puzzle state, and you want to be able to look it up later based on a given puzzle state? So that you don't have to calculate it on the fly? Presumably because of the time it takes to calculate the heuristic value.
So you're iterating over all the possible puzzle states, calculating the heuristic, and then storing that result. And it's taking a long time to do that. It seems like your assumption is that it's taking a long time to store the value - but what if the time lag you're seeing isn't the time it's taking to store the values in the data store, but rather the time it's taking the generate the heuristic values? That seems far more likely to me.
In that case, if you want to speed up the process of generating and storing the values, I might suggest splitting up the task into sections, and using several threads at once.
The fasted data structure I believe is going to be an in-memory hash table, with the hash key being your puzzle state, and the value being your heuristic value. Others have already suggested reasonable ways of generating puzzle-state hash keys. The same hash table structure could be accessed by each of the threads which are generating and storing heuristic values for sections of the puzzle state domain.
Once you've populated the hash table, you can simply serialize it and store it in a binary file in the filesystem. Then have your heuristic value server load that into memory (and deserialize it into the in-memory hash table) when it starts up.
If my premise is incorrect that it's taking a long time to generate the heuristic values, then it seems like you're doing something grossly sub-optimal when you go to store them. For example reconnecting to a remote database each time you store a value. That could potentially explain the 5 minutes. And if you're reconnecting every time you go to look up a value, that could explain why that is taking too long, too.
Depending on how big your heuristic values are, an in memory hash table might not be practical. A random-access binary file of records (with each record simply containing the heuristic value) could accomplish the same thing, potentially, but you'd need some way of mathematically mapping the hash key domain to the record index domain (which consists of sequential integers). If you're iterating over all the possible puzzle states, it seems like you already have a way of mapping puzzle states to sequential integers; you just have to figure out the math.
Using a local database table with each row simply having a key and a value is not unreasonable. You should definitely be able to insert 518 million rows in the space of a few minutes - you just need to maintain a connection during the data loading process, and build your index after your data load is finished. Once you've built the index on your key, a look up using the (clustered primary key integer) index should be pretty quick as long as you don't have to reconnect for every look up.
Also if you're committing rows into a database, you don't want to commit after each row, you'll want to commit every 1,000 or 10,000 rows. If you're committing after each row is inserted, that will substantially degrade your data loading performance.

Generating unique list of numbers each time in java

I am trying to generate a list of integers, using JAVA, as below:
01,55,45,23,48,05,45,97
I want to build a logic which will always generate a unique list ie. it should not generate another list having same numbers in same sequence.
One way, I thought of, is to dump the generated list in database and compare the lists which are generated then after. Save the list only if same is not already present in DB table. Is there any another way you guys can think of?
<>
I will describe my question through an Use case:
1. Code generated a list of random numbers. eg. 02,34,45,67,90
2. Second time when code generate list of random numbers, I need to check whether the list generated is : 02,34,45,67,90 ie. the one generated in step 1 or not.
Storing random numbers in a database for comparison sounds to me like a bad idea.. Instead, try to start with something like :
Random random = new Random(System.nanoTime());
random.nextInt();

Unique alphanumeric String with a fixed length

How can generate an unique alphanumeric String with a fixed length of 8 characters. I want base it in an Id + current time.
I tried with MD5 but it make a string too long
Thanks!
The problem is that 8 alphanumeric characters is most likely too few to guarantee uniqueness ... using that approach.
You just need to do some arithmetic. Multiply the number of ids that your application could generate per second by the expected number of seconds that your application is expected to "live". Now figure out how many alphanumeric characters you need to encode that number ... and that gives you how large the "timestamp" part of your id would need to be. Then add the characters for the "id" part of your string.
IMO, the best approach (if you have to use short strings) is to generate partially or fully random strings, and then check them against a (big) table of all previously issued id strings. If you get a collision, generate another string, and repeat.
If you also want your ids to be hard to predict (per your comment), then the "random number" approach is best. Make sure that you use a cryptographic-quality RNG or PRNG. The problem with a timestamp-based approach is that the resulting ids will be much easier to predict ... or guess.
Use java.util.UUID.
UUID uuid = UUID.randomUUID();
String id = uuid.toString().substring(0, 8);
Strings can't be unique: uniqueness refers to an item in the context of a collection without duplicates, called a set. Given a set of symbols (you said alphanumeric in you question) and a string length (in your example 8) there's a known number of possible combinations which may or may not be enough for your needs.
Your requirements can't be satisfied (at least, not with the information you provided). If you really want the token to be unique and the given input (id, timestamp) is guaranteed to be the key (ie for each given ID you'll never have two or more identical timestamps), just put the ID and the timestamp side by side.
The size of the ID columns will be the maximum size for the username + the fixed size for the timestamp.

Generating unique reference number

Tech Stack: Java 1.6, JPA (Hibernate 3), Spring 3, Oracle 11g
I am working on a project where one of the requirement is to give back the customers a ‘ReferenceNumber’.
One option is to return the row ID, but for that to work, it must not be sequential. Otherwise, you can guess the next number etc.
I can generate a number in Java and store it in a separate column, but then I’ll have make sure there are no collisions.
There are ways to generate such number in database, but not sure if it will guarantee uniqueness.
Is there a best practice for such a requirement from the database point of view?
UPDATE 1
Current I am using the following in Java to generrate the number:
private static SecureRandom random = new SecureRandom();
public static BigInteger getNew() {
return new BigInteger(60, random);
}
public static BigInteger getNew(int numBits) {
return new BigInteger(numBits, random);
}
UPDATE 2: Requirement
Allowing sequential number would allow:
customer to guess the next number.
Find out how many numbers (orders) were there between two number. etc
It is preferable for this reference to be a number, but say a three letter prefix follwed by number is also fine.
If your table has a sequence generated primary key (e.g. the customer_id) then you could reverse the digits and then convert that to an octal representation. Thus it still looks like a decimal number, but it is definitely no longer consecutive and hard to guess any ranges.
The process is even reversable if you can find a way how to deal with trailing zeros in the original value (because they'd become a leading zero in the reversed number and thus will be "dropped" during the conversion).
How about prefixing the number with a customer abbreviation or name abbreviation or something (or 3 letters assigned when the customer is created, checked for uniqueness) and then just have a value stored that you increment sequentially for just that customer? That way they can't tell what the order numbers are in the rest of the system, but they can for themselves, which shouldn't really matter as they know how many orders they have placed anyway.
Why not take a SHA1 or MD5 Hash of a couple of pertinent fields (say the user's name and the time of record creation, etc.)? In many respects, if the strategy is well known, your services external to the Database will be able to recreate the reference number without having to query the database.

storing sets of integers to check if a certain set has already been mentioned

I've come across an interesting problem which I would love to get some input on.
I have a program that generates a set of numbers (based on some predefined conditions). Each set contains up to 6 numbers that do not have to be unique with integers that ranges from 1 to 100).
I would like to somehow store every set that is created so that I can quickly check if a certain set with the exact same numbers (order doesn't matter) has previously been generated.
Speed is a priority in this case as there might be up to 100k sets stored before the program stops (maybe more, but most the time probably less)! Would anyone have any recommendations as to what data structures I should use and how I should approach this problem?
What I have currently is this:
Sort each set before storing it into a HashSet of Strings. The string is simply each number in the sorted set with some separator.
For example, the set {4, 23, 67, 67, 71} would get encoded as the string "4-23-67-67-71" and stored into the HashSet. Then for every new set generated, sort it, encode it and check if it exists in the HashSet.
Thanks!
if you break it into pieces it seems to me that
creating a set (generate 6 numbers, sort, stringify) runs in O(1)
checking if this string exists in the hashset is O(1)
inserting into the hashset is O(1)
you do this n times, which gives you O(n).
this is already optimal as you have to touch every element once anyways :)
you might run into problems depending on the range of your random numbers.
e.g. assume you generate only numbers between one and one, then there's obviously only one possible outcome ("1-1-1-1-1-1") and you'll have only collisions from there on. however, as long as the number of possible sequences is much larger than the number of elements you generate i don't see a problem.
one tip: if you know the number of generated elements beforehand it would be wise to initialize the hashset with the correct number of elements (i.e. new HashSet<String>( 100000 ) );
p.s. now with other answers popping up i'd like to note that while there may be room for improvement on a microscopic level (i.e. using language specific tricks), your overal approach can't be improved.
Create a class SetOfIntegers
Implement a hashCode() method that will generate reasonably unique hash values
Use HashMap to store your elements like put(hashValue,instance)
Use containsKey(hashValue) to check if the same hashValue already present
This way you will avoid sorting and conversion/formatting of your sets.
Just use a java.util.BitSet for each set, adding integers to the set with the set(int bitIndex) method, you don't have to sort anything, and check a HashMap for already existing BitSet before adding a new BitSet to it, it will be really very fast. Don't use sorting of value and toString for that purpose ever if speed is important.

Categories

Resources