UUID generated using two programming languages? - java

Will the UUID generated using two programming languages say 'Ruby' and 'Java' be unique?

Yep. UUID is a standard format - the formatting will be the same in any language that accurately implements a UUID function. As for the IDs themselves being unique? You have a much better chance of being struck by lightning while being attacked by a shark, after your second plane crash of the day, after winning the lottery a few times in a row, than generating two identical UUIDs. :)

Yes a UUID (universally unique identifier) will be unique.
From wikipedia:
only after generating 1 billion UUIDs every second for the next 100
years, the probability of creating just one duplicate would be about
50%. Or, to put it another way, the probability of one duplicate would
be about 50% if every person on earth owns 600 million UUIDs.

Related

Rules on arbitrary combinations of fact attributes

I am a complete Drools noob. I have been tasked with implementing a set rule which, in the absence of nested rules, seems very complex to me. The problem is as follows:
I have a fact called Person with attributes age, gender, income, height, weight and a few others. A person may be classified as level_1, level_2, ..., level_n based on the values of the attributes. For example,
when age < a and any value for other attributes then
classification = level_1.
when gender == female and any
value for other attributes then classification = level_2.
when age < a and gender == female and any value for other
attributes then classification = level_10.
...
So, in any rule any arbitrary combination of attributes may be used. Can anyone help me in expressing this?
The second part of the problem is that the levels are ordered and if a person satisfies more than 1 rule, the highest level is chosen. The only way I can think of of ordering levels is to order the rules themselves using salience. So rules resulting is higher levels will have higher salience. Is there a more elegant way of doing this?
I found a similar question here but that seems to deal with only 1 rule and the OP is probably more familiar with Drools than I am because I have not understood the solution. That talks about introducing a separate control fact but I didn't get how that works.
EDIT:
I would eventually have to create a template and supply the data using a csv. It probably does not matter for this problem, but, if it helps in any way...
The problem of assigning a discrete value to facts of a type based on attribute combinations is what I call "classification problem" (cf. Design Patterns in Production Systems). The simple approach is to write one rule for each discrete value, with constraints separating the attribute value cleanly from all other attribute sets. Note that statements such as
when attribute value age < a and any value for other attributes then classify as level 1
are misleading and must not be used to derive rules, because, evidently, this isn't a correct requirement since we have
when age < a && gender == female (...) then classify as level 10
and this contradicts the former requirement, correctly written as
when age < a && gender == male then classify as level 1
Likewise, the specification for level 2 must also be completed (and it'll become evident that there is no country for old men). With this approach, a classification based on i attributes with just 2 intervals each results in 2n rules. If the number of resulting levels is anywhere near this number, this approach is best. For implementation a decision table is suitable.
If major subsets of the n-dimensional space should fall into the same class, a more economical solution should be used. If, for instance, all women should fall into the same level, a rule selecting all women can be written and given highest precedence; the remaining rules will have to deal with n-1 dimenions. Thus, the simplest scenario would require just n rules, one for each dimension.
It is also possible to describe other intervals in an n-dimensional space, providing the full set of values for all dimensions with each interval. Using the appropriate subset of values for each interval avoids the necessity of ordering the rules (using salience) and ensures that really all cases are handled. Of course, a "fall-through" rule firing with low priority if the level hasn't been set is only prudent.

strategy to create 4 bytes unique Id in java

Our java application has a 4 bytes size restriction to hold unique id.
We are forced to implement a strategy to create unique ids which are 4 bytes in size.
Does any one know a strategy to create it
Yes, start with a random 32 bit integer and increment it.
Anything else will be too demanding as you scale up (e.g. if you have 1 billion already created ids and need to randomly generate a new one, you have to have a 1 billion entry table to check for existence inside... ouch!).
But if it absolutely has to be random and unique, the two strategies you can take are:
1) Have a big HashSet of every id used so far and check for existence in the set whenever you generate a new random ID. If it is, discard and try again.
2) Store all randomly used IDs in the database, and do a SELECT to see if your newly generated random ID exists. If it does, discard and try again.
If the unique ID was larger, you could use a Guid (also known as uuid), which are generated large enough and in such a way that you'll never see two Guids have the same value ever, anywhere, without needing to check.
For Guids/UUIDs in java, see http://docs.oracle.com/javase/7/docs/api/java/util/UUID.html
I think int can meet your demands.
you can try like this
private static byte[] synhead = {(byte)0xAA,0x55,0x7E,0x0B};

Generating unique reference number

Tech Stack: Java 1.6, JPA (Hibernate 3), Spring 3, Oracle 11g
I am working on a project where one of the requirement is to give back the customers a ‘ReferenceNumber’.
One option is to return the row ID, but for that to work, it must not be sequential. Otherwise, you can guess the next number etc.
I can generate a number in Java and store it in a separate column, but then I’ll have make sure there are no collisions.
There are ways to generate such number in database, but not sure if it will guarantee uniqueness.
Is there a best practice for such a requirement from the database point of view?
UPDATE 1
Current I am using the following in Java to generrate the number:
private static SecureRandom random = new SecureRandom();
public static BigInteger getNew() {
return new BigInteger(60, random);
}
public static BigInteger getNew(int numBits) {
return new BigInteger(numBits, random);
}
UPDATE 2: Requirement
Allowing sequential number would allow:
customer to guess the next number.
Find out how many numbers (orders) were there between two number. etc
It is preferable for this reference to be a number, but say a three letter prefix follwed by number is also fine.
If your table has a sequence generated primary key (e.g. the customer_id) then you could reverse the digits and then convert that to an octal representation. Thus it still looks like a decimal number, but it is definitely no longer consecutive and hard to guess any ranges.
The process is even reversable if you can find a way how to deal with trailing zeros in the original value (because they'd become a leading zero in the reversed number and thus will be "dropped" during the conversion).
How about prefixing the number with a customer abbreviation or name abbreviation or something (or 3 letters assigned when the customer is created, checked for uniqueness) and then just have a value stored that you increment sequentially for just that customer? That way they can't tell what the order numbers are in the rest of the system, but they can for themselves, which shouldn't really matter as they know how many orders they have placed anyway.
Why not take a SHA1 or MD5 Hash of a couple of pertinent fields (say the user's name and the time of record creation, etc.)? In many respects, if the strategy is well known, your services external to the Database will be able to recreate the reference number without having to query the database.

Time table Generation using Genetic Algorithms in java

I am trying to seek a solution for timetable generation using Genetic Algorithms(GA).
In my scenario i view a timetable of 6 days. Monday to Saturday.
Each day is divided into number of lectures/Time slots.(maximum no. of lectures are 6 in a day/Each time slot if 1 hr so that means 6 hours for a day)
I have tried to represent a Class consisting of Teacher,Student Group(set), and a lecture.
I maintain a pool of possible teachers,possible subjects and possible student groups.
And i randomly assign them to these Class.
so a Class is collection of all these references.
so for each time slot we have the Class object representation in it.
similarly a day is made up of number of lectures Class object representation.
and so on with the week making up of 6 days.
A set of possible constraints that i have is:
1.A teacher can take only one lecture in one time slot
2.A teacher can take a set of subjects(finite)
3.A teacher can be unavailable on a certain day
4.A teacher can be unavailable on a certain timeslot
And other constraints as it may be included lately.
Can anyone give me a idea about how to represent these constraints or handle these constraints? and how to calculate the fitness scores depending on constraints?
EDIT : The implementation is here https://github.com/shridattz/dynamicTimeTable
UPDATE:
The code can be found here
github.com/shridattz/dynamicTimeTable
In my TimeTable Generation I have used A timetable object. This object consists of ClassRoom objects and the timetable schedule for each them also a fittness score for the timetable.
Fittness score corresponds to the number of clashes the timetable has with respect to the other schedules for various classes.
ClassRoom object consists of week objects.Week objects consist of Days. and Days consists of Timeslots. TimeSlot has a lecture in which a subject,student group attending the lecture and professor teaching the subject is associated
This way I have represented the timetable as a chromosome.
And further on talking about the constraints, I have used composite design pattern, which make it well extendable to add or remove as many constraints.
in each constraint class the condition as specified in my question is checked between two timetable objects.
If condition is satisfied i.e there is a clash is present then the score is incremented by one.
This way the timetable with the least Score is the Best we can get.
For this problem ther is no efficint solution. I think you got that too because you use genetic algorithms. I wrote some month ago a framework for genetic algorithm myself.
I think you missed: every class has a list of lessons per week and only one lesson can happen at a time. Now you can combine randomly teachers and classes for the timeslots.
In the fitnes function I'd give a huge plus if a class has all lessons to do a week. A big minus would be if teachers haven't simmilar load (teacher a has two lessons a week and teacher b 12 for example). This you might relativate if a teacher has to work just 20 hours a week (use %).
All in all it is not that trivial and you might look for an experienced co-worker or mentor to help you with this topic.
If you want more specific advises, please specify your question.

How should I go about optimizing a hash table for a given population?

Say I have a population of key-value pairs which I plan to store in a hash table. The population is fixed and will never change. What optimizations are available to me to make the hash table as fast as possible? Which optimizations should I concentrate on? This is assuming I have a lot of space. There will be a reasonable number of pairs (say no more than 100,000).
EDIT: I want to optimize look up. I don't care how long it takes to build.
I would make sure that your key's hash to unique values. This will ensure that every lookup will be constant time, and thus, as fast as possible.
Since you can never have more than 100,000 keys, it is entirely possible to have 100,000 hash values.
Also, make sure that you use the constructor that takes an int to specify the initial capacity (Set it to 100,000), and a float to set the load factor. (Use 1) Also, doing this requires that you have a perfect hash function for your keys. But, this will result in the fastest possible lookup, in the least amount of memory.
In general, to optimize a hash table, you want to minimize collisions in the determination of your hash, so your buckets won't contain more than one item and the hash-search will return immediately.
Most of the time, that means that you should measure the output of your hash function on the problem space. So i guess i'd recommend looking into that
Ensure there are no collisions. If there are no collisions, you are guaranteed O(1) constant look-up time. The next optimization would then be the look-up.
Use a profiler to optimize piece by piece. It's hard to without that.
If it's possible to make a large hash table such that there are no collisions at all, it will be ideal. Since your insertions and lookups will done in constant time.
But if that is not possible, try to choose a hash function such that your keys get distributed uniformly across the hash table.
Perfect hashing algorithms deal with the problem, but may not scale to 100k objects. I found a Java MPH package, but haven't tried it.
If the population is known at compile time, then the optimal solution is to use a minimal perfect hash function (MPH). The Wikipedia page on this subject links to several Java tools that can generate these.
The optimization must be done int the hashCode method of the key class. The thing to have in mind is to implement this method to avoid collisions.
Getting the perfect hashing algorithm to give totally unique values to 100K objects is likely to be close to impossible. Consider the birthday paradox. The date on which people are born can be considered a perfect hashing algorithm yet if you have more than 23 people you are more than likely to have a collision, and that is in a table of 365 dates.
So how big a table will you need to have no collisions in 100K?
If your keys are strings, your optimal strategy is a tree, not binary but n-branch at each character. If the keys are lower-case only it is easier still as you need just 26 whenever you create a branch.
We start with 26 keys. Follow the first character, say f
f might have a value associated with it. And it may have sub-trees. Look up a subtree of o. This leads to more subtrees then look up the next o. (You knew where that was leading!). If this doesn't have a value associated with it, or we hit a null sub-tree on the way, we know the value is not found.
You can optimise the space on the tree where you hit a point of uniqueness. Say you have a key january and it becomes unique at the 4th character. At this point where you assign the value you also store the actual string associated with it. In our example there may be one value associated with foo but the key it relates to may be food, not foo.
I think google search engines use a technique similar to this.
The key question is what your key is. (No pun intended.) As others have pointed out, the goal is to minimize the number of hash collisions. If you can get the number of hash collisions to zero, i.e. your hash function generates a unique value for every key that is actually passed to it, you will have a perfect hash.
Note that in Java, a hash function really has two steps: First the key is run through the hashCode function for it's class. Then we calculate an index value into the hash table by taking this value modulo the size of the hash table.
I think that people discussing the perfect hash function tend to forget that second step. Even if you wrote a hashCode function that generated a unique value for every key passed to it, you could still get an absolutely terrible hash if this value modulo the hash table size is not unique. For example, say you have 100 keys and your hashCode function returns the values 1, 1001, 2001, 3001, 4001, 5001, ... 99001. If your hash table has 100,000 slots, this would be a perfect hash. Every key gets its own slot. But if it has 1000 slots, they all hash to the same slot. It would be the worst possible hash.
So consider constructing a good hash function. Take the extreme cases. Suppose that your key is a date. You know that the dates will all be in January of the same year. Then using the day of the month as the hash value should be as good as it's going to get: everything will hash to a unique integer in a small range. On the other hand, if your dates were all the first of the month for many years and many months, taking the day of the month would be a terrible hash, as every actual key would map to "1".
My point being that if you really want to optimize your hash, you need to know the nature of your data. What is the actual range of values that you will get?

Categories

Resources