Permutations of subsets and subsequent run-time analysis

Permutations of subsets and subsequent run-time analysis - java

I have 2 questions:
I would like to generate the permutations of subsets e.g. There are 20 possible amino acids and 5 positions where they can occur. What are the total permutations that can occur (in text)
Once I have this list of permutations certain values will be assinged to each one and I would like to look up any given permutation at run time. The first idea that comes to mind is a look-up table, but I was wondering if there might be a better way of doing this.

You want combinations of length 5, not permutations. This is a standard problem, which can be solved with recursion. Use CombinationGenerator if you don't want to write it yourself.
Number the combinations using base 20 (not to be confused with the chemical definition of base). Use a hashtable if you'll be storing data for a limited subset of combinations, or a look-up array if you'll be most of them.

Related

Java repeated Cartesian product and combinations

Is there an equivalent to python's product and combination functions?
Or in other words, given a set of Integers, and REPEAT number of times to repeat,
is there a way to create a list of lists or array of arrays or something of this sort that contains all the ways to choose REPEAT objects from the set with or without choosing the same element twice.
The main issue is that the number of repetition times is not known during compilation time.
Example of combination (with choosing twice):
input [1,2,3,4,5] 3
output [[1,1,1], [1,1,2] ... ]
Example of product (without choosing twice):
input [1,2,3,4,5] 3
output [[1,2,3], [1,2,4], [1,2,5] ... ]

Posting #RC's comment as an answer for others to find this library.
Perhaps there is something a more standard way or a simple code to do it in java (which people will be able to copy and change without importing a package just for it).
But anyway, this seems like a good library to do just that.

Implementing efficient data structure using Arrays only

As part of my programming course I was given an exercise to implement my own String collection. I was planning on using ArrayList collection or similar but one of the constraints is that we are not allowed to use any Java API to implement it, so only arrays are allowed. I could have implemented this using arrays however efficiency is very important as well as the amount of data that this code will be tested with. I was suggested to use hash tables or ordered tress as they are more efficient than arrays. After doing some research I decided to go with hash tables because they seemed easy to understand and implement but once I started writing code I realised it is not as straight forward as I thought.
So here are the problems I have come up with and would like some advice on what is the best approach to solve them again with efficiency in mind:
ACTUAL SIZE: If I understood it correctly hash tables are not ordered (indexed) so that means that there are going to be gaps in between items because hash function gives different indices. So how do I know when array is full and I need to resize it?
RESIZE: One of the difficulties that I need to create a dynamic data structure using arrays. So if I have an array String[100] once it gets full I will need to resize it by some factor I decided to increase it by 100 each time so once I would do that I would need to change positions of all existing values since their hash keys will be different as the key is calculated:
int position = "orange".hashCode() % currentArraySize;
So if I try to find a certain value its hash key will be different from what it was when array was smaller.
HASH FUNCTION: I was also wondering if built-in hashCode() method in String class is efficient and suitable for what I am trying to implement or is it better to create my own one.
DEALING WITH MULTIPLE OCCURRENCES: one of the requirements is to be able to add multiple words that are the same, because I need to be able to count how many times the word is stored in my collection. Since they are going to have the same hash code I was planning to add the next occurrence at the next index hoping that there will be a gap. I don't know if it is the best solution but here how I implemented it:
public int count(String word) {
int count = 0;
while (collection[(word.hashCode() % size) + count] != null && collection[(word.hashCode() % size) + count].equals(word))
count++;
return count;
}
Thank you in advance for you advice. Please ask anything needs to be clarified.
P.S. The length of words is not fixed and varies greatly.
UPDATE Thank you for your advice, I know I did do few stupid mistakes there I will try better. So I took all your suggestions and quickly came up with the following structure, it is not elegant but I hope it is what you roughly what you meant. I did have to make few judgements such as bucket size, for now I halve the size of elements, but is there a way to calculate or some general value? Another uncertainty was as to by what factor to increase my array, should I multiply by some n number or adding fixed number is also applicable? Also I was wondering about general efficiency because I am actually creating instances of classes, but String is a class to so I am guessing the difference in performance should not be too big?

ACTUAL SIZE: The built-in Java HashMap just resizes when the total number of elements exceeds the number of buckets multiplied by a number called the load factor, which is by default 0.75. It does not take into account how many buckets are actually full. You don't have to, either.
RESIZE: Yes, you'll have to rehash everything when the table is resized, which does include recomputing its hash.
So if I try to find a certain value it's hash key will be different from what it was when array was smaller.
Yup.
HASH FUNCTION: Yes, you should use the built in hashCode() function. It's good enough for basic purposes.
DEALING WITH MULTIPLE OCCURRENCES: This is complicated. One simple solution would just be to have the hash entry for a given string also keep count of how many occurrences of that string are present. That is, instead of keeping multiple copies of the same string in your hash table, keep an int along with each String counting its occurrences.

So how do I know when array is full and I need to resize it?
You keep track of the size and HashMap does. When the size used > capacity * load factor you grow the underlying array, either as a whole or in part.
int position = "orange".hashCode() % currentArraySize;
Some things to consider.
The % of a negative value is a negative value.
Math.abs can return a negative value.
Using & with a bit mask is faster however you need a size which is a power of 2.
I was also wondering if built-in hashCode() method in String class is efficient and suitable for what I am trying to implement or is it better to create my own one.
The built in hashCode is cached, so it is fast. However it is not a great hashCode and has poor randomness for lower bit, and higher bit for short strings. You might want to implement your own hashing strategy, possibly a 64-bit one.
DEALING WITH MULTIPLE OCCURRENCES:
This is usually done with a counter for each key. This way you can have say 32767 duplicates (if you use short) or 2 billion (if you use int) duplicates of the same key/element.

Sorting algorithm that is stable in sorting ten million objects

I am trying to sort 10 Million Account objects in an array or array list. The Account class implements the comparable interface. with some variables such as age, acct number, etc. I need to sort this array or array list by age, and I need to keep the relative ordering of the accounts with the same age.
I am thinking that I would use a Mergesort in this application, because 1) Mergesort is a stable comparable sort that will keep the relative ordering, and it has the best worst case time of n log n. However a Binary Tree Sort would have similar effects with the same time complexity with this amount of objects. What do you think?

If you really wanna sort by 'age', how about using Counting Sort (http://en.wikipedia.org/wiki/Counting_sort)? You can maintain same relative order as original in at most 2 iterations or 2n lookups.

From the javadoc of Collections.sort():
This sort is guaranteed to be stable: equal elements will not be reordered as a result of the sort.
So don't reinvent the wheel, and just use the standard sort algorithm that the JDK provides: Collections.sort() or, better if using Java 8: List.sort(). Without any warmup that would allow the JIT to optimize the code, sorting 10M accounts with an age between 0 and 30 takes 1.4 seconds on my machine.

I prefer using merge sort as it does not add space complexcity.
Quickost would also be considered providing space & memory allocation is not a constraint

I think you can do it by serial step:
step 1: split 10 Million objects into 2^N slices, and sort for each slice;
step 2: use selectsort for the head objects from 2 slices and merge into new slice;
step 3: again and again do step 2, util just only 1 slice.

It depends on the parameters like how optimal of solution does the
problem require, what are your main operations after sorting, is it 32
bit or 64-bit numbers . i.e What are your project requirements.
Look at the difference between internal
sorting and external
sorting. Your approach
requires external sorting mechanism.
For example, if they want to count the ages of the employees, you
probably use the Counting
sort, It can sort the
data in the memory.
But for fairly random data, you need external
sorting.

query for Math.random to generate only a certain amount of each number needed

I am curious if anyone knows a way to use math.random to generate random numbers between say 0 and 3, But when it generates two 0's or two 1's it rules out the possibility of generating them numbers?
This is for a game assignment for college and all I have left to do is set it that it only generates two of each number with one 3. If anyone knows how this would be very helpful (even if it is using something other than math.random.
The language is Java.

So, you basically want to keep track of the number draws, right?
A possible solution is to use an array whose length is the valid range of your random numbers, where each cell counts the occurrences of the respective number. Then, for each draw, you check the contents of the respective cell in the array to see if you reached the limit. If you did, then redraw and repeat.
Note: at the time of writing this, the language of use is unknown, but the solution is generic enough to be implemented in virtually any language that provide a random() function.

storing sets of integers to check if a certain set has already been mentioned

I've come across an interesting problem which I would love to get some input on.
I have a program that generates a set of numbers (based on some predefined conditions). Each set contains up to 6 numbers that do not have to be unique with integers that ranges from 1 to 100).
I would like to somehow store every set that is created so that I can quickly check if a certain set with the exact same numbers (order doesn't matter) has previously been generated.
Speed is a priority in this case as there might be up to 100k sets stored before the program stops (maybe more, but most the time probably less)! Would anyone have any recommendations as to what data structures I should use and how I should approach this problem?
What I have currently is this:
Sort each set before storing it into a HashSet of Strings. The string is simply each number in the sorted set with some separator.
For example, the set {4, 23, 67, 67, 71} would get encoded as the string "4-23-67-67-71" and stored into the HashSet. Then for every new set generated, sort it, encode it and check if it exists in the HashSet.
Thanks!

if you break it into pieces it seems to me that
creating a set (generate 6 numbers, sort, stringify) runs in O(1)
checking if this string exists in the hashset is O(1)
inserting into the hashset is O(1)
you do this n times, which gives you O(n).
this is already optimal as you have to touch every element once anyways :)
you might run into problems depending on the range of your random numbers.
e.g. assume you generate only numbers between one and one, then there's obviously only one possible outcome ("1-1-1-1-1-1") and you'll have only collisions from there on. however, as long as the number of possible sequences is much larger than the number of elements you generate i don't see a problem.
one tip: if you know the number of generated elements beforehand it would be wise to initialize the hashset with the correct number of elements (i.e. new HashSet<String>( 100000 ) );
p.s. now with other answers popping up i'd like to note that while there may be room for improvement on a microscopic level (i.e. using language specific tricks), your overal approach can't be improved.

Create a class SetOfIntegers
Implement a hashCode() method that will generate reasonably unique hash values
Use HashMap to store your elements like put(hashValue,instance)
Use containsKey(hashValue) to check if the same hashValue already present
This way you will avoid sorting and conversion/formatting of your sets.

Just use a java.util.BitSet for each set, adding integers to the set with the set(int bitIndex) method, you don't have to sort anything, and check a HashMap for already existing BitSet before adding a new BitSet to it, it will be really very fast. Don't use sorting of value and toString for that purpose ever if speed is important.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Permutations of subsets and subsequent run-time analysis - java

Related

Java repeated Cartesian product and combinations

Implementing efficient data structure using Arrays only

Sorting algorithm that is stable in sorting ten million objects

query for Math.random to generate only a certain amount of each number needed

storing sets of integers to check if a certain set has already been mentioned

Categories

Resources