Genetic Algorithm using Roulette Wheel Selection

Genetic Algorithm using Roulette Wheel Selection - java

I'm trying to create different selection methods for a genetic algorithm I'm working on but one problem I meet in all selection methods is that my fitness of each node must be different. This is a problem for me as my fitness calculator is quite basic and will yield several identical fitness's
public static Map<String, Double> calculateRouletteSelection(Map<String, Double> population) {
String[] keys = new String[population.size()];
Double[] values = new Double[population.size()];
Double[] unsortedValues = new Double[population.size()];
int index = 0;
for(Map.Entry<String, Double> mapEntry : population.entrySet()) {
keys[index] = mapEntry.getKey();
values[index] = mapEntry.getValue();
unsortedValues[index] = mapEntry.getValue();
index++;
}
Arrays.sort(values);
ArrayList<Integer> numbers = new ArrayList<>();
while(numbers.size() < values.length/2) {
int random = rnd.nextInt(values.length);
if (!numbers.contains(random)) {
numbers.add(random);
}
}
HashMap<String, Double> finalHashMap = new HashMap<>();
for(int i = 0; i<numbers.size(); i++) {
for(int j = 0; j<values.length; j++) {
if(values[numbers.get(i)] == unsortedValues[j]) {
finalHashMap.put(keys[j], unsortedValues[j]);
}
}
}
return finalHashMap;
}
90% of all my different selection methods are the same so I'm sure if I could solve it for one I can solve it for all.
Any help on what I'm doing wrong would be appreciated
EDIT: I saw I'm meant to post the general behavior of what's happening so essentially the method takes in a HashMap<>, sorts the values based on their fitness, picks half sorted values randomly and adds these to a new HashMap<> with their corresponding chromosomes.

I think you'd be much better off using collection classes.
List<Map.Entry<String, Double>> sorted = new ArrayList<>(population.entrySet());
// sort by fitness
Collections.sort(sorted, Comparator.comparing(Map.Entry::getValue));
Set<Integer> usedIndices = new HashSet<>(); // keep track of used indices
Map<String, Double> result = new HashMap<>();
while (result.size() < sorted.size()/2) {
int index = rnd.nextInt(sorted.size());
if (!usedIndices.add(index)) {
continue; // was already used
}
Map.Entry<String,Double> survivor = sorted.get(index);
result.put(survivor.getKey(), survivor.getValue());
}
return result;
But, as Sergey stated, I don't believe this is what you need for your algorithm; you do need to favor the individuals with higher fitness.

As mentioned in the comments, in roulette wheel selection order is not important, only weights are. A roulette wheel is like a pie chart with different sections occupying different portions of the disk, but in the end they all sum up to unit area (the area of the disk).
I'm not sure if there is an equivalent in Java, but in C++ you have std::discrete_distribution. It generates a distribution [0,n) which you initialise with weights representing the probability of each of those integers being picked. So what I normally do is have the IDs of my agents in an array and their corresponding fitness values in another array. Order is not important as long as indices match. I pass the array of fitness values to the discrete distribution, which returns an integer interpretable as an array index. I then use that integer to select the individual from the other array.

Related

optimize source code to decrease time execution

I'm working on the next exercise from HackerRank: https://www.hackerrank.com/challenges/migratory-birds/problem?isFullScreen=false
So far I need to optimize my sourcecode in order to pass the tests related to time execution
This is my sourcecode:
class Result {
/*
* Complete the 'migratoryBirds' function below.
*
* The function is expected to return an INTEGER.
* The function accepts INTEGER_ARRAY arr as parameter.
*/
public static int migratoryBirds(List<Integer> arr) {
// Write your code here
int coincidences = 0;
int maxValuesPerCategory = 0;
//I'm using TreeMap because sort isthe key on this exercise
Map<Integer, Integer> results = new TreeMap<>();
List<Integer> targetKeys = new ArrayList<>();
//1. classifying values by coincidences
for(Integer element: arr){
coincidences = Collections.frequency(arr, element);
results.put(element, coincidences);
}
/*
2. filtering categories by highest coincidences,
if there are more than 1, choose the label with the lowest value
example: 4=5; 3=5 ->output= 3
*/
//getting the value with most coincidences
maxValuesPerCategory = Collections.max(results.values());
//iterate the map to identify which keys have the maxvalue
Set<Integer> keySet = results.keySet();
for(Integer key : keySet){
if(results.get(key) == maxValuesPerCategory){
targetKeys.add(key);
}
}
//3. sorting the list ascending to obtain the lowest value
Collections.sort(targetKeys);
//get the first value (it should be the lowest label category)
return targetKeys.get(0);
}
}
I would like to ask you about suggestions how to optimize stages 2 and 3 because, from my point of view, the first stage is efficient in terms of execution but if you have suggestions about it please let me know.
Thanks a lot in advanced

Should I sort a hashmap that contains frequency with bucketsort or heapsort?

I have a hashmap in Java in this form HashMap<String, Integer> frequency. The key is a string where I hold the name of a movie and the value is the frequency of the said movie.
My program takes input from users so whenever someone is adding a video to favorite I go in the hashmap and I increment its frequency.
Now the problem is at one point I need to take the most k frequent movies. I've found that I could use bucketsort or heapsort in this leetcode problem (check the first comment), however I am not sure if it is more efficient in my case. My hashmap constantly updates, therefore I need to call the sorting algorithm again times if one frequency changed.
From my understanding, it takes O(N) time to build the map, where 'N' is the number of movies even with duplicates as it needs to add to the frequency, which gets me 'M' unique movie titles. Would that mean that heapsort will result in O(M * log(k)) and bucketsort O(M) for any given k?

Having a map that sorts on values (the thing you map to) isn't a thing, unfortunately. You could instead have a set whose keys sort themselves on frequency, but given that frequency is the key at that point, you couldn't look up entries in this set without knowing the frequency beforehand which eliminates the point of the exercise.
One strategy that comes to mind is to have 2 separate data structures. One serves to let you look up the actual object based on the name of the movie, the other is to be self-sorting:
#Data
public class MovieFrequencyTuple implements Comparable<MovieFrequencyTable> {
#NonNull private final String name;
private int frequency;
public void incrementFrequency() {
frequency++;
}
#Override public int compareTo(MovieFrequencyTuple other) {
int c = Integer.compare(frequency, other.frequency);
if (c != 0) return -c;
return name.compareTo(other.name);
}
}
and with that available to you:
SortedSet<MovieFrequencyTuple> frequencies = new TreeSet<>();
Map<String, MovieFrequencyTuple> movies = new HashMap<>();
public int increment(String movieName) {
MovieFrequencyTuple tuple = movies.get(name);
if (tuple == null) {
tuple = new MovieFrequencyTuple(name);
movies.put(name, tuple);
}
// Self-sorting data structures will just fail
// to do the job if you modify a sorting order on
// an object already in the collection. Thus,
// we take it out, modify, put it back in.
frequencies.remove(tuple);
tuple.incrementFrequency();
frequencies.add(tuple);
return tuple.getFrequency();
}
public int get(String movieName) {
MovieFrequencyTuple tuple = movies.get(movieName);
if (tuple == null) return 0;
return tuple.getFrequency();
}
public List<String> getTop10() {
var out = new ArrayList<String>();
for (MovieFrequencyTuple tuple : frequencies) {
out.add(tuple.getName());
if (out.size() == 10) break;
}
return out;
}
Each operation is amortized O(1) or O(logn), even the top10 operation. So, if you run a million times 'increment a movie's frequency, then obtain the top 10', with n = # of times we do that, then the worst case scenario is O(nlogn) performance.
NB: Uses lombok for constructors, getters, etc - if you don't like that, have your IDE generate these things.

PairWise matching millions of records

I have an algorithmic problem at hand. To easily explain the problem, I will be using a simple analogy.
I have an input file
Country,Exports
Austrailia,Sheep
US, Apple
Austrialia,Beef
End Goal:
I have to find the common products between the pairs of countries so
{"Austrailia,New Zealand"}:{"apple","sheep}
{"Austrialia,US"}:{"apple"}
{"New Zealand","US"}:{"apple","milk"}
Process :
I read in the input and store it in a TreeMap > Where the List, the strings are interned due to many duplicates.
Essentially, I am aggregating by country.
where Key is country, Values are its Exports.
{"austrailia":{"apple","sheep","koalas"}}
{"new zealand":{"apple","sheep","milk"}}
{"US":{"apple","beef","milk"}}
I have about 1200 keys (countries) and total number of values(exports) is 80 million altogether.
I sort all the values of each key:
{"austrailia":{"apple","sheep","koalas"}} -- > {"austrailia":{"apple","koalas","sheep"}}
This is fast as there are only 1200 Lists to sort.
for(k1:keys)
for(k2:keys)
if(k1.compareTo(k2) <0){ //Dont want to double compare
List<String> intersectList = intersectList_func(k1's exports,k2's exports);
countriespair.put({k1,k2},intersectList)
}
This code block takes so long.I realise it O(n2) and around 1200*1200 comparisions.Thus,Running for almost 3 hours till now..
Is there any way, I can speed it up or optimise it.
Algorithm wise is best option, or are there other technologies to consider.
Edit:
Since both List are sorted beforehand, the intersectList is O(n) where n is length of floor(listOne.length,listTwo.length) and NOT O(n2) as discussed below
private static List<String> intersectList(List<String> listOne,List<String> listTwo){
int i=0,j=0;
List<String> listResult = new LinkedList<String>();
while(i!=listOne.size() && j!=listTwo.size()){
int compareVal = listOne.get(i).compareTo(listTwo.get(j));
if(compareVal==0){
listResult.add(listOne.get(i));
i++;j++;} }
else if(compareVal < 0) i++;
else if (compareVal >0) j++;
}
return listResult;
}
Update 22 Nov
My current implementation is still running for almost 18 hours. :|
Update 25 Nov
I had run the new implementation as suggested by Vikram and a few others. It's been running this Friday.
My question, is that how does grouping by exports rather than country save computational complexity. I find that the complexity is the same. As Groo mentioned, I find that the complexity for the second part is O(E*C^2) where is E is exports and C is country.

This can be done in one statement as a self-join using SQL:
test data. First create a test data set:
Lines <- "Country,Exports
Austrailia,Sheep
Austrailia,Apple
New Zealand,Apple
New Zealand,Sheep
New Zealand,Milk
US,Apple
US,Milk
"
DF <- read.csv(text = Lines, as.is = TRUE)
sqldf Now that we have DF issue this command:
library(sqldf)
sqldf("select a.Country, b.Country, group_concat(Exports) Exports
from DF a, DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
")
giving this output:
Country Country Exports
1 Austrailia New Zealand Sheep,Apple
2 Austrailia US Apple
3 New Zealand US Apple,Milk
with index If its too slow add an index to the Country column (and be sure not to forget the main. parts:
sqldf(c("create index idx on DF(Country)",
"select a.Country, b.Country, group_concat(Exports) Exports
from main.DF a, main.DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
"))
If you run out memory then add the dbname = tempfile() sqldf argument so that it uses disk.

Store something like following datastructure:- (following is a pseudo code)
ValuesSet ={
apple = {"Austrailia","New Zealand"..}
sheep = {"Austrailia","New Zealand"..}
}
for k in ValuesSet
for k1 in k.values()
for k2 in k.values()
if(k1<k2)
Set(k1,k2).add(k)
time complextiy: O(No of distinct pairs with similar products)
Note: I might be wrong but i donot think u can reduce this time complexity
Following is a java implementation for your problem:-
public class PairMatching {
HashMap Country;
ArrayList CountNames;
HashMap ProdtoIndex;
ArrayList ProdtoCount;
ArrayList ProdNames;
ArrayList[][] Pairs;
int products=0;
int countries=0;
public void readfile(String filename) {
try {
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
CountNames = new ArrayList();
Country = new HashMap<String,Integer>();
ProdtoIndex = new HashMap<String,Integer>();
ProdtoCount = new ArrayList<ArrayList>();
ProdNames = new ArrayList();
products = countries = 0;
while((line=br.readLine())!=null) {
String[] s = line.split(",");
s[0] = s[0].trim();
s[1] = s[1].trim();
int k;
if(!Country.containsKey(s[0])) {
CountNames.add(s[0]);
Country.put(s[0],countries);
k = countries;
countries++;
}
else {
k =(Integer) Country.get(s[0]);
}
if(!ProdtoIndex.containsKey(s[1])) {
ProdNames.add(s[1]);
ArrayList n = new ArrayList();
ProdtoIndex.put(s[1],products);
n.add(k);
ProdtoCount.add(n);
products++;
}
else {
int ind =(Integer)ProdtoIndex.get(s[1]);
ArrayList c =(ArrayList) ProdtoCount.get(ind);
c.add(k);
}
}
System.out.println(CountNames);
System.out.println(ProdtoCount);
System.out.println(ProdNames);
} catch (FileNotFoundException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
}
}
void FindPairs() {
Pairs = new ArrayList[countries][countries];
for(int i=0;i<ProdNames.size();i++) {
ArrayList curr = (ArrayList)ProdtoCount.get(i);
for(int j=0;j<curr.size();j++) {
for(int k=j+1;k<curr.size();k++) {
int u =(Integer)curr.get(j);
int v = (Integer)curr.get(k);
//System.out.println(u+","+v);
if(Pairs[u][v]==null) {
if(Pairs[v][u]!=null)
Pairs[v][u].add(i);
else {
Pairs[u][v] = new ArrayList();
Pairs[u][v].add(i);
}
}
else Pairs[u][v].add(i);
}
}
}
for(int i=0;i<countries;i++) {
for(int j=0;j<countries;j++) {
if(Pairs[i][j]==null)
continue;
ArrayList a = Pairs[i][j];
System.out.print("\n{"+CountNames.get(i)+","+CountNames.get(j)+"} : ");
for(int k=0;k<a.size();k++) {
System.out.print(ProdNames.get((Integer)a.get(k))+" ");
}
}
}
}
public static void main(String[] args) {
PairMatching pm = new PairMatching();
pm.readfile("Input data/BigData.txt");
pm.FindPairs();
}
}

[Update] The algorithm presented here shouldn't improve time complexity compared to the OP's original algorithm. Both algorithms have the same asymptotic complexity, and iterating through sorted lists (as OP does) should generally perform better than using a hash table.
You need to group the items by product, not by country, in order to be able to quickly fetch all countries belonging to a certain product.
This would be the pseudocode:
inputList contains a list of pairs {country, product}
// group by product
prepare mapA (product) => (list_of_countries)
for each {country, product} in inputList
{
if mapA does not contain (product)
create a new empty (list_of_countries)
and add it to mapA with (product) as key
add this (country) to the (list_of_countries)
}
// now group by country_pair
prepare mapB (country_pair) => (list_of_products)
for each {product, list_of_countries} in mapA
{
for each pair {countryA, countryB} in list_of_countries
{
if mapB does not countain country_pair {countryA, countryB}
create a new empty (list_of_products)
and add it to mapB with country_pair {countryA, countryB} as key
add this (product) to the (list_of_products)
}
}
If your input list is length N, and you have C distinct countries and P distinct products, then the running time of this algorithm should be O(N) for the first part and O(P*C^2) for the second part. Since your final list needs to have pairs of countries mapping to lists of products, I don't think you will be able to lose the P*C^2 complexity in any case.
I don't code in Java too much, so I added a C# example which I believe you'll be able to port pretty easily:
// mapA maps each product to a list of countries
var mapA = new Dictionary<string, List<string>>();
foreach (var t in inputList)
{
List<string> countries = null;
if (!mapA.TryGetValue(t.Product, out countries))
{
countries = new List<string>();
mapA[t.Product] = countries;
}
countries.Add(t.Country);
}
// note (this is very important):
// CountryPair tuple must have value-type comparison semantics,
// i.e. you need to ensure that two CountryPairs are compared
// by value to allow hashing (mapping) to work correctly, in O(1).
// In C# you can also simply use a Tuple<string,string> to
// represent a pair of countries (which implements this correctly),
// but I used a custom class to emphasize the algorithm
// mapB maps each CountryPair to a list of products
var mapB = new Dictionary<CountryPair, List<string>>();
foreach (var kvp in mapA)
{
var product = kvp.Key;
var countries = kvp.Value;
for (int i = 0; i < countries.Count; i++)
{
for (int j = i + 1; j < countries.Count; j++)
{
var pair = CountryPair.Create(countries[i], countries[j]);
List<string> productsForCountryPair = null;
if (!mapB.TryGetValue(pair, out productsForCountryPair))
{
productsForCountryPair = new List<string>();
mapB[pair] = productsForCountryPair;
}
productsForCountryPair.Add(product);
}*
}
}

This is a great example to use Map Reduce.
At your map phase you just collect all the exports that belong to each Country.
Then, the reducer sorts the products (Products belong to the same country, because of mapper)
You will benefit from distributed, parallel algorithm that can be distributed into a cluster.

You are actually taking O(n^2 * time required for 1 intersect).
Lets see if we can improve time for intersect. We can maintain map for every country which stores corresponding products, so you have n hash maps for n countries. Just need to iterate thru all products once for initializing. If you want quick lookup, maintain a map of maps as:
HashMap<String,HashMap<String,Boolean>> countryMap = new HashMap<String, HashMap<String,Boolean>>();
Now if you want to find the common products for countries str1 and str2 do:
HashMap<String,Boolean> map1 = countryMap.get("str1");
HashMap<String,Boolean> map2 = countryMap.get("str2");
ArrayList<String > common = new ArrayList<String>();
Iterator it = map1.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String,Boolean> pairs = (Map.Entry)it.next();
//Add to common if it is there in other map
if(map2.containsKey(pairs.getKey()))
common.add(pairs.getKey());
}
So, total it will be O(n^2 * k) if there are k entries in one map assuming hash map lookup implementation is O(1) (I guess it is log k for java).

Using hashmaps where necessary to speed things up:
1) Go through the data and create a map with keys Items and values a list of countries associated with that item. So e.g. Sheep:Australia, US, UK, New Zealand....
2) Create a hashmap with keys each pair of countries and (initially) an empty list as values.
3) For each Item retrieve the list of countries associated with it and for each pair of countries within that list, add that item to the list created for that pair in step (2).
4) Now output the updated list for each pair of countries.
The largest costs are in steps (3) and (4) and both of these costs are linear in the amount of output produced, so I think this is not too far from optimal.

How can I update Java hashmap values by previous value

This question is a bit more complex that the title states.
What I am trying to do is store a map of {Object:Item} for a game where the Object represents a cupboard and the Item represents the content of the cupboard (i.e the item inside).
Essentially what I need to do is update the values of the items in a clockwise (positive) rotation; though I do NOT want to modify the list in any way after it is created, only shift the positions of the values + 1.
I am currently doing almost all That I need, however, there are more Object's than Item's so I use null types to represent empty cupboards. However, when I run my code, the map is being modified (likely as it's in the for loop) and in turn, elements are being overwritten incorrectly which after A while may leave me with a list full of nulls (and empty cupboards)
What I have so far...
private static Map<Integer, Integer> cupboardItems = new HashMap<Integer, Integer>();
private static Map<Integer, Integer> rewardPrices = new HashMap<Integer, Integer>();
private static final int[] objects = { 10783, 10785, 10787, 10789, 10791, 10793, 10795, 10797 };
private static final int[] rewards = { 6893, 6894, 6895, 6896, 6897 };
static {
int reward = rewards[0];
for (int i = 0; i < objects.length; i++) {
if (reward > rewards[rewards.length - 1])
cupboardItems.put(objects[i], null);
else
cupboardItems.put(objects[i], reward);
reward++;
}
}
// updates the items in the cupboards in clockwise rotation.
for (int i = 0; i < cupboardItems.size(); i++) {
if (objects[i] == objects[objects.length - 2])
cupboardItems.put(objects[i], cupboardItems.get(objects[0]));
else if (objects[i] == objects[objects.length - 1])
cupboardItems.put(objects[i], cupboardItems.get(objects[1]));
else
cupboardItems.put(objects[i], cupboardItems.get(objects[i + 2]));
}
So how may I modify my code to update so i get the following results..
======
k1:v1
k2:v2
k3:v3
k4:none
=======
k1:none
k2:v1
k3:v2
k4:v3
?

HashMap doesn't guarantee ordering, therefore if you need ordering, use ArrayList or LinkedList.
If you want to stick with HashMap, you need to sort the HashMap based on the key before each rotation. You can sort easily since the keys are Integer objects. But this will affect the performace.

Ragavan has a good answer if you want to stick to your approach. However, you are doing a lot of work to just rotate the items. It would be much more efficient to just rotate the index (using modulus) and keep the arrays the same:
final static List<Integer> objects = new ArrayList<Integer>(
Arrays.asList(10783, 10785, 10787, 10789, 10791, 10793, 10795, 10797));
final static List<Integer> rewards = new ArrayList<Integer>(
Arrays.asList(6893, 6894, 6895, 6896, 6897, -1, -1, -1));
public static int getReward(int obj, int rot){
int rotIndex = (objects.indexOf(obj) - rot)%objects.size();
//modulus in java can be negative
rotIndex = rotIndex < 0 ? rotIndex+objects.size():rotIndex;
return rewards.get(rotIndex);
}
public static void main(String... args){
//This should give 6897, which is the reward for obj 10783 after 4 rotations
System.out.println(getReward(10783,4));
}

How to optimize the updating of values in an ArrayList<Integer>

I want to store all values of a certain variable in a dataset and the frequency for each of these values. To do so, I use an ArrayList<String> to store the values and an ArrayList<Integer> to store the frequencies (since I can't use int). The number of different values is unknown, that's why I use ArrayList and not Array.
Example (simplified) dataset:
a,b,c,d,b,d,a,c,b
The ArrayList<String> with values looks like: {a,b,c,d} and the ArrayList<Integer> with frequencies looks like: {2,3,2,2}.
To fill these ArrayLists I iterate over each record in the dataset, using the following code.
public void addObservation(String obs){
if(values.size() == 0){// first value
values.add(obs);
frequencies.add(new Integer(1));
return;//added
}else{
for(int i = 0; i<values.size();i++){
if(values.get(i).equals(obs)){
frequencies.set(i, new Integer((int)frequencies.get(i)+1));
return;//added
}
}
// only gets here if value of obs is not found
values.add(obs);
frequencies.add(new Integer(1));
}
}
However, since the datasets I will use this for can be very big, I want to optimize my code, and using frequencies.set(i, new Integer((int)frequencies.get(i)+1)); does not seem very efficient.
That brings me to my question; how can I optimize the updating of the Integer values in the ArrayList?

Use a HashMap<String,Integer>
Create the HashMap like so
HashMap<String,Integer> hm = new HashMap<String,Integer>();
Then your addObservation method will look like
public void addObservation(String obs) {
if( hm.contains(obs) )
hm.put( obs, hm.get(obs)+1 );
else
hm.put( obs, 1 );
}

I would use a HashMap or a Hashtable as tskzzy suggested. Depending on your needs I would also create an object that has the name, count as well as other metadata that you might need.
So the code would be something like:
Hashtable<String, FrequencyStatistics> statHash = new Hashtable<String, FrequencyStatistics>();
for (String value : values) {
if (statHash.get(value) == null) {
FrequencyStatistics newStat = new FrequencyStatistics(value);
statHash.set(value, newStat);
} else {
statHash.get(value).incrementCount();
}
}
Now, your FrequencyStatistics objects constructor would automatically set its inital count to 1, while the incrementCound() method would increment the count, and perform any other statistical calculations that you might require. This should also be more extensible in the future than storing a hash of the String with only its corresponding Integer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Genetic Algorithm using Roulette Wheel Selection - java

Related

optimize source code to decrease time execution

Should I sort a hashmap that contains frequency with bucketsort or heapsort?

PairWise matching millions of records

How can I update Java hashmap values by previous value

How to optimize the updating of values in an ArrayList<Integer>

Categories

Resources