Compartmentalizing loops over a large iteration

Compartmentalizing loops over a large iteration - java

The Goal of my question is to enhance the performance of my algorithm by splitting the range of my loop iterations over a large array list.
For example: I have an Array list with a size of about 10 billion entries of long values, the goal I am trying to achieve is to start the loop from 0 to 100 million entries, output the result for the 100 million entries of whatever calculations inside the loop; then begin and 100 million to 200 million doing the previous and outputting the result, then 300-400million,400-500million and so on and so forth.
after I get all the 100 billion/100 million results, then I can sum them up outside of the loop collecting the results from the loop outputs parallel.
I have tried to use a range that might be able to achieve something similar by trying to use a dynamic range shift method but I cant seem to have the logic fully implemented like I would like to.
public static void tt4() {
long essir2 = 0;
long essir3 = 0;
List cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
long t1 = (long) cc.get((int) k);
long t2 = (long) cc.get((int) (k + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}
System.out.println("\n" + essir3);
}
I don't have any errors, I am just looking for a way to enhance performance and time. I can do a million entries in under a second directly, but when I put the size I require it runs forever. The size I'm giving are abstracts to illustrate size magnitudes, I don't want opinions like a 100 billion is not much, if I can do a million under a second, I'm talking massively huge numbers I need to iterate over doing complex tasks and calls, I just need help with the logic I'm trying to achieve if I can.

One thing I would suggest right off the bat would be to store your Breakpoint return value inside a simple array rather then using a List. This should improve your execution time significantly:
List<Long> cc = new ArrayList<>();
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
Long[] ccArray = cc.toArray(new Long[0]);
I believe what you're looking for is to split your tasks across multiple threads. You can do this with ExecutorService "which simplifies the execution of tasks in asynchronous mode".
Note that I am not overly familiar with this whole concept but have experimented with it a bit recently and give you a quick draft of how you could implement this.
I welcome those more experienced with multi-threading to either correct this post or provide additional information in the comments to help improve this answer.
Runnable Task class
public class CompartmentalizationTask implements Runnable {
private final ArrayList<Long> cc;
private final long index;
public CompartmentalizationTask(ArrayList<Long> list, long index) {
this.cc = list;
this.index = index;
}
#Override
public void run() {
Main.compartmentalize(cc, index);
}
}
Main class
private static ExecutorService exeService = Executors.newCachedThreadPool();
private static List<Future> futureTasks = new ArrayList<>();
public static void tt4() throws ExecutionException, InterruptedException
{
long essir2 = 0;
long essir3 = 0;
ArrayList<Long> cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
futureTasks.add(Main.exeService.submit(new CompartmentalizationTask(cc, k)));
}
for (int i = 0; i < futureTasks.size(); i++) {
futureTasks.get(i).get();
}
exeService.shutdown();
}
public static void compartmentalize(ArrayList<Long> cc, long index)
{
long t1 = (long) cc.get((int) index);
long t2 = (long) cc.get((int) (index + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}

Related

Java Multithreading Implementation for generating unique codes

My question is how I would implement multithreading to this task correctly.
I have a program that takes quite a long time to finish executing. About an hour and a half. I need to generate about 10,000 random and unique number codes. The code below is how I first implemented it and have it right now.
import java.util.Random;
import java.util.ArrayList;
public class Main
{
public static void main(String[] args) {
Random random = new Random();
// This holds all the codes
ArrayList<String> database = new ArrayList<>();
int counter = 0;
while(counter < 10000){
// Generate a 10 digit long code and append to sb
StringBuilder sb = new StringBuilder();
for(int i = 0; i < 10; i++){
sb.append(random.nextInt(10));
}
String code = String.valueOf(sb);
sb.setLength(0);
// Check if this code already exists in the database
// If not, then add the code and update counter
if(!database.contains(code)){
database.add(code);
counter++;
}
}
System.out.println("Done");
}
}
This of course is incredibly inefficient. So my question is: Is there is a way to implement multithreading that can work on this single piece of code? Best way I can word it is to give two cores/ threads the same code but have them both check the a single ArrayList? Both cores/ threads will generate codes but check to make sure the code it just made doesn't already exist either from the other core/ thread or from itself. I drew a rough diagram below. Any insight, advice, or pointers is greatly appreciated.

Using a more appropriate data structure and a more appropriate representation of the data, this should be a lot faster and easier to read, too:
Set<Long> database = new HashSet<>(10000);
while(database.size() < 10000){
database.add(ThreadLocalRandom.current().nextLong(10_000_000_000L);
}

Start with more obvious optimizations:
Do not use ArrayList, use HashSet. ArrayList contains() time complexity is O(n), while HashSet is O(1). Read this question about Big O summary for java collections framework. Read about Big O notation.
Initialize your collection with appropriate initial capacity. For your case that would be:
new HashSet<>(10000);
Like this underlying arrays won't be copied to increase their capacity. I would suggest to look/debug implementations of java collections to better understand how they work under the hood. Even try to implement them on your own.
Before you delve into complex multithreading optimizations, fix the simple problems - like bad collection choices.
Edit: As per suggestion from #Thomas in comments, you can directly generate a number(long) in the range you need - 0 to 9_999_999_999. You can see in this question how to do it. Stringify the resulting number and if length is less than 10, pad with leading zeroes.

Example:
(use ConcurrentHashMap, use threads, use random.nextLong())
public class Main {
static Map<String,Object> hashMapCache = new ConcurrentHashMap<String,Object>();
public static void main(String[] args) {
Random random = new Random();
// This holds all the codes
ArrayList<String> database = new ArrayList<>();
int counter = 0;
int NumOfThreads = 20;
int total = 10000;
int numberOfCreationsForThread = total/NumOfThreads;
int leftOver = total%NumOfThreads;
List<Thread> threadList = new ArrayList<>();
for(int i=0;i<NumOfThreads;i++){
if(i==0){
threadList.add(new Thread(new OneThread(numberOfCreationsForThread+leftOver,hashMapCache)));
}else {
threadList.add(new Thread(new OneThread(numberOfCreationsForThread,hashMapCache)));
}
}
for(int i=0;i<NumOfThreads;i++){
threadList.get(i).start();;
}
for(int i=0;i<NumOfThreads;i++){
try {
threadList.get(i).join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
for(String key : hashMapCache.keySet()){
database.add(key);
}
System.out.println("Done");
}}
OneThread:
public class OneThread implements Runnable{
int numberOfCreations;
Map<String,Object> hashMapCache;
public OneThread(int numberOfCreations,Map<String,Object> hashMapCache){
this.numberOfCreations = numberOfCreations;
this.hashMapCache = hashMapCache;
}
#Override
public void run() {
int counter = 0;
Random random = new Random();
System.out.println("thread "+ Thread.currentThread().getId() + " Start with " +numberOfCreations);
while(counter < numberOfCreations){
String code = generateRandom(random);
while (code.length()!=10){
code = generateRandom(random);
}
// Check if this code already exists in the database
// If not, then add the code and update counter
if(hashMapCache.get(code)==null){
hashMapCache.put(code,new Object());
counter++;
}
}
System.out.println("thread "+ Thread.currentThread().getId() + " end with " +numberOfCreations);
}
private static String generateRandom(Random random){
return String.valueOf(digits(random.nextLong(),10));
}
/** Returns val represented by the specified number of hex digits. */
private static String digits(long val, int digits) {
val = val > 0 ? val : val*-1;
return Long.toString(val).substring(0,digits);
}
}

Hackerrank: Frequency Queries Question Getting Timeout Error, How to optimize the code further?

I am getting timeout error for my code which I wrote using hashmap functions in java 8.When I submitted my answer 5 test cases failed due to timeout error out of 14 test cases on hackerrank platform.
Below is the question
You are given queries. Each query is of the form two integers described below:
x : Insert x in your data structure.
y : Delete one occurence of y from your data structure, if present.
z : Check if any integer is present whose frequency is exactly z. If yes, print 1 else 0.
The queries are given in the form of a 2-D array of where queries[i][0] contains the operation, and queries[i][1] contains the data element.
How should I optimize this code further ?
static HashMap<Integer,Integer> buffer = new HashMap<Integer,Integer>();
// Complete the freqQuery function below.
static List<Integer> freqQuery(List<List<Integer>> queries) {
List<Integer> output = new ArrayList<>();
output = queries.stream().map(query -> {return performQuery(query);}).filter(v -> v!=-1).collect(Collectors.toList());
//get the output array iterate over each query and perform operation
return output;
}
private static Integer performQuery(List<Integer> query) {
if(query.get(0) == 1){
buffer.put(query.get(1), buffer.getOrDefault(query.get(1), 0) + 1);
}
else if(query.get(0) == 2){
if(buffer.containsKey(query.get(1)) && buffer.get(query.get(1))>0 ){
buffer.put(query.get(1), buffer.get(query.get(1)) - 1);
}
}
else{
if(buffer.containsValue(query.get(1))){
return 1;
}
else{
return 0;
}
}
return -1;
}
public static void main(String[] args) {
List<List<Integer>> queries = Arrays.asList(
Arrays.asList(1,5),
Arrays.asList(1,6),
Arrays.asList(3,2),
Arrays.asList(1,10),
Arrays.asList(1,10),
Arrays.asList(1,6),
Arrays.asList(2,5),
Arrays.asList(3,2)
);
long start = System.currentTimeMillis();
System.out.println(freqQuery(queries));
long end = System.currentTimeMillis();
//finding the time difference and converting it into seconds
float sec = (end - start) / 1000F;
System.out.println("FreqQuery function Took "+sec + " s");
}
}

The problem with your code is the z operation. Sepecifically, the method containsValue has linear time complexty, making the whole complexity of the algorithm in the order of O(n*n). Here is a hint: add another hashmap on top of the one that you have which counts the occurences of occurences by value of the other map. In that way you can query directly this second one by the value (because it will be the key in this case).

Genetic Algorithm for Process Allocation

I have the following uni assignment that's been puzzling me. I have to implement a genetic algorithm that allocates processes into processors. More specifically the problem is the following:
"You have a program that is computed in parallel processor system. The program is made up of a N number of processes that need to be allocated on a n number of processors (where n is way smaller than N). The communication of processes during this whole process can be quite time consuming, so the best practice would be to assign processes that need intercommunication with one another to same processor.
In order to reduce the communication time between processes you could allocate of these processes to the same processor but this would negate the parallel processing idea that every processor needs to contribute to the whole process.
Consider the following: Let's say that Cij is the total amount of communication between process i and process j. Assume that every process needs the same amount of computing power so that the limitations of the processing process can be handled by assigning the same amount of processes to a processor. Use a genetic algorithm to assign N processes to n processors."
The above is roughly translated the description of the problem. Now I have the following question that puzzle me.
1) What would be the best viable solution in order to for the genetic algorithm to run. I have the theory behind them and I have deduced that you need a best possible solution in order to check each generation of the produced population.
2) How can I properly design the whole problem in order to be handled by a program.
I am planning to implement this in Java but any other recommendations for other programming languages would be welcome.

The Dude abides. Or El Duderino if you're not into the whole brevity thing.
What you're asking is really a two part question, but the Genetic Algorithm part can be easily illustrated in concept. I find that giving a basic start can be helpful, but this problem as a "whole" is too complicated to address here.
Genetic Algorithms (GA) can be used as an optimizer, as you note. In order to apply a GA to a process execution plan, you need to be able to score an execution plan, then clone and mutate the best plans. A GA works by running several plans, cloning the best, and then mutating some of them slightly to see if the offspring (cloned) plans are improved or worsened.
In this example, I created a array of 100 random Integers. Each Integer is a "process" to be run and the value of the Integer is the "cost" of running that individual process.
List<Integer> processes = new ArrayList<Integer>();
The processes are then added to an ExecutionPlan, which is a List<List<Integer>>. This List of List of Integers will be used to represent 4 processors doing 25 rounds of processing:
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
The total cost of an execution plan will be computed by taking the highest process cost per round (the greatest Integer) and summing the costs of all the rounds. Thus, the goal of the optimizer is to distribute the initial 100 integers (processes) into 25 rounds of "processing" on 4 "processors" such that total cost is as low as possible.
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
// i.e., kill off the least fit and reproduce the best fit.
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
When the program is run, the cost is initially randomly determined, but with each generation it improves. If you run it for 1000 generations and plot the results, a typical run will look like this:
The purpose of the GA is to Optimize or attempt to determine the best possible solution. The reason it can be applied to you problem is that your ExecutionPlan can be scored, cloned and mutated. The path to success, therefore, is to separate the problems in your mind. First, figure out how you can make an execution plan that can be scored as to what the cost will be to actually run it on an assumed set of hardware. Add rountines to clone and mutate an ExecutionPlan. Once you have that plug it into this GA example. Good luck and stay cool dude.
public class Optimize {
private static int GENERATIONCOUNT = 1000;
private static int PROCESSCOUNT = 100;
private static int MUTATIONCOUNT = PROCESSCOUNT/10;
public static void main(String...strings) {
new Optimize().run();
}
// define an execution plan as 25 runs on 4 processors
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
public ExecutionPlan(List<List<Integer>> plan) {
this.plan = plan;
}
#Override
public int compareTo(ExecutionPlan o) {
return cost-o.cost;
}
#Override
public String toString() {
return Integer.toString(cost);
}
}
private void run() {
// make 100 processes to be completed
List<Integer> processes = new ArrayList<Integer>();
// assign them a random cost between 1 and 100;
for ( int index = 0; index < PROCESSCOUNT; ++index) {
processes.add( new Integer((int)(Math.random() * 99.0)+1));
}
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
}
private void mutateClones(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
// mutate 10 different location swaps, maybe the plan improves
for ( int mutationCount = 0; mutationCount < MUTATIONCOUNT ; ++mutationCount) {
int location1 = (int)(Math.random() * 100.0);
int location2 = (int)(Math.random() * 100.0);
// swap two locations
Integer processCostTemp = execution.plan.get(location1/4).get(location1%4);
execution.plan.get(location1/4).set(location1%4, execution.plan.get(location2/4).get(location2%4));
execution.plan.get(location2/4).set(location2%4, processCostTemp);
}
}
}
private void cloneBetterPlansOverWorsePlans(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
List<List<Integer>> clonePlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
clonePlan.add( new ArrayList<Integer>(execution.plan.get(roundNumber)) );
}
executionPlans.set( index + executionPlans.size()/2, new ExecutionPlan(clonePlan) );
}
}
private void computeCostOfPlans(List<ExecutionPlan> executionPlans) {
for ( ExecutionPlan execution: executionPlans) {
execution.cost = 0;
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
// cost of a round is greatest "communication time".
List<Integer> round = execution.plan.get(roundNumber);
int roundCost = round.get(0)>round.get(1)?round.get(0):round.get(1);
roundCost = execution.cost>round.get(2)?roundCost:round.get(2);
roundCost = execution.cost>round.get(3)?roundCost:round.get(3);
// add all the round costs' to determine total plan cost
execution.cost += roundCost;
}
}
}
private List<ExecutionPlan> createAndIntializePlans(List<Integer> processes) {
List<ExecutionPlan> executionPlans = new ArrayList<ExecutionPlan>();
for ( int planNumber = 0; planNumber < 10; ++planNumber) {
// randomize the processes for this plan
Collections.shuffle(processes);
// and make the plan
List<List<Integer>> currentPlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
List<Integer> round = new ArrayList<Integer>();
round.add(processes.get(4*roundNumber+0));
round.add(processes.get(4*roundNumber+1));
round.add(processes.get(4*roundNumber+2));
round.add(processes.get(4*roundNumber+3));
currentPlan.add(round);
}
executionPlans.add(new ExecutionPlan(currentPlan));
}
return executionPlans;
}
}

Find the top N most popular elements

I have a List of TrackDay objects for a runner going around a track field on different days. Each pair of start/finish times signal a single lap run by the runner. We are guaranteed that there is a matching start/finish date (in the order in which they appear in the appropriate lists) :
TrackDay() {
List<DateTime> startTimes
List<DateTime> finishTimes
}
I would like to find the top N days (lets say 3) that runner ran the most. This translates to finding the N longest total start/finish times per TrackDay object. The naive way would be to do the following:
for (TrackDay td : listOftrackDays) {
// loop through each start/finish lists and find out the finish-start time for each pair.
// Add the delta times (finish-start) up for each pair of start/finish objects.
// Create a map to store the time for each TrackDay
// sort the map and get the first N entries
}
Is there a better, more clean/efficient way to do the above?

The problem you're trying to solve is well-known as Selection algorithm, in particular - Quick select. While sorting in general works good, for large collections it would be better to consider this approach, since it will give you linear time instead of N*log(N).

This solution should be linear time. I have assumed that startTimes and finishTimes support random access. I don't know what API your DateTime is part of, so have used java.time.LocalDateTime.
public List<TrackDay> findTop(List<TrackDay> trackDays, int limit) {
limit = Math.min(limit, trackDays.size());
List<Duration> durations = new ArrayList<>(Collections.nCopies(limit, Duration.ZERO));
List<TrackDay> result = new ArrayList<>(Collections.nCopies(limit, null));
int lastIndex = limit - 1;
for (TrackDay trackDay : trackDays) {
Duration duration = Duration.ZERO;
for (int i = 0, n = trackDay.startTimes.size(); i < n; i++) {
duration = duration.plus(Duration.between(trackDay.startTimes.get(i), trackDay.finishTimes.get(i)));
}
Integer destinationIndex = null;
for (int i = lastIndex; i >= 0; i--) {
if (durations.get(i).compareTo(duration) >= 0) {
break;
}
destinationIndex = i;
}
if (destinationIndex != null) {
durations.remove(lastIndex);
result.remove(lastIndex);
durations.add(destinationIndex, duration);
result.add(destinationIndex, trackDay);
}
}
return result;
}

For Loop is performing slow

Please have a look at the following code
//Devide the has into set of 3 pieces
private void devideHash(String str)
{
int lastIndex = 0;
for(int i=0;i<=str.length();i=i+3)
{
lastIndex = i;
try
{
String stringPiece = str.substring(i, i+3);
// pw.println(stringPiece);
hashSet.add(stringPiece);
}
catch(Exception arr)
{
String stringPiece = str.substring(lastIndex, str.length());
// pw.println(stringPiece);
hashSet.add(stringPiece);
}
}
}
The above method receives String like abcdefgjijklmnop as the parameter. Inside the method, its job is to divide this sets of 3 letters. So when the operation is completed, the hashset will have pieces like abc def ghi jkl mno p
But the problem is that if the input String is big, then this loop takes noticeable amount of time to complete. Is there any way I can use to speed this process?

As an option, you could replace all your code with this line:
private void divideHash(String str) {
hashSet.addAll(Arrays.asList(str.split("(?<=\\G...)")));
}
Which will perform well.
Here's some test code:
String str = "abcdefghijklmnop";
hashSet.addAll(Arrays.asList(str.split("(?<=\\G...)")));
System.out.println(hashSet);
Output:
[jkl, abc, ghi, def, mno, p]

There is nothing we can really tell unless you tell us what the "noticeable large amount" is, and what is the expected time. It is recommended that you start a profiler to find what logic takes most time.
Some recommendations I can give from briefly reading your code is:
If the result Set is going to be huge, it will involve lots of resize and rehashing when your HashSet resize. It is recommended you first allocate required size. e.g.
HashSet hashSet = new HashSet<String>(input.size() / 3 + 1, 1.0);
This will save you lots of time for unnecessary rehashing
Never use exception to control your program flow.
Why not simply do:
int i = 0;
for (int i = 0; i < input.size(); i += 3) {
if (i + 3 > input.size()) {
// substring from i to end
} else {
// subtring from i to i+3
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.