I have the following uni assignment that's been puzzling me. I have to implement a genetic algorithm that allocates processes into processors. More specifically the problem is the following:
"You have a program that is computed in parallel processor system. The program is made up of a N number of processes that need to be allocated on a n number of processors (where n is way smaller than N). The communication of processes during this whole process can be quite time consuming, so the best practice would be to assign processes that need intercommunication with one another to same processor.
In order to reduce the communication time between processes you could allocate of these processes to the same processor but this would negate the parallel processing idea that every processor needs to contribute to the whole process.
Consider the following: Let's say that Cij is the total amount of communication between process i and process j. Assume that every process needs the same amount of computing power so that the limitations of the processing process can be handled by assigning the same amount of processes to a processor. Use a genetic algorithm to assign N processes to n processors."
The above is roughly translated the description of the problem. Now I have the following question that puzzle me.
1) What would be the best viable solution in order to for the genetic algorithm to run. I have the theory behind them and I have deduced that you need a best possible solution in order to check each generation of the produced population.
2) How can I properly design the whole problem in order to be handled by a program.
I am planning to implement this in Java but any other recommendations for other programming languages would be welcome.
The Dude abides. Or El Duderino if you're not into the whole brevity thing.
What you're asking is really a two part question, but the Genetic Algorithm part can be easily illustrated in concept. I find that giving a basic start can be helpful, but this problem as a "whole" is too complicated to address here.
Genetic Algorithms (GA) can be used as an optimizer, as you note. In order to apply a GA to a process execution plan, you need to be able to score an execution plan, then clone and mutate the best plans. A GA works by running several plans, cloning the best, and then mutating some of them slightly to see if the offspring (cloned) plans are improved or worsened.
In this example, I created a array of 100 random Integers. Each Integer is a "process" to be run and the value of the Integer is the "cost" of running that individual process.
List<Integer> processes = new ArrayList<Integer>();
The processes are then added to an ExecutionPlan, which is a List<List<Integer>>. This List of List of Integers will be used to represent 4 processors doing 25 rounds of processing:
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
The total cost of an execution plan will be computed by taking the highest process cost per round (the greatest Integer) and summing the costs of all the rounds. Thus, the goal of the optimizer is to distribute the initial 100 integers (processes) into 25 rounds of "processing" on 4 "processors" such that total cost is as low as possible.
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
// i.e., kill off the least fit and reproduce the best fit.
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
When the program is run, the cost is initially randomly determined, but with each generation it improves. If you run it for 1000 generations and plot the results, a typical run will look like this:
The purpose of the GA is to Optimize or attempt to determine the best possible solution. The reason it can be applied to you problem is that your ExecutionPlan can be scored, cloned and mutated. The path to success, therefore, is to separate the problems in your mind. First, figure out how you can make an execution plan that can be scored as to what the cost will be to actually run it on an assumed set of hardware. Add rountines to clone and mutate an ExecutionPlan. Once you have that plug it into this GA example. Good luck and stay cool dude.
public class Optimize {
private static int GENERATIONCOUNT = 1000;
private static int PROCESSCOUNT = 100;
private static int MUTATIONCOUNT = PROCESSCOUNT/10;
public static void main(String...strings) {
new Optimize().run();
}
// define an execution plan as 25 runs on 4 processors
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
public ExecutionPlan(List<List<Integer>> plan) {
this.plan = plan;
}
#Override
public int compareTo(ExecutionPlan o) {
return cost-o.cost;
}
#Override
public String toString() {
return Integer.toString(cost);
}
}
private void run() {
// make 100 processes to be completed
List<Integer> processes = new ArrayList<Integer>();
// assign them a random cost between 1 and 100;
for ( int index = 0; index < PROCESSCOUNT; ++index) {
processes.add( new Integer((int)(Math.random() * 99.0)+1));
}
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
}
private void mutateClones(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
// mutate 10 different location swaps, maybe the plan improves
for ( int mutationCount = 0; mutationCount < MUTATIONCOUNT ; ++mutationCount) {
int location1 = (int)(Math.random() * 100.0);
int location2 = (int)(Math.random() * 100.0);
// swap two locations
Integer processCostTemp = execution.plan.get(location1/4).get(location1%4);
execution.plan.get(location1/4).set(location1%4, execution.plan.get(location2/4).get(location2%4));
execution.plan.get(location2/4).set(location2%4, processCostTemp);
}
}
}
private void cloneBetterPlansOverWorsePlans(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
List<List<Integer>> clonePlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
clonePlan.add( new ArrayList<Integer>(execution.plan.get(roundNumber)) );
}
executionPlans.set( index + executionPlans.size()/2, new ExecutionPlan(clonePlan) );
}
}
private void computeCostOfPlans(List<ExecutionPlan> executionPlans) {
for ( ExecutionPlan execution: executionPlans) {
execution.cost = 0;
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
// cost of a round is greatest "communication time".
List<Integer> round = execution.plan.get(roundNumber);
int roundCost = round.get(0)>round.get(1)?round.get(0):round.get(1);
roundCost = execution.cost>round.get(2)?roundCost:round.get(2);
roundCost = execution.cost>round.get(3)?roundCost:round.get(3);
// add all the round costs' to determine total plan cost
execution.cost += roundCost;
}
}
}
private List<ExecutionPlan> createAndIntializePlans(List<Integer> processes) {
List<ExecutionPlan> executionPlans = new ArrayList<ExecutionPlan>();
for ( int planNumber = 0; planNumber < 10; ++planNumber) {
// randomize the processes for this plan
Collections.shuffle(processes);
// and make the plan
List<List<Integer>> currentPlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
List<Integer> round = new ArrayList<Integer>();
round.add(processes.get(4*roundNumber+0));
round.add(processes.get(4*roundNumber+1));
round.add(processes.get(4*roundNumber+2));
round.add(processes.get(4*roundNumber+3));
currentPlan.add(round);
}
executionPlans.add(new ExecutionPlan(currentPlan));
}
return executionPlans;
}
}
Related
The Goal of my question is to enhance the performance of my algorithm by splitting the range of my loop iterations over a large array list.
For example: I have an Array list with a size of about 10 billion entries of long values, the goal I am trying to achieve is to start the loop from 0 to 100 million entries, output the result for the 100 million entries of whatever calculations inside the loop; then begin and 100 million to 200 million doing the previous and outputting the result, then 300-400million,400-500million and so on and so forth.
after I get all the 100 billion/100 million results, then I can sum them up outside of the loop collecting the results from the loop outputs parallel.
I have tried to use a range that might be able to achieve something similar by trying to use a dynamic range shift method but I cant seem to have the logic fully implemented like I would like to.
public static void tt4() {
long essir2 = 0;
long essir3 = 0;
List cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
long t1 = (long) cc.get((int) k);
long t2 = (long) cc.get((int) (k + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}
System.out.println("\n" + essir3);
}
I don't have any errors, I am just looking for a way to enhance performance and time. I can do a million entries in under a second directly, but when I put the size I require it runs forever. The size I'm giving are abstracts to illustrate size magnitudes, I don't want opinions like a 100 billion is not much, if I can do a million under a second, I'm talking massively huge numbers I need to iterate over doing complex tasks and calls, I just need help with the logic I'm trying to achieve if I can.
One thing I would suggest right off the bat would be to store your Breakpoint return value inside a simple array rather then using a List. This should improve your execution time significantly:
List<Long> cc = new ArrayList<>();
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
Long[] ccArray = cc.toArray(new Long[0]);
I believe what you're looking for is to split your tasks across multiple threads. You can do this with ExecutorService "which simplifies the execution of tasks in asynchronous mode".
Note that I am not overly familiar with this whole concept but have experimented with it a bit recently and give you a quick draft of how you could implement this.
I welcome those more experienced with multi-threading to either correct this post or provide additional information in the comments to help improve this answer.
Runnable Task class
public class CompartmentalizationTask implements Runnable {
private final ArrayList<Long> cc;
private final long index;
public CompartmentalizationTask(ArrayList<Long> list, long index) {
this.cc = list;
this.index = index;
}
#Override
public void run() {
Main.compartmentalize(cc, index);
}
}
Main class
private static ExecutorService exeService = Executors.newCachedThreadPool();
private static List<Future> futureTasks = new ArrayList<>();
public static void tt4() throws ExecutionException, InterruptedException
{
long essir2 = 0;
long essir3 = 0;
ArrayList<Long> cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
futureTasks.add(Main.exeService.submit(new CompartmentalizationTask(cc, k)));
}
for (int i = 0; i < futureTasks.size(); i++) {
futureTasks.get(i).get();
}
exeService.shutdown();
}
public static void compartmentalize(ArrayList<Long> cc, long index)
{
long t1 = (long) cc.get((int) index);
long t2 = (long) cc.get((int) (index + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}
I was coding a leetcode problem : https://oj.leetcode.com/problems/gas-station/ using Java 8.
My solution got TLE when I used Arrays.stream(integer_array).sum() to compute sum while the same solution got accepted using iteration to calculate the sum of elements in array. The best possible time complexity for this problem is O(n) and I am surprised to get TLE when using streaming API's from Java 8. I have implemented the solution in O(n) only.
import java.util.Arrays;
public class GasStation {
public int canCompleteCircuit(int[] gas, int[] cost) {
int start = 0, i = 0, runningCost = 0, totalGas = 0, totalCost = 0;
totalGas = Arrays.stream(gas).sum();
totalCost = Arrays.stream(cost).sum();
// for (int item : gas) totalGas += item;
// for (int item : cost) totalCost += item;
if (totalGas < totalCost)
return -1;
while (start > i || (start == 0 && i < gas.length)) {
runningCost += gas[i];
if (runningCost >= cost[i]) {
runningCost -= cost[i++];
} else {
runningCost -= gas[i];
if (--start < 0)
start = gas.length - 1;
runningCost += (gas[start] - cost[start]);
}
}
return start;
}
public static void main(String[] args) {
GasStation sol = new GasStation();
int[] gas = new int[] { 10, 5, 7, 14, 9 };
int[] cost = new int[] { 8, 5, 14, 3, 1 };
System.out.println(sol.canCompleteCircuit(gas, cost));
gas = new int[] { 10 };
cost = new int[] { 8 };
System.out.println(sol.canCompleteCircuit(gas, cost));
}
}
The solution gets accepted when,
I comment the following two lines: (calculating sum using streaming)
totalGas = Arrays.stream(gas).sum();
totalCost = Arrays.stream(cost).sum();
and uncomment the following two lines (calculating sum using iteration):
//for (int item : gas) totalGas += item;
//for (int item : cost) totalCost += item;
Now the solution gets accepted. Why streaming API in Java 8 is slower for large input than iteration for primitives?
The first step in dealing with problems like this is to bring the code into a controlled environment. That means running it in the JVM you control (and can invoke) and running tests inside a good benchmark harness like JMH. Analyze, don't speculate.
Here's a benchmark I whipped up using JMH to do some analysis on this:
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MICROSECONDS)
#State(Scope.Benchmark)
public class ArraySum {
static final long SEED = -897234L;
#Param({"1000000"})
int sz;
int[] array;
#Setup
public void setup() {
Random random = new Random(SEED);
array = new int[sz];
Arrays.setAll(array, i -> random.nextInt());
}
#Benchmark
public int sumForLoop() {
int sum = 0;
for (int a : array)
sum += a;
return sum;
}
#Benchmark
public int sumStream() {
return Arrays.stream(array).sum();
}
}
Basically this creates an array of a million ints and sums them twice: once using a for-loop and once using streams. Running the benchmark produces a bunch of output (elided for brevity and for dramatic effect) but the summary results are below:
Benchmark (sz) Mode Samples Score Score error Units
ArraySum.sumForLoop 1000000 avgt 3 514.473 398.512 us/op
ArraySum.sumStream 1000000 avgt 3 7355.971 3170.697 us/op
Whoa! That Java 8 streams stuff is teh SUXX0R! It's 14 times slower than a for-loop, don't use it!!!1!
Well, no. First let's go over these results, and then look more closely to see if we can figure out what's going on.
The summary shows the two benchmark methods, with the "sz" parameter of a million. It's possible to vary this parameter but it doesn't turn out to make a difference in this case. I also only ran the benchmark methods 3 times, as you can see from the "samples" column. (There were also only 3 warmup iterations, not visible here.) The score is in microseconds per operation, and clearly the stream code is much, much slower than the for-loop code. But note also the score error: that's the amount of variability in the different runs. JMH helpfully prints out the standard deviation of the results (not shown here) but you can easily see that the score error is a significant fraction of reported score. This reduces our confidence in the score.
Running more iterations should help. More warmup iterations will let the JIT do more work and settle down before running the benchmarks, and running more benchmark iterations will smooth out any errors from transient activity elsewhere on my system. So let's try 10 warmup iterations and 10 benchmark iterations:
Benchmark (sz) Mode Samples Score Score error Units
ArraySum.sumForLoop 1000000 avgt 10 504.803 34.010 us/op
ArraySum.sumStream 1000000 avgt 10 7128.942 178.688 us/op
Performance is overall a little faster, and the measurement error is also quite a bit smaller, so running more iterations has had the desired effect. But the streams code is still considerably slower than the for-loop code. What's going on?
A large clue can be obtained by looking at the individual timings of the streams method:
# Warmup Iteration 1: 570.490 us/op
# Warmup Iteration 2: 491.765 us/op
# Warmup Iteration 3: 756.951 us/op
# Warmup Iteration 4: 7033.500 us/op
# Warmup Iteration 5: 7350.080 us/op
# Warmup Iteration 6: 7425.829 us/op
# Warmup Iteration 7: 7029.441 us/op
# Warmup Iteration 8: 7208.584 us/op
# Warmup Iteration 9: 7104.160 us/op
# Warmup Iteration 10: 7372.298 us/op
What happened? The first few iterations were reasonably fast, but then the 4th and subsequent iterations (and all the benchmark iterations that follow) were suddenly much slower.
I've seen this before. It was in this question and this answer elsewhere on SO. I recommend reading that answer; it explains how the JVM's inlining decisions in this case result in poorer performance.
A bit of background here: a for-loop compiles to a very simple increment-and-test loop, and can easily be handled by usual optimization techniques like loop peeling and unrolling. The streams code, while not very complex in this case, is actually quite complex compared to the for-loop code; there's a fair bit of setup, and each loop requires at least one method call. Thus, the JIT optimizations, particularly its inlining decisions, are critical to making the streams code go fast. And it's possible for it to go wrong.
Another background point is that integer summation is about the simplest possible operation you can think of to do in a loop or stream. This will tend to make the fixed overhead of stream setup look relatively more expensive. It is also so simple that it can trigger pathologies in the inlining policy.
The suggestion from the other answer was to add the JVM option -XX:MaxInlineLevel=12 to increase the amount of code that can be inlined. Rerunning the benchmark with that option gives:
Benchmark (sz) Mode Samples Score Score error Units
ArraySum.sumForLoop 1000000 avgt 10 502.379 27.859 us/op
ArraySum.sumStream 1000000 avgt 10 498.572 24.195 us/op
Ah, much nicer. Disabling tiered compilation using -XX:-TieredCompilation also had the effect of avoiding the pathological behavior. I also found that making the loop computation even a bit more expensive, e.g. summing squares of integers -- that is, adding a single multiply -- also avoids the pathological behavior.
Now, your question is about running in the context of the leetcode environment, which seems to run the code in a JVM that you don't have any control over, so you can't change the inlining or compilation options. And you probably don't want to make your computation more complex to avoid the pathology either. So for this case, you might as well just stick to the good old for-loop. But don't be afraid to use streams, even for dealing with primitive arrays. It can perform quite well, aside from some narrow edge cases.
The normal iteration approach is going to be pretty much as fast as anything can be, but streams have a variety of overheads: even though it's coming directly from a stream, there's probably going to be a primitive Spliterator involved and lots of other objects being generated.
In general, you should expect the "normal approach" to usually be faster than streams unless you're both using parallelization and your data is very large.
My benchmark (see code below) shows that streaming approach is about 10-15% slower than iterative. Interestingly enough, parallel stream results vary greatly on my 4 core (i7) macbook pro, but, while I have seen a them a few times being about 30% faster than iterative, the most common result is almost three times slower than sequential.
Here is the benchmark code:
import java.util.*;
import java.util.function.*;
public class StreamingBenchmark {
private static void benchmark(String name, LongSupplier f) {
long start = System.currentTimeMillis(), sum = 0;
for(int count = 0; count < 1000; count ++) sum += f.getAsLong();
System.out.println(String.format(
"%10s in %d millis. Sum = %d",
name, System.currentTimeMillis() - start, sum
));
}
public static void main(String argv[]) {
int data[] = new int[1000000];
Random randy = new Random();
for(int i = 0; i < data.length; i++) data[i] = randy.nextInt();
benchmark("iterative", () -> { int s = 0; for(int n: data) s+=n; return s; });
benchmark("stream", () -> Arrays.stream(data).sum());
benchmark("parallel", () -> Arrays.stream(data).parallel().sum());
}
}
Here is the output from a few runs:
iterative in 350 millis. Sum = 564821058000
stream in 394 millis. Sum = 564821058000
parallel in 883 millis. Sum = 564821058000
iterative in 340 millis. Sum = -295411382000
stream in 376 millis. Sum = -295411382000
parallel in 1031 millis. Sum = -295411382000
iterative in 365 millis. Sum = 1205763898000
stream in 379 millis. Sum = 1205763898000
parallel in 1053 millis. Sum = 1205763898000
etc.
This got me curious, and I also tried running equivalent logic in scala:
object Scarr {
def main(argv: Array[String]) = {
val randy = new java.util.Random
val data = (1 to 1000000).map { _ => randy.nextInt }.toArray
val start = System.currentTimeMillis
var sum = 0l;
for ( _ <- 1 to 1000 ) sum += data.sum
println(sum + " in " + (System.currentTimeMillis - start) + " millis.")
}
}
This took 14 seconds! About 40 times(!) longer than streaming in java. Ouch!
The sum() method is syntactically equivalent to return reduce(0, Integer::sum); In a large list, there will be more overhead in making all the method calls than the basic by-hand for-loop iteration. The byte code for the for(int i : numbers) iteration is only very slightly more complicated than that generated by the by-hand for-loop. The stream operation is possibly faster in parallel-friendly environments (though maybe not for primitive methods), but unless we know that the environment is parallel-friendly (and it may not be since leetcode itself is probably designed to favor low-level over abstract since it's measuring efficiency rather than legibility).
The sum operation done in any of the three ways (Arrays.stream(int[]).sum, for (int i : ints){total+=i;}, and for(int i = 0; i < ints.length; i++){total+=i;} should be relatively similar in efficiency. I used the following test class (which sums a hundred million integers between 0 and 4096 a hundred times each and records the average times). All of them returned in very similar timeframes. It even attempts to limit parallel processing by occupying all but one of the available cores in while(true) loops, but I still found no particular difference:
public class SumTester {
private static final int ARRAY_SIZE = 100_000_000;
private static final int ITERATION_LIMIT = 100;
private static final int INT_VALUE_LIMIT = 4096;
public static void main(String[] args) {
Random random = new Random();
int[] numbers = new int[ARRAY_SIZE];
IntStream.range(0, ARRAY_SIZE).forEach(i->numbers[i] = random.nextInt(INT_VALUE_LIMIT));
Map<String, ToLongFunction<int[]>> inputs = new HashMap<String, ToLongFunction<int[]>>();
NanoTimer initializer = NanoTimer.start();
System.out.println("initialized NanoTimer in " + initializer.microEnd() + " microseconds");
inputs.put("sumByStream", SumTester::sumByStream);
inputs.put("sumByIteration", SumTester::sumByIteration);
inputs.put("sumByForLoop", SumTester::sumByForLoop);
System.out.println("Parallelables: ");
averageTimeFor(ITERATION_LIMIT, inputs, Arrays.copyOf(numbers, numbers.length));
int cores = Runtime.getRuntime().availableProcessors();
List<CancelableThreadEater> threadEaters = new ArrayList<CancelableThreadEater>();
if (cores > 1){
threadEaters = occupyThreads(cores - 1);
}
// Only one core should be left to our class
System.out.println("\nSingleCore (" + threadEaters.size() + " of " + cores + " cores occupied)");
averageTimeFor(ITERATION_LIMIT, inputs, Arrays.copyOf(numbers, numbers.length));
for (CancelableThreadEater cte : threadEaters){
cte.end();
}
System.out.println("Complete!");
}
public static long sumByStream(int[] numbers){
return Arrays.stream(numbers).sum();
}
public static long sumByIteration(int[] numbers){
int total = 0;
for (int i : numbers){
total += i;
}
return total;
}
public static long sumByForLoop(int[] numbers){
int total = 0;
for (int i = 0; i < numbers.length; i++){
total += numbers[i];
}
return total;
}
public static void averageTimeFor(int iterations, Map<String, ToLongFunction<int[]>> testMap, int[] numbers){
Map<String, Long> durationMap = new HashMap<String, Long>();
Map<String, Long> sumMap = new HashMap<String, Long>();
for (String methodName : testMap.keySet()){
durationMap.put(methodName, 0L);
sumMap.put(methodName, 0L);
}
for (int i = 0; i < iterations; i++){
for (String methodName : testMap.keySet()){
int[] newNumbers = Arrays.copyOf(numbers, ARRAY_SIZE);
ToLongFunction<int[]> function = testMap.get(methodName);
NanoTimer nt = NanoTimer.start();
long sum = function.applyAsLong(newNumbers);
long duration = nt.microEnd();
sumMap.put(methodName, sum);
durationMap.put(methodName, durationMap.get(methodName) + duration);
}
}
for (String methodName : testMap.keySet()){
long duration = durationMap.get(methodName) / iterations;
long sum = sumMap.get(methodName);
System.out.println(methodName + ": result '" + sum + "', elapsed time: " + duration + " milliseconds on average over " + iterations + " iterations");
}
}
private static List<CancelableThreadEater> occupyThreads(int numThreads){
List<CancelableThreadEater> result = new ArrayList<CancelableThreadEater>();
for (int i = 0; i < numThreads; i++){
CancelableThreadEater cte = new CancelableThreadEater();
result.add(cte);
new Thread(cte).start();
}
return result;
}
private static class CancelableThreadEater implements Runnable {
private Boolean stop = false;
public void run(){
boolean canContinue = true;
while (canContinue){
synchronized(stop){
if (stop){
canContinue = false;
}
}
}
}
public void end(){
synchronized(stop){
stop = true;
}
}
}
}
which returned
initialized NanoTimer in 22 microseconds
Parallelables:
sumByIteration: result '-1413860413', elapsed time: 35844 milliseconds on average over 100 iterations
sumByStream: result '-1413860413', elapsed time: 35414 milliseconds on average over 100 iterations
sumByForLoop: result '-1413860413', elapsed time: 35218 milliseconds on average over 100 iterations
SingleCore (3 of 4 cores occupied)
sumByIteration: result '-1413860413', elapsed time: 37010 milliseconds on average over 100 iterations
sumByStream: result '-1413860413', elapsed time: 38375 milliseconds on average over 100 iterations
sumByForLoop: result '-1413860413', elapsed time: 37990 milliseconds on average over 100 iterations
Complete!
That said, there's no real reason to do the sum() operation in this case. You are iterating through each array, for a total of three iterations and the last one may be a longer-than-normal iteration. It's possible to calculate correctly with one full simultaneous iteration of the arrays and one short-circuiting iteration. It may be possible to do it even more efficiently, but I couldn't figure out any better way to do it than I did. My solution ended up being one of the fastest java solutions on the chart - it ran in 223ms, which was in amongst the middle pack of python solutions.
I'll add my solution to the problem if you care to see it, but I hope the actual question is answered here.
Stream function is relatively slow. So during leetcode contest or any algorithms contest, always prefer classic loops over stream functions as large inputs are prone to TLE. This can in turn cause penality, which would affect your final ranking.
A detailed explanation is mentioned here https://stackoverflow.com/a/27994074/6185191
I also came across this issue while doing this pretty basic LeetCode problem. The first code that I submitted used Java Stream API's Arrays.stream().sum() to compute the Array sum, which gave a time of 6ms.
While the classic for loop just took a time of 1ms to iterate through the same array. Now that's insane! The Stream API method, takes atleast 6x the time, than your simple for loop. So yeah! always go with the simpler and classic method.
I am kind of learning concepts of Random number generation & Multithreading in java.
The idea is to not generating a repeated random number of range 1000 in a particular millisecond (Considering, not more than 50 data, in a multithreaded way will be processed in a millisecond). So that list of generated random number at the specific time is unique. Can you give me any idea as i am ending up generating couple of repeated random numbers (also, there is a considerable probability) in a particular milli second.
I have tried the following things where i failed.
Random random = new Random(System.nanoTime());
double randomNum = random.nextInt(999);
//
int min=1; int max=999;
double randomId = (int)Math.abs(math.Random()* (max - min + 1) + min);
//
Random random = new Random(System.nanoTime()); // also tried new Random();
double randomId = (int)Math.abs(random.nextDouble()* (max - min + 1) + min);
As I am appending the timestamp that is being generated, in a multithreaded environment i see the same ids (around 8-10) that is being generated (2-4 times) for 5000+ unique data.
First, you should use new Random(), since it looks like this (details depend on Java version):
public Random() { this(++seedUniquifier + System.nanoTime()); }
private static volatile long seedUniquifier = 8682522807148012L;
I.e. it already makes use of nanoTime() and makes sure different threads with the same nanoTime() result get different seeds, which new Random(System.nanoTime()) doesn't.
(EDIT: Pyranja pointed out this is a bug in Java 6, but it's fixed in Java 7:
public Random() {
this(seedUniquifier() ^ System.nanoTime());
}
private static long seedUniquifier() {
// L'Ecuyer, "Tables of Linear Congruential Generators of
// Different Sizes and Good Lattice Structure", 1999
for (;;) {
long current = seedUniquifier.get();
long next = current * 181783497276652981L;
if (seedUniquifier.compareAndSet(current, next))
return next;
}
}
private static final AtomicLong seedUniquifier
= new AtomicLong(8682522807148012L);
)
Second, if you generate 50 random numbers from 1 to 1000, the probability some numbers will be the same is quite high thanks to the birthday paradox.
Third, if you just want unique ids, you could just use AtomicInteger counter instead of random numbers. Or if you want a random part to start with, append a counter as well to guarantee uniqueness.
This class will allow you to get nonrepeating values from a certain range until the whole range has been used. Once the range is used, it will be reinitialized.
Class comes along with a simple test.
If you want to make the class thread safe, just add synchronized to nextInt() declaration.
Then you can use the singleton pattern or just a static variable to access the generator from multiple threads. That way all your threads will use the same object and the same unique id pool.
public class NotRepeatingRandom {
int size;
int index;
List<Integer> vals;
Random gen = new Random();
public NotRepeatingRandom(int rangeMax) {
size = rangeMax;
index = rangeMax; // to force initial shuffle
vals = new ArrayList<Integer>(size);
fillBaseList();
}
private void fillBaseList() {
for (int a=0; a<size; a++) {
vals.add(a);
}
}
public int nextInt() {
if (index == vals.size()) {
Collections.shuffle(vals);
index = 0;
}
int val = vals.get(index);
index++;
return val;
}
public static void main(String[] args) {
NotRepeatingRandom gen = new NotRepeatingRandom(10);
for (int a=0; a<30; a++) {
System.out.println(gen.nextInt());
}
}
}
If I understand your question correctly, multiple threads are creating their own instances of Random class at the same time and all threads generate the same random number?
Same number is generated, because all random instances where created at the same time, i.e. with the same seed.
To fix this, create only one instance of Random class, which is shared by all threads so that all your threads call nextDouble() on the same instance. Random.nextDouble() class is thread safe and will implicitly update its seed with every call.
//create only one Random instance, seed is based on current time
public static final Random generator= new Random();
Now all threads should use the same instance:
double random=generator.nextDouble()
I'm currently doing a test for the Binary searches average case. Simply all I do is I generate a random variable and then search for this random variable in different sized arrays using the binary search. Below is my code used:
public static void main(String[] args)
{
//This array keeps track of the times of the linear search
long[] ArrayTimeTaken = new long[18];
//Values of the array lengths that we test for
int[] ArrayAssignValues = new int[18];
ArrayAssignValues[0] = 1000000;
ArrayAssignValues[1] = 10000000;
ArrayAssignValues[2] = 20000000;
ArrayAssignValues[3] = 30000000;
ArrayAssignValues[4] = 40000000;
ArrayAssignValues[5] = 50000000;
ArrayAssignValues[6] = 60000000;
ArrayAssignValues[7] = 70000000;
ArrayAssignValues[8] = 80000000;
ArrayAssignValues[9] = 90000000;
ArrayAssignValues[10] = 100000000;
ArrayAssignValues[11] = 110000000;
ArrayAssignValues[12] = 120000000;
ArrayAssignValues[13] = 130000000;
ArrayAssignValues[14] = 140000000;
ArrayAssignValues[15] = 150000000;
ArrayAssignValues[16] = 160000000;
ArrayAssignValues[17] = 170000000;
//Code that runs the linear search
for (int i = 0; i < ArrayAssignValues.length; i++)
{
float[] arrayExperimentTest = new float[ ArrayAssignValues[i]];
//We fill the array with ascending numbers
for (int j = 0; j < arrayExperimentTest.length; j++)
{
arrayExperimentTest[j] = j;
}
Random Generator = new Random();
int ValuetoSearchfor = (int) Generator.nextInt(ArrayAssignValues[i]);
System.out.println(ValuetoSearchfor);
ValuetoSearchfor = (int) arrayExperimentTest[ValuetoSearchfor];
//Here we perform a the Linear Search
ArrayTimeTaken[i] = BinarySearch(arrayExperimentTest,ValuetoSearchfor);
}
ChartCreate(ArrayTimeTaken);
System.out.println("Done");
}
Here is my code for the binary search:
static long BinarySearch(float[] ArraySearch,int ValueFind)
{
System.gc();
long startTime = System.nanoTime();
int low = 0;
int high = ArraySearch.length-1;
int mid = Math.round((low+high)/2);
while (ArraySearch[mid] != ValueFind )
{
if (ValueFind <ArraySearch[mid])
{
high = mid-1;
}
else
{
low = mid+1;
}
mid = (low+high)/2;
}
long TimeTaken = System.nanoTime() - startTime;
return TimeTaken;
}
Now the problem is that the results aren't making sense. Below is a graph:
Can some explain how the 1st array takes so much time? I've run the code a good few times and its basically the same graph created every time. Does Java some how cache results? Can anyone explain the result why the 1st binary search takes so long relativve to the others even though the array size is tiny compared to the rest?
It looks like you're doing these searches one after another, starting with the lowest values. If that's true, then the code will be running much slower, because the JIT compiler won't have had a chance to warm up yet. Generally, for benchmarking like this, you want to run through all relevant code to give the JIT compiler time to compile it and optimize before you do the real testing.
For more information on the JIT compiler, read this
You should also see this question to learn more about benchmarking.
Another possible cause of the slowness is that the JVM could still be in the process of starting up, and running it' own background code while you're timing it, causing slowdown.
Benchmarking is not done this way, you should run at least 1000 cycles as a "warmup" and only then start measuring. Benchmarking could be more complicated than it seems, it should be carefully constructed to not be affected by other programs that run in the memory at the same time etc. Here and here you can find some good tips.
It is said that pipeline is a better way when many set/get is required in redis, so this is my test code:
public class TestPipeline {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
list.add(si);
ShardedJedis jedis = new ShardedJedis(list);
long startTime = System.currentTimeMillis();
ShardedJedisPipeline pipeline = jedis.pipelined();
for (int i = 0; i < 100000; i++) {
Map<String, String> map = new HashMap<String, String>();
map.put("id", "" + i);
map.put("name", "lyj" + i);
pipeline.hmset("m" + i, map);
}
pipeline.sync();
long endTime = System.currentTimeMillis();
System.out.println(endTime - startTime);
}
}
When I ran it, there is no response with this program for a while, but when I don't work with pipe, it takes only 20073 ms, so I am confused why it is even better without pipeline and how a wide gap!
Thanks for answer me, a few questions, how do you calculate 6MB data?
When I send 10K data, pipeline is always faster than normal mode, but with 100k, pipeline would no response.I think 100-1000 operations is a advisable choice as below said.Is there anyting with JIT since I don't understand it?
There are a few points you need to consider before writing such a benchmark (and especially a benchmark using the JVM):
on most (physical) machines, Redis is able to process more than 100K ops/s when pipelining is used. Your benchmark only deals with 100K item, so it does not last long enough to produce meaningful results. Furthermore, there is no time for the successive stages of the JIT to kick in.
the absolute time is not a very relevant metric. Displaying the throughput (i.e. the number of operation per second) while keeping the benchmark running for at least 10 seconds would be a better and more stable metric.
your inner loop generates a lot of garbage. If you plan to benchmark Jedis+Redis, then you need to keep the overhead of your own program low.
because you have defined everything into the main function, your loop will not be compiled by the JIT (depending on the JVM you use). Only the inner method calls may be. If you want the JIT to be efficient, make sure to encapsulate your code into methods that can be compiled by the JIT.
optionally, you may want to add a warm-up phase before performing the actual measurement to avoid accounting the overhead of running the first iterations with the bare-bone interpreter, and the cost of the JIT itself.
Now, regarding Redis pipelining, your pipeline is way too long. 100K commands in the pipeline means Jedis has to build a 6MB buffer before sending anything to Redis. It means the socket buffers (on client side, and perhaps server-side) will be saturated, and that Redis will have to deal with 6 MB communication buffers as well.
Furthermore, your benchmark is still synchronous (using a pipeline does not magically make it asynchronous). In other words, Jedis will not start reading replies until the last query of your pipeline has been sent to Redis. When the pipeline is too long, it has the potential to block things.
Consider limiting the size of the pipeline to 100-1000 operations. Of course, it will generate more roundtrips, but the pressure on the communication stack will be reduced to an acceptable level. For instance, consider the following program:
import redis.clients.jedis.*;
import java.util.*;
public class TestPipeline {
/**
* #param args
*/
int i = 0;
Map<String, String> map = new HashMap<String, String>();
ShardedJedis jedis;
// Number of iterations
// Use 1000 to test with the pipeline, 100 otherwise
static final int N = 1000;
public TestPipeline() {
JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
list.add(si);
jedis = new ShardedJedis(list);
}
public void push( int n ) {
ShardedJedisPipeline pipeline = jedis.pipelined();
for ( int k = 0; k < n; k++) {
map.put("id", "" + i);
map.put("name", "lyj" + i);
pipeline.hmset("m" + i, map);
++i;
}
pipeline.sync();
}
public void push2( int n ) {
for ( int k = 0; k < n; k++) {
map.put("id", "" + i);
map.put("name", "lyj" + i);
jedis.hmset("m" + i, map);
++i;
}
}
public static void main(String[] args) {
TestPipeline obj = new TestPipeline();
long startTime = System.currentTimeMillis();
for ( int j=0; j<N; j++ ) {
// Use push2 instead to test without pipeline
obj.push(1000);
// Uncomment to see the acceleration
//System.out.println(obj.i);
}
long endTime = System.currentTimeMillis();
double d = 1000.0 * obj.i;
d /= (double)(endTime - startTime);
System.out.println("Throughput: "+d);
}
}
With this program, you can test with or without pipelining. Be sure to increase the number of iterations (N parameter) when pipelining is used, so that it runs for at least 10 seconds. If you uncomment the println in the loop, you will realize that the program is slow at the begining and will get quicker as the JIT starts to optimize things (that's why the program should run at least several seconds to give a meaningful result).
On my hardware (an old Athlon box), I can get 8-9 times more throughput when the pipeline is used. The program could be further improved by optimizing key/value formatting in the inner loop and adding a warm-up phase.