Suppose I am given integers a and b such that a range of interest is formed, of integers in [a,b]. The range can span well over 10^9 integers. I want to sum the values of a given function f : N -> N over all integers a <= n <= b. The range is very large so I want to do this using multithreading.
Less formaly, I want to parallelize the following code:
long sum = 0;
for (long n = a ; n <= b ; n++)
sum += f(n);
System.out.println(sum);
Ideally (at least in my mind), the range will divided equally across the available number of threads available by the processor (suppose f(n) has near-identical complexity and running time for each n in range). The values are completely independent, and f could really be any function. For example, it could output the sum of digits of the number, but it really could be anything, it's just an example.
Is there a general way to do exactly that in Java using multithreading?
This particular use-case fits very well for a parallel stream. See tutorial by Oracle. Use java.util.stream.LongStream class for a stream of 64-bit long integers.
You can implement it like this:
long sum = LongStream.rangeClosed(a, b)
.parallel()
.map(n -> f(n))
.sum();
System.out.println(sum);
You probably want to look into the fork/join framework; it's a generalized approach to this principle of splitting up a task into a great many little independent tasks and then combining them back together, and gives you all control you'd want about how to spin off threads.
Alternatively, you can use the parallel() method of the stream API, but note that this method doesn't explain or guarantee much about how it works. You have no control over which aspect is parallelized, and no control over the amount of threads that'll be involved. For trivial cases you might as well use it, but as a rule, if 'uhoh I better write this in parallel or it would be too slow' is relevant, then you need some guarantees and some control. Here's oracle's tutorial/explanation on parallel streams. For this specific case it does seem like it would more or less do what you want (it gets tricky if e.g. the stream you attempt to apply this to is e.g. what Files.lines gives you - where the parallelism gets potentially stymied by where it is applied vs. where the bottleneck is).
RecursiveTask is suitable for such a problem. Threads will be managed by the ForkJoinPool.
The general idea is to decompose the problem into shorter sub-problems, up to the point a single thread is able to manage the given sub-problem by itself.
class RecursiveSum extends RecursiveTask<Long> {
private final int THLD = 1_000;
private int low, high;
public RecursiveSum(int high) {
this(0,high);
}
public RecursiveSum(int low, int high) {
this.low = low; this.high = high;
}
private long sum(int l,int h) {
long sum = 0;
for (int i=l; i<h; i++) sum += i;
return sum;
}
#Override
protected Long compute() {
int len = high-low;
if (len<=THLD) return sum(low,high);
RecursiveSum rhalf = new RecursiveSum(low+len/2,high);
rhalf.fork();
RecursiveSum lhalf = new RecursiveSum(low,low+len/2);
high = low+len/2;
return lhalf.compute()+rhalf.join();
}
}
and use it like this:
long r = new RecursiveSum(1_000_000_000).invoke();
System.out.println("Sum="+r);
create your sum class (edit 'f' function to your function):
class sumFun implements Callable<Long> {
public long f(long num){
return num;
}
private long a;
private long b;
public sumFun(long a, long b) {
this.a = a;
this.b = b;
}
#Override
public Long call() {
long sum = 0;
for (long i = a; i <= b; i++) {
sum += f(i);
}
return sum;
}
}
in main :
public static void main(String[] args) throws InterruptedException, ExecutionException {
long start = 1;
long end = 10000000;
int numOfThreads = 5;
long step = (end - start + 1) / numOfThreads;
List<sumFun> tasks = new ArrayList<>();
for (long i = start; i <= end; i += step) {
long tmp = end;
if(i+step-1 < end){
tmp = i+step-1;
}
tasks.add(new sumFun(i, tmp));
}
ExecutorService executor = Executors.newFixedThreadPool(numOfThreads);
List<Future<Long>> results = executor.invokeAll(tasks);
long sum = 0;
for (Future<Long> result : results) {
sum += result.get();
}
System.out.println("sum = " + sum);
executor.shutdown();
}
Related
I'm trying to calculate e for my assignment using threads for the least amount of time possible, given that the user will pass a variable with the amount of threads that will be used for the calculation, but I can't get my multithreading working properly to achieve some kind of result. I was told that a good method for calculation will be using the following mathematical expression: e = sum( (3-4k^2)/((2k+1)!) ) , where k = (0;infinity). But so far I got only this basic method:
public class MainClass {
public static long fact(int x) {
long p = 1;
for (int i = 1; i <= x; i++)
p = p * i;
return p;
}
public static void main(String[] args) {
double e = 1;
for (int i = 1; i < 50; i++)
e = e + 1 / (double) (fact(i));
System.out.print("e = " + e);
}
}
You can use the Java 8 parrallel Streams it's easier and less error-prone than explcitly creating Threads.
import org.apache.commons.math.util.MathUtils;
...
public static double computeE(){
return IntStream.iterate(0,k->k+1)
.limit(100000)
.mapToDouble(k->(3-4*k*k)/MathUtils.factorialDouble(2*k+1))
.parallel()
.sum();
}
On my machine, it uses the two cores and find e=2.718281828459045 for 10000 iterations which is a value where every digits are correct.
I was given feedback that I need to improve my refactoring/eleminating code smells skills.
I need short exercises to detect and how-to improve the most comon code smells
with answers in java
example:
public class Calculator {
public long sum(int min, int max) {
long result = 0;
for (int i = min ; i <= max ; i++)
result += i;
return result;
}
public long sumOfSquares(int min, int max) {
long result = 0;
for (int i = min ; i <= max ; i++)
result += i * i;
return result;
}
}
and then the best/most convinent solution.
BTW, you could right away show me best solution for this repetition over there /\ maybe with use of operator lambda "->"
Thanks!
You could try merging both methods in one. Since both of them look like
public long sum(long min, long max) {
long result = 0;
for (int i = min ; i <= max ; i++)
result += someOperation(i);
return result;
}
you could allow user to provide some operation which will compute i, so it can be i or i+2 or i*i.
Such strategy can be implementation of LongUnaryOperator interface where user will need to implement long applyAsLong(long operand) method.
So instead of two methods you can have one which can look like
public static long sum(long min, long max, LongUnaryOperator mapper) {
long result = 0;
for (long i = min ; i <= max ; i++)
result += mapper.applyAsLong(i);
return result;
}
and you could use it like
sum(1, 4, i -> i * i)//sum of squares
i->i*i is lambda expression which implements functional interface LongUnaryOperator and provides implementation of its abstract applyAsLong method which will be used in our code. In other words it will map i to i*i.
more usage examples:
sum(1, 4, i -> 2 * i)//sum of doubled values
sum(1, 4, i -> i)//sum of original values
//i->i can be also returned with `LongUnaryOperator.identity()`
//so you can rewrite your code into something like
sum(1, 4, LongUnaryOperator.identity())
You can also rewrite your code using streams like
public static long sum(long min, long max, LongUnaryOperator mapper) {
return LongStream.rangeClosed(min, max).map(mapper).sum();
}
I am busy on a parallel programming assignment, and I am really stuck. To be honest I am not entirely sure how each method works, but I think I have an idea.
I need to sum an array of consecutive values (in parallel). Seems easy enough, but I get 0 as an answer every time I try. I really don't know why.
class SumThreaded extends RecursiveTask<Integer> {
static int SEQUENTIAL_THRESHOLD = 10000;
double lo=0.0;
double hi=0.0;
long[] arr;
public SumThreaded(long[] array, double a, double b) {
arr=array;
lo=a;
hi=b;
}
public Integer compute() {
//System.out.println(mid);
if(hi - lo <= SEQUENTIAL_THRESHOLD) {
int ans = 0;
for(int i= (int) lo; i < hi; ++i)
ans += arr[i];
return ans;
}
else {
SumThreaded left = new SumThreaded(arr,lo,(hi+lo)/2.0);
SumThreaded right = new SumThreaded(arr,(hi+lo)/2.0,hi);
left.fork();
int rightAns = right.compute();
int leftAns = left.join();
return leftAns+rightAns;
}
}
public static void main(String args[]){
int size = 1000000;
long [] testArray=new long[size];
for(int i=0;i<size;i++){
testArray[i]=i+1;
}
SumThreaded t = new SumThreaded(testArray,0.0,testArray.length);
ForkJoinPool fjPool = new ForkJoinPool();
int result =fjPool.invoke(t);
System.out.println(result);
}
}
Any help would be greatly appreciated.
Your problem appears to be that you have two separate constructors for SumThreaded, only one of which sets the class's fields. When you're feeding in the long[] array from the new in sumArray, you throw the array away. You need to pick whether you're using ints or longs (and the sum of a big array is likely to need a long) and then make sure your values are getting set appropriately. Debugging and setting a breakpoint on compute would have shown you this.
So basically I needed to optimize this piece of code today. It tries to find the longest sequence produced by some function for the first million starting numbers:
public static void main(String[] args) {
int mostLen = 0;
int mostInt = 0;
long currTime = System.currentTimeMillis();
for(int j=2; j<=1000000; j++) {
long i = j;
int len = 0;
while((i=next(i)) != 1) {
len++;
}
if(len > mostLen) {
mostLen = len;
mostInt = j;
}
}
System.out.println(System.currentTimeMillis() - currTime);
System.out.println("Most len is " + mostLen + " for " + mostInt);
}
static long next(long i) {
if(i%2==0) {
return i/2;
} else {
return i*3+1;
}
}
My mistake was to try to introduce multithreading:
void doSearch() throws ExecutionException, InterruptedException {
final int numProc = Runtime.getRuntime().availableProcessors();
System.out.println("numProc = " + numProc);
ExecutorService executor = Executors.newFixedThreadPool(numProc);
long currTime = System.currentTimeMillis();
List<Future<ValueBean>> list = new ArrayList<Future<ValueBean>>();
for (int j = 2; j <= 1000000; j++) {
MyCallable<ValueBean> worker = new MyCallable<ValueBean>();
worker.setBean(new ValueBean(j, 0));
Future<ValueBean> f = executor.submit(worker);
list.add(f);
}
System.out.println(System.currentTimeMillis() - currTime);
int mostLen = 0;
int mostInt = 0;
for (Future<ValueBean> f : list) {
final int len = f.get().getLen();
if (len > mostLen) {
mostLen = len;
mostInt = f.get().getNum();
}
}
executor.shutdown();
System.out.println(System.currentTimeMillis() - currTime);
System.out.println("Most len is " + mostLen + " for " + mostInt);
}
public class MyCallable<T> implements Callable<ValueBean> {
public ValueBean bean;
public void setBean(ValueBean bean) {
this.bean = bean;
}
public ValueBean call() throws Exception {
long i = bean.getNum();
int len = 0;
while ((i = next(i)) != 1) {
len++;
}
return new ValueBean(bean.getNum(), len);
}
}
public class ValueBean {
int num;
int len;
public ValueBean(int num, int len) {
this.num = num;
this.len = len;
}
public int getNum() {
return num;
}
public int getLen() {
return len;
}
}
long next(long i) {
if (i % 2 == 0) {
return i / 2;
} else {
return i * 3 + 1;
}
}
Unfortunately, the multithreaded version worked 5 times slower than the single-threaded on 4 processors (cores).
Then I tried a bit more crude approach:
static int mostLen = 0;
static int mostInt = 0;
synchronized static void updateIfMore(int len, int intgr) {
if (len > mostLen) {
mostLen = len;
mostInt = intgr;
}
}
public static void main(String[] args) throws InterruptedException {
long currTime = System.currentTimeMillis();
final int numProc = Runtime.getRuntime().availableProcessors();
System.out.println("numProc = " + numProc);
ExecutorService executor = Executors.newFixedThreadPool(numProc);
for (int i = 2; i <= 1000000; i++) {
final int j = i;
executor.execute(new Runnable() {
public void run() {
long l = j;
int len = 0;
while ((l = next(l)) != 1) {
len++;
}
updateIfMore(len, j);
}
});
}
executor.shutdown();
executor.awaitTermination(30, TimeUnit.SECONDS);
System.out.println(System.currentTimeMillis() - currTime);
System.out.println("Most len is " + mostLen + " for " + mostInt);
}
static long next(long i) {
if (i % 2 == 0) {
return i / 2;
} else {
return i * 3 + 1;
}
}
and it worked much faster, but still it was slower than the single thread approach.
I hope it's not because I screwed up the way I'm doing multithreading, but rather this particular calculation/algorithm is not a good fit for parallel computation. If I change calculation to make it more processor intensive by replacing method next with:
long next(long i) {
Random r = new Random();
for(int j=0; j<10; j++) {
r.nextLong();
}
if (i % 2 == 0) {
return i / 2;
} else {
return i * 3 + 1;
}
}
both multithreaded versions start to execute more than twice as fast than the singlethreaded version on a 4 core machine.
So clearly there must be some threshold that you can use to determine if it is worth to introduce multithreading and my question is:
What is the basic rule that would help decide if a given calculation is intensive enough to be optimized by running it in parallel (without spending effort to actually implement it?)
The key to efficiently implementing multithreading is to make sure the cost is not too high. There are no fixed rules as they heavily depend on your hardware.
Starting and stopping threads has a high cost. Of course you already used the executor service which reduces these costs considerably because it uses a bunch of worker threads to execute your Runnables. However each Runnable still comes with some overhead. Reducing the number of runnables and increasing the amount of work each one has to do will improve performance, but you still want to have enough runnables for the executor service to efficiently distribute them over the worker threads.
You have choosen to create one runnable for each starting value so you end up creating 1000000 runnables. You would probably be getting much better results of you let each Runnable do a batch of say 1000 start values. Which means you only need 1000 runnables greatly reducing the overhead.
I think there is another component to this which you are not considering. Parallelization works best when the units of work have no dependence on each other. Running a calculation in parallel is sub-optimal when later calculation results depend on earlier calculation results. The dependence could be strong in the sense of "I need the first value to compute the second value". In that case, the task is completely serial and later values cannot be computed without waiting for earlier computations. There could also be a weaker dependence in the sense of "If I had the first value I could compute the second value faster". In that case, the cost of parallelization is that some work may be duplicated.
This problem lends itself to being optimized without multithreading because some of the later values can be computed faster if you have the previous results already in hand. Take, for example j == 4. Once through the inner loop produces i == 2, but you just computed the result for j == 2 two iterations ago, if you saved the value of len you can compute it as len(4) = 1 + len(2).
Using an array to store previously computed values of len and a little bit twiddling in the next method, you can complete the task >50x faster.
"Will the performance gain be greater than the cost of context switching and thread creation?"
That is a very OS, language, and hardware, dependent cost; this question has some discussion about the cost in Java, but has some numbers and some pointers to how to calculate the cost.
You also want to have one thread per CPU, or less, for CPU intensive work. Thanks to David Harkness for the pointer to a thread on how to work out that number.
Estimate amount of work which a thread can do without interaction with other threads (directly or via common data). If that piece of work can be completed in 1 microsecond or less, overhead is too much and multithreading is of no use. If it is 1 millisecond or more, multithreading should work well. If it is in between, experimental testing required.
Here's the situation:
The application world consists of hundreds of thousands of states.
Given a state, I can work out a set of 3 or 4 other reachable states. A simple recursion can build a tree of states that gets very large very fast.
I need to perform a DFS to a specific depth in this tree from the root state, to search for the subtree which contains the 'minimal' state (calculating the value of the node is irrelevant to the question).
Using a single thread to perform the DFS works, but is very slow. Covering 15 levels down can take a few good minutes, and I need to improve this atrocious performance. Trying to assign a thread to each subtree created too many threads and caused an OutOfMemoryError. Using a ThreadPoolExecutor wasn't much better.
My question: What's the most efficient way to traverse this large tree?
I don't believe navigating the tree is your problem as your tree has about 36 million nodes. Instead is it more likely what you are doing with each node is expensive.
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicLong;
public class Main {
public static final int TOP_LEVELS = 2;
enum BuySell {}
private static final AtomicLong called = new AtomicLong();
public static void main(String... args) throws InterruptedException {
int maxLevels = 15;
long start = System.nanoTime();
method(maxLevels);
long time = System.nanoTime() - start;
System.out.printf("Took %.3f second to navigate %,d levels called %,d times%n", time / 1e9, maxLevels, called.longValue());
}
public static void method(int maxLevels) throws InterruptedException {
ExecutorService service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
try {
int result = method(service, 0, maxLevels - 1, new int[maxLevels]).call();
} catch (Exception e) {
e.printStackTrace();
}
service.shutdown();
service.awaitTermination(10, TimeUnit.MINUTES);
}
// single threaded process the highest levels of the tree.
private static Callable<Integer> method(final ExecutorService service, final int level, final int maxLevel, final int[] options) {
int choices = level % 2 == 0 ? 3 : 4;
final List<Callable<Integer>> callables = new ArrayList<Callable<Integer>>(choices);
for (int i = 0; i < choices; i++) {
options[level] = i;
Callable<Integer> callable = level < TOP_LEVELS ?
method(service, level + 1, maxLevel, options) :
method1(service, level + 1, maxLevel, options);
callables.add(callable);
}
return new Callable<Integer>() {
#Override
public Integer call() throws Exception {
Integer min = Integer.MAX_VALUE;
for (Callable<Integer> result : callables) {
Integer num = result.call();
if (min > num)
min = num;
}
return min;
}
};
}
// at this level, process the branches in separate threads.
private static Callable<Integer> method1(final ExecutorService service, final int level, final int maxLevel, final int[] options) {
int choices = level % 2 == 0 ? 3 : 4;
final List<Future<Integer>> futures = new ArrayList<Future<Integer>>(choices);
for (int i = 0; i < choices; i++) {
options[level] = i;
final int[] optionsCopy = options.clone();
Future<Integer> future = service.submit(new Callable<Integer>() {
#Override
public Integer call() {
return method2(level + 1, maxLevel, optionsCopy);
}
});
futures.add(future);
}
return new Callable<Integer>() {
#Override
public Integer call() throws Exception {
Integer min = Integer.MAX_VALUE;
for (Future<Integer> result : futures) {
Integer num = result.get();
if (min > num)
min = num;
}
return min;
}
};
}
// at these levels each task processes in its own thread.
private static int method2(int level, int maxLevel, int[] options) {
if (level == maxLevel) {
return process(options);
}
int choices = level % 2 == 0 ? 3 : 4;
int min = Integer.MAX_VALUE;
for (int i = 0; i < choices; i++) {
options[level] = i;
int n = method2(level + 1, maxLevel, options);
if (min > n)
min = n;
}
return min;
}
private static int process(final int[] options) {
int min = options[0];
for (int i : options)
if (min > i)
min = i;
called.incrementAndGet();
return min;
}
}
prints
Took 1.273 second to navigate 15 levels called 35,831,808 times
I suggest you limit the number of threads and only use separate threads for the highest levels of the tree. How many cores do you have? Once you have enough threads to keep every core busy, you don't need to create more threads as this just adds overhead.
Java has a built in Stack which thread safe, however I would just use ArrayList which is more efficient.
You will definitely have to use an iterative method. Simplest way is a stack based DFS with a pseudo code similar to this:
STACK.push(root)
while (STACK.nonempty)
current = STACK.pop
if (current.done) continue
// ... do something with node ...
current.done = true
FOREACH (neighbor n of current)
if (! n.done )
STACK.push(n)
The time complexity of this is O(n+m) where n (m) denotes the number of nodes (edges) in your graph. Since you have a tree this is O(n) and should work quickly for n>1.000.000 easily...