I have a rest call api where max count of result return by the api is 1000.start page=1
{
"status": "OK",
"payload": {
"EMPList":[],
count:5665
}
So to get other result I have to change the start page=2 and again hit the service.again will get 1000 results only.
but after first call i want to make it as a parallel call and I want to collect the result and combine it and send it back to calling service in java. Please suggest i am new to java.i tried using callable but it's not working
It seems to me that ideally you should be able to configure your max count to one appropriate for your use case. I'm assuming you aren't able to do that. Here is a simple, lock-less, multi threading scheme that acts as a simple reduction operation for your two network calls:
// online runnable: https://ideone.com/47KsoS
int resultSize = 5;
int[] result = new int[resultSize*2];
Thread pg1 = new Thread(){
public void run(){
System.out.println("Thread 1 Running...");
// write numbers 1-5 to indexes 0-4
for(int i = 0 ; i < resultSize; i ++) {
result[i] = i + 1;
}
System.out.println("Thread 1 Exiting...");
}
};
Thread pg2 = new Thread(){
public void run(){
System.out.println("Thread 2 Running");
// write numbers 5-10 to indexes 5-9
for(int i = 0 ; i < resultSize; i ++) {
result[i + resultSize] = i + 1 + resultSize;
}
System.out.println("Thread 2 Exiting...");
}
};
pg1.start();
pg2.start();
// ensure that pg1 execution finishes
pg1.join();
// ensure that pg2 execution finishes
pg2.join();
// print result of reduction operation
System.out.println(Arrays.toString(result));
There is a very important caveat with this implementation however. You will notice that both of the threads DO NOT overlap in their memory writes. This is very important as if you were to simply change our int[] result to ArrayList<Integer> this could lead to catastrophic failure in our reduction operation between the two threads called a Race Condition (I believe the standard ArrayList implementation in Java is not thread safe). Since we can guarantee how large our result will be I would highly suggest sticking to my usage of an array for this multi-threaded implementation as ArrayLists hide a lot of implementation logic from you that you likely won't understand until you take a basic data-structures course.
I have an input array of basic type int, I would like to process this array using multiple threads and store the results in an output array of same type and size. Is the following code correct in terms of memory visibility?
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ArraySynchronization2
{
final int width = 100;
final int height = 100;
final int[][] img = new int[width][height];
volatile int[][] avg = new int[width][height];
public static void main(String[] args) throws InterruptedException, ExecutionException
{
new ArraySynchronization2().doJob();;
}
private void doJob() throws InterruptedException, ExecutionException
{
final int threadNo = 8;
ExecutorService pool = Executors.newFixedThreadPool(threadNo);
final CountDownLatch countDownLatch = new CountDownLatch(width - 2);
for (int x = 1; x < width - 1; x++)
{
final int col = x;
pool.execute(new Runnable()
{
public void run()
{
for (int y = 0; y < height; y++)
{
avg[col][y] = (img[col - 1][y] + img[col][y] + img[col + 1][y]) / 3;
}
// how can I make the writes to the data in avg[][] visible to other threads? is this ok?
avg = avg;
countDownLatch.countDown();
};
});
}
try
{
// Does this make any memory visibility guarantees?
countDownLatch.await();
}
catch (InterruptedException e)
{
e.printStackTrace();
}
// can I read avg here, will the results be correct?
for (int x = 0; x < width; x++)
{
for (int y = 0; y < height; y++)
{
System.out.println(avg[x][y]);
}
}
pool.shutdown();
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
// now I know tasks are completed and results synchronized (after thread death), but what if I plan to reuse the pool?
}
}
I do not want to synchronize on CountDownLatch. I would like to know how to make the writes to the output array visible to other threads. Let imagine that I have an array (eg. image) that I would like to process, I could do this in multiple separate tasks that process chunks of the input array into the output array, there are no inter-dependencies between the writes to the output. After all computations complete, I would like to have all the results in the output array ready to read. How could I achieve such behaviour? I know that it is achievable by using submit and Future.get() instead of execute, I'd like to know how to properly implement such low-level mechanism? Please also refer to the questions raised in comments near the code.
Hm, just wondering if you actually need a latch. The array itself is a reserved block in memory, with every cell being a dedicated memory address. (btw. marking it volatile does only mark the reference to the array as volatile, not the cells of the array, see here). So you need to coordinate access to the cells only if multiple threads write-access the same cell.
Question is, are you actually doing this? Or the aim should be: avoid coordinating access if possible, because it comes at a cost.
In your algorithm, you operate on rows, so why not parallelize on rows, so that each thread only reads & calculates values of a row-segement of the entire array and ignore the other rows?
i.e.
thread-0 -> rows 0, 8, 15, ...
thread-1 -> rows 1, 9, 16, ...
...
basically this (haven't tested):
for (int n = 0; n < threadNo; n++) { //each n relates to a thread
pool.execute(new Runnable() {
public void run() {
for (int row = n; row < height; row += threadNo) { //proceed to the next row for the thread
for (int col = 1; col < width-1; col++) {
avg[col][row] = (img[col - 1][row] + img[col][row] + img[col + 1][row]) / 3;
}
}
};
});
}
So they can operate on the entire array without having to synchronize at all. An by putting the loop to print out the result after shutting down the pool will ensure all calculate-threads have finished, and the only thread that has to wait is the main thread.
An alternative to this approach is to create an avg-array of size 100/ThreadNo for each thread so that each thread write-operate on it's on array and you merge the arrays afterwards with System.arraycopy() into one array.
If you intend to reuse the pool, you should use submit instead of execute and call get() on the Futures you get from submit.
Set<Future> futures = new HashSet<>();
for(int n = 0; ...) {
futures.add(pool.submit(new Runnable() {...}));
}
for(Future f : futures) {
f.get(); //blocks until the task is completed
}
In case you want to read intermediate states of the array you can either read it directly, if inconsistent data on single cells is acceptable, or use AtomicIntegerArray, as Nicolas Filotto suggested.
-- EDIT --
After the edit for using the width for the latch instead of the original thread number and all the discussion I'd like to add a few words.
As #jameslarge pointed out, it's about how to establish a "happens-before" relationship, or how to guarantee, that operation A (i.e. a write) happens before operation B (i.e. a read). Therefore access between two threads needs to be coordinated. There are several options
volatile keyword - doesn't work on arrays as it marks only the reference and not the values as being volatile
synchronization - pessimistic locking (synchronized modifier or statement)
CAS - optimistic locking, used by quite a few concurrent implementations
However every syncpoint (pessimistic or optimistic) establishes a happens-before relationship. Which one you choose, depends on your requirement.
What you like to achieve is a coordination between the read operation of the main thread and the write operations of the worker threads. How do you implement, is up to you and your requirements. The CountDownLatch counting down the total number of jobs is one way (btw., the latch uses a state property which is a volatile int). A CyclicBarrier may also be a construct worth to consider, especially if you'd like to read consistent intermediate states. Or a future.get(), or...
All boils down to the worker thread having to signal they're done writingso the reader thread can start reading.
However be aware of using sleep instead of synchronization. Sleep does not establish a happens before relationship, and using sleep for synchronization is a typical concurrency bug pattern. I.e. in the worst case, sleep is executed before any of the work has been done.
What you need to use is rather an AtomicIntegerArray instead of a simple volatile int array. Indeed, it is meant to be used in your case to update array element atomically and visible by all threads.
I am trying out the executor service in Java, and wrote the following code to run Fibonacci (yes, the massively recursive version, just to stress out the executor service).
Surprisingly, it will run faster if I set the nThreads to 1. It might be related to the fact that the size of each "task" submitted to the executor service is really small. But still it must be the same number also if I set nThreads to 1.
To see if the access to the shared Atomic variables can cause this issue, I commented out the three lines with the comment "see text", and looked at the system monitor to see how long the execution takes. But the results are the same.
Any idea why this is happening?
BTW, I wanted to compare it with the similar implementation with Fork/Join. It turns out to be way slower than the F/J implementation.
public class MainSimpler {
static int N=35;
static AtomicInteger result = new AtomicInteger(0), pendingTasks = new AtomicInteger(1);
static ExecutorService executor;
public static void main(String[] args) {
int nThreads=2;
System.out.println("Number of threads = "+nThreads);
executor = Executors.newFixedThreadPool(nThreads);
Executable.inQueue = new AtomicInteger(nThreads);
long before = System.currentTimeMillis();
System.out.println("Fibonacci "+N+" is ... ");
executor.submit(new FibSimpler(N));
waitToFinish();
System.out.println(result.get());
long after = System.currentTimeMillis();
System.out.println("Duration: " + (after - before) + " milliseconds\n");
}
private static void waitToFinish() {
while (0 < pendingTasks.get()){
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
executor.shutdown();
}
}
class FibSimpler implements Runnable {
int N;
FibSimpler (int n) { N=n; }
#Override
public void run() {
compute();
MainSimpler.pendingTasks.decrementAndGet(); // see text
}
void compute() {
int n = N;
if (n <= 1) {
MainSimpler.result.addAndGet(n); // see text
return;
}
MainSimpler.executor.submit(new FibSimpler(n-1));
MainSimpler.pendingTasks.incrementAndGet(); // see text
N = n-2;
compute(); // similar to the F/J counterpart
}
}
Runtime (approximately):
1 thread : 11 seconds
2 threads: 19 seconds
4 threads: 19 seconds
Update:
I notice that even if I use one thread inside the executor service, the whole program will use all four cores of my machine (each core around 80% usage on average). This could explain why using more threads inside the executor service slows down the whole process, but now, why does this program use 4 cores if only one thread is active inside the executor service??
It might be related to the fact that the size of each "task" submitted
to the executor service is really small.
This is certainly the case and as a result you are mainly measuring the overhead of context switching. When n == 1, there is no context switching and thus the performance is better.
But still it must be the same number also if I set nThreads to 1.
I'm guessing you meant 'to higher than 1' here.
You are running into the problem of heavy lock contention. When you have multiple threads, the lock on the result is contended all the time. Threads have to wait for each other before they can update the result and that slows them down. When there is only a single thread, the JVM probably detects that and performs lock elision, meaning it doesn't actually perform any locking at all.
You may get better performance if you don't divide the problem into N tasks, but rather divide it into N/nThreads tasks, which can be handled simultaneously by the threads (assuming you choose nThreads to be at most the number of physical cores/threads available). Each thread then does its own work, calculating its own total and only adding that to a grand total when the thread is done. Even then, for fib(35) I expect the costs of thread management to outweigh the benefits. Perhaps try fib(1000).
I'm still in the process of wrapping my brain around how concurrency works in Java. I understand that (if you're subscribing to the OO Java 5 concurrency model) you implement a Task or Callable with a run() or call() method (respectively), and it behooves you to parallelize as much of that implemented method as possible.
But I'm still not understanding something inherent about concurrent programming in Java:
How is a Task's run() method assigned the right amount of concurrent work to be performed?
As a concrete example, what if I have an I/O-bound readMobyDick() method that reads the entire contents of Herman Melville's Moby Dick into memory from a file on the local system. And let's just say I want this readMobyDick() method to be concurrent and handled by 3 threads, where:
Thread #1 reads the first 1/3rd of the book into memory
Thread #2 reads the second 1/3rd of the book into memory
Thread #3 reads the last 1/3rd of the book into memory
Do I need to chunk Moby Dick up into three files and pass them each to their own task, or do I I just call readMobyDick() from inside the implemented run() method and (somehow) the Executor knows how to break the work up amongst the threads.
I am a very visual learner, so any code examples of the right way to approach this are greatly appreciated! Thanks!
You have probably chosen by accident the absolute worst example of parallel activities!
Reading in parallel from a single mechanical disk is actually slower than reading with a single thread, because you are in fact bouncing the mechanical head to different sections of the disk as each thread gets its turn to run. This is best left as a single threaded activity.
Let's take another example, which is similar to yours but can actually offer some benefit: assume I want to search for the occurrences of a certain word in a huge list of words (this list could even have come from a disk file, but like I said, read by a single thread). Assume I can use 3 threads like in your example, each searching on 1/3rd of the huge word list and keeping a local counter of how many times the searched word appeared.
In this case you'd want to partition the list in 3 parts, pass each part to a different object whose type implements Runnable and have the search implemented in the run method.
The runtime itself has no idea how to do the partitioning or anything like that, you have to specify it yourself. There are many other partitioning strategies, each with its own strengths and weaknesses, but we can stick to the static partitioning for now.
Let's see some code:
class SearchTask implements Runnable {
private int localCounter = 0;
private int start; // start index of search
private int end;
private List<String> words;
private String token;
public SearchTask(int start, int end, List<String> words, String token) {
this.start = start;
this.end = end;
this.words = words;
this.token = token;
}
public void run() {
for(int i = start; i < end; i++) {
if(words.get(i).equals(token)) localCounter++;
}
}
public int getCounter() { return localCounter; }
}
// meanwhile in main :)
List<String> words = new ArrayList<String>();
// populate words
// let's assume you have 30000 words
// create tasks
SearchTask task1 = new SearchTask(0, 10000, words, "John");
SearchTask task2 = new SearchTask(10000, 20000, words, "John");
SearchTask task3 = new SearchTask(20000, 30000, words, "John");
// create threads for each task
Thread t1 = new Thread(task1);
Thread t2 = new Thread(task2);
Thread t3 = new Thread(task3);
// start threads
t1.start();
t2.start();
t3.start();
// wait for threads to finish
t1.join();
t2.join();
t3.join();
// collect results
int counter = 0;
counter += task1.getCounter();
counter += task2.getCounter();
counter += task3.getCounter();
This should work nicely. Note that in practical cases you would build a more generic partitioning scheme. You could alternatively use an ExecutorService and implement Callable instead of Runnable if you wish to return a result.
So an alternative example using more advanced constructs:
class SearchTask implements Callable<Integer> {
private int localCounter = 0;
private int start; // start index of search
private int end;
private List<String> words;
private String token;
public SearchTask(int start, int end, List<String> words, String token) {
this.start = start;
this.end = end;
this.words = words;
this.token = token;
}
public Integer call() {
for(int i = start; i < end; i++) {
if(words.get(i).equals(token)) localCounter++;
}
return localCounter;
}
}
// meanwhile in main :)
List<String> words = new ArrayList<String>();
// populate words
// let's assume you have 30000 words
// create tasks
List<Callable> tasks = new ArrayList<Callable>();
tasks.add(new SearchTask(0, 10000, words, "John"));
tasks.add(new SearchTask(10000, 20000, words, "John"));
tasks.add(new SearchTask(20000, 30000, words, "John"));
// create thread pool and start tasks
ExecutorService exec = Executors.newFixedThreadPool(3);
List<Future> results = exec.invokeAll(tasks);
// wait for tasks to finish and collect results
int counter = 0;
for(Future f: results) {
counter += f.get();
}
You picked a bad example, as Tudor was so kind to point out. Spinning disk hardware is subject to physical constraints of moving platters and heads, and the most efficient read implementation is to read each block in order, which reduces the need to move the head or wait for the disk to align.
That said, some operating systems don't always store things continuously on disks, and for those who remember, defragmentation could provide a disk performance boost if you OS / filesystem didn't do the job for you.
As you mentioned wanting a program that would benefit, let me suggest a simple one, matrix addition.
Assuming you made one thread per core, you can trivially divide any two matrices to be added into N (one for each thread) rows. Matrix addition (if you recall) works as such:
A + B = C
or
[ a11, a12, a13 ] [ b11, b12, b13] = [ (a11+b11), (a12+b12), (a13+c13) ]
[ a21, a22, a23 ] + [ b21, b22, b23] = [ (a21+b21), (a22+b22), (a23+c23) ]
[ a31, a32, a33 ] [ b31, b32, b33] = [ (a31+b31), (a32+b32), (a33+c33) ]
So to distribute this across N threads, we simply need to take the row count and modulus divide by the number of threads to get the "thread id" it will be added with.
matrix with 20 rows across 3 threads
row % 3 == 0 (for rows 0, 3, 6, 9, 12, 15, and 18)
row % 3 == 1 (for rows 1, 4, 7, 10, 13, 16, and 19)
row % 3 == 2 (for rows 2, 5, 8, 11, 14, and 17)
// row 20 doesn't exist, because we number rows from 0
Now each thread "knows" which rows it should handle, and the results "per row" can be computed trivially because the results do not cross into other thread's domain of computation.
All that is needed now is a "result" data structure which tracks when the values have been computed, and when last value is set, then the computation is complete. In this "fake" example of a matrix addition result with two threads, computing the answer with two threads takes approximately half the time.
// the following assumes that threads don't get rescheduled to different cores for
// illustrative purposes only. Real Threads are scheduled across cores due to
// availability and attempts to prevent unnecessary core migration of a running thread.
[ done, done, done ] // filled in at about the same time as row 2 (runs on core 3)
[ done, done, done ] // filled in at about the same time as row 1 (runs on core 1)
[ done, done, .... ] // filled in at about the same time as row 4 (runs on core 3)
[ done, ...., .... ] // filled in at about the same time as row 3 (runs on core 1)
More complex problems can be solved by multithreading, and different problems are solved with different techniques. I purposefully picked one of the simplest examples.
you implement a Task or Callable with a run() or call() method
(respectively), and it behooves you to parallelize as much of that
implemented method as possible.
A Task represents a discrete unit of work
Loading a file into memory is a discrete unit of work and can therefore this activity can be delegated to a background thread. I.e. a background thread runs this task of loading the file.
It is a discrete unit of work since it has no other dependencies needed in order to do its job (load the file) and has discrete boundaries.
What you are asking is to further divide this into task. I.e. a thread loads 1/3 of the file while another thread the 2/3 etc.
If you were able to divide the task into further subtasks then it would not be a task in the first place by definition. So loading a file is a single task by itself.
To give you an example:
Let's say that you have a GUI and you need to present to the user data from 5 different files. To present them you need also to prepare some data structures to process the actual data.
All these are separate tasks.
E.g. the loading of files is 5 different tasks so could be done by 5 different threads.
The preparation of the data structures could be done a different thread.
The GUI runs of course in another thread.
All these can happen concurrently
If you system supported high-throughput I/O , here is how you can do it:
How to read a file using multiple threads in Java when a high throughput(3GB/s) file system is available
Here is the solution to read a single file with multiple threads.
Divide the file into N chunks, read each chunk in a thread, then merge them in order. Beware of lines that cross chunk boundaries. It is the basic idea as suggested by user
slaks
Bench-marking below implementation of multiple-threads for a single 20 GB file:
1 Thread : 50 seconds : 400 MB/s
2 Threads: 30 seconds : 666 MB/s
4 Threads: 20 seconds : 1GB/s
8 Threads: 60 seconds : 333 MB/s
Equivalent Java7 readAllLines() : 400 seconds : 50 MB/s
Note: This may only work on systems that are designed to support high-throughput I/O , and not on usual personal computers
Here is the essential nits of the code, for complete details , follow the link
public class FileRead implements Runnable
{
private FileChannel _channel;
private long _startLocation;
private int _size;
int _sequence_number;
public FileRead(long loc, int size, FileChannel chnl, int sequence)
{
_startLocation = loc;
_size = size;
_channel = chnl;
_sequence_number = sequence;
}
#Override
public void run()
{
System.out.println("Reading the channel: " + _startLocation + ":" + _size);
//allocate memory
ByteBuffer buff = ByteBuffer.allocate(_size);
//Read file chunk to RAM
_channel.read(buff, _startLocation);
//chunk to String
String string_chunk = new String(buff.array(), Charset.forName("UTF-8"));
System.out.println("Done Reading the channel: " + _startLocation + ":" + _size);
}
//args[0] is path to read file
//args[1] is the size of thread pool; Need to try different values to fing sweet spot
public static void main(String[] args) throws Exception
{
FileInputStream fileInputStream = new FileInputStream(args[0]);
FileChannel channel = fileInputStream.getChannel();
long remaining_size = channel.size(); //get the total number of bytes in the file
long chunk_size = remaining_size / Integer.parseInt(args[1]); //file_size/threads
//thread pool
ExecutorService executor = Executors.newFixedThreadPool(Integer.parseInt(args[1]));
long start_loc = 0;//file pointer
int i = 0; //loop counter
while (remaining_size >= chunk_size)
{
//launches a new thread
executor.execute(new FileRead(start_loc, toIntExact(chunk_size), channel, i));
remaining_size = remaining_size - chunk_size;
start_loc = start_loc + chunk_size;
i++;
}
//load the last remaining piece
executor.execute(new FileRead(start_loc, toIntExact(remaining_size), channel, i));
//Tear Down
}
}
I have to multithread a method that runs a code in the batches of 1000 . I need to give these batches to different threads .
Currently i have spawn 3 threads but all 3 are picking the 1st batch of 1000 .
I want that the other batches should not pick the same batch instead pick the other batch .
Please help and give suggestions.
I would use an ExecutorService
int numberOfTasks = ....
int batchSize = 1000;
ExecutorService es = Executors.newFixedThreadPool(3);
for (int i = 0; i < numberOfTasks; i += batchSize) {
final int start = i;
final int last = Math.min(i + batchSize, numberOfTasks);
es.submit(new Runnable() {
#Override
public void run() {
for (int j = start; j < last; j++)
System.out.println(j); // do something with j
}
});
}
es.shutdown();
Put the batches in a BlockingQueue and make your worker threads to take the batches from the queue.
Use a lock or a mutex when retrieving the batch. That way, the threads can't access the critical section at the same time and can't accidentally access the same batch.
I'm assuming you're removing a batch once it was picked by a thread.
EDIT: aioobe and jonas' answers are better, use that. This is an alternative. :)
You need to synchronize the access to the list of jobs in the batch. ("Synchronize" essentially means "make sure the threads aware of potential race-conditions". In most scenarios this means "let some method be executed by a single thread at a time".)
This is easiest solved using the java.util.concurrent package. Have a look at the various implementations of BlockingQueue for instance ArrayBlockingQueue or LinkedBlockingQueue.