consuming concurrently a collection using reactor

consuming concurrently a collection using reactor - java

from time to time have to implement the classic concurrent producer-consumer solution across the project I'm involved in, pretty much the problem is reduced in having some collection which gets populated from multiple threads and which is being consumed by several consumers.
In a nutshell the collection say is bounded to 10k entities,
once buffer size is hit a worker task is submitted consuming these 10k entities, there is a limit of such workers say its set to 10, which in worst case scenario means I can have up to 10 workers each consuming 10k entities.
I do have to play with some locking here and there some checks around buffer overflows (case when producers generate too much data while all workers are busy processing their chunks) thus have to discard new events to avoid OOM (not the best solution but stability is p1 ;))
Was looking these days around reactor and a way to use it instead of going low level and do all the things described above, so the dumb question is: "can reactor be used for this use case?"
for now forget about overflow/discarding.. how can i achieve the N consumers for a broadcaster?
was looking particularly around broadcaster with the buffer + a thread pooled dispatcher:
void test() {
final Broadcaster<String> sink = Broadcaster.create(Environment.initialize());
Dispatcher dispatcher = Environment.newDispatcher(2048, 20, DispatcherType.WORK_QUEUE);
sink
.buffer(100)
.consumeOn(dispatcher, this::log);
for (int i=0; i<100000; i++) {
sink.onNext("elementent " + i);
if (i%1000 == 0) {
System.out.println("addded elements " + i);
}
}
}
void log(List<String> values) {
System.out.print("simulating slow processing....");
System.out.println("processing: " + Arrays.toString(values.toArray()));
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
my intention here is have a broadcaster execute the log(..) in asynch manner when buffer size was reached, however it looks like it is always executing log(...) it in blocking mode. executing 100 once done next 100 and so on.. how can i make it asynch ?
thanks
vyvalyty

A possible pattern is to use flatMap with publishOn:
Flux.range(1, 1_000_000)
.buffer(100)
.flatMap(b -> Flux.just(b).publishOn(SchedulerGroup.io())
.doOnNext(this::log))
.consume(...);

Related

Amount of Threads with subtasks

An optimum of threads in a pool is something that is case specific, though there is a rule of thumb which says #threads = #CPU +1.
However, how does this work with threads spanning other threads and waiting (i.e. blocked until thread.join() is successful) for these 'subthreads'?
Assume that I have code that requires the execution of list of tasks (2), which has subtasks(2), which has subsubtasks(3) and so on. The total number of tasks is 2*2*3 = 12, though 18 threads will be created (because a threads will 'spawn' more subtasks (threads), where the thread spawning more threads will be blocked untill all is over. See below for pseudo code.
I am assuming that for a CPU with N cores there is a rule of thumb that everything can be parallelized if the highest number of active threads (12) is #CPU + 1. Is this correct?
PseudoCode
outputOfTask = []
for subtask in SubTaskList
outputOfTask --> append(subtask.doCompute())
// wait untill all output is finished.
in subtask.java:
Each subtask, for example, implements the same interface, but can be different.
outputOfSubtask = []
for task in subsubTaskList
// do some magic depending on the type of subtask
outputOfSubtask -> append( task.doCompute())
return outputOfSubtask
in subsubtask.java:
outputOfSubsubtask = []
for task in subsubsubtask
// do some magic depending on the type of subsubtask
outputOfSubsubtask -> append( task.doCompute())
return outputOfSubsubtask
EDIT:
Dummy code Java code. I used this in my original question to check how many threads were active, but I assume that the pseudocode is more clear. Please note: I used the Eclipse Collection, this introduces the asParallel function which allows for a shorter notation of the code.
#Test
public void testasParallelthreads() {
// // ExecutorService executor = Executors.newWorkStealingPool();
ExecutorService executor = Executors.newCachedThreadPool();
MutableList<Double> myMainTask = Lists.mutable.with(1.0, 2.0);
MutableList<Double> mySubTask = Lists.mutable.with(1.0, 2.0);
MutableList<Double> mySubSubTask = Lists.mutable.with(1.0, 2.0);
MutableList<Double> mySubSubSubTask = Lists.mutable.with(1.0, 2.0, 2.0);
MutableList<Double> a = myMainTask.asParallel(executor, 1)
.flatCollect(task -> mySubTask.asParallel(executor,1)
.flatCollect(subTask -> mySubSubTask.asParallel(executor, 1)
.flatCollect(subsubTask -> mySubSubSubTask.asParallel(executor, 1)
.flatCollect(subsubTask -> dummyFunction(task, subTask, subsubTask, subsubTask,executor))
.toList()).toList()).toList()).toList();
System.out.println("pool size: " + ((ThreadPoolExecutor) executor).getPoolSize());
executor.shutdownNow();
}
private MutableList<Double> dummyFunction(double a, double b, double c, double d, ExecutorService ex) {
System.out.println("ThreadId: " + Thread.currentThread().getId());
System.out.println("Active threads size: " + ((ThreadPoolExecutor) ex).getActiveCount());
return Lists.mutable.with(a,b,c,d);
}

I am assuming that for a CPU with N cores there is a rule of thumb that everything can be parallelized if the highest number of active threads (12) is #CPU + 1. Is this correct?
This topic is extremely hard to generalize about. Even with the actual code, the performance of your application is going to be very difficult to determine. Even if you could come up an estimation, the actual performance may vary wildly between runs – especially considering that the threads are interacting with each other. The only time we can take the #CPU + 1 number is if the jobs that are submitted into the thread-pool are independent and completely CPU bound.
I'd recommend trying a number of different thread-pool size values under simulated load to find the optimal values for your application. Examining the overall throughput numbers or system load stats should give you the feedback you need.

However, how does this work with threads spanning other threads and waiting (i.e. blocked until thread.join() is successful) for these 'subthreads'?
Threads will block, and it is up to the os/jvm to schedule another one if possible. If you have a single thread pool executor and call join from one of your tasks, the other task won't even get started. With executors that use more threads, then the blocking task will block a single thread and the os/jvm is free to scheduled other threads.
These blocked threads should not consume CPU time, because they are blocked. So I am assuming that for a CPU with N cores there is a rule of thumb that everything can be parallelized if the highest number of active threads (24) is #CPU + 1. Is this correct?
Active threads can be blocking. I think you're mixing terms here, #CPU, the number of cores, and the number of virtual cores. If you have N physical cores, then you can run N cpu bound tasks in parallel. When you have other types of blocking or very short lived tasks, then you can have more parallel tasks.

Q: Parse.com query count stability

For testing purpose I put the following code in the onCreate() of an Activity:
// Create 50 objects
for (int i = 0; i < 50; i++) {
ParseObject obj = new ParseObject("test_obj");
obj.put("foo", "bar");
try {
obj.save();
} catch (ParseException pe) {
Log.d("Parsetest", "Failed to save " + pe.toString());
}
}
// Count them
for (int i = 0; i < 10; i ++) {
ParseQuery<ParseObject> query = ParseQuery.getQuery("test_obj");
query.countInBackground(new CountCallback() {
#Override
public void done(int count, ParseException e) {
if (e == null) {
Log.d("Parsetest", "Background found " + count + " objects");
} else {
Log.d("Parsetest", "Query issue" + e.toString());
}
}
});
}
I would expect the count to be always fifty, however running this code yields something like:
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 50 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
Can somebody explain this behavior and how to correct this ?

Without knowing further details, I'm inclined to believe the inconsistency is due to threading and the mixing of synchronous/asynchronous calls.
For example, calling obj.save(); is synchronous (reference), however, without seeing the rest of your code, it's possible that the synchronous save is being executed on a background thread.
Additionally, query.countInBackground is asynchronous and is being called multiple times with a for loop. This is going to simultaneously create 10 separate background processes to query Parse for the count of objects and depending on how the save is handled there could be race conditions.
Lastly, there are documented limitations on count operations with Parse.
Count queries are rate limited to a maximum of 160 requests per
minute. They can also return inaccurate results for classes with more
than 1,000 objects. Thus, it is preferable to architect your
application to avoid this sort of count operation (by using counters,
for example.)
From Héctor Ramos on the Parse Developers Google group,
Count queries have always been expensive once you throw some
constraints in. If you only care about the total size of the
collection, you can run a count query without any constraints and that
one should be pretty fast, as getting the total number of records is a
different problem than counting how many of these match an arbitrary
list of constraints. This is just the reality of working with database
systems.
Given the cost of count operations, it is possible that Parse has mechanisms in place to prevent rapid bursts of count operations from a given client.
If you are needing to perform count operations often, the recommended approach is to use cloud code afterSave hooks to increment/decrement a counter as needed.

Should we use Thread.sleep( ) when doing something with timeout?

Consider the following two blocks:
// block one
long start = System.currentTimeMillis();
while (System.currentTimeMillis() - start < TIMEOUT) {
if( SOME_CONDITION_IS_MET ) {
// do something
break;
} else {
Thread.sleep( 100 );
}
}
// block two
long start = System.currentTimeMillis();
while (System.currentTimeMillis() - start < TIMEOUT) {
if( SOME_CONDITION_IS_MET ) {
// do something
break;
}
}
The difference between the two is that the first one has a Thread.sleep(), which seemingly can reduce condition checking in while and if. However, is there any meaningful benefit by having this sleep, assuming the if condition doesn't have a heavy computation? Which one would you recommend for implementing timeout?

One key difference is that the second method involves busy waiting. If SOME_CONDITION_IS_MET doesn't involve any I/O, the second approach will likely consume an entire CPU core. This is a wasteful thing to do (but could be perfectly reasonable in some -- pretty rare -- circumstances). On the flip side, the second approach has lower latency.
I agree with Boris that, in a general setting, both approaches are basically hacks. A better way would be to use proper synchronization primitives to signal the condition.

Is Java fork/join (divide and conquor) possible when inserting line-by-line into MongoDB from InputStream?

I list code below which performs batch inserts into MongoDB. This code takes a long time to run--about an hour to insert 20 million Mongo documents.
The time consuming portion of the code--the scanner.hasNextLine()--nextLine()--insert loop--runs as slowly as 20 seconds an iteration at times. This slowness becomes pronounced toward the middle of the job, I note. (Answers on this forum indicate that the mongo insert-batch or regular can be expensive, owing to the conversion of json into binary format, bson.)
I want to speed up this process. I would like to process this job in parallel on several cores. Can I do this using Fork/join? I ask because I could not see a way how to apply a divide and conquer strategy in the case of this code with its while loop on an input stream.
Another possibility is to use the ThreadPoolExecutor. Would use of an executor be best? Would an executor distribute the job over several cores?
The code:
Scanner lineScan = new Scanner(inputStream, encoding);
while (lineScan.hasNextLine() {
//add to list of DBObjects to be inserted as a batch
//do batch insert here if object-count threshold is reached.
}
Similar code using a ThreadPoolExecutor ( see
Java Iterator Concurrency and Java: Concurrent reads on an InputStream ):
ExecutorService executor = Executors.newCachedThreadPool();
Iterator<Long> i = getUserIDs();
while (i.hasNext()) {
final Long l = i.next();
Runnable task = new Runnable() {
public void run() {
someObject.doSomething(l);
anotheObject.doSomething(l);
}
}
executor.submit(task);
}
executor.shutdown();
Any perspectives on which technique might best expedite this loop and insert would be greatly appreciated.
Many thanks in advance!

You should consider the bulk write operations in the 2.12 driver: http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-java-driver/#bulk-operations. Also, disabling indexes during inserts will help, too.

Why is my threaded sort algorithm slow compared to the non-threaded version?

I just have implemented a threaded version of the merge sort. ThreadedMerge.java: http://pastebin.com/5ZEvU6BV
Since merge sort is a divide and conquer algorithm I create a thread for every half of the array. But the number of avialable threads in Java-VM is limited so I check that before creating threads:
if(num <= nrOfProcessors){
num += 2;
//create more threads
}else{
//continue without threading
}
However the threaded sorting takes about ~ 6000 ms while the non-threaded version is much faster with just ~ 2500 ms.
Non-Threaded: http://pastebin.com/7FdhZ4Fw
Why is the threaded version slower and how do I solve that problem?
Update: I use atomic integer now for thread counting and declared a static field for Runtime.getRuntime().availableProcessors(). The sorting takes about ~ 1400 ms now.
However creating just one thread in the mergeSort method and let the current thread do the rest has no sigificant performance increase. Why?
Besides when after I call join on a thread and after that decrement the number of used threads with
num.set(num.intValue() - 1);
the sorting takes about ~ 200 ms longer. Here is the update of my algorithm http://pastebin.com/NTZq5zQp Why does this line of code make it even worse?

first off your accesses to num is not threadsafe (check http://download.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicInteger.html )
you create an equal amount of processes to cores but you block half of them with the join call
num += 1;
ThreadedMerge tm1 = new ThreadedMerge(array, startIndex, startIndex + halfLength);
tm1.start();
sortedRightPart = mergeSort(array, startIndex + halfLength, endIndex);
try{
tm1.join();
num-=1
sortedLeftPart = tm1.list;
}catch(InterruptedException e){
}
this doesn't block the calling thread but uses it to sort the right part and let the created thread do the other part when that one returns the space it takes up can be used by another thread

Hhm, you should not create a thread for every single step (they are expensive and there are lightweight alternatives.)
Ideally, you should only create 4 threads if there are 4 CPU´s.
So let´s say you have 4 CPU´s, then you create one thread at the first level (now you have 2) and at the second level you also create a new thread. This gives you 4.
The reason why you only create one and not two is that you can use the thread you are currently running like:
Thread t = new Thread(...);
t.start();
// Do half of the job here
t.join(); // Wait for the other half to complete.
If you have, let´s say, 5 CPU´s (not in the power of two) then just create 8 threads.
One simple way to do this in practice, is to create the un-threaded version you already made when you reach the appropriate level. In this way you avoid to clutter the merge method when if-sentences etc.

The call to Runtime.availableProcessors() appears to be taking up a fair amount of extra time. You only need to call it once, so just move it outside of the method and define it as a static, e.g.:
static int nrOfProcessors = Runtime.getRuntime().availableProcessors();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.