Understanding RxJava observable when underlying data source has new values

Understanding RxJava observable when underlying data source has new values - java

I am trying to experiment RxJava observable and observer code. My objective is to check that how things work when underlying source receives new data values. My code is as:
List<Integer> numbers = new ArrayList<>();
Runnable r = new Runnable() {
#Override
public void run() {
int i = 100;
while(i < 110) {
numbers.add(i);
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
i++;
}
}
};
numbers.add(0);
numbers.add(1);
numbers.add(2);
Observable.fromIterable(numbers)
.observeOn(Schedulers.io())
.subscribe(i -> System.out.println("Received "+i+ " on "+ Thread.currentThread().getName()),
e -> e.printStackTrace());
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
Thread t = new Thread(r);
t.start();
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
e.printStackTrace();
}
So I have a list of numbers. I then have a runnable which adds new numbers to this list with a time gap between the additions. I don't start thread yet. I add 0,1,2 to the list and then create an observable with it, scheduling the observer on a thread from pool, and finally subscribing to the observable. As subscription happens, observable emits the values 0,1,2 and observer is invoked(lambda passed to subscribe is executed). Then I introduce a delay of 1 sec on the main thread and then I spawn a new thread using runnable I created earlier, and also add a final delay so that application doesn't exit immediately.
What I expect is that as new numbers are added to the list, observer must be invoked, thus printing the message. But that doesn't happen. Surely I have got it wrong in my understanding. Do I need to also put observable on a scheduler?

The Observable.fromIterable() method is a "one time" load of values for an observable each time a subscription is build. What happens "after" building the subscription has no affect anymore. When you use the subscribe(onNext, onError, onComplete) method with the onComplete argument you will see that the subscription has fully consumed and the three initial values has been printed.
You can use a Subject (something like a PublishSubject) where you use the onNext() method to add "new values" while the subscriptions which were built earlier are still active (and not completed). That way you can build the subscriptions first and keep calling onNext() for new values in the subject until you are done and call onCompleted().

Related

Execute long running task processing data in parallel in java

I am looking for ways to process list entries in parallel, a task that is quite long (say 24 hours - I stream data from huge dbs and then for each row it takes about 1-2 sec to be done with it). I have an application that have 2 methods each processing a list of data. My intitial idea was to use ForkJoin which works but not quite. The simplified dummy code mimicing my app's behaviour is as follows:
#Service
#Slf4j
public class ListProcessing implements Runnable {
#Async
private void processingList() {
// can change to be a 100 or 1000 to speed up the processing,
// but the point is to see the behaviour after the task runs for a long time
// so just using 12.
ForkJoinPool newPool = new ForkJoinPool(12);
newPool.execute(() -> {
List<Integer> testInt = IntStream.rangeClosed(0, 50000)
.boxed().toList();
long start = System.currentTimeMillis();
Map<Integer,DummyModel> output = testInt.stream().parallel()
.map(item -> {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
log.info("I slept at item {} for map",item);
return new DummyModel(UUID.randomUUID(), item); // a model class with 2 fields and no logic save for getters/setters
}).collect(Collectors.toConcurrentMap(DummyModel::getNum, item -> item));
long end = System.currentTimeMillis();
log.info("Processing time {}",(end-start));
log.info("Size is {}",output.size());
});
newPool.shutdown();
}
// method is identical to the one above for simplicity & demo purposes
#Async
private void processingList2() {
ForkJoinPool newPool = new ForkJoinPool(12);
newPool.execute(() -> {
List<Integer> testInt = IntStream.rangeClosed(0, 50000)
.boxed().toList();
long start = System.currentTimeMillis();
Map<Integer,DummyModel> output = testInt.stream().parallel()
.map(item -> {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
log.info("I slept at item {} for map2",item);
return new DummyModel(UUID.randomUUID(), item);
}).collect(Collectors.toConcurrentMap(DummyModel::getNum, item -> item));
long end = System.currentTimeMillis();
log.info("Processing time {}",(end-start));
log.info("Size is {}",output.size());
});
newPool.shutdown();
}
#Override
public void run() {
processingList();
processingList2();
}
}
The class is then being called by my controller which is as follows:
#PostMapping
public void startTest() {
Thread startRun = new Thread(new ListProcessing());
startRun.start();
}
This works perfectly - both methods are executed in async and I can see that they are using separate pools with 12 worker threads each. However, about an hour into running this app I can see that the number of threads used by each method starts dropping. After some researching, I learnt that parallel streams might be the problem (according to this discussion).
Now, I can change my ForkJoinPools to have more worker threads (which will shorten the execution time solveing the problem, but that sounds like a temp fix with the problem still there if execution exceeds 1 hour mark). So I decided to try something else, although I would really like to make ForkJoin work.
Another solution that seems to be able to do what I want is using CompletableFuture with Custom Executor as described here. So I removed Runnable & ForkJoin and implemented CompletableFuture as described in the article. The only difference being that I have a separate pool for each method and both methods are being called by controller which looks like so now:
#Autowired
private ListProcessing listProcessing;
#PostMapping
public void startTest() {
listProcessing.processingList();
listProcessing.processingList2();
}
However, the custom Executors never get used and each testInt gets executed synchronosly one by one. I tried to make it work with only 1 method but that also didn't work - custom executor seems to just be ignored. The method looked like so:
private CompletableFuture<List<DummyModel>> processingList() {
List<Integer> testInt = IntStream.rangeClosed(0, 50000)
.boxed().toList();
long start = System.currentTimeMillis();
List<CompletableFuture<DummyModel>> myDummyies = new ArrayList<>();
testInt.forEach(item -> {
myDummyies.add(createDummy(item));
log.info("I slept at item {} for list", item);
});
// waiting for all CompletableFutures to complete and collect them into a list
CompletableFuture<List<DummyModel>> output = CompletableFuture.allOf(myDummyies.toArray(new CompletableFuture[0]))
.thenApply(item -> myDummyies.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList()));
long end = System.currentTimeMillis();
log.info("Processing time {} \n", (end - start));
return output;
}
#Async("myPool")
private CompletableFuture<DummyModel> createDummy(Integer item) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
return CompletableFuture.completedFuture(new DummyModel(UUID.randomUUID(), item));
}
So my questions are as follows:
Can I somehow set up ForkJoin to replace blocked worker threads with the fresh ones, so that the number of worker threads remain the same all the time? Or maybe after some time ask it to be replaced by a newly created one and continue the work? Or is it all just a limitation of a ForkJoin framework and I should look elsewhere?
If the ForkJoin cannot happen, how can I make CompletableFuture work? Where did I go worng with what I have implemented?
Is there any other way to process a long running task with custom number of worker threads which run in parallel? What would be the best way to process a lot of data for a prolong period of time in parallel?

Java Concurrency: Need to make 2 webservice calls simultaneously - is this correct?

I want to make web calls to 2 different services simultaneously. At the end, I zip the 2 Response objects into one stream. I'm using a Callable, but I'm not sure I'm going about this in the correct way. It seems as though I'm still going to be blocked by the first get() call to the Future, right? Can someone tell me if I'm on the right track? This is what I have so far:
// submit the 2 calls to the thread pool
ExecutorService executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
Future<Mono<Response<ProcessInstance>>> processFuture =
executorService.submit(() -> getProcessInstances(processDefinitionKey, encryptedIacToken));
Future<Mono<Response<Task>>> taskFuture =
executorService.submit(() -> getTaskResponses(processDefinitionKey, encryptedIacToken, 100, 0));
// get the result of the 2 calls
Optional<Tuple2<Response<ProcessInstance>, Response<Task>>> tuple;
try {
Mono<Response<ProcessInstance>> processInstances = processFuture.get();
Mono<Response<Task>> userTasks = taskFuture.get();
tuple = processInstances.zipWith(userTasks).blockOptional();
} catch (InterruptedException e) {
log.error("Exception while processing response", e);
// Restore interrupted state...
Thread.currentThread().interrupt();
return emptyProcessResponseList;
} catch (ExecutionException e) {
log.error("Exception while processing response", e);
return emptyProcessResponseList;
}

Given: You need to wait until both tasks are complete.
If processFuture ends first, you'll immediately fall through and wait until taskFuture ends. If taskFuture ends first you'll block until processFuture ends, but the taskFuture.get() call will return instantly since that task is done. In either case the result is the same.
You could use CompletableFutures instead and then CompletableFuture.allOf() but for something this simple what you have works fine. See also Waiting on a list of Future

Your code will block until the processFuture is finished, then it will block until the taskFuture is finished.
The callables will be processed in parallel, so here you are saving time (assuming thread pool size >= 2).

RxJava: Merging Observable with Completable does not work

I have an Observable that at some point has to write things to the cache - and we would like to wait that writes are done before finishing the whole operation on the observable (for reporting purposes).
For the purpose of test, the cache write Completable looks like this:
Completable.create(
emitter ->
new Thread(
() -> {
try {
Thread.sleep(2000);
doSomething();
emitter.onComplete();
} catch (InterruptedException e) {
e.printStackTrace();
}
})
.start());
Since I have several cache writes, I try to merge them in a container class:
public class CacheInsertionResultsTracker {
private Completable cacheInsertResultsCompletable;
public CacheInsertionResultsTracker() {
this.cacheInsertResultsCompletable = Completable.complete();
}
public synchronized void add(Completable cacheInsertResult) {
this.cacheInsertResultsCompletable = this.cacheInsertResultsCompletable.mergeWith(cacheInsertResult);
}
public Completable getCompletable() {
return this.cacheInsertResultsCompletable;
}
}
And I try to merge it with Observable in a following way:
CacheInsertionResultsTracker tracker = new ...;
observable
.doOnNext(next->tracker.add(next.writeToCache(...)))
.mergeWith(Completable.defer(()->tracker.getCompletable()))
.subscribe(
// on next
this::logNextElement
// on error
this::finishWithError
// on complete
this::finishWithSuccess
);
How could I make sure that by the time finishWithSuccess is called the doSomething is completed?
The problem is that the Completable reference is updated every time I add a new one, and it happens after the mergeWith runs...

The solution that seems to work for our use case is to use concatWith + defer:
observable
.doOnNext(next->tracker.add(next.writeToCache(...)))
.concatWith(Completable.defer(()->tracker.getCompletable()))
.subscribe(
// on next
this::logNextElement
// on error
this::finishWithError
// on complete
this::finishWithSuccess
);
Concat assures that the subscription to the Completable happens only after the Observable is done, and defer defers getting the final Completable till this subscription (so all the objects are already added to the tracker).

Based on the comments, you could replace the completable cache with ReplaySubject<Completable>, do some timeout to detect inactivity and have the observable sequence end.
ReplaySubject<Completable> cache = ReplaySubject.create();
cache.onNext(completable);
observable.mergeWith(
cache.flatMapCompletable(v -> v)
.timeout(10, TimeUnit.MILLISECONDS, Completable.complete())
)
Edit:
Your updated example implies you want to run Completables in response to items in the main observable, isolated to that sequence, and wait for all of them to complete. This is a typical use case for flatMap:
observable.flatMap(
next -> next.writeToCache(...).andThen(Observable.just(next))
)
.subscribe(
this::logNextElement
// on error
this::finishWithError
// on complete
this::finishWithSuccess
);

Writing to/Reading from a Vector (or ArrayList) with two threads

I have two threads both of which accesses an Vector. t1 adds a random number, while t2 removes and prints the first number. Below is the code and the output. t2 seems to execute only once (before t1 starts) and terminates forever. Am I missing something here? (PS: Tested with ArrayList as well)
import java.util.Random;
import java.util.Vector;
public class Main {
public static Vector<Integer> list1 = new Vector<Integer>();
public static void main(String[] args) throws InterruptedException {
System.out.println("Main started!");
Thread t1 = new Thread(new Runnable() {
#Override
public void run() {
System.out.println("writer started! ");
Random rand = new Random();
for(int i=0; i<10; i++) {
int x = rand.nextInt(100);
list1.add(x);
System.out.println("writer: " + x);
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
});
Thread t2 = new Thread(new Runnable() {
#Override
public void run() {
System.out.println("reader started! ");
while(!list1.isEmpty()) {
int x = list1.remove(0);
System.out.println("reader: "+x);
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
});
t2.start();
t1.start();
t1.join();
t2.join();
}
}
Output:
Main started!
reader started!
writer started!
writer: 40
writer: 9
writer: 23
writer: 5
writer: 41
writer: 29
writer: 72
writer: 73
writer: 95
writer: 46

This sounds like a toy to understand concurrency, so I didn't mention it before, but I will now (at the top because it is important).
If this is meant to be production code, don't roll your own. There are plenty of well implemented (debugged) concurrent data structures in java.util.concurrent. Use them.
When consuming, you need to not shutdown your consumer based on "all items consumed". This is due to a race condition where the consumer might "race ahead" of the producer and detect an empty list only because the producer hasn't yet written the items for consumption.
There are a number of ways to accomplish a shutdown of the consumer, but none of them can be done by looking at the data to be consumed in isolation.
My recommendation is that the producer "signals" the consumer when the producer is done producing. Then the consumer will stop when it has both the "signal" no more data is being produced AND the list is empty.
Alternative techniques include creating a "shutdown" item. The "producer" adds the shutdown item, and the consumer only shuts down when the "shutdown" item is seen. If you have a group of consumers, keep in mind that you shouldn't remove the shutdown item (or only one consumer would shutdown).
Also, the consumer could "monitor" the producer, such that if the producer is "alive / existent" and the list is empty, the consumer assumes that more data will become available. Shutdown occurs when the producer is dead / non-existent AND no data is available.
Which technique you use will depend on the approach you prefer and the problem you're trying to solve.
I know that people like the elegant solutions, but if your single producer is aware of the single consumer, the first option looks like.
public class Producer {
public void shutdown() {
addRemainingItems();
consumer.shutdown();
}
}
where the Consumer looks like {
public class Consumer {
private boolean shuttingDown = false;
public void shutdown() {
shuttingDown = true;
}
public void run() {
if (!list.isEmpty() && !shuttingDown) {
// pull item and process
}
}
}
Note that such lack of locking around items on the list is inherently dangerous, but you stated only a single consumer, so there's no contention for reading from the list.
Now if you have multiple consumers, you need to provide protections to assure that a single item isn't pulled by two threads at the same time (and need to communicate in such a manner that all threads shutdown).

I think this is a typical Producer–consumer problem. Try to have a look into Semaphore.

Update: The issue`s gone after changing the while loop in the consumer (reader). Instead of exiting the thread if the list is empty, it now enters the loop but does not do anything. Below is the updated reader thread. Of course a decent shutdown mechanism can also be added to the code such as Edwin suggested.
public void run() {
System.out.println("reader started! ");
while(true) {
if(!list1.isEmpty()) {
int x = list1.remove(0);
System.out.println("reader: "+x);
try {
Thread.sleep(100);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
Please note, this is not a code snippet taken from a real product or will go in one!

Java BlockingQueue take() vs poll(time, unit)

I think mistakenly guys compared take() vs poll(), but I found that it is reasonable to compare take() vs poll(time, unit) as both provided by BlockingQueue and both are blocking tell queue not Empty "and in case or poll or time-out", OK lets start comparing, usually I'm using take() for BlockingQueue but I was facing issues about:
handling interrupt inside loop.
waiting till be interrupted from outside.
how to stop looping on queue "using Kill-Bill or interrupt thread"
specially when work with Java 8 streams, then I got idea about I need to stop retrieving data from queue and close it in better way, so I thought to make waiting for sometime after that I can stop retrieve data then I found poll(time, unit) and it will fit for this idea check code below:
public static void main(String[] args) throws InterruptedException {
BlockingQueue<Integer> q = new LinkedBlockingQueue<Integer>();
ExecutorService executor = Executors.newCachedThreadPool();
executor.submit(() -> {
IntStream.range(0, 1000).boxed().forEach(i -> {
try {
q.put(i);
} catch (InterruptedException e) {
currentThread().interrupt();
throw new RuntimeException(e);
}
});
});
....
// Take
Future fTake = executor.submit(() -> {
try {
while (!Thread.currentThread().isInterrupted()) {
System.out.println(q.take());
}
} catch (InterruptedException e) {
currentThread().interrupt();
throw new RuntimeException(e);
}
});
//to stop it I have to do below code "Expecting that execution will take 1 sec"
executor.shutdown();
sleep(1000);
fTake.cancel(true);
....
// poll there is no need to expect time till processing will be done
Future fPoll = executor.submit(() -> {
try {
Integer i;
while ((i = q.poll(100, TimeUnit.MILLISECONDS)) != null)
System.out.println(i);
} catch (InterruptedException e) {
currentThread().interrupt();
throw new RuntimeException(e);
}
});
executor.shutdown();
}
I think the poll code is more clean and there is no need to depend on interrupt and also no need to estimate execution time or make code to determined when to interrupt thread, what do you think?
Note 1: I'm sure that 2nd solution also have drawbacks like not getting data till time-out but I think you are going to know what is suitable time-out for your case.
Note 2: if use case requires waiting for ever and producer is low frequency provide data, I think take solution is better.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Understanding RxJava observable when underlying data source has new values - java

Related

Execute long running task processing data in parallel in java

Java Concurrency: Need to make 2 webservice calls simultaneously - is this correct?

RxJava: Merging Observable with Completable does not work

Writing to/Reading from a Vector (or ArrayList) with two threads

Java BlockingQueue take() vs poll(time, unit)

Categories

Resources