RXJava 2 Polling with multiple subscribers - java

i would like to start replacing/using RXJava2 for polling instead of Observer and Listeners. Now there is just one Problem. Ive one Polling Observable which should only be started if at least one Subscriber is connected. If multiple subscribers are connected the Interval should be the same. means: One observable repeat the Polling process all n Seconds. If the observable has 1..* subscribers it should keep polling n Seconds and notify all subscribers with the result.
This is how i have done this with Listeners and/or with my RXJava Solution.
My first try is creating a Singleton Class which creates only one PublishSubject. If someone is subscribed it will get the data in onNext(). Now my Polling Observer is started somewhere and push the Data to the Subject. This doesnt work as excepted since it's
bad pattern design
doesnt start ONLY if a subscriber is connected and stops if no subscribers are available
doesnt share the data successfull and requires two classes (for subject and repeating observable)
public class SingleTonClass {
private PublishSubject<List<Data>> subject = PublishSubject.create();
public PublishSubject getSubject() {
return this.subject;
}
public void setData(List<Data> data) {
subject.onNext(data);
}
}
I would lovely avoid Listener/Interfaces to share the informations around and let rxjava2 do its job.
After a research i figured out that there is refcount() and share() but im not sure if this is the proper way solving this. In my case it's for a REST-Service which polls the server if at least one subscriber is connected elsehow it should stop polling since it doesnt make sense getting the data in this case.
I tried to solve it onec but it doesnt work as excepted:
Polling using RXJava2 / RXAndroid 2 and Retrofit

I would do it like this:
Observable<Data> dataSource = Observable.interval(INTERVAL, TIME_UNIT)
.observeOn(Schedulers.io()) // make REST requests on IO threads
.map(n -> {
return requestData();
})
.replay(1);
replay() operator includes share(), which in turn includes publish() and refCount() functionalities. This makes your observable hot, i.e. all subscribers share single subscription. It automatically subscribes (starts new interval sequence) with the first subscriber, and unsubscribes (stops interval) when last subscriber is gone.
replay(1) also caches last emitted value, i.e. new subscribers do not have to wait for new data to arrive.

Related

Since Publisher and Subscriber reside in same app limited by same resources, how does backpressure help in the load issue on either side?

I've been trying to build an app using the Java Flow API. While the idea of being able to perform backpressure between the publisher and subscriber in an event when their speeds are different, I'm not sure how it really helps since both the publisher and consumer usually reside within the same application; at least, that's what almost all examples online look like.
For example, in my application, there is a publisher producing messages retrieved from a RabbitMQ and a subscriber processing those messages. So, the message is submitted to the publisher in the RabbitMQ listener like so:
#RabbitListener(queues = "rabbit_queue")
public void rabbitHandler(MessageObject message) {
// do some stuff to the message and then submit it to the publisher
publisher.submit(message);
}
// Then the message will be processed in the subscriber
In an event if a publisher is producing faster than the subscriber can process, the subscriber can call a small n value on subscription.request(n). But, there are 2 things that I am not sure if my understanding of how the request(n) is going to help is correct:
Since both the publisher and subscriber are in the same Spring application in this case, they pretty much share and are limited by the same amount of resources. If the subscriber is going to run out of memory or resources because there were too many elements being sent to it, we are supposed to be able to reduce the n value in request(n). But this will then mean the buffer size in publisher will be full quickly. I can increase the buffer size in the publisher but I'm also limited by the same amount of resources the subscriber was facing because both the publisher and subscriber are in the same application using the same set of resources. Then what's the point of having all of those extra complexity of having a publisher and that request() methods?
It seems to me that the publisher is usually receiving its elements from some sources. Sometimes, not all of these sources can be throttled. In my case, I have a RabbitMQ listener sending the messages to the publisher. But the rate at which the publisher is going to send out those messages to the subscription is largely dependent on the rate at which the rabbitHandler is going to receive the messages from the RabbitMQ queue. If the RabbmitMQ is sending messages faster than the publisher's subscriber can process, the buffering of the messages are still done between the publisher and subscriber within the application and the problem in the above point will occur.
I'm pretty sure there is somewhere wrong in my understanding of the process because it feels like a catch-22 situation to me. It's like I can only hold so many balls in my 2 hands and I'm just passing the balls around between my 2 hands and calling it backpressure. Since both the publisher and subscriber are limited by the same amount of resource as they are both in the same application, what's the benefits of having that extra complexity when I could simply just pass the message on to another handler and be limited by the same amount of resources too, like this:
public class RabbitMqListener {
#RabbitListener(queues = "rabbit_queue")
public void rabbitHandler(MessageObject message) {
// do some stuff to the message and then submit it to the publisher
MessageProcessor.process(message);
}
}
public class MessageProcessor {
public static void process(MessageObject message) {
System.out.println("processing message...");
}
}
It will be great if somebody can help me to correct my understanding.
"If the RabbmitMQ is sending messages faster than the publisher's subscriber can process"
Then you should try to extend backpressure feedback to the very source of messages, the RabbmitMQ publisher. For this goal, you can create additional point-to-point connection. If you cannot slow down the RabbmitMQ publisher, then you have 2 choices: drop some messages which you are unable to store, or buy more performant hardware.
RabbitMQ has a support library for Project Reactor with backpressure built-in: https://projectreactor.io/docs/rabbitmq/snapshot/reference/#_getting_started . I'm not aware of any Java Flow binding so you'll have to bridge the flow back and forth.
I don't think you can backpressure the RabbitMQ #RabbitListener other than callstack blocking. Assuming publisher.submit is SubmissionPublisher::submit, the documentation states
blocking uninterruptibly while resources for any subscriber are unavailable
hence if the downstream Flow.Subscriber hasn't requested, the method will block the listener thread.

Delaying Kafka Streams consuming

I'm trying to use Kafka Streams (i.e. not a simple Kafka Consumer) to read from a retry topic with events that have previously failed to process. I wish to consume from the retry topic, and if processing still fails (for example, if an external system is down), I wish to put the event back on the retry topic. Thus I don't want to keep consuming immediately, but instead wait a while before consuming, in order to not flood the systems with messages that are temporarily unprocessable.
Simplified, the code currently does this, and I wish to add a delay to it.
fun createTopology(topic: String): Topology {
val streamsBuilder = StreamsBuilder()
streamsBuilder.stream<String, ArchivalData>(topic, Consumed.with(Serdes.String(), ArchivalDataSerde()))
.peek { key, msg -> logger.info("Received event for key $key : $msg") }
.map { key, msg -> enrich(msg) }
.foreach { key, enrichedMsg -> archive(enrichedMsg) }
return streamsBuilder.build()
}
I have tried to use Window Delay to set this up, but have not managed to get it to work. I could of course do a sleep inside a peek, but that would leave a thread hanging and does not sound like a very clean solution.
The exact details of how the delay would work is not terribly important to my use case. For example, all of these would work fine:
All events on the topic in the past x seconds are all consumed at once. After it begins / finishes to consume, the stream waits x seconds before consuming again
Every event is processed x seconds after being put on the topic
The stream consumes messages with a delay of x seconds between every event
I would be very grateful if someone could provide a few lines of Kotlin or Java code that would accomplish any of the above.
You cannot really pause reading from the input topic using Kafka Streams—the only way to "delay" would be to call a "sleep", but as you mentioned, that blocks the whole thread and is not a good solution.
However, what you can do is to use a stateful processor, e.g., process() (with attached state store) instead of foreach(). If the retry fails, you don't put the record back into the input topic, but you put it into the store and also register a punctuation with desired retry delay. If the punctuation fires, you retry and if the retry succeeds, you delete the entry from the store and cancel the punctuation; otherwise, you wait until the punctuation fires again.

How to implement chain lock in RxJava

I want to implement lock in my application to let only one chain fragment execute at the time and any other to wait each other.
For example:
val demoDao = DemoDao() // data that must be accessed only by one rx-chain fragment at one time
Observable.range(0, 150)
.subscribeOn(Schedulers.io())
.flatMapCompletable {
dataLockManager.lock("action") { // fragment-start
demoDao.get()
.flatMapCompletable { data ->
demoDao.set(...)
}
} // fragment-end
}
.subscribe()
Observable.range(0, 100)
.subscribeOn(Schedulers.io())
.flatMapCompletable {
dataLockManager.lock("action") { // fragment-start
demoDao.get()
.flatMapCompletable { data ->
demoDao.set(...)
}
} // fragment-end
}
.subscribe()
I tried to implement it via custom Completable.create with CountDownLatch but it may lead to deadlock.
And I suck at this point. What can you recommend me?
To serialize access to demoDao.get(), there are a few ways of achieving this but try hard not to use a lock to do it as that can stuff up a reactive stream with deadlocks for starters (as you have found out).
If you do want to use a lock you should ensure that no lock is held across a stream signal like an emission to downstream or request to upstream. In that situation you can use a lock (shortlived).
One approach is to combine the actions of the two streams into one (with say merge) and do the demoDao stuff on that one stream.
Another approach is to create a PublisheSubject using PublishSubject.create().serialized() which does the demoDao.get() stuff downstream and subscribe to it once only. Then the two sources you have mentioned can .doOnNext(x -> subject.onNext()). Depends if each source must know about failure independently or if it is acceptable that the PublishSubject subscription is the only spot where the failure is notified.
in asynchronous world, using of locks is strongly discouraged. Instead, locking is modelled by serialized execution of an actor or a serial executor. In turn, actor can be modelled by an Obserever and serial executor by Schedulers.single(), though more experienced RxJava programmers can make more advice.

Pattern to continuously listen to AWS SQS messages

I have a simple class named QueueService with some methods that wrap the methods from the AWS SQS SDK for Java. For example:
public ArrayList<Hashtable<String, String>> receiveMessages(String queueURL) {
List<Message> messages = this.sqsClient.receiveMessage(queueURL).getMessages();
ArrayList<Hashtable<String, String>> resultList = new ArrayList<Hashtable<String, String>>();
for(Message message : messages) {
Hashtable<String, String> resultItem = new Hashtable<String, String>();
resultItem.put("MessageId", message.getMessageId());
resultItem.put("ReceiptHandle", message.getReceiptHandle());
resultItem.put("Body", message.getBody());
resultList.add(resultItem);
}
return resultList;
}
I have another another class named App that has a main and creates an instace of the QueueService.
I looking for a "pattern" to make the main in App to listen for new messages in the queue. Right now I have a while(true) loop where I call the receiveMessagesmethod:
while(true) {
messages = queueService.receiveMessages(queueURL);
for(Hashtable<String, String> message: messages) {
String receiptHandle = message.get("ReceiptHandle");
String messageBody = message.get("MessageBody");
System.out.println(messageBody);
queueService.deleteMessage(queueURL, receiptHandle);
}
}
Is this the correct way? Should I use the async message receive method in SQS SDK?
To my knowledge, there is no way in Amazon SQS to support an active listener model where Amazon SQS would "push" messages to your listener, or would invoke your message listener when there are messages.
So, you would always have to poll for messages. There are two polling mechanisms supported for polling - Short Polling and Long Polling. Each has its own pros and cons, but Long Polling is the one you would typically end up using in most cases, although the default one is Short Polling. Long Polling mechanism is definitely more efficient in terms of network traffic, is more cost efficient (because Amazon charges you by the number of requests made), and is also the preferred mechanism when you want your messages to be processed in a time sensitive manner (~= process as soon as possible).
There are more intricacies around Long Polling and Short Polling that are worth knowing, and its somewhat difficult to paraphrase all of that here, but if you like, you can read a lot more details about this through the following blog. It has a few code examples as well that should be helpful.
http://pragmaticnotes.com/2017/11/20/amazon-sqs-long-polling-versus-short-polling/
In terms of a while(true) loop, I would say it depends.
If you are using Long Polling, and you can set the wait time to be (max) 20 seconds, that way you do not poll SQS more often than 20 seconds if there are no messages. If there are messages, you can decide whether to poll frequently (to process messages as soon as they arrive) or whether to always process them in time intervals (say every n seconds).
Another point to note would be that you could read upto 10 messages in a single receiveMessages request, so that would also reduce the number of calls you make to SQS, thereby reducing costs. And as the above blog explains in details, you may request to read 10 messages, but it may not return you 10 even if there are that many messages in the queue.
In general though, I would say you need to build appropriate hooks and exception handling to turn off the polling if you wish to at runtime, in case you are using a while(true) kind of a structure.
Another aspect to consider is whether you would like to poll SQS in your main application thread or you would like to spawn another thread. So another option could be to create a ScheduledThreadPoolExecutor with a single thread in the main to schedule a thread to poll the SQS periodically (every few seconds), and you may not need a while(true) structure.
There are a few things that you're missing:
Use the receiveMessages(ReceiveMessageRequest) and set a wait time to enable long polling.
Wrap your AWS calls in try/catch blocks. In particular, pay attention to OverLimitException, which can be thrown from receiveMessages() if you would have too many in-flight messages.
Wrap the entire body of the while loop in its own try/catch block, logging any exceptions that are caught (there shouldn't be -- this is here to ensure that your application doesn't crash because AWS changed their API or you neglected to handle an expected exception).
See doc for more information about long polling and possible exceptions.
As for using the async client: do you have any particular reason to use it? If not, then don't: a single receiver thread is much easier to manage.
If you want to use SQS and then lambda to process the request you can follow the steps given in the link or you always use lambda instead of SQS and invoke lambda for every request.
As of 2019 SQS can trigger lambdas:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
I found one solution for actively listening the queue.
For Node. I have used the following package and resolved my issue.
sqs-consumer
Link
https://www.npmjs.com/package/sqs-consumer

RxJava: restart from the beginning on each subscription

Imagine I have some time consuming task which I want to run only occasionally.
I want to wrap it into an observable and pass it to some component.
That component will subscribe to this observable whenever it wants to retrieve the data, and unsubscribe after it receives it.
I.e. I want an observable which upon subscription would invoke some expensive API call, and this API call can return a different data each time it is called - and then this observable would shut down until next subscription is made.
Is this possible to achieve?
I have seen 'replay()' and 'cache()' operators, but they won't work because from what I understood, they will cache once and then replay cached values which fails my case of changing data.
Also there is 'observable.publish()' but it seems that this will make a hot observable which will stay connected to the source observable all the time...
As I understand your question, you need a cold observable.
Observable<Integer> obs = Observable.from(1, 2, 3, 4);
obs.subscribe(); // will iter over values
obs.subscribe(); // will iter AGAIN over values
So, if your observable is your api call, and it's a cold observable, just subscribe twice on it to perform two api calls.

Categories

Resources