Flink Trigger when State expires - java

I have an interesting use case which I want to test with Flink. I have an incoming stream of Message which is either PASS or FAIL. Now if the message is of type FAIL, I have a downstream ProcessFunction which saves the Message state and then sends pause commands to everything that depends on this. When I receive a PASS message which is associated with the FAIL I had received earlier (keying by message id), I send resume commands to everything I had paused earlier.
Now I plan on using State TTL to expire the stored FAIL state and resume everything after a certain timeout even if I haven't received a PASS message with the same message id. Could this be done with Flink alone or would I need to have some external timer to send timeout messages to my program?
I had something like this in mind to get it working in Flink:
For each Message, add timestamp and pass it on to a process function which waits until current_ts - timestamp == timeout before sending it on to resume everything paused by the module. Is there a better way or do you guys think this is ok?

Seems like it would be more straightforward to use a timer to expire the state (by calling state.clear() in the onTimer method), rather than using state TTL. The same onTimer method can also arrange for things to resume at the same time.

Related

How to implement blocking queue-like SQL mechanism

My programme is a notification service, it basically receives http requests(client sends notifications) and forwards them to a device.
I want it to work the following way:
receive client notification request
save it to the database(yes, i need this step, its mandatory)
async threads watch new requests in database
async threads forward them to the destination(device).
In this case the programme can send client confirmation straight away after the step 2).
Thus, not waiting for the destination to respond(device response time can be too long).
If I stored client notification in memory i would use BlockingQueue. But I need to persist my notifications in db. Also, I cannot use Message Queues, because clients want rest endpoints to send notifications.
Help me to work out the architecture of such a mechanism.
PS In Java, Postgresql
Here are some ideas that can lead to the solution:
Probably the step 2 is mandatory to make sure that the request is persisted so that rather it will be queried. So we're talking about some "data model" here.
With this in mind, if you "send" the confirmation "right away after the step 2" - what if later you want to do some action with this data (say, send it somewhere) and this action doesn't succeed. You store it on disk? what happens if the disk is full?
The most important question is what happens to your data model (in the database) in this case? Should the entry in the database still be there or the whole "logical" action has failed? This is something you should figure out depending on the actual system the answers can be different.
The most "strict" solution would use transactions in the following (schematic) way:
tr = openTransaction()
try {
saveRequestIntoDB(data);
forwardToDestination(data);
tr.commit();
} catch(SomeException ex) {
tr.rollback();
}
With this design, if something goes wrong during the "saveRequest" step - well, nothing will happen. If the data is stored in db, but then forwardToDestination fails - then the transaction will be rolled back and the record won't be stored in DB.
If all the operations succeed - the transaction will be committed.
Now It looks like you still can use the messaging system in step 4. Sending message can be fast and won't add any significant overhead to the whole request.
On the other hand, the benefits are obvious:
- Who listens to these "notifications"? If you send something and only one service should receive and process the notification how do you make sure that others won't get it? How would you implement the opposite - what if all the services should get the notification and process it independently?
These facilities are already implemented by any descent messaging system.
I can't really understand the statement:
I cannot use Message Queues, because clients want rest endpoints to send notifications.
Since the whole flow is originated by the client's request I don't see any contradication here. The code that is called from rest endpoint (which is after all is a logic entrypoint that should be implemented by you) can call the database, persist the data and then send the notification...

Is it OK to share observables between multiple clients using SSE?

I have a service (ServiceA) with an endpoint to which client can subscribe and after subscription, this service produces data continuously using server sent events.
If this is important, I am using Project Reactor with Java.
It may be important, so I'll explain what this endpoint does. Every 15 seconds it fetches data from another service (ServiceB), checks if there were some changes with data that it fetched 15 seconds ago and if there were, it prouces a new event with this data, if there were no changes, it does not send anything (so the payload to the client is as small as possible).
Now, this application can have multiple clients connected at once and they all ask for the same data - it is not filtered by the user etc.
Is it sensible that this observable producing the output is shared between multiple clients?
Of course it would save us a lot of unnecessary calls to the ServiceB, but I wonder if there are any counterindications to this approach - it is the first time I am writing reactive program on the backend (coming from the RxJS) and I don't know if this would cause any concurrency problems or any other sort of problems.
The other benefit I can see is that a new client connecting would immediately be served the last received data from the ServiceB (it usually takes about 4s per call to retrieve this data).
I also wonder if it would be possible that this observable is calling the ServiceB only if there are some subscribers - i.e. until there is at least one subscriber, call the service, if there are no subscribers stop calling it, when a new subscriber subscribes call it again but first fetch the client the last fetched data (no matter how old or stale it may be).
your SSE source can perfectly be shared using the following pattern:
source.publish().refCount();
Note that you need to store the return value of that call and return that same instance to subsequent callers in order for the sharing to occur.
Once all subscribers unsubscribe, refCount will also cancel its subscription to the original source. After that the first subscriber to come in will trigger a new subscription to the source, which you should craft so that it fetches the latest data and re-initializes a polling cycle every 15s.

How to check if the DELETE API has been called

I have a task to implement a distributed Queuing System something like the Amazon SQS.
If there is GET Request, I have to deliver the message to the user from the main queue and put the message in the invisible queue. And immediately a DELETE Request should come and I should delete the message from the invisible queue.
In case there is no DELETE Request, I am supposed to increase the redelivery count and send the message back to the main queue. This will happen till the redelivery count becomes 5 after which I will delete the message permanently.
Now my doubt is, how do I know that there has been no DELETE request which means that I should send the message back to the main queue?
My program works for the case where the DELETE Request follows the GET Request. I am using java for this implementation.
First of all, at the design level, the get and delete should be done in one action. Notice that in the JDK, the pull() operation of Queue will do get and delete. if you insist on separate actions, at the very least you should support an optional get-and-delete request type.
now, there is a problem when you want to detect an action that did not happen because it can forever "maybe happen in the future". So you need to set a window of time after which you decide that the expected action did not happen.
what is usually done is that you attach a "received" timestamp to the request (and also re-deliver count) before putting it in the invisible queue (a better name would be "pending delete requests" queue) you can wrap the request in a custom java class that adds the properties.
actually, I don't think a queue is a good choice for a collection. when a delete request does come, you need random access to the request. so perhaps a hash map is a better choice.
you will need to implement a Timer that invokes tasks every x seconds. the tasks will scan the pendingDeleteRequests map for requests that did not recevie delete in the allowed window of time and remove from the map.
last note: some messaging systems have "dead letter" feature, which is a destination where notices of failed deliveries are sent. this will help in debugging of problems.

How to check if a message is really delivered in Netty using websocket?

I'm developing a websocket application by using Netty. I'd like to know if a message is really delivered from a source to a destination. In particular, let's assume that a client and a server have an open channel and exchange some messages for a while. At a certain point, the client goes down, but the channel is still active in Netty. I tried to use isReachable() before sending the message, but this method seems to be buggy in some scenarios (e.g. a machine with Win7 is up, but isReachable() returns false). Now, my idea is to implement a mechanism using ACKs, namely the server sends the message and the client sends back an ack. To do that, I need a timeout to see if, after a certain interval, the corresponding ack does not arrive. Is there something similar in Netty?
Regarding isReachable() - it's only a best effort API. The documentation points out that it tries to send an ICMP echo request or create a TCP connection to port 7 on the destination host, both of which are highly likely to be blocked by a firewall. Is this happening in your case?
As for the acknowledgement, there's nothing in Netty that provides this as standard, but it shouldn't be too difficult to implement. Firstly each message needs to be uniquely identifible by some sort of identifier, possibly a sequence number but a globally unique identifier means you can potentially recover across disconnections. Then you want to create a combined handler that implements both ChannelInboundHandler and ChannelOutboundHandler (assuming Netty 4). When a message is sent
add the message to a map indexed by its id
create a timer associated with the message id. Add it to another map indexed by message id
forward the message
When the ACK is received cancel the timer and remove the timer and message from their respective maps. If the timer fires use the associated id to decide what to do with the timer and message (possibly retransmit and reset the timer).
Netty provides a HashedWheelTimer for efficiently managing lots of timers with a resolution suitable for this kind of activity.
You may also want to consider putting a limit on the number of retries so you can stop and raise an error rather than continually indefinitely.

How to know when a server updates my HashMap in java

I have a client end (say Customer) that sends request (with RequestID 1) to server end and receives ack for the sent request. My server end (say SomeStore) processes the request 1 and sends to Customer and receives ack (or resends three times). I have another thread listening at Customer. Upon receiving the Customer's listener thread should update HashMap at key 1. All I need is to wait and retrieve this updated value at key 1.
I have a thread from a threadpool to send request and recieve ack on both ends. I see that both threads do the process of sending. I also have a threadpool for listener. After receiving ack, if I make my main thread wait in a while loop, I don't see the listener's update. (Here I cannot make it with wait()). I don't understand this behavior. Shouldn't both threads be working?
I tried changing my implementation and created a separate class upon receiving and synchroned with this.wait() on myHashMap.get(key) and this.notify() on myHashMap.set(key, value). It works a couple of times and not always. My understanding is that it depends on which thread gets the lock first.
How else do I wait and listen at the same time? Maybe I am overseeing something obvious...
It is easy to receive reply instead of ack but my request gets lost in the network. Therefore using ack. I am already using Callable<> for ack. Any idea is appreciated...
I suspect you are not using thread safe access to the map.
If it's not a ConcurrentHashMap and you are not using synchronization, there is no guarentee you will ever see a change in a HashMap.
Instead of using wait/notify and your own threads, I suggest you use ConcurrentHashMap and ExecutorService and add tasks to perform the update. This will ensure you process and see every update.

Categories

Resources