I'm working on state processing function where im keying on an ID and would like to hold a state for like 60 secs.
DataStream<String> keyedStream = stream.keyBy(Events::getAnonymousId)
.process(new KeyedProcessingWithCallBack(Long.parseLong(parameters.get("ttl"))))
.uid("keyed-processing");
keyedStream.sinkTo(sink(parameters))
.uid("kafka-sink");
public class KeyedProcessingWithCallBack extends KeyedProcessFunction<String, Events, String> {
ValueState<Boolean> anonymousIdHasBeenSeen;
private final long stateTtl;
public KeyedProcessingWithCallBack(long stateTtl) {
this.stateTtl = stateTtl;
}
#Override
public void open(Configuration parameters) throws Exception {
ValueStateDescriptor<Boolean> desc = new ValueStateDescriptor<>("anonymousIdHasBeenSeen", Types.BOOLEAN);
// defines the time the state has to be stored in the state backend before it is auto cleared
anonymousIdHasBeenSeen = getRuntimeContext().getState(desc);
}
#Override
public void processElement(EngagerEvents value, KeyedProcessFunction<String, Events, String>.Context ctx, Collector<String> out) throws Exception {
if (anonymousIdHasBeenSeen.value() == null) {
System.out.println("Value is NULL : " +value.getAnonymousId());
// key is not available in the state
anonymousIdHasBeenSeen.update(true);
System.out.println("TIMER START TIME: " +ctx.timestamp());
ctx.timerService().registerProcessingTimeTimer(ctx.timestamp() + (stateTtl * 1000));
out.collect(value.getEventString());
}
}
#Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out)
throws Exception {
// triggers after ttl has passed
System.out.println("Call back triggered : time : " +timestamp + " value : " +anonymousIdHasBeenSeen.value());
if (anonymousIdHasBeenSeen.value()) {
anonymousIdHasBeenSeen.clear();
}
}
I have registered a timer to clear value from state. however as per my logs it is triggering correctly. but my process element is accepting the value for the same key before even it is cleared in the call back.
Expect data in the output topic are separated by a minute gap, but is not.
Can someone point out the mistake in my implementation here. I'm spanning multiple threads to pump request at the same time.
Related
I am trying to figure out how to determine if all async HTTP GET requests I've made have completed, so that I can execute another method. For context, I have something similar to the code below:
public void init() throws IOException {
Map<String, CustomObject> mapOfObjects = new HashMap<String, CustomObject>();
ObjectMapper mapper = new ObjectMapper();
// some code to populate the map
mapOfObjects.forEach((k,v) -> {
HttpClient.asyncGet("https://fakeurl1.com/item/" + k, createCustomCallbackOne(k, mapper));
// HttpClient is just a wrapper class for your standard OkHTTP3 calls,
// e.g. client.newcall(request).enqueue(callback);
HttpClient.asyncGet("https://fakeurl2.com/item/" + k, createCustomCallbackTwo(k, mapper));
});
}
private createCustomCallbackOne(String id, ObjectMapper mapper) {
return new Callback() {
#Override
public void onResponse(Call call, Response response) throws IOException {
if (response.isSuccessful()) {
try (ResponseBody body = response.body()) {
CustomObject co = mapOfObjects.get(id);
if (co != null) {
co.setFieldOne(mapper.readValue(body.byteStream(), FieldOne.class)));
}
} // implicitly closes the response body
}
}
#Override
public void onFailure(Call call, IOException e) {
// log error
}
}
}
// createCustomCallbackTwo does more or less the same thing,
// just sets a different field and then performs another
// async GET in order to set an additional field
So what would be the best/correct way to monitor all these asynchronous calls to ensure they have completed and I can go about performing another method on the Objects stored inside the map?
The most simple way would be to keep a count of how many requests are 'in flight'. Increment it for each request enqueued, decrement it at the end of the callback. When/if the count is 0, any/all requests are done. Using a semaphore or counting lock you can wait for it to become 0 without polling.
Note that the callbacks run on separate threads, so you must provide some kind of synchronization.
If you want to create a new callback for every request, you could use something like this:
public class WaitableCallback implements Callback {
private boolean done;
private IOException exception;
private final Object[] signal = new Object[0];
#Override
public void onResponse(Call call, Response response) throws IOException {
...
synchronized (this.signal) {
done = true;
signal.notifyAll();
}
}
#Override
public void onFailure(Call call, IOException e) {
synchronized (signal) {
done = true;
exception = e;
signal.notifyAll();
}
}
public void waitUntilDone() throws InterruptedException {
synchronized (this.signal) {
while (!this.done) {
this.signal.wait();
}
}
}
public boolean isDone() {
synchronized (this.signal) {
return this.done;
}
}
public IOException getException() {
synchronized (this.signal) {
return exception;
}
}
}
Create an instance for every request and put it into e.g. a List<WaitableCallback> pendingRequests.
Then you can just wait for all requests to be done:
for ( WaitableCallback cb : pendingRequests ) {
cb.waitUntilDone();
}
// At this point, all requests have been processed.
However, you probably should not create a new identical callback object for every request. Callback's methods get the Call passed as parameter so that the code can examine it to figure out which request it is processing; and in your case, it seems you don't even need that. So use a single Callback instance for the requests that should be handled identically.
If the function asyncGet calls your function createCustomCallbackOne then its easy.
For each key you are calling two pages. "https://fakeurl1.com/item/" and "https://fakeurl2.com/item/" (left out + k)
So you need a map to trach that and just one call back function is enough.
Use a map with key indicating each call:
static final Map<String, Integer> trackerOfAsyncCalls = new HashMap<>();
public void init() throws IOException {
Map<String, CustomObject> mapOfObjects = new HashMap<String, CustomObject>();
//need to keep a track of the keys in some object
ObjectMapper mapper = new ObjectMapper();
trackerOfAsyncCalls.clear();
// some code to populate the map
mapOfObjects.forEach((k,v) -> {
HttpClient.asyncGet("https://fakeurl1.com/item/" + k, createCustomCallback(k,1 , mapper));
// HttpClient is just a wrapper class for your standard OkHTTP3 calls,
// e.g. client.newcall(request).enqueue(callback);
HttpClient.asyncGet("https://fakeurl2.com/item/" + k, createCustomCallback(k, 2, mapper));
trackerOfAsyncCalls.put(k + "-2", null);
});
}
//final important
private createCustomCallbackOne(final String idOuter, int which, ObjectMapper mapper) {
return new Callback() {
final String myId = idOuter + "-" + which;
trackerOfAsyncCalls.put(myId, null);
#Override
public void onResponse(Call call, Response response) throws IOException {
if (response.isSuccessful()) {
trackerOfAsyncCalls.put(myId, 1);
///or put outside of if if u dont care if success or fail or partial...
Now set up a thread or best a schduler that is caclled every 5 seconds, check all eys in mapOfObjects and trackerOfAsyncCalls to see if all keys have been started and some final success or timeout or error status has been got for all.
In RxJava and Reactor there is this notion of virtual time to tests operators that are dependent of time. I cant figure out how to do this in Flink. For example I have put together the following example where I would like to play around with late arriving events to understand how they are handled. However im not able to understand how such a test would look? Is there a way to combine Flink and Reactor to make the tests better?
public class PlayWithFlink {
public static void main(String[] args) throws Exception {
final OutputTag<MyEvent> lateOutputTag = new OutputTag<MyEvent>("late-data"){};
// TODO understand how BoundedOutOfOrderness is related to allowedLateness
BoundedOutOfOrdernessTimestampExtractor<MyEvent> eventTimeFunction = new BoundedOutOfOrdernessTimestampExtractor<MyEvent>(Time.seconds(10)) {
#Override
public long extractTimestamp(MyEvent element) {
return element.getEventTime();
}
};
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<MyEvent> events = env.fromCollection(MyEvent.examples())
.assignTimestampsAndWatermarks(eventTimeFunction);
AggregateFunction<MyEvent, MyAggregate, MyAggregate> aggregateFn = new AggregateFunction<MyEvent, MyAggregate, MyAggregate>() {
#Override
public MyAggregate createAccumulator() {
return new MyAggregate();
}
#Override
public MyAggregate add(MyEvent myEvent, MyAggregate myAggregate) {
if (myEvent.getTracingId().equals("trace1")) {
myAggregate.getTrace1().add(myEvent);
return myAggregate;
}
myAggregate.getTrace2().add(myEvent);
return myAggregate;
}
#Override
public MyAggregate getResult(MyAggregate myAggregate) {
return myAggregate;
}
#Override
public MyAggregate merge(MyAggregate myAggregate, MyAggregate acc1) {
acc1.getTrace1().addAll(myAggregate.getTrace1());
acc1.getTrace2().addAll(myAggregate.getTrace2());
return acc1;
}
};
KeySelector<MyEvent, String> keyFn = new KeySelector<MyEvent, String>() {
#Override
public String getKey(MyEvent myEvent) throws Exception {
return myEvent.getTracingId();
}
};
SingleOutputStreamOperator<MyAggregate> result = events
.keyBy(keyFn)
.window(EventTimeSessionWindows.withGap(Time.seconds(10)))
.allowedLateness(Time.seconds(20))
.sideOutputLateData(lateOutputTag)
.aggregate(aggregateFn);
DataStream lateStream = result.getSideOutput(lateOutputTag);
result.print("SessionData");
lateStream.print("LateData");
env.execute();
}
}
class MyEvent {
private final String tracingId;
private final Integer count;
private final long eventTime;
public MyEvent(String tracingId, Integer count, long eventTime) {
this.tracingId = tracingId;
this.count = count;
this.eventTime = eventTime;
}
public String getTracingId() {
return tracingId;
}
public Integer getCount() {
return count;
}
public long getEventTime() {
return eventTime;
}
public static List<MyEvent> examples() {
long now = System.currentTimeMillis();
MyEvent e1 = new MyEvent("trace1", 1, now);
MyEvent e2 = new MyEvent("trace2", 1, now);
MyEvent e3 = new MyEvent("trace2", 1, now - 1000);
MyEvent e4 = new MyEvent("trace1", 1, now - 200);
MyEvent e5 = new MyEvent("trace1", 1, now - 50000);
return Arrays.asList(e1,e2,e3,e4, e5);
}
#Override
public String toString() {
return "MyEvent{" +
"tracingId='" + tracingId + '\'' +
", count=" + count +
", eventTime=" + eventTime +
'}';
}
}
class MyAggregate {
private final List<MyEvent> trace1 = new ArrayList<>();
private final List<MyEvent> trace2 = new ArrayList<>();
public List<MyEvent> getTrace1() {
return trace1;
}
public List<MyEvent> getTrace2() {
return trace2;
}
#Override
public String toString() {
return "MyAggregate{" +
"trace1=" + trace1 +
", trace2=" + trace2 +
'}';
}
}
The output of running this is:
SessionData:1> MyAggregate{trace1=[], trace2=[MyEvent{tracingId='trace2', count=1, eventTime=1551034666081}, MyEvent{tracingId='trace2', count=1, eventTime=1551034665081}]}
SessionData:3> MyAggregate{trace1=[MyEvent{tracingId='trace1', count=1, eventTime=1551034166081}], trace2=[]}
SessionData:3> MyAggregate{trace1=[MyEvent{tracingId='trace1', count=1, eventTime=1551034666081}, MyEvent{tracingId='trace1', count=1, eventTime=1551034665881}], trace2=[]}
However I would expect to see the lateStream trigger for the e5 event that should be 50seconds before the first event triggers.
If you modify your watermark assigner to be like this
AssignerWithPunctuatedWatermarks eventTimeFunction = new AssignerWithPunctuatedWatermarks<MyEvent>() {
long maxTs = 0;
#Override
public long extractTimestamp(MyEvent myEvent, long l) {
long ts = myEvent.getEventTime();
if (ts > maxTs) {
maxTs = ts;
}
return ts;
}
#Override
public Watermark checkAndGetNextWatermark(MyEvent event, long extractedTimestamp) {
return new Watermark(maxTs - 10000);
}
};
then you will get the results you expect. I'm not recommending this -- just using it to illustrate what's going on.
What's happening here is that a BoundedOutOfOrdernessTimestampExtractor is a periodic watermark generator that will only insert a watermark into the stream every 200 msec (by default). Because your job completes long before then, the only watermark your job is experiencing is the one the Flink injects at the end of every finite stream (with value MAX_WATERMARK). Lateness is relative to watermarks, and the event that you expected to be late is managing to arrive before that watermark.
By switching to punctuated watermarks you can force watermarking to occur more often, or more precisely at specific points in the stream. This is generally unnecessary (and too frequent watermarking causes overhead), but is helpful when you want to have strong control over the sequencing of watermarks.
As for how to write tests, you might take a look at the test harnesses used in Flink's own tests, or at flink-spector.
Update:
The time interval associated with the BoundedOutOfOrdernessTimestampExtractor is a specification of how out-of-order the stream is expected to be. Events that arrive within this bound are not considered late, and event time timers won't fire until this delay has elapsed, thereby giving time for out-of-order events to arrive. allowedLateness only applies to the window API, and describes for how long past the normal window firing time the framework keeps window state so that events can still be added to a window and cause late firings. After this additional interval, window state is cleared and subsequent events are sent to the side output (if configured).
So when you use BoundedOutOfOrdernessTimestampExtractor<MyEvent>(Time.seconds(10)) you are not saying "wait 10 seconds after every event in case earlier events might still arrive". But you are saying that your events should be at most 10 seconds out of order. So if you are processing a live, real-time event stream, this means you will wait for at most 10 seconds in case earlier events arrive. (And if you are processing historic data, then you may be able to process 10 seconds of data in 1 second, or not -- knowing you will wait for n seconds of event time to pass says nothing about how long it will actually take.)
For more on this topic, see Event Time and Watermarks.
I've been following AxonBank example in order to understand implementation of Saga in Axon framework and have some code like this for starting and ending saga
#Saga
public class MoneyTransferSaga {
#Inject
private transient CommandGateway commandGateway;
private String targetAccount;
private String transferId;
#StartSaga
#SagaEventHandler(associationProperty = "transferId")
public void on(MoneyTransferRequestedEvent event) {
System.out.println("Inside start saga for money transfer event");
targetAccount = event.getTargetAccount();
transferId = event.getTransferId();
SagaLifecycle.associateWith("transactionId", transferId);
System.out.println("## These are the params going into WMC : sourceAccount: " + event.getSourceAccount()
+ " transferID: " + transferId + " event.getAmount: " + event.getAmount());
commandGateway.send(new WithdrawMoneyCommand(event.getSourceAccount(), transferId, event.getAmount()),
new CommandCallback<WithdrawMoneyCommand, Object>() {
#Override
public void onSuccess(CommandMessage<? extends WithdrawMoneyCommand> commandMessage,
Object result) {
}
#Override
public void onFailure(CommandMessage<? extends WithdrawMoneyCommand> commandMessage,
Throwable cause) {
System.out.println("On failure of withdraw money command inside saga ");
System.out.println("###################### Cause of failure = " + cause);
commandGateway.send(new CancelMoneyTransferCommand(event.getTransferId()));
}
});
}
#SagaEventHandler(associationProperty = "transactionId")
public void on(MoneyWithdrawnEvent event) {
System.out.println("Inside saga event handler for monney withdrawnevent");
commandGateway.send(new DepositMoneyCommand(targetAccount, event.getTransactionId(), event.getAmount()),
LoggingCallback.INSTANCE);
}
#SagaEventHandler(associationProperty = "transactionId")
public void on(MoneyDepositedEvent event) {
System.out.println("Inside saga event handler for money deposited event");
commandGateway.send(new CompleteMoneyTransferCommand(transferId), LoggingCallback.INSTANCE);
}
#EndSaga
#SagaEventHandler(associationProperty = "transferId")
public void on(MoneyTransferCompletedEvent event) {
System.out.println("Inside Endsaga for money transfer complete event");
}
#SagaEventHandler(associationProperty = "transferId")
public void on(MoneyTransferCancelledEvent event) {
end();
}
}
After performing the money transfer via the REST API all this code gets executed as I can see my logs getting printed to console and all transactions being stored in account table.
All entries exist in domain_event_entry also, but the saga_entry and association_value_entry tables remain empty regardless of whether the transaction is success or failure.
Initially I thought this might be because of a misconfigured saga store so I configured the saga store with MongoSagaStore but still the collection of Saga remains empty.
So am I missing something here or axon just deletes the data from these tables after the saga is complete?
AxonFramework will automatically remove a Saga entry from its storage, including any associations, when it has ended. So you'll only ever see information of active instances, there.
In the sample application, all bus components use the "Simple..." implementation, which basically means all activities are executed in the same thread. Therefore, once you have received the OK nor NotOK, all activities by the Saga will have ended as well.
If you were to replace them with Async or Distributed implementations, this is no longer the case. You OK will be returned before the entire process has finished.
I am trying to implement the guaranteed message processing but the ack or fail methods on the Spout are not being called.
I am passing the a message ID object with the spout.
I am passing the tuple with each bolt and calling collector.ack(tuple) in each bolt.
Question
The ack or fail is not being called and I cannot work out why?
Here is a shortened code sample.
Spout Code using BaseRichSpout
public void nextTuple() {
for( String usage : usageData ) {
.... further code ....
String msgID = UUID.randomUUID().toString()
+ System.currentTimeMillis();
Values value = new Values(splitUsage[0], splitUsage[1],
splitUsage[2], msgID);
outputCollector.emit(value, msgID);
}
}
#Override
public void ack(Object msgId) {
this.pendingTuples.remove(msgId);
LOG.info("Ack " + msgId);
}
#Override
public void fail(Object msgId) {
// Re-emit the tuple
LOG.info("Fail " + msgId);
this.outputCollector.emit(this.pendingTuples.get(msgId), msgId);
}
Bolt Code using BaseRichBolt
#Override
public void execute(Tuple inputTuple) {
this.outputCollector.emit(inputTuple, new Values(serverData, msgId));
this.outputCollector.ack(inputTuple);
}
Final Bolt
#Override
public void execute(Tuple inputTuple) {
..... Simply reports does not emit .....
this.outputCollector.ack(inputTuple);
}
The reason the ack did not work was the use of the for loop in the spout. Changed this to a counter loop version below the emit and it works.
Example
index++;
if (index >= dataset.size()) {
index = 0;
}
Further to this thanks to the mailing list info.
Its because the Spout runs on a single thread and will block in a for loop, as next tuple will not return therefore it will never be able to call ACK method.
I'm trying to use H2 Trigger facility to let clients connected to a H2 database in automatic mixed mode (AUTO_SERVER=TRUE) receive notification when something changes in the database table
test(id INTEGER NOT NULL AUTO_INCREMENT, message varchar(1024))
So far only the H2 server receives the TRIGGER notification, while clients cannot receive any notification therefore their only way to check for changes to the database is to poll with queries to the table, but this way the TRIGGER itself is useless, I could just simply all clients and server poll the database for changes!.
Is there some way to let a trigger notify all clients connected or call a method inside each client so that they realize the table has been modified with an insertion (doesn't bother me the delete or update cases)?
I post my code below which is based on this answer by Thomas Mueller (H2 database creator):
import java.sql.*;
import java.util.concurrent.atomic.AtomicLong;
import org.h2.api.Trigger;
public class TestSimpleDb
{
public static void main(String[] args) throws Exception
{
final String url = "jdbc:h2:test;create=true;AUTO_SERVER=TRUE;multi_threaded=true";
boolean isSender = false;
for (String arg : args)
{
if (arg.contains("receiver"))
{
System.out.println("receiver starting");
isSender = false;
}
else if (arg.contains("sender"))
{
System.out.println("sender starting");
isSender = true;
}
}
if (isSender)
{
Connection conn = DriverManager.getConnection(url);
Statement stat = conn.createStatement();
stat.execute("create table test(id INTEGER NOT NULL AUTO_INCREMENT, message varchar(1024))");
stat.execute("create trigger notifier "
+ "before insert, update, delete, rollback "
+ "on test FOR EACH ROW call \""
+ TestSimpleDb.Notifier.class.getName() + "\"");
Thread.sleep(500);
for (int i = 0; i < 10; i++) {
System.out.println("Sender: I change something...");
stat.execute("insert into test(message) values('my message')");
Thread.sleep(1000);
}
conn.close();
}
else
{
new Thread() {
public void run() {
try {
Connection conn = DriverManager.getConnection(url);
while (true) {
;
//this loop is just to keep the thread alive..
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}.start();
}
}
public static class Notifier implements Trigger
{
#Override
public void init(Connection cnctn, String string, String string1, String string2, boolean bln, int i) throws SQLException {
// Initializing trigger
}
#Override
public void fire(Connection conn, Object[] oldRow, Object[] newRow) throws SQLException {
if (newRow != null) {
System.out.println("Received: " + (String) newRow[1]);
}
}
#Override
public void close() {
// ignore
}
#Override
public void remove() {
// ignore
}
}
}
Like all triggers, this trigger is called on the server, that is, when using the automatic mixed mode (like you do) in the process that opened the database first. Therefore, if I first start the "sender", then I get the following output there:
sender starting
Sender: I change something...
Received: my message
Sender: I change something...
Received: my message
and if I then start the "receiver", I get the following messages there:
Receiver: event received
Receiver: event received
Receiver: event received
If you want that the "receiver" can display what rows were changed, you would need a different architecture. For example, you could add a timestamp column to the table (and an index for this column), and then, on the receiver side, query for the rows where the timestamp is new. This will only work for added and changed rows; for removed rows, you might need to add a new table that contains the removed rows since time x. This table would need to be garbage collected from time to time so that it doesn't grow forever.