How to change timestamp of records? - java

I'm using FluentD (v.12 last stable version) to send messages to Kafka. But FluentD is using an old KafkaProducer, so that the records timestamp is always set to -1.
Thus i have to use the WallclockTimestampExtractor to set the timestamp of the record to the point in time, when the message arrives in kafka.
Is there a Kafka Streams-specific solution?
The timestamp i'm realy interested in, is send by fluentd within the message:
"timestamp":"1507885936","host":"V.X.Y.Z."
record representation in kafka:
offset = 0, timestamp= - 1, key = null, value = {"timestamp":"1507885936","host":"V.X.Y.Z."}
i would like to have a record like this in kafka:
offset = 0, timestamp= 1507885936, key = null, value = {"timestamp":"1507885936","host":"V.X.Y.Z."}
my workaround would look like:
write a consumer to extract the timestamp (https://kafka.apache.org/0110/javadoc/org/apache/kafka/streams/processor/TimestampExtractor.html)
write a producer to produce a new record with the timestamp set (ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value)
I would prefer a KafkaStreams solution, if there is one.

You can write a very simple Kafka Streams Application like:
KStreamBuilder builder = new KStreamBuilder();
builder.stream("input-topic").to("output-topic");
and configure the application with a custom TimestampExtractor that extract the timestamp from the record and returns it.
Kafka Streams will use the returned timestamps when writing the records back to Kafka.
Note: if you have out of order data -- ie, timestamps are not strictly ordered -- the result will contain out of order timestamps, too. Kafka Streams uses the returned timestamps to writing back to Kafka (ie, whatever the extractor returns, is used as record metadata timestamp). Note, that on write, the timestamp from the currently processed input record is used for all generated output records -- this hold for version 1.0 but might change in future releases.).
Update:
In general, you can modify timestamps via the Processor API. Calling context.forward() you can set the output record timestamp via To.all().withTimestamp(...) as a parameter for forward().

Related

Set the database current timestamp while inserting using JOOQ

I am using the following code segment to do the insertion using JOOQ's UpdatableRecord.
public void acknowledgeDisclaimer(AcknowledgeDisclaimerReq acknowledgeDisclaimerReq) {
DisclaimerRecord disclaimerRecord = dslContext.newRecord(Disclaimer.DISCLAIMER);
disclaimerRecord.setDisclaimerForId(acknowledgeDisclaimerReq.getDealListingId());
disclaimerRecord.setDisclaimerForType("DEAL");
disclaimerRecord.setAcceptedAt(LocalDateTime.now());
disclaimerRecord.setAcceptedByOwnerId(acknowledgeDisclaimerReq.getLoggedInOwnerId());
int count = disclaimerRecord.store();
log.info("Inserted entry for disclaimer for deal: {}, owner: {}, id {}, insertCount: {}", disclaimerRecord.getDisclaimerForId(), disclaimerRecord.getAcceptedByOwnerId(), disclaimerRecord.getId(), count);
}
When setting the AcceptedAt data, I want to use the database's current timestamp instead of passing the JVM timestamp. Is there any way to do that in JOOQ?
UpdatableRecord.store() can only set Field<T> => T key/values, not Field<T> => Field<T>, so you cannot set an expression in your record. You can obviously run an explicit INSERT / UPDATE / MERGE statement instead.
Using triggers
The best way to ensure such a timestamp is set to the database timestamp whenever you run some specific DML on the table is to use a database trigger (you could make the trigger watch for changes in the ACCEPTED_BY_OWNER_ID value)
If you can't do this on the server side (which is the most reliable, because it will behave correctly for all database clients, not just the JDBC/jOOQ based ones), you might have a few client side options in jOOQ:
Using jOOQ 3.17 client side computed columns
jOOQ 3.17 has added support for stored (or virtual) client side computed columns, a special case of which are audit columns (which is almost what you're doing).
Using this, you can specify, for example:
<forcedType>
<generator><![CDATA[
ctx -> org.jooq.impl.DSL.currentTimestamp()
]]></generator>
<includeExpression>(?i:ACCEPTED_AT)</includeExpression>
</forcedType>
The above acts like a trigger that sets the ACCEPTED_AT date to the current timestamp every time you write to the table. In your case, it'll be more like:
<forcedType>
<generator><![CDATA[
ctx -> org.jooq.impl.DSL
.when(ACCEPTED_BY_OWNER_ID.isNotNull(), org.jooq.impl.DSL.currentTimestamp())
.else_(ctx.table().ACCEPTED_AT)
]]></generator>
<includeExpression>(?i:ACCEPTED_AT)</includeExpression>
</forcedType>
See a current limitation of the above here:
https://github.com/jOOQ/jOOQ/issues/13809
See the relevant manual sections here:
Client side computed columns
Audit columns
Should be something like disclaimerRecord.setAcceptedAt(DSL.now());

How to ensure the expiry of a stream data structure in redis is set once only?

I have a function that use lettuce to talk to a redis cluster.
In this function, I insert data into a stream data structure.
import io.lettuce.core.cluster.SlotHash;
...
public void addData(Map<String, String> dataMap) {
var sync = SlotHash.getSlot(key).sync()
sync.xadd(key, dataMap);
}
I also want to set the ttl when I insert a record for the first time. It is because part of the user requirement is expire the structure after a fix length of time. In this case it is 10 hour.
Unfortunately the XADD function does not accept an extra parameter to set the TTL like the SET function.
So now I am setting the ttl this way:
public void addData(Map<String, String> dataMap) {
var sync = SlotHash.getSlot(key).sync()
sync.xadd(key, dataMap);
sync.expire(key, 60000 /* 10 hours */);
}
What is the best way to ensure the I will set the expiry time only once (i.e. when the stream structure is first created)? I should not set TTL multiple times within the function because every call to xadd will also follow by a call of expire which effectively postpone the expiry time indefinitely.
I think I can always check the number of items in the stream data structure but it is an overhead. I don't want to keep flags in the java application side because the app could be restarted and this information will be removed from the memory.
You may want to try lua script, sample script below which sets the expiry only if it's not set for key, works with any type of key in redis.
eval "local ttl = redis.call('ttl', KEYS[1]); if ttl == -1 then redis.call('expire', KEYS[1], ARGV[1]); end; return ttl;" 1 mykey 12
script also returns the actual expiry time left in seconds.

Spark writing to Cassandra with varying TTL

In Java Spark, I have a dataframe that has a 'bucket_timestamp' column, which represents the time of the bucket that the row belongs to.
I want to write the dataframe to a Cassandra DB. The data must be written to the DB with TTL. The TTL should be depended on the bucket timestamp - where each row's TTL should be calculated as ROW_TTL = CONST_TTL - (CurrentTime - bucket_timestamp), where CONST_TTL is a constant TTL that I configured.
Currently I am writing to Cassandra with spark using a constant TTL, with the following code:
df.write().format("org.apache.spark.sql.cassandra")
.options(new HashMap<String, String>() {
{
put("keyspace", "key_space_name");
put("table, "table_name");
put("spark.cassandra.output.ttl, Long.toString(CONST_TTL)); // Should be depended on bucket_timestamp column
}
}).mode(SaveMode.Overwrite).save();
One possible way I thought about is - for each possible bucket_timestamp - filter the data according to timestamp, calculate the TTL and write filtered data to Cassandra. but this seems very non-efficient and not the spark way. Is there a way in Java Spark to provide a spark column as the TTL option, so that the TTL will differ for each row?
Solution should be working with Java and dataset< Row>: I encountered some solutions for performing this with RDD in scala, but didn't find a solution for using Java and dataframe.
Thanks!
From Spark-Cassandra connector options (https://github.com/datastax/spark-cassandra-connector/blob/v2.3.0/spark-cassandra-connector/src/main/java/com/datastax/spark/connector/japi/RDDAndDStreamCommonJavaFunctions.java) you can set the TTL as:
constant value (withConstantTTL)
automatically resolved value (withAutoTTL)
column-based value (withPerRowTTL)
In your case you could try the last option and compute the TTL as a new column of the starting Dataset with the rule you provided in the question.
For use case you can see the test here: https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/it/scala/com/datastax/spark/connector/writer/TableWriterSpec.scala#L612
For DataFrame API there is no support for such functionality, yet... There is JIRA for it - https://datastax-oss.atlassian.net/browse/SPARKC-416, you can watch it to get notified when it's implemented...
So only choice that you have is to use RDD API as described in the #bartosz25's answer...

Hazelcast get ttl of key in Imap

I am using set to put values on IMap where i set the ttl.
The problem i am trying to solve is, when i read the key from the map, i want to be able to get the corresponding ttl. I am new to hazelcast, would appreciate some help.
val testMap: IMap[String, String] = hc.getNativeInstance().getMap(testhcMap)
if (!testMap.containsKey(key)) {
val duration = TimeUnit.HOURS
val ttlLen: Long = 1
md5Map.set(key: String, event: acp_event, ttlLen: Long, duration: TimeUnit)
return true
}
The above snippet sets the values. I want to add one more check before inserting data into the IMap, I want to check if the ttl is less than an hour and do some action based on that.
This should help you out:
IMap<String, String> foo;
foo.getEntryView(key).getExpirationTime();
You cannot access the TTL value. You would have to store it (the deadline => currentTime + timeout = deadline) in either key or value before you actually store it in Hazelcast. Easiest way might be to use some envelope-alike class to store the actual value + deadline.

Kafka how to know the latest created partition id for a topic in Java

I am creating partition for a topic dynamically using AdminUtils. I have to get the partitionId of the newly created partition for saving it in Mysql for my business logic. How can I achieve this ?
My current code :
AdminUtils.addPartitions(zkUtils,KAFKA_PRODUCER_TOPIC,currentPartitionCount+1,"",true);
List<PartitionInfo> infoList = producer.partitionsFor(KAFKA_PRODUCER_TOPIC);
for(PartitionInfo info : infoList){
System.out.println(info.partition());
}
createdPartionId = infoList.get(0).partition();
Problem with above is that latest created partition is not coming in the PartitionInfo list. Don't know why.producer is the Kafka Producer API.
You can do a TopicMetadataRequest. See the example. You can do one before and after and compare the results. Otherwise, the new partition always gets added with a newPartitionID == previousLastPartitionID + 1

Categories

Resources