Elasticsearch: comparing dates (painless script)

Elasticsearch: comparing dates (painless script) - java

My mapping of createdAt:
"createdAt": {
"type": "date"
},
I insert the dates like this:
POST logs/_doc/_bulk?pretty
{"index":{"_id":1}}
{"createdAt":"2018-05-01T07:30:00Z","value":"on"}
When I request the documents
GET logs/_doc/_search
It shows me the date as I inserted it:
"_source": {
"createdAt": "2018-05-01T07:30:00Z",
"value":"on"
}
Now I'd like to compare this date with the current time:
"map_script": {
long timestampLog = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.S").parse(doc.createdAt.value).getTime();
long timestampNow = new Date().getTime();
if (timestampNow < timestampLog) {
// case 1
} else {
// case 2
}
}
Weird:
doc.createdAt.value returns "2018-05-01T07:30:00.000Z", which includes milliseconds that I never added.
This error occurs while parsing:
Cannot cast org.joda.time.MutableDateTime to java.lang.String
When I replace doc.createdAt.value by the string 2018-05-01T07:30:00.000Z, it works.
Any help is appreciated. Thank you very much!

Elasticsearch index fields with date types are org.joda.time.DateTime in painless. Using the SimpleDateFormat is the source of the error. Try this instead:
long timestampLog = doc['createdAt'].value.getMillis();
long timestampNow = new Date().getTime();
The rest works as is.
Tested on Elasticsearch 6.3.0.

Please remove the big S in the formatter, check Date and Time Patterns
long timestampLog = new SimpleDateFormat("yyyy-MM-dd'T'hh:mm:ss").parse(doc.createdAt.value).getTime();
long timestampNow = new Date().getTime();

Related

Mongo Format for Storing dates (or response format)

Does Mongo store underlying dates in different formats depending on the Date? I've got 2 Date fields that in Robo3T come back as a Date object. However, when running an aggregation query where I had to write a BsonDateAdapter to handle the Document, I can see one has been stored as a $numberLong.
"dateOne": {
"$date": "2021-09-01T04:00:00Z"
},
"dateTwo": {
"$date": {
"$numberLong": "-86400000"
}
},
dateTwo evaluates to 1969-12-31 00:00:00.000Z so I was wondering if it's before 1970 it does that? Or is it because someone manually updated this date incorrectly? And actually I see that Mongo stores all Dates as a 64bit integer from 1970, so maybe it's the mongo shell and how it returns data.
This was my original code in my BsonDateTypeAdapter, and I've added to it to handle the $numberLong format, but I'm still trying to understand why:
#Override
public Date read(final JsonReader in) throws IOException {
if (in.peek() == JsonToken.BEGIN_OBJECT) {
in.beginObject();
String token = in.nextName();
assert "$date".equals(token);
String date = in.nextString();
in.endObject();
return DateFunctions.convertStringToDate(date);
} else {
in.skipValue();
}
return null;
}

How do I convert Google's DateTime object to a long-type value?

I am trying to integrate a database with a web application that extracts event data from Google Calendar API which inputs the data into the database. The following code is identical to the Quickstart class provided by Google.
I basically want 'DateTime start' to be converted to 'long start'. I need the long value for SQL.
import com.google.api.client.util.DateTime;
// ...
DateTime now = new DateTime(System.currentTimeMillis());
Events events = service.events().list(calendarId)
.setTimeMin(now)
.setOrderBy("startTime")
.setSingleEvents(true)
.execute();
List<Event> items = events.getItems();
if (items.isEmpty()) {
System.out.println("No upcoming events found.");
} else {
System.out.println("Upcoming events");
for (Event event : items) {
DateTime start = event.getStart().getDateTime();
DateTime end = event.getEnd().getDateTime();
if (start == null) {
start = event.getStart().getDate();
}
System.out.printf("%s\n", start.toString());

Google has implemented the Rfc3339 parser in Google HTTP Client Library. You can try parsing it first and the use the DateTime.getValue() function to convert it into long.
You may also try using the DatetimeFormatter to format it to the way you want the value.
DateTimeFormatter formatter = DateTimeFormatter
.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")
.withZone(ZoneId.of("UTC"));
public void convertDatetime() {
String timeStamp = "2019-05-24T11:32:26.553955473Z";
DateTime dateTime = DateTime.parseRfc3339(timeStamp);
long millis = dateTime.getValue();
String result = formatter.format(new Date(millis).toInstant());

Apache Flink: Wierd FlatMap behaviour

I'm ingesting a stream of data into Flink. For each 'instance' of this data, I have a timestamp. I can detect if the machine I'm getting the data from is 'producing' or 'not producing', this is done via a custom flat map function that's located in it's own static class.
I want to calculate how long the machine has been producing / not producing.
My current approach is collecting the production and non production timestamps in two plain lists. For each 'instance' of the data, I calculate the current production/non-production duration by subtracting the latest timestamp from the earliest timestamp. This is giving me incorrect results, though. When the production state changes from producing to non producing, I clear the timestamp list for producing and vice versa, so that if the production starts again, the duration starts from zero.
I've looked into the two lists I collect the respective timestamps in and I see things I don't understand. My assumption is that, as long as the machine 'produces', the first timestamp in the production timestamp list stays the same, while new timestamps are added to the list per new instance of data.
Apparantly, this assumption is wrong since I get seemingly random timestamps in the lists. They are still correctly ordered, though.
Here's my code for the flatmap function:
public static class ImaginePaperDataConverterRich extends RichFlatMapFunction<ImaginePaperData, String> {
private static final long serialVersionUID = 4736981447434827392L;
private transient ValueState<ProductionState> stateOfProduction;
SimpleDateFormat dateFormat = new SimpleDateFormat("dd.MM.yyyy HH:mm:ss.SS");
DateFormat timeDiffFormat = new SimpleDateFormat("dd HH:mm:ss.SS");
String timeDiffString = "00 00:00:00.000";
List<String> productionTimestamps = new ArrayList<>();
List<String> nonProductionTimestamps = new ArrayList<>();
public String calcProductionTime(List<String> timestamps) {
if (!timestamps.isEmpty()) {
try {
Date firstDate = dateFormat.parse(timestamps.get(0));
Date lastDate = dateFormat.parse(timestamps.get(timestamps.size()-1));
long timeDiff = lastDate.getTime() - firstDate.getTime();
if (timeDiff < 0) {
System.out.println("Something weird happened. Maybe EOF.");
return timeDiffString;
}
timeDiffString = String.format("%02d %02d:%02d:%02d.%02d",
TimeUnit.MILLISECONDS.toDays(timeDiff),
TimeUnit.MILLISECONDS.toHours(timeDiff) % TimeUnit.HOURS.toHours(1),
TimeUnit.MILLISECONDS.toMinutes(timeDiff) % TimeUnit.HOURS.toMinutes(1),
TimeUnit.MILLISECONDS.toSeconds(timeDiff) % TimeUnit.MINUTES.toSeconds(1),
TimeUnit.MILLISECONDS.toMillis(timeDiff) % TimeUnit.SECONDS.toMillis(1));
} catch (ParseException e) {
e.printStackTrace();
}
System.out.println("State duration: " + timeDiffString);
}
return timeDiffString;
}
#Override
public void open(Configuration config) {
ValueStateDescriptor<ProductionState> descriptor = new ValueStateDescriptor<>(
"stateOfProduction",
TypeInformation.of(new TypeHint<ProductionState>() {}),
ProductionState.NOT_PRODUCING);
stateOfProduction = getRuntimeContext().getState(descriptor);
}
#Override
public void flatMap(ImaginePaperData ImaginePaperData, Collector<String> output) throws Exception {
List<String> warnings = new ArrayList<>();
JSONObject jObject = new JSONObject();
String productionTime = "0";
String nonProductionTime = "0";
// Data analysis
if (stateOfProduction == null || stateOfProduction.value() == ProductionState.NOT_PRODUCING && ImaginePaperData.actSpeedCl > 60.0) {
stateOfProduction.update(ProductionState.PRODUCING);
} else if (stateOfProduction.value() == ProductionState.PRODUCING && ImaginePaperData.actSpeedCl < 60.0) {
stateOfProduction.update(ProductionState.NOT_PRODUCING);
}
if(stateOfProduction.value() == ProductionState.PRODUCING) {
if (!nonProductionTimestamps.isEmpty()) {
System.out.println("Production has started again, non production timestamps cleared");
nonProductionTimestamps.clear();
}
productionTimestamps.add(ImaginePaperData.timestamp);
System.out.println(productionTimestamps);
productionTime = calcProductionTime(productionTimestamps);
} else {
if(!productionTimestamps.isEmpty()) {
System.out.println("Production has stopped, production timestamps cleared");
productionTimestamps.clear();
}
nonProductionTimestamps.add(ImaginePaperData.timestamp);
warnings.add("Production has stopped.");
System.out.println(nonProductionTimestamps);
//System.out.println("Production stopped");
nonProductionTime = calcProductionTime(nonProductionTimestamps);
}
// The rest is just JSON stuff
Do I maybe have to hold these two timestamp lists in a ListState?
EDIT: Because another user asked, here is the data I'm getting.
{'szenario': 'machine01', 'timestamp': '31.10.2018 09:18:39.432069', 'data': {1: 100.0, 2: 100.0, 101: 94.0, 102: 120.0, 103: 65.0}}
The behaviour I expect is that my flink program collects the timestamps in the two lists productionTimestamps and nonProductionTimestamps. Then I want my calcProductionTime method to subtract the last timestamp in the list from the first timestamp, to get the duration between when I first detected the machine is "producing" / "not-producing" and the time it stopped "producing" / "not-producing".

I found out that the reason for the 'seemingly random' timestamps is Apache Flink's parallel execution. When the parallelism is set to > 1, the order of events isn't guaranteed anymore.
My quick fix was to set the parallelism of my program to 1, this guarantees the order of events, as far as I know.

Aggregate by Hour in the mongodb-async-driver

How can I aggregate by Hour in the mongodb-async-driver (http://www.allanbank.com/mongodb-async-driver/usage.html)
I have an ISODate-Field in my Collection.
[
{ name = "a", date = ISODate(...)},
{ name = "b", date = ISODate(...)},
...
]
I want to display a graph of how may documents occur per hour.
in the MongoDB-Console. I would do something like this:
db.mycollection.aggregate([{$group : {_id : {day:{ $hour : "$date"}}, count: { $sum: 1 }}}])
but I get stuck at the driver-api:
import static com.allanbank.mongodb.builder.AggregationGroupField.set;
import static com.allanbank.mongodb.builder.AggregationGroupId.id;
Aggregate.Builder builder = new Aggregate.Builder();
builder.group(id().add(???), set("pop").sum("pop"))

You need to make use of the Expressions class. Make use of the group method that takes a Builder and the AggregationGroupField array as input.
public Aggregate.Builder group(AggregationGroupId.Builder id,
AggregationGroupField... aggregations)
Build the hour Expression and pass it as the id.
Builder hour = new Builder();
hour.add(Expressions.set("day",Expressions.hour(Expressions.field("date"))));
Aggregate.Builder builder = Aggregate.builder();
builder.group(
hour,
AggregationGroupField.set("pop").sum("pop")
);
MongoIterator<Document> result = col.aggregate(builder);
while(result.hasNext())
{
System.out.println(result.next());
};

How to sort mongo query result stored in cursor using java?

I am using java and mongo db.
I have stored multple documents in mongodb. I want to fetch only 12 documents whose timestamp is less than timestamp provided to query.
The condition is that query must select 12 documents whose timestamp is closer to given timestamp.
Here is what I did??
BasicDBObject criteria = new BasicDBObject();
BasicDBObject projections = new BasicDBObject();
criteria.put("hostId",ip);
criteria.put("status",0);
projections.put("runtimeMillis",1);
projections.put("cpuUtilization",1);
String json_string="";
DBCursor cur = coll.find(criteria,projections).sort(new BasicDBObject("runtimeMillis",-1)).limit(12);
Object[] row = createOutputRow(new Object[0], outputRowSize);
int index = 0;
String mystring = null;
List list = new ArrayList();
JSONObject result = new JSONObject();
json_string = "[";
while(cur.hasNext() && !isStopped()) {
String json = cur.next().toString();
JSONObject responseObject = new JSONObject(json);
long convert = Long.parseLong(responseObject.getString("runtimeMillis"));
long set_date = convert;
Date dateObj = new Date(set_date);
String date_text = ft.format(dateObj);
int month = 0;
month = Integer.parseInt(new java.text.SimpleDateFormat("MM").format(dateObj));
/json_string +="{x: ("+convert+"),y: "+responseObject.getString("cpuUtilization")+", color: \"red\"},";
}//end of while
This gives me correct output but in descending order.
If I sort documents in ascending order, I got oldest documents.
I want output in ascending order and must be closer to given timestamp(latest documents whose timestamp is less than given timestamp) and are sorted in ascending order.
How do I get this result??

Let me check I understand correctly what you're trying to do. You're looking for the 12 documents with timestamps just before a given time? So, for example, if you had the following data set (I'm using very simplified timestamps for ease of understanding):
{ documentNumber: 1, timestamp: 1002 },
{ documentNumber: 2, timestamp: 1003 },
{ documentNumber: 3, timestamp: 1005 },
{ documentNumber: 4, timestamp: 1007 },
{ documentNumber: 5, timestamp: 10011 },
{ documentNumber: 6, timestamp: 10013 },
{ documentNumber: 7, timestamp: 10017 },
{ documentNumber: 8, timestamp: 10019 },
{ documentNumber: 9, timestamp: 10023 },
{ documentNumber: 10,timestamp: 10031 },
{ documentNumber: 11,timestamp: 10037 },
{ documentNumber: 12,timestamp: 10041 },
{ documentNumber: 13,timestamp: 10053 },
{ documentNumber: 14,timestamp: 10057 },
{ documentNumber: 15,timestamp: 10063 },
{ documentNumber: 16,timestamp: 10065 },
{ documentNumber: 17,timestamp: 10069 },
{ documentNumber: 18,timestamp: 10074 },
{ documentNumber: 19,timestamp: 10079 }
and you searched for the timestamp 10069, you want to find the 12 documents just before that timestamp, but in ascending order. So you want to get documents 4,5,6,7,8,9,10,11,12,13,14,15,16?
Your current code is enormously over-complicated, which is why there's the comment suggesting you check the documentation. However, you're actually partially correct, you can't order it in ascending order and get the values that you want.
I'm not at all sure what your code on the lines after DBCursor cur = ... are for, that messing around with JSON and Dates and stuff looks pretty hairy, and there are simpler ways to do that, but I'll leave you to research that. However, I have written something that should give you more or less what you need, in terms of the query, and I've written a test to prove that this does what you want:
#Test
public void shouldUseASortForLimitCriteriaAndSortArrayInPlace() {
// given
long timestampStartingPoint = 1000;
for (long timestamp = timestampStartingPoint; timestamp < 1100; timestamp++) {
//insert some basic documents into the database with different timestamps
collection.insert(new BasicDBObject("timestamp", timestamp));
}
// when
long timestampToSearchFor = 1050; // halfway through the data set
// this is the query for documents older than a chosen timestamp
BasicDBObject queryForDocumentsOlderThanTimestampToSearchFor = new BasicDBObject("timestamp", new BasicDBObject("$lt", timestampToSearchFor));
// limit selects only 12, you have to sort descending to get the 12 closes to the selected timestamp
List<DBObject> foundItems = collection.find(queryForDocumentsOlderThanTimestampToSearchFor)
.limit(12)
.sort(new BasicDBObject("timestamp", -1))
.toArray();
// now you have to sort the returned array into the order you want
Collections.sort(foundItems, new Comparator<DBObject>() {
#Override
public int compare(final DBObject o1, final DBObject o2) {
return (int) ((Long) o1.get("timestamp") - (Long) o2.get("timestamp"));
}
});
// then
assertThat(foundItems.size(), is(12));
assertThat((Long) foundItems.get(0).get("timestamp"), is(1038L));
assertThat((Long) foundItems.get(11).get("timestamp"), is(1049L));
}
Note that this solution is OK in your case because you're only returning 12 items. If the results are very large, this may not work as the whole list will be in memory. That's why it's usually better to do sorting in the database.
There is another way to achieve this, using the aggregation framework - that will let you set up one sort to use for the limit (you need to sort descending to limit to the correct 12 items) and a second sort to put them into the order you want.
The key points to understand in my solution are:
$lt, to get the timestamps older than a given value
limit to return only a subset of results
Collections.sort and Comparator for sorting an array in Java
I also recommend you check out the documentation for Aggregation.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Elasticsearch: comparing dates (painless script) - java

Please remove the big S in the formatter, check Date and Time Patterns long timestampLog = new SimpleDateFormat("yyyy-MM-dd'T'hh:mm:ss").parse(doc.createdAt.value).getTime(); long timestampNow = new Date().getTime();

Related

Mongo Format for Storing dates (or response format)

How do I convert Google's DateTime object to a long-type value?

Apache Flink: Wierd FlatMap behaviour

Aggregate by Hour in the mongodb-async-driver

How to sort mongo query result stored in cursor using java?

Categories

Resources