Find max trips per day from DataFrame in Java - Spark

Find max trips per day from DataFrame in Java - Spark - java

Environment: Java 1.8, VM Cloudera Quickstart.
I have data into Hadoop hdfs from a csv file. Each row represents a bus route.
id vendor start_datetime end_datetime trip_duration_in_sec
17534 A 1/1/2013 12:00 1/1/2013 12:14 840
68346 A 1/1/2013 12:13 1/1/2013 12:18 300
09967 B 1/1/2013 12:34 1/1/2013 12:39 300
09967 B 1/1/2013 12:44 1/1/2013 12:51 420
09967 A 1/1/2013 12:54 1/1/2013 12:56 120
.........
.........
So, i want for every day, to find the hour that each vendor (A and B) has the most bus routes. With java and spark.
A result could be:
1/1/2013 (Day 1) - Vendor A has 3 bus routes at 12:00-13:00 hour. (That time 12:00-13:00, vendor A had the most bus routes..)
1/1/2013 (Day 1) - Vendor B has 2 bus routes at 12:00-13:00 hour. (That time 12:00-13:00, vendor B had the most bus routes..)
....
Mu java code is:
import static org.apache.spark.sql.functions;
import static org.apache.spark.sql.Row;
Dataset<Row> ds;
ds.groupBy(functions.window(col("start_datetime"), "1 hour").count().show();
But i cant find in which hour are the max routes per day.

I'm not so familiar in Java so I tried to explain it in Scala.
The key to find out the hour of max routes per day per vendor, is to count by (vendor, day, hour), then aggregate by (vendor, day) to calculate the hour corresponding to maximum cnt of each group. The day and the hour of each record could be parsed by start_datetime.
val df = spark.createDataset(Seq(
("17534","A","1/1/2013 12:00","1/1/2013 12:14",840),
("68346","A","1/1/2013 12:13","1/1/2013 12:18",300),
("09967","B","1/1/2013 12:34","1/1/2013 12:39",300),
("09967","B","1/1/2013 12:44","1/1/2013 12:51",420),
("09967","A","1/1/2013 12:54","1/1/2013 12:56",120)
)).toDF("id","vendor","start_datetime","end_datetime","trip_duration_in_sec")
df.rdd.map(t => {
val vendor = t(1)
val day = t(2).toString.split(" ")(0)
val hour = t(2).toString.split(" ")(1).split(":")(0)
((vendor, day, hour), 1)
})
// count by key
.aggregateByKey(0)((x: Int, y: Int) =>x+y, (x: Int, y: Int) =>x+y)
.map(t => {
val ((vendor, day, hour), cnt) = t;
((vendor, day), (hour, cnt))
})
// solve the max cnt by key (vendor, day)
.foldByKey(("", 0))((z: (String, Int), i: (String, Int)) => if (i._2 > z._2) i else z)
.foreach(t => println(s"${t._1._2} - Vendor ${t._1._1} has ${t._2._2} bus routes from ${t._2._1}:00 hour."))

Related

Time series data aggregation per similar snapshots

I have a cassandra table defined like below:
create table if not exists test(
id int,
readDate timestamp,
totalreadings text,
readings text,
PRIMARY KEY(meter_id, date)
) WITH CLUSTERING ORDER BY(date desc);
The reading contains the map of all snapshots of data collected at regular intervals (30 minutes) along with aggregated data for full day.
The data would like below :
id=8, readDate=Tue Dec 20 2016, totalreadings=220.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
id=8, readDate=Tue Dec 21 2016, totalreadings=221.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
id=8, readDate=Tue Dec 22 2016, totalreadings=219.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
id=8, readDate=Tue Dec 23 2016, totalreadings=224.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
The java pojo classes look like below:
public class Test{
private int id;
private Date readDate;
private String totalreadings;
private Map<Integer, Double> readings;
//setters
//getters
}
I am trying to find last 4 days aggregated average of all reading per snapshot. So logically, i have 4 list for last 4 days Test object and each of them has a map containing reading across the intervals.
Is there a simple way to find aggregate of a similar snapshot entries across 4 days . For example , i want to aggregate specific data snapshots (1,2,3,4,5,6,etc) only not the total aggregate.

After changing you table-structure a little bit the problem can be solved completely in Cassandra. - Mainly I have put your readings into a map.
create table test(
id int,
readDate timestamp,
totalreadings float,
readings map<int,float>,
PRIMARY KEY(id, readDate)
) WITH CLUSTERING ORDER BY(readDate desc);
Now I entered a bit of your data using CQL:
insert into test (id,readDate,totalReadings, readings ) values (8 '2016-12-20', 220.0, {0:9.0, 1:0.0, 2:9.0, 3:5.0, 4:2.0, 5:7.0, 6:1.0, 7:3.0, 8:9.0, 9:2.0, 10:5.0, 11:1.0, 12:1.0, 13:2.0, 14:4.0, 15:4.0, 16:7.0, 17:7.0, 18:5.0, 19:4.0, 20:9.0, 21:6.0, 22:8.0, 23:4.0, 24:6.0, 25:3.0, 26:5.0, 27:7.0, 28:2.0, 29:0.0, 30:8.0, 31:9.0, 32:1.0, 33:8.0, 34:9.0, 35:2.0, 36:4.0, 37:5.0, 38:4.0, 39:7.0, 40:3.0, 41:2.0, 42:1.0, 43:2.0, 44:4.0, 45:5.0, 46:3.0, 47:1.0});
insert into test (id,readDate,totalReadings, readings ) values (8, '2016-12-21', 221.0,{0:9.0, 1:0.0, 2:9.0, 3:5.0, 4:2.0, 5:7.0, 6:1.0, 7:3.0, 8:9.0, 9:2.0, 10:5.0, 11:1.0, 12:1.0, 13:2.0, 14:4.0, 15:4.0, 16:7.0, 17:7.0, 18:5.0, 19:4.0, 20:9.0, 21:6.0, 22:8.0, 23:4.0, 24:6.0, 25:3.0, 26:5.0, 27:7.0, 28:2.0, 29:0.0, 30:8.0, 31:9.0, 32:1.0, 33:8.0, 34:9.0, 35:2.0, 36:4.0, 37:5.0, 38:4.0, 39:7.0, 40:3.0, 41:2.0, 42:1.0, 43:2.0, 44:4.0, 45:5.0, 46:3.0, 47:1.0});
To extract single values out of the map I created a User defined function (UDF). This UDF picks the right value aut of your map containing the readings. See Cassandra docs on UDF for more on UDFs. Note that UDFs are disabled in cassandra by default so you need to modify cassandra.yaml to include enable_user_defined_functions: true
create function map_item(readings map<int,float>, idx int) called on null input returns float language java as ' return readings.get(idx);';
After creating the function you can calculate your average as
select avg(map_item(readings, 7)) from test where readDate > '2016-12-20' allow filtering;
which gives me:
system.avg(betterconnect.map_item(readings, 7))
-------------------------------------------------
3
You may want to supply the date fort your where-clause and the index (7 in my example) as parameters from your application.

What are JDBC-mysql Driver settings for sane handling of DATETIME and TIMESTAMP in UTC?

Having been burned by mysql timezone and Daylight Savings "hour from hell" issues in the past, I decided my next application would store everything in UTC timezone and only interact with the database using UTC times (not even the closely-related GMT).
I soon ran into some mysterious bugs. After pulling my hair out for a while, I came up with this test code:
try(Connection conn = dao.getDataSource().getConnection();
Statement stmt = conn.createStatement()) {
Instant now = Instant.now();
stmt.execute("set time_zone = '+00:00'");
stmt.execute("create temporary table some_times("
+ " dt datetime,"
+ " ts timestamp,"
+ " dt_string datetime,"
+ " ts_string timestamp,"
+ " dt_epoch datetime,"
+ " ts_epoch timestamp,"
+ " dt_auto datetime default current_timestamp(),"
+ " ts_auto timestamp default current_timestamp(),"
+ " dtc char(19) generated always as (cast(dt as character)),"
+ " tsc char(19) generated always as (cast(ts as character)),"
+ " dt_autoc char(19) generated always as (cast(dt_auto as character)),"
+ " ts_autoc char(19) generated always as (cast(ts_auto as character))"
+ ")");
PreparedStatement ps = conn.prepareStatement("insert into some_times "
+ "(dt, ts, dt_string, ts_string, dt_epoch, ts_epoch) values (?,?,?,?,from_unixtime(?),from_unixtime(?))");
DateTimeFormatter dbFormat = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").withZone(ZoneId.of("UTC"));
ps.setTimestamp(1, new Timestamp(now.toEpochMilli()));
ps.setTimestamp(2, new Timestamp(now.toEpochMilli()));
ps.setString(3, dbFormat.format(now));
ps.setString(4, dbFormat.format(now));
ps.setLong(5, now.getEpochSecond());
ps.setLong(6, now.getEpochSecond());
ps.executeUpdate();
ResultSet rs = stmt.executeQuery("select * from some_times");
ResultSetMetaData md = rs.getMetaData();
while(rs.next()) {
for(int c=1; c <= md.getColumnCount(); ++c) {
Instant inst1 = Instant.ofEpochMilli(rs.getTimestamp(c).getTime());
Instant inst2 = Instant.from(dbFormat.parse(rs.getString(c).replaceAll("\\.0$", "")));
System.out.println(inst1.getEpochSecond() - now.getEpochSecond());
System.out.println(inst2.getEpochSecond() - now.getEpochSecond());
}
}
}
Note how the session timezone is set to UTC, and everything in the Java code is very timezone-aware and forced to UTC. The only thing in this entire environment which is not UTC is the JVM's default timezone.
I expected the output to be a bunch of 0s but instead I get this
0
-28800
0
-28800
28800
0
28800
0
28800
0
28800
0
28800
0
28800
0
0
-28800
0
-28800
28800
0
28800
0
Each line of output is just subtracting the time stored from the time retrieved. The result in each row should be 0.
It seems the JDBC driver is performing inappropriate timezone conversions. For an application which interacts fully in UTC although it runs on a VM that's not in UTC, is there any way to completely disable the TZ conversions?
i.e. Can this test be made to output all-zero rows?
UPDATE
Using useLegacyDatetimeCode=false (cacheDefaultTimezone=false makes no difference) changes the output but still not a fix:
0
-28800
0
-28800
0
-28800
0
-28800
0
-28800
0
-28800
0
-28800
0
-28800
0
0
0
0
0
0
0
0
UPDATE2
Checking the console (after changing the test to create a permanent table), I see all the values are STORED correctly:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 27148
Server version: 5.7.12-log MySQL Community Server (GPL)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> set time_zone = '-00:00';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM some_times \G
*************************** 1. row ***************************
dt: 2016-11-18 15:39:51
ts: 2016-11-18 15:39:51
dt_string: 2016-11-18 15:39:51
ts_string: 2016-11-18 15:39:51
dt_epoch: 2016-11-18 15:39:51
ts_epoch: 2016-11-18 15:39:51
dt_auto: 2016-11-18 15:39:51
ts_auto: 2016-11-18 15:39:51
dtc: 2016-11-18 15:39:51
tsc: 2016-11-18 15:39:51
dt_autoc: 2016-11-18 15:39:51
ts_autoc: 2016-11-18 15:39:51
1 row in set (0.00 sec)
mysql>

The solution is to set JDBC connection parameter noDatetimeStringSync=true with useLegacyDatetimeCode=false. As a bonus I also found sessionVariables=time_zone='-00:00' alleviates the need to set time_zone explicitly on every new connection.
There is some "intelligent" timezone conversion code that gets activated deep inside the ResultSet.getString() method when it detects that the column is a TIMESTAMP column.
Alas, this intelligent code has a bug: TimeUtil.fastTimestampCreate(TimeZone tz, int year, int month, int day, int hour, int minute, int seconds, int secondsPart) returns a Timestamp wrongly tagged to the JVM's default timezone, even when the tz parameter is set to something else:
final static Timestamp fastTimestampCreate(TimeZone tz, int year, int month, int day, int hour, int minute, int seconds, int secondsPart) {
Calendar cal = (tz == null) ? new GregorianCalendar() : new GregorianCalendar(tz);
cal.clear();
// why-oh-why is this different than java.util.date, in the year part, but it still keeps the silly '0' for the start month????
cal.set(year, month - 1, day, hour, minute, seconds);
long tsAsMillis = cal.getTimeInMillis();
Timestamp ts = new Timestamp(tsAsMillis);
ts.setNanos(secondsPart);
return ts;
}
The return ts would be perfectly valid except when further up in the call chain, it is converted back to a string using the bare toString() method, which renders the ts as a String representing what a clock would display in the JVM-default timezone, instead of a String representation of the time in UTC. In ResultSetImpl.getStringInternal(int columnIndex, boolean checkDateTypes):
case Types.TIMESTAMP:
Timestamp ts = getTimestampFromString(columnIndex, null, stringVal, this.getDefaultTimeZone(), false);
if (ts == null) {
this.wasNullFlag = true;
return null;
}
this.wasNullFlag = false;
return ts.toString();
Setting noDatetimeStringSync=true disables the entire parse/unparse mess and just returns the string value as-is received from the database.
Test output:
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
The useLegacyDatetimeCode=false is still important because it changes the behaviour of getDefaultTimeZone() to use the database server's TZ.
While chasing this down I also found the documentation for useJDBCCompliantTimezoneShift is incorrect, although it makes no difference: documentation says [This is part of the legacy date-time code, thus the property has an effect only when "useLegacyDatetimeCode=true."], but that's wrong, see ResultSetImpl.getNativeTimestampViaParseConversion(int, Calendar, TimeZone, boolean).

localized duration format for french

What is the Python analog of Time4J's code example:
// duration in seconds normalized to hours, minutes and seconds
Duration<?> dur = Duration.of(337540, ClockUnit.SECONDS).with(Duration.STD_CLOCK_PERIOD);
// custom duration format => hh:mm:ss
String s1 = Duration.Formatter.ofPattern("hh:mm:ss").format(dur);
System.out.println(s1); // output: 93:45:40
// localized duration format for french
String s2 = PrettyTime.of(Locale.FRANCE).print(dur, TextWidth.WIDE);
System.out.println(s2); // output: 93 heures, 45 minutes et 40 secondes
It is easy to get 93:45:40:
#!/usr/bin/env python3
from datetime import timedelta
dur = timedelta(seconds=337540)
print(dur) # -> 3 days, 21:45:40
fields = {}
fields['hours'], seconds = divmod(dur // timedelta(seconds=1), 3600)
fields['minutes'], fields['seconds'] = divmod(seconds, 60)
print("%(hours)02d:%(minutes)02d:%(seconds)02d" % fields) # -> 93:45:40
but how do I emulate PrettyTime.of(Locale.FRANCE).print(dur, TextWidth.WIDE) Java code in Python (without hardcoding the units)?

babel module allows to get close to desired output:
from babel.dates import format_timedelta # $ pip install babel
print(", ".join(format_timedelta(timedelta(**{unit: fields[unit]}),
granularity=unit.rstrip('s'),
threshold=fields[unit] + 1,
locale='fr')
for unit in "hours minutes seconds".split()))
# -> 93 heures, 45 minutes, 40 secondes
It handles locale and plural forms automatically e.g., for dur = timedelta(seconds=1) it produces:
0 heure, 0 minute, 1 seconde
Perhaps a better solution would be to translate the format string manually using standard tools such as gettext.

If you're using Kotlin, I just came across a similar problem with the Kotlin Duration type with localized formatting and because I couldn't find a good solution, I wrote one myself. It is based on APIs provided starting in Android 9 (for localized units), but with a fallback to English units for lower Android versions so it can be used with lower targeting apps.
Here's how it looks like on the usage side (see Kotlin Duration type to understand 1st line):
val duration = 5.days.plus(3.hours).plus(2.minutes).plus(214.milliseconds)
DurationFormat().format(duration) // "5day 3hour 2min"
DurationFormat(Locale.GERMANY).format(duration) // "5T 3Std. 2Min."
DurationFormat(Locale.forLanguageTag("ar").format(duration) // "٥يوم ٣ساعة ٢د"
DurationFormat().format(duration, smallestUnit = DurationFormat.Unit.HOUR) // "5day 3hour"
DurationFormat().format(15.minutes) // "15min"
DurationFormat().format(0.hours) // "0sec"
As you can see, you can specify a custom locale to the DurationFormat type. By default it uses Locale.getDefault(). Languages that have different symbols for number than romanic are also supported (via NumberFormat). Also, you can specify a custom smallestUnit, by default it is set to SECOND, so milliseconds will not be shown. Note that any unit with a value of 0 will be ignored and if the entire number is 0, the smallest unit will be used with the value 0.
This is the full DurationFormat type, feel free to copy (also available as a GitHub gist incl. unit tests):
import android.icu.text.MeasureFormat
import android.icu.text.NumberFormat
import android.icu.util.MeasureUnit
import android.os.Build
import java.util.Locale
import kotlin.time.Duration
import kotlin.time.ExperimentalTime
import kotlin.time.days
import kotlin.time.hours
import kotlin.time.milliseconds
import kotlin.time.minutes
import kotlin.time.seconds
#ExperimentalTime
data class DurationFormat(val locale: Locale = Locale.getDefault()) {
enum class Unit {
DAY, HOUR, MINUTE, SECOND, MILLISECOND
}
fun format(duration: kotlin.time.Duration, smallestUnit: Unit = Unit.SECOND): String {
var formattedStringComponents = mutableListOf<String>()
var remainder = duration
for (unit in Unit.values()) {
val component = calculateComponent(unit, remainder)
remainder = when (unit) {
Unit.DAY -> remainder - component.days
Unit.HOUR -> remainder - component.hours
Unit.MINUTE -> remainder - component.minutes
Unit.SECOND -> remainder - component.seconds
Unit.MILLISECOND -> remainder - component.milliseconds
}
val unitDisplayName = unitDisplayName(unit)
if (component > 0) {
val formattedComponent = NumberFormat.getInstance(locale).format(component)
formattedStringComponents.add("$formattedComponent$unitDisplayName")
}
if (unit == smallestUnit) {
val formattedZero = NumberFormat.getInstance(locale).format(0)
if (formattedStringComponents.isEmpty()) formattedStringComponents.add("$formattedZero$unitDisplayName")
break
}
}
return formattedStringComponents.joinToString(" ")
}
private fun calculateComponent(unit: Unit, remainder: Duration) = when (unit) {
Unit.DAY -> remainder.inDays.toLong()
Unit.HOUR -> remainder.inHours.toLong()
Unit.MINUTE -> remainder.inMinutes.toLong()
Unit.SECOND -> remainder.inSeconds.toLong()
Unit.MILLISECOND -> remainder.inMilliseconds.toLong()
}
private fun unitDisplayName(unit: Unit) = if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.P) {
val measureFormat = MeasureFormat.getInstance(locale, MeasureFormat.FormatWidth.NARROW)
when (unit) {
DurationFormat.Unit.DAY -> measureFormat.getUnitDisplayName(MeasureUnit.DAY)
DurationFormat.Unit.HOUR -> measureFormat.getUnitDisplayName(MeasureUnit.HOUR)
DurationFormat.Unit.MINUTE -> measureFormat.getUnitDisplayName(MeasureUnit.MINUTE)
DurationFormat.Unit.SECOND -> measureFormat.getUnitDisplayName(MeasureUnit.SECOND)
DurationFormat.Unit.MILLISECOND -> measureFormat.getUnitDisplayName(MeasureUnit.MILLISECOND)
}
} else {
when (unit) {
Unit.DAY -> "day"
Unit.HOUR -> "hour"
Unit.MINUTE -> "min"
Unit.SECOND -> "sec"
Unit.MILLISECOND -> "msec"
}
}
}

This humanize package may help. It has a french localization, or you can add your own. For python 2.7 and 3.3.

Using pendulum module:
>>> import pendulum
>>> it = pendulum.interval(seconds=337540)
>>> it.in_words(locale='fr_FR')
'3 jours 21 heures 45 minutes 40 secondes'

Processing CQEngine ResultSet with Scala foreach is very slow

I'm trying to process CQEngine's ResultSet using Scala's foreach, but the result is very slow.
Following is the snippet of what I'm trying to do
import collection.JavaConversions._
val query = existIn(myOtherCollection, REFERENCE, REFERENCE)
val resultSet = myIndexCollection.retrieve(query)
resultSet.foreach(r =>{
//do something here
})
Somehow the .foreach method is very slow. I tried to debug by putting SimonMonitor and change the .foreach using while(resultSet.hasNext), surprisingly, every call to hasNext method takes about 1-2 seconds. That's very slow.
I tried to create the same version using Java, and the Java version is super fast.
Please help

I am not able to reproduce your problem with the below test code. Can you try it on your system and let me know how it runs?
(Uncomment line 38, garages.addIndex(HashIndex.onAttribute(Garage.BRANDS_SERVICED)), to make BOTH the Scala and Java iterators run blazingly fast...)
The output first (time in milliseconds):
Done adding data
Done adding index
============== Scala ==============
Car{carId=4, name='BMW M3', description='2013 model', features=[radio, convertible]}
Time : 3 seconds
Car{carId=1, name='Ford Focus', description='great condition, low mileage', features=[spare tyre, sunroof]}
Time : 1 seconds
Car{carId=2, name='Ford Taurus', description='dirty and unreliable, flat tyre', features=[spare tyre, radio]}
Time : 2 seconds
============== Java ==============
Car{carId=4, name='BMW M3', description='2013 model', features=[radio, convertible]}
Time : 3 seconds
Car{carId=1, name='Ford Focus', description='great condition, low mileage', features=[spare tyre, sunroof]}
Time : 1 seconds
Car{carId=2, name='Ford Taurus', description='dirty and unreliable, flat tyre', features=[spare tyre, radio]}
Time : 2 seconds
Code below:
import collection.JavaConversions._
import com.googlecode.cqengine.query.QueryFactory._
import com.googlecode.cqengine.CQEngine;
import com.googlecode.cqengine.index.hash._;
import com.googlecode.cqengine.IndexedCollection;
import com.googlecode.cqengine.query.Query;
import java.util.Arrays.asList;
object CQTest {
def main(args: Array[String]) {
val cars: IndexedCollection[Car] = CQEngine.newInstance();
cars.add(new Car(1, "Ford Focus", "great condition, low mileage", asList("spare tyre", "sunroof")));
cars.add(new Car(2, "Ford Taurus", "dirty and unreliable, flat tyre", asList("spare tyre", "radio")));
cars.add(new Car(3, "Honda Civic", "has a flat tyre and high mileage", asList("radio")));
cars.add(new Car(4, "BMW M3", "2013 model", asList("radio", "convertible")));
// add cruft to try and slow down CQE
for (i <- 1 to 10000) {
cars.add(new Car(i, "BMW2014_" + i, "2014 model", asList("radio", "convertible")))
}
// Create an indexed collection of garages...
val garages: IndexedCollection[Garage] = CQEngine.newInstance();
garages.add(new Garage(1, "Joe's garage", "London", asList("Ford Focus", "Honda Civic")));
garages.add(new Garage(2, "Jane's garage", "Dublin", asList("BMW M3")));
garages.add(new Garage(3, "John's garage", "Dublin", asList("Ford Focus", "Ford Taurus")));
garages.add(new Garage(4, "Jill's garage", "Dublin", asList("Ford Focus")));
// add cruft to try and slow down CQE
for (i <- 1 to 10000) {
garages.add(new Garage(i, "Jill's garage", "Dublin", asList("DONT_MATCH_CARS_BMW2014_" + i)))
}
println("Done adding data")
// cars.addIndex(HashIndex.onAttribute(Car.NAME));
// garages.addIndex(HashIndex.onAttribute(Garage.BRANDS_SERVICED));
println("Done adding index")
val query = existsIn(garages, Car.NAME, Garage.BRANDS_SERVICED, equal(Garage.LOCATION, "Dublin"))
val resultSet = cars.retrieve(query)
var previous = System.currentTimeMillis()
println("============== Scala ============== ")
// Scala version
resultSet.foreach(r => {
println(r);
val t = (System.currentTimeMillis() - previous)
System.out.println("Time : " + t / 1000 + " seconds")
previous = System.currentTimeMillis()
})
println("============== Java ============== ")
previous = System.currentTimeMillis()
// Java version
val i: java.util.Iterator[Car] = resultSet.iterator()
while (i.hasNext) {
val r = i.next()
println(r);
val t = (System.currentTimeMillis() - previous)
System.out.println("Time : " + t / 1000 + " seconds")
previous = System.currentTimeMillis()
}
}
}

TimeZone issue with ibatis ORM and postgres

Please suggest, I am retrieving data using ibatis ORM from postgres database.
When I fire the below query using ibatis the timestamp "PVS.SCHED_DTE" get reduced to "2013-06-07 23:30:00.0" where as in db timestamp is "2013-06-08 05:00:00.0".
Though when the same query is executed using postgresAdminII, its gives proper PVS.SCHED_DTE.
SELECT PVS.PTNT_VISIT_SCHED_NUM, PVS.PTNT_ID,P.FIRST_NAM AS PTNT_FIRST_NAM,P.PRIM_EMAIL_TXT,P.SCNDRY_EMAIL_TXT, PVS.PTNT_NOTIFIED_FLG,PVS.SCHED_DTE,PVS.VISIT_TYPE_NUM,PVS.DURTN_TYPE_NUM,PVS.ROUTE_TYPE_NUM, C.FIRST_NAM AS CARE_FIRST_NAM,C.MIDDLE_INIT_NAM AS CARE_MIDDLE_NAM,C.LAST_NAM AS CARE_LAST_NAM, C.PHONE_NUM_1,ORGNIZATN.EMAIL_HOST_SERVER_TXT,ORGNIZATN.PORT_NUM, ORGNIZATN.MAIL_SENDER_USER_ID,ORGNIZATN.MAIL_SENDER_PSWD_DESC,ORGNIZATN.SOCKET_FCTRY_PORT_DESC, ORGNIZATN.TIMEZONE_ID FROM IHCAS.PTNT_VISIT_SCHED PVS , IHCAS.PTNT P, IHCAS.ROUTE_LEG RL, IHCAS.ROUTE R, IHCAS.CAREGIVER C,IHCAS.ORG ORGNIZATN WHERE ORGNIZATN.ORG_NUM = ? AND PVS.SCHED_STAT_CDE = '2002' AND PVS.PTNT_NOTIFIED_FLG = FALSE AND **PVS.SCHED_DTE >= (CURRENT_DATE + interval '1 day') AT TIME ZONE ? at TIME ZONE 'UTC'** AND **PVS.SCHED_DTE < (CURRENT_DATE + interval '2 day' - interval '1 sec') AT TIME ZONE ? at TIME ZONE 'UTC'** AND PVS.PTNT_ID = P.PTNT_ID AND PVS.PTNT_VISIT_SCHED_NUM = RL.PTNT_VISIT_SCHED_NUM AND RL.ROUTE_NUM = R.ROUTE_NUM AND C.CAREGIVER_NUM=R.ASGN_TO_CAREGIVER_NUM AND PVS.ORG_NUM= ORGNIZATN.ORG_NUM ORDER BY PVS.SCHED_DTE ASC,PVS.ROUTE_TYPE_NUM ASC,PVS.ORG_NUM ASC
Parameters: **[1, EST, EST]**
Header: [PTNT_VISIT_SCHED_NUM, PTNT_NOTIFIED_FLG, **SCHED_DTE**, VISIT_TYPE_NUM, DURTN_TYPE_NUM, ROUTE_TYPE_NUM, PTNT_ID, PTNT_FIRST_NAM, PRIM_EMAIL_TXT, SCNDRY_EMAIL_TXT, CARE_FIRST_NAM, CARE_MIDDLE_NAM, CARE_LAST_NAM, PHONE_NUM_1, EMAIL_HOST_SERVER_TXT, PORT_NUM, MAIL_SENDER_USER_ID, MAIL_SENDER_PSWD_DESC, SOCKET_FCTRY_PORT_DESC, TIMEZONE_ID]
Result: [1129, false, **2013-06-07 23:30:00.0**, 1002, 1551, 1702, PID146, Ricky, Receiver.test#******.com, , Vikas, S, Jain, 956-**-5**8, 172.17.*.***, 25, Sender.test#******.com, ********123, 465, EST]
For me.. it seems to be some timezone issue but what and how to resolve.
N.B : Local Time zone : IST
Application Time zone : EST
In database, datatype of column is timestamp without timezone.
Any suggestion.. ??

It may be best to set the database timezone to UTC and implement an UTC-preserving MyBatis type handler as shown in the accepted answer to this:
What's the right way to handle UTC date-times using Java, iBatis, and Oracle?.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find max trips per day from DataFrame in Java - Spark - java

Related

Time series data aggregation per similar snapshots

What are JDBC-mysql Driver settings for sane handling of DATETIME and TIMESTAMP in UTC?

localized duration format for french

Processing CQEngine ResultSet with Scala foreach is very slow

TimeZone issue with ibatis ORM and postgres

Categories

Resources