TimeZone issue with ibatis ORM and postgres

TimeZone issue with ibatis ORM and postgres - java

Please suggest, I am retrieving data using ibatis ORM from postgres database.
When I fire the below query using ibatis the timestamp "PVS.SCHED_DTE" get reduced to "2013-06-07 23:30:00.0" where as in db timestamp is "2013-06-08 05:00:00.0".
Though when the same query is executed using postgresAdminII, its gives proper PVS.SCHED_DTE.
SELECT PVS.PTNT_VISIT_SCHED_NUM, PVS.PTNT_ID,P.FIRST_NAM AS PTNT_FIRST_NAM,P.PRIM_EMAIL_TXT,P.SCNDRY_EMAIL_TXT, PVS.PTNT_NOTIFIED_FLG,PVS.SCHED_DTE,PVS.VISIT_TYPE_NUM,PVS.DURTN_TYPE_NUM,PVS.ROUTE_TYPE_NUM, C.FIRST_NAM AS CARE_FIRST_NAM,C.MIDDLE_INIT_NAM AS CARE_MIDDLE_NAM,C.LAST_NAM AS CARE_LAST_NAM, C.PHONE_NUM_1,ORGNIZATN.EMAIL_HOST_SERVER_TXT,ORGNIZATN.PORT_NUM, ORGNIZATN.MAIL_SENDER_USER_ID,ORGNIZATN.MAIL_SENDER_PSWD_DESC,ORGNIZATN.SOCKET_FCTRY_PORT_DESC, ORGNIZATN.TIMEZONE_ID FROM IHCAS.PTNT_VISIT_SCHED PVS , IHCAS.PTNT P, IHCAS.ROUTE_LEG RL, IHCAS.ROUTE R, IHCAS.CAREGIVER C,IHCAS.ORG ORGNIZATN WHERE ORGNIZATN.ORG_NUM = ? AND PVS.SCHED_STAT_CDE = '2002' AND PVS.PTNT_NOTIFIED_FLG = FALSE AND **PVS.SCHED_DTE >= (CURRENT_DATE + interval '1 day') AT TIME ZONE ? at TIME ZONE 'UTC'** AND **PVS.SCHED_DTE < (CURRENT_DATE + interval '2 day' - interval '1 sec') AT TIME ZONE ? at TIME ZONE 'UTC'** AND PVS.PTNT_ID = P.PTNT_ID AND PVS.PTNT_VISIT_SCHED_NUM = RL.PTNT_VISIT_SCHED_NUM AND RL.ROUTE_NUM = R.ROUTE_NUM AND C.CAREGIVER_NUM=R.ASGN_TO_CAREGIVER_NUM AND PVS.ORG_NUM= ORGNIZATN.ORG_NUM ORDER BY PVS.SCHED_DTE ASC,PVS.ROUTE_TYPE_NUM ASC,PVS.ORG_NUM ASC
Parameters: **[1, EST, EST]**
Header: [PTNT_VISIT_SCHED_NUM, PTNT_NOTIFIED_FLG, **SCHED_DTE**, VISIT_TYPE_NUM, DURTN_TYPE_NUM, ROUTE_TYPE_NUM, PTNT_ID, PTNT_FIRST_NAM, PRIM_EMAIL_TXT, SCNDRY_EMAIL_TXT, CARE_FIRST_NAM, CARE_MIDDLE_NAM, CARE_LAST_NAM, PHONE_NUM_1, EMAIL_HOST_SERVER_TXT, PORT_NUM, MAIL_SENDER_USER_ID, MAIL_SENDER_PSWD_DESC, SOCKET_FCTRY_PORT_DESC, TIMEZONE_ID]
Result: [1129, false, **2013-06-07 23:30:00.0**, 1002, 1551, 1702, PID146, Ricky, Receiver.test#******.com, , Vikas, S, Jain, 956-**-5**8, 172.17.*.***, 25, Sender.test#******.com, ********123, 465, EST]
For me.. it seems to be some timezone issue but what and how to resolve.
N.B : Local Time zone : IST
Application Time zone : EST
In database, datatype of column is timestamp without timezone.
Any suggestion.. ??

It may be best to set the database timezone to UTC and implement an UTC-preserving MyBatis type handler as shown in the accepted answer to this:
What's the right way to handle UTC date-times using Java, iBatis, and Oracle?.

Related

Find max trips per day from DataFrame in Java - Spark

Environment: Java 1.8, VM Cloudera Quickstart.
I have data into Hadoop hdfs from a csv file. Each row represents a bus route.
id vendor start_datetime end_datetime trip_duration_in_sec
17534 A 1/1/2013 12:00 1/1/2013 12:14 840
68346 A 1/1/2013 12:13 1/1/2013 12:18 300
09967 B 1/1/2013 12:34 1/1/2013 12:39 300
09967 B 1/1/2013 12:44 1/1/2013 12:51 420
09967 A 1/1/2013 12:54 1/1/2013 12:56 120
.........
.........
So, i want for every day, to find the hour that each vendor (A and B) has the most bus routes. With java and spark.
A result could be:
1/1/2013 (Day 1) - Vendor A has 3 bus routes at 12:00-13:00 hour. (That time 12:00-13:00, vendor A had the most bus routes..)
1/1/2013 (Day 1) - Vendor B has 2 bus routes at 12:00-13:00 hour. (That time 12:00-13:00, vendor B had the most bus routes..)
....
Mu java code is:
import static org.apache.spark.sql.functions;
import static org.apache.spark.sql.Row;
Dataset<Row> ds;
ds.groupBy(functions.window(col("start_datetime"), "1 hour").count().show();
But i cant find in which hour are the max routes per day.

I'm not so familiar in Java so I tried to explain it in Scala.
The key to find out the hour of max routes per day per vendor, is to count by (vendor, day, hour), then aggregate by (vendor, day) to calculate the hour corresponding to maximum cnt of each group. The day and the hour of each record could be parsed by start_datetime.
val df = spark.createDataset(Seq(
("17534","A","1/1/2013 12:00","1/1/2013 12:14",840),
("68346","A","1/1/2013 12:13","1/1/2013 12:18",300),
("09967","B","1/1/2013 12:34","1/1/2013 12:39",300),
("09967","B","1/1/2013 12:44","1/1/2013 12:51",420),
("09967","A","1/1/2013 12:54","1/1/2013 12:56",120)
)).toDF("id","vendor","start_datetime","end_datetime","trip_duration_in_sec")
df.rdd.map(t => {
val vendor = t(1)
val day = t(2).toString.split(" ")(0)
val hour = t(2).toString.split(" ")(1).split(":")(0)
((vendor, day, hour), 1)
})
// count by key
.aggregateByKey(0)((x: Int, y: Int) =>x+y, (x: Int, y: Int) =>x+y)
.map(t => {
val ((vendor, day, hour), cnt) = t;
((vendor, day), (hour, cnt))
})
// solve the max cnt by key (vendor, day)
.foldByKey(("", 0))((z: (String, Int), i: (String, Int)) => if (i._2 > z._2) i else z)
.foreach(t => println(s"${t._1._2} - Vendor ${t._1._1} has ${t._2._2} bus routes from ${t._2._1}:00 hour."))

Dynamically creating a regex from a DateFormat

I need to detect some stuff within a String that contains, among other things, dates. Now, parsing dates using regex is a known question on SO.
However, the dates in this text are localized. And the app needs to be able to adapt to differently localized dates. Luckily, I can figure out the correct date format for the current locale using DateFormat.getDateInstance(SHORT, locale). I can get a date pattern from that. But how do I turn it into a regex, dynamically?
The regex would not need to do in-depth validation of the format (leap years, correct amount of days for a month etc.), I can already be sure that the data is provided in a valid format. The date just needs to be identified (as in, the regex should be able to detect the start and end index of where a date is).
The answers in the linked question all assume the handful of common date formats. But assuming that in this case is a likely cause of getting an edge case that breaks the app in a very non-obvious way. Which is why I'd prefer a dynamically generated regex over a one-fits-all(?) solution.
I can't use DateFormat.parse(...), since I have to actually detect the date first, and can't directly extract it.

Since you're doing getDateInstance(SHORT, locale), with emphasis on Date and SHORT, the patterns are fairly limited, so the following code will do:
public static String dateFormatToRegex(Locale locale) {
StringBuilder regex = new StringBuilder();
String fmt = ((SimpleDateFormat) DateFormat.getDateInstance(DateFormat.SHORT, locale)).toPattern();
for (Matcher m = Pattern.compile("[^a-zA-Z]+|([a-zA-Z])\\1*").matcher(fmt); m.find(); ) {
String part = m.group();
if (m.start(1) == -1) { // Not letter(s): Literal text
regex.append(Pattern.quote(part));
} else {
switch (part.charAt(0)) {
case 'G': // Era designator
regex.append("\\p{L}+");
break;
case 'y': // Year
regex.append("\\d{1,4}");
break;
case 'M': // Month in year
if (part.length() > 2)
throw new UnsupportedOperationException("Date format part: " + part);
regex.append("(?:1[0-2]|0?[1-9])");
break;
case 'd': // Day in month
regex.append("(?:3[01]|[12][0-9]|0?[1-9])");
break;
default:
throw new UnsupportedOperationException("Date format part: " + part);
}
}
}
return regex.toString();
}
To see what regex's you'll get for various locales:
Locale[] locales = Locale.getAvailableLocales();
Arrays.sort(locales, Comparator.comparing(Locale::toLanguageTag));
Map<String, List<String>> fmtLocales = new TreeMap<>();
for (Locale locale : locales) {
String fmt = ((SimpleDateFormat) DateFormat.getDateInstance(DateFormat.SHORT, locale)).toPattern();
fmtLocales.computeIfAbsent(fmt, k -> new ArrayList<>()).add(locale.toLanguageTag());
}
fmtLocales.forEach((k, v) -> System.out.println(dateFormatToRegex(Locale.forLanguageTag(v.get(0))) + " " + v));
Output
\p{L}+\d{1,4}\Q.\E(?:0[1-9]|1[0-2])\Q.\E(?:0[1-9]|[12][0-9]|3[01]) [ja-JP-u-ca-japanese-x-lvariant-JP]
(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01])\Q/\E\d{1,4} [brx, brx-IN, chr, chr-US, ee, ee-GH, ee-TG, en, en-AS, en-BI, en-GU, en-MH, en-MP, en-PR, en-UM, en-US, en-US-POSIX, en-VI, fil, fil-PH, ks, ks-IN, ug, ug-CN, zu, zu-ZA]
(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01])\Q/\E\d{1,4} [es-PA, es-PR]
(?:0[1-9]|[12][0-9]|3[01])\Q-\E(?:0[1-9]|1[0-2])\Q-\E\d{1,4} [or, or-IN]
(?:0[1-9]|[12][0-9]|3[01])\Q. \E(?:0[1-9]|1[0-2])\Q. \E\d{1,4} [ksh, ksh-DE]
(?:0[1-9]|[12][0-9]|3[01])\Q. \E(?:0[1-9]|1[0-2])\Q. \E\d{1,4} [sl, sl-SI]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [fi, fi-FI, he, he-IL, is, is-IS]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [be, be-BY, dsb, dsb-DE, hsb, hsb-DE, sk, sk-SK, sq, sq-AL, sq-MK, sq-XK]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4}\Q.\E [bs-Cyrl, bs-Cyrl-BA, sr, sr-CS, sr-Cyrl, sr-Cyrl-BA, sr-Cyrl-ME, sr-Cyrl-RS, sr-Cyrl-XK, sr-Latn, sr-Latn-BA, sr-Latn-ME, sr-Latn-RS, sr-Latn-XK, sr-ME, sr-RS]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [tr, tr-CY, tr-TR]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4}\Q 'г'.\E [bg, bg-BG]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [agq, agq-CM, bas, bas-CM, bm, bm-ML, dje, dje-NE, dua, dua-CM, dyo, dyo-SN, en-HK, en-ZW, ewo, ewo-CM, ff, ff-CM, ff-GN, ff-MR, ff-SN, kab, kab-DZ, kea, kea-CV, khq, khq-ML, ksf, ksf-CM, ln, ln-AO, ln-CD, ln-CF, ln-CG, lo, lo-LA, lu, lu-CD, mfe, mfe-MU, mg, mg-MG, mua, mua-CM, nmg, nmg-CM, rn, rn-BI, seh, seh-MZ, ses, ses-ML, sg, sg-CF, shi, shi-Latn, shi-Latn-MA, shi-MA, shi-Tfng, shi-Tfng-MA, sw-CD, twq, twq-NE, yav, yav-CM, zgh, zgh-MA, zh-HK, zh-Hant-HK, zh-Hant-MO]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [ast, ast-ES, bn, bn-BD, bn-IN, ca, ca-AD, ca-ES, ca-ES-VALENCIA, ca-FR, ca-IT, el, el-CY, el-GR, en-AU, en-SG, es, es-419, es-AR, es-BO, es-BR, es-CR, es-CU, es-DO, es-EA, es-EC, es-ES, es-GQ, es-HN, es-IC, es-NI, es-PH, es-PY, es-SV, es-US, es-UY, es-VE, gu, gu-IN, ha, ha-GH, ha-NE, ha-NG, haw, haw-US, hi, hi-IN, km, km-KH, kn, kn-IN, ml, ml-IN, mr, mr-IN, pa, pa-Guru, pa-Guru-IN, pa-IN, pa-PK, ta, ta-IN, ta-LK, ta-MY, ta-SG, th, th-TH, to, to-TO, ur, ur-IN, ur-PK, zh-Hans-HK, zh-Hans-MO]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [th-TH-u-nu-thai-x-lvariant-TH]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [nus, nus-SS]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [en-NZ, es-CO, es-GT, es-PE, fr-BE, ms, ms-BN, ms-MY, ms-SG, nl-BE]
(?:0[1-9]|[12][0-9]|3[01])\Q-\E(?:0[1-9]|1[0-2])\Q-\E\d{1,4} [sv-FI]
(?:0[1-9]|[12][0-9]|3[01])\Q-\E(?:0[1-9]|1[0-2])\Q-\E\d{1,4} [es-CL, fy, fy-NL, my, my-MM, nl, nl-AW, nl-BQ, nl-CW, nl-NL, nl-SR, nl-SX, rm, rm-CH, te, te-IN]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [mk, mk-MK]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [nb, nb-NO, nb-SJ, nn, nn-NO, nn-NO, no, no-NO, pl, pl-PL, ro, ro-MD, ro-RO, tk, tk-TM]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4}\Q.\E [hr, hr-BA, hr-HR]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [az, az-AZ, az-Cyrl, az-Cyrl-AZ, az-Latn, az-Latn-AZ, cs, cs-CZ, de, de-AT, de-BE, de-CH, de-DE, de-LI, de-LU, et, et-EE, fo, fo-DK, fo-FO, fr-CH, gsw, gsw-CH, gsw-FR, gsw-LI, hy, hy-AM, it-CH, ka, ka-GE, kk, kk-KZ, ky, ky-KG, lb, lb-LU, lv, lv-LV, os, os-GE, os-RU, ru, ru-BY, ru-KG, ru-KZ, ru-MD, ru-RU, ru-UA, uk, uk-UA]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4}\Q.\E [bs, bs-BA, bs-Latn, bs-Latn-BA]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q \E\d{1,4} [kkj, kkj-CM]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [am, am-ET, asa, asa-TZ, bem, bem-ZM, bez, bez-TZ, cgg, cgg-UG, da, da-DK, da-GL, dav, dav-KE, ebu, ebu-KE, en-001, en-150, en-AG, en-AI, en-AT, en-BB, en-BM, en-BS, en-CC, en-CH, en-CK, en-CM, en-CX, en-CY, en-DE, en-DG, en-DK, en-DM, en-ER, en-FI, en-FJ, en-FK, en-FM, en-GB, en-GD, en-GG, en-GH, en-GI, en-GM, en-GY, en-IE, en-IL, en-IM, en-IO, en-JE, en-JM, en-KE, en-KI, en-KN, en-KY, en-LC, en-LR, en-LS, en-MG, en-MO, en-MS, en-MT, en-MU, en-MW, en-MY, en-NA, en-NF, en-NG, en-NL, en-NR, en-NU, en-PG, en-PH, en-PK, en-PN, en-PW, en-RW, en-SB, en-SC, en-SD, en-SH, en-SI, en-SL, en-SS, en-SX, en-SZ, en-TC, en-TK, en-TO, en-TT, en-TV, en-TZ, en-UG, en-VC, en-VG, en-VU, en-WS, en-ZM, fr, fr-BF, fr-BI, fr-BJ, fr-BL, fr-CD, fr-CF, fr-CG, fr-CI, fr-CM, fr-DJ, fr-DZ, fr-FR, fr-GA, fr-GF, fr-GN, fr-GP, fr-GQ, fr-HT, fr-KM, fr-LU, fr-MA, fr-MC, fr-MF, fr-MG, fr-ML, fr-MQ, fr-MR, fr-MU, fr-NC, fr-NE, fr-PF, fr-PM, fr-RE, fr-RW, fr-SC, fr-SN, fr-SY, fr-TD, fr-TG, fr-TN, fr-VU, fr-WF, fr-YT, ga, ga-IE, gd, gd-GB, guz, guz-KE, ig, ig-NG, jmc, jmc-TZ, kam, kam-KE, kde, kde-TZ, ki, ki-KE, kln, kln-KE, ksb, ksb-TZ, lag, lag-TZ, lg, lg-UG, luo, luo-KE, luy, luy-KE, mas, mas-KE, mas-TZ, mer, mer-KE, mgh, mgh-MZ, mt, mt-MT, naq, naq-NA, nd, nd-ZW, nyn, nyn-UG, pa-Arab, pa-Arab-PK, qu, qu-BO, qu-EC, qu-PE, rof, rof-TZ, rwk, rwk-TZ, saq, saq-KE, sbp, sbp-TZ, sn, sn-ZW, sw, sw-KE, sw-TZ, sw-UG, teo, teo-KE, teo-UG, tzm, tzm-MA, vai, vai-LR, vai-Latn, vai-Latn-LR, vai-Vaii, vai-Vaii-LR, vi, vi-VN, vun, vun-TZ, xog, xog-UG, yo, yo-BJ, yo-NG]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [cy, cy-GB, en-BE, en-BW, en-BZ, en-IN, es-MX, fur, fur-IT, gl, gl-ES, id, id-ID, it, it-IT, it-SM, nnh, nnh-CM, om, om-ET, om-KE, pt, pt-AO, pt-BR, pt-CH, pt-CV, pt-GQ, pt-GW, pt-LU, pt-MO, pt-MZ, pt-PT, pt-ST, pt-TL, so, so-DJ, so-ET, so-KE, so-SO, ti, ti-ER, ti-ET, uz, uz-AF, uz-Cyrl, uz-Cyrl-UZ, uz-Latn, uz-Latn-UZ, uz-UZ, yi, yi-001, zh-Hans-SG, zh-SG]
(?:0[1-9]|[12][0-9]|3[01])\Q‏/\E(?:0[1-9]|1[0-2])\Q‏/\E\d{1,4} [ar, ar-001, ar-AE, ar-BH, ar-DJ, ar-DZ, ar-EG, ar-EH, ar-ER, ar-IL, ar-IQ, ar-JO, ar-KM, ar-KW, ar-LB, ar-LY, ar-MA, ar-MR, ar-OM, ar-PS, ar-QA, ar-SA, ar-SD, ar-SO, ar-SS, ar-SY, ar-TD, ar-TN, ar-YE]
\d{1,4}\Q-\E(?:0[1-9]|1[0-2])\Q-\E(?:0[1-9]|[12][0-9]|3[01]) [af, af-NA, af-ZA, as, as-IN, bo, bo-CN, bo-IN, br, br-FR, ce, ce-RU, ckb, ckb-IQ, ckb-IR, cu, cu-RU, dz, dz-BT, en-CA, en-SE, gv, gv-IM, ii, ii-CN, jgo, jgo-CM, kl, kl-GL, kok, kok-IN, kw, kw-GB, lkt, lkt-US, lrc, lrc-IQ, lrc-IR, lt, lt-LT, mgo, mgo-CM, mn, mn-MN, mzn, mzn-IR, ne, ne-IN, ne-NP, prg, prg-001, se, se-FI, se-NO, se-SE, si, si-LK, smn, smn-FI, sv, sv-AX, sv-SE, und, uz-Arab, uz-Arab-AF, vo, vo-001, wae, wae-CH]
\d{1,4}\Q. \E(?:0[1-9]|1[0-2])\Q. \E(?:0[1-9]|[12][0-9]|3[01])\Q.\E [hu, hu-HU]
\d{1,4}\Q/\E(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01]) [fa, fa-AF, fa-IR, ps, ps-AF, yue, yue-HK, zh, zh-CN, zh-Hans, zh-Hans-CN, zh-Hant, zh-Hant-TW, zh-TW]
\d{1,4}\Q/\E(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01]) [en-ZA, eu, eu-ES, ja, ja-JP]
\d{1,4}\Q-\E(?:0[1-9]|1[0-2])\Q-\E(?:0[1-9]|[12][0-9]|3[01]) [eo, eo-001, fr-CA, sr-BA]
\d{1,4}\Q. \E(?:0[1-9]|1[0-2])\Q. \E(?:0[1-9]|[12][0-9]|3[01])\Q.\E [ko, ko-KP, ko-KR]
\d{1,4}\Q/\E(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01]) [sah, sah-RU]
\d{1,4}\Q/\E(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01]) [ak, ak-GH, rw, rw-RW]

What you're asking is really complicated, but it's not impossible — just likely many hundreds of lines of code before you're done. I'm really not sure that this is the route you want to go — honestly, if you already know what format the date is in, you should probably just parse() it — but let's say for the sake of argument that you really do want to turn a date pattern like YYYY-mm-dd HH:mm:ss into a regular expression that can match dates in that format.
There are several steps in the solution: You'll need to lexically analyze the pattern; transform the tokens into correct regex pieces in the current locale; and then mash them all together to make a regex you can use. (Thankfully, you don't need to perform complex parsing on the date-pattern string; lexical analysis is good enough for this.)
Lexical analysis or tokenization is the act of breaking the input string into its component tokens, so that instead of an array of characters, it becomes a sequence of enumerated values or objects: So for the previous example, you'd end up with an array or list like this: [YYYY, Hyphen, mm, Hyphen, dd, Space, HH, Colon, mm, Colon, ss]. This kind of tokenization is often done with a big state machine, and you may be able to find some open-source code somewhere (part of the Android source code, maybe?) that already does it. If not, you'll have to read each letter, count up how many of that letter there is, and choose an appropriate enum value to add to the growing list of tokens.
Once you have the tokenized sequence of elements, the next step is to transform each into a chunk of a regular expression that is valid for the current localization. This is probably a giant switch statement inside a loop over the tokens, and thus would turn a YYYY enum value into the string piece "[0-9]{4}", or the mmm enum value into a big chunk of regex string that matches all of the month names in the current locale ("jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec"). This obviously involves you pulling all the data for the given locale, so that you can make regex chunks out of its words.
Finally, you can concatenate all of the regex bits together, wrapping each bit in parentheses to ensure precedence is correct, and then finally Pattern.compile() the whole string. Don't forget to make it use a case-insensitive test.
If you don't know what locale you're in, you'll have to do this many times to produce many regexes for each possible locale, and then test the input against each one of them in turn.
This is a project-and-a-half, but it is something that could be built, if you really really really need it to work exactly like you described.
But again, if I were you, I'd stick with something that already exists: If you already know what locale you're in (or even if you don't), the parse() method already not only does the lexical analysis and input-validation for you — and is not only already written! — but it also produces a usable date object, too!

I still think that parsing from each position in the string and seeing if it succeeds is simpler and easier than first generating a regular expression.
Locale loc = Locale.forLanguageTag("en-AS");
DateTimeFormatter dateFormatter
= DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT).withLocale(loc);
String mixed = "09/03/18Some data06/29/18Some other data04/27/18A third piece of data";
// Check that the string starts with a date
ParsePosition pos = new ParsePosition(0);
LocalDate.from(dateFormatter.parse(mixed, pos));
int dataStartIndex = pos.getIndex();
System.out.println("Date: " + mixed.substring(0, dataStartIndex));
int candidateDateStartIndex = dataStartIndex;
while (candidateDateStartIndex < mixed.length()) {
try {
pos.setIndex(candidateDateStartIndex);
LocalDate.from(dateFormatter.parse(mixed, pos));
// Date found
System.out.println("Data: "
+ mixed.substring(dataStartIndex, candidateDateStartIndex));
dataStartIndex = pos.getIndex();
System.out.println("Date: "
+ mixed.substring(candidateDateStartIndex, dataStartIndex));
candidateDateStartIndex = dataStartIndex;
} catch (DateTimeException dte) {
// No date here; try next
candidateDateStartIndex++;
pos.setErrorIndex(-1); // Clear error
}
}
System.out.println("Data: " + mixed.substring(dataStartIndex, mixed.length()));
The output from this snippet was:
Date: 09/03/18
Data: Some data
Date: 06/29/18
Data: Some other data
Date: 04/27/18
Data: A third piece of data
If you’re happy with the accepted answer, please don’t let me take that away from you. Only please allow me to demonstrate the alternative to anyone reading along.
Exactly because I am presenting this for a broader audience, I am using java.time, the modern Java date and time API. If your data was originally written with a DateFormat, you may want to substitute that class into the above code. I trust you to do that in that case.

Time series data aggregation per similar snapshots

I have a cassandra table defined like below:
create table if not exists test(
id int,
readDate timestamp,
totalreadings text,
readings text,
PRIMARY KEY(meter_id, date)
) WITH CLUSTERING ORDER BY(date desc);
The reading contains the map of all snapshots of data collected at regular intervals (30 minutes) along with aggregated data for full day.
The data would like below :
id=8, readDate=Tue Dec 20 2016, totalreadings=220.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
id=8, readDate=Tue Dec 21 2016, totalreadings=221.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
id=8, readDate=Tue Dec 22 2016, totalreadings=219.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
id=8, readDate=Tue Dec 23 2016, totalreadings=224.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
The java pojo classes look like below:
public class Test{
private int id;
private Date readDate;
private String totalreadings;
private Map<Integer, Double> readings;
//setters
//getters
}
I am trying to find last 4 days aggregated average of all reading per snapshot. So logically, i have 4 list for last 4 days Test object and each of them has a map containing reading across the intervals.
Is there a simple way to find aggregate of a similar snapshot entries across 4 days . For example , i want to aggregate specific data snapshots (1,2,3,4,5,6,etc) only not the total aggregate.

After changing you table-structure a little bit the problem can be solved completely in Cassandra. - Mainly I have put your readings into a map.
create table test(
id int,
readDate timestamp,
totalreadings float,
readings map<int,float>,
PRIMARY KEY(id, readDate)
) WITH CLUSTERING ORDER BY(readDate desc);
Now I entered a bit of your data using CQL:
insert into test (id,readDate,totalReadings, readings ) values (8 '2016-12-20', 220.0, {0:9.0, 1:0.0, 2:9.0, 3:5.0, 4:2.0, 5:7.0, 6:1.0, 7:3.0, 8:9.0, 9:2.0, 10:5.0, 11:1.0, 12:1.0, 13:2.0, 14:4.0, 15:4.0, 16:7.0, 17:7.0, 18:5.0, 19:4.0, 20:9.0, 21:6.0, 22:8.0, 23:4.0, 24:6.0, 25:3.0, 26:5.0, 27:7.0, 28:2.0, 29:0.0, 30:8.0, 31:9.0, 32:1.0, 33:8.0, 34:9.0, 35:2.0, 36:4.0, 37:5.0, 38:4.0, 39:7.0, 40:3.0, 41:2.0, 42:1.0, 43:2.0, 44:4.0, 45:5.0, 46:3.0, 47:1.0});
insert into test (id,readDate,totalReadings, readings ) values (8, '2016-12-21', 221.0,{0:9.0, 1:0.0, 2:9.0, 3:5.0, 4:2.0, 5:7.0, 6:1.0, 7:3.0, 8:9.0, 9:2.0, 10:5.0, 11:1.0, 12:1.0, 13:2.0, 14:4.0, 15:4.0, 16:7.0, 17:7.0, 18:5.0, 19:4.0, 20:9.0, 21:6.0, 22:8.0, 23:4.0, 24:6.0, 25:3.0, 26:5.0, 27:7.0, 28:2.0, 29:0.0, 30:8.0, 31:9.0, 32:1.0, 33:8.0, 34:9.0, 35:2.0, 36:4.0, 37:5.0, 38:4.0, 39:7.0, 40:3.0, 41:2.0, 42:1.0, 43:2.0, 44:4.0, 45:5.0, 46:3.0, 47:1.0});
To extract single values out of the map I created a User defined function (UDF). This UDF picks the right value aut of your map containing the readings. See Cassandra docs on UDF for more on UDFs. Note that UDFs are disabled in cassandra by default so you need to modify cassandra.yaml to include enable_user_defined_functions: true
create function map_item(readings map<int,float>, idx int) called on null input returns float language java as ' return readings.get(idx);';
After creating the function you can calculate your average as
select avg(map_item(readings, 7)) from test where readDate > '2016-12-20' allow filtering;
which gives me:
system.avg(betterconnect.map_item(readings, 7))
-------------------------------------------------
3
You may want to supply the date fort your where-clause and the index (7 in my example) as parameters from your application.

What are JDBC-mysql Driver settings for sane handling of DATETIME and TIMESTAMP in UTC?

Having been burned by mysql timezone and Daylight Savings "hour from hell" issues in the past, I decided my next application would store everything in UTC timezone and only interact with the database using UTC times (not even the closely-related GMT).
I soon ran into some mysterious bugs. After pulling my hair out for a while, I came up with this test code:
try(Connection conn = dao.getDataSource().getConnection();
Statement stmt = conn.createStatement()) {
Instant now = Instant.now();
stmt.execute("set time_zone = '+00:00'");
stmt.execute("create temporary table some_times("
+ " dt datetime,"
+ " ts timestamp,"
+ " dt_string datetime,"
+ " ts_string timestamp,"
+ " dt_epoch datetime,"
+ " ts_epoch timestamp,"
+ " dt_auto datetime default current_timestamp(),"
+ " ts_auto timestamp default current_timestamp(),"
+ " dtc char(19) generated always as (cast(dt as character)),"
+ " tsc char(19) generated always as (cast(ts as character)),"
+ " dt_autoc char(19) generated always as (cast(dt_auto as character)),"
+ " ts_autoc char(19) generated always as (cast(ts_auto as character))"
+ ")");
PreparedStatement ps = conn.prepareStatement("insert into some_times "
+ "(dt, ts, dt_string, ts_string, dt_epoch, ts_epoch) values (?,?,?,?,from_unixtime(?),from_unixtime(?))");
DateTimeFormatter dbFormat = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").withZone(ZoneId.of("UTC"));
ps.setTimestamp(1, new Timestamp(now.toEpochMilli()));
ps.setTimestamp(2, new Timestamp(now.toEpochMilli()));
ps.setString(3, dbFormat.format(now));
ps.setString(4, dbFormat.format(now));
ps.setLong(5, now.getEpochSecond());
ps.setLong(6, now.getEpochSecond());
ps.executeUpdate();
ResultSet rs = stmt.executeQuery("select * from some_times");
ResultSetMetaData md = rs.getMetaData();
while(rs.next()) {
for(int c=1; c <= md.getColumnCount(); ++c) {
Instant inst1 = Instant.ofEpochMilli(rs.getTimestamp(c).getTime());
Instant inst2 = Instant.from(dbFormat.parse(rs.getString(c).replaceAll("\\.0$", "")));
System.out.println(inst1.getEpochSecond() - now.getEpochSecond());
System.out.println(inst2.getEpochSecond() - now.getEpochSecond());
}
}
}
Note how the session timezone is set to UTC, and everything in the Java code is very timezone-aware and forced to UTC. The only thing in this entire environment which is not UTC is the JVM's default timezone.
I expected the output to be a bunch of 0s but instead I get this
0
-28800
0
-28800
28800
0
28800
0
28800
0
28800
0
28800
0
28800
0
0
-28800
0
-28800
28800
0
28800
0
Each line of output is just subtracting the time stored from the time retrieved. The result in each row should be 0.
It seems the JDBC driver is performing inappropriate timezone conversions. For an application which interacts fully in UTC although it runs on a VM that's not in UTC, is there any way to completely disable the TZ conversions?
i.e. Can this test be made to output all-zero rows?
UPDATE
Using useLegacyDatetimeCode=false (cacheDefaultTimezone=false makes no difference) changes the output but still not a fix:
0
-28800
0
-28800
0
-28800
0
-28800
0
-28800
0
-28800
0
-28800
0
-28800
0
0
0
0
0
0
0
0
UPDATE2
Checking the console (after changing the test to create a permanent table), I see all the values are STORED correctly:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 27148
Server version: 5.7.12-log MySQL Community Server (GPL)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> set time_zone = '-00:00';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM some_times \G
*************************** 1. row ***************************
dt: 2016-11-18 15:39:51
ts: 2016-11-18 15:39:51
dt_string: 2016-11-18 15:39:51
ts_string: 2016-11-18 15:39:51
dt_epoch: 2016-11-18 15:39:51
ts_epoch: 2016-11-18 15:39:51
dt_auto: 2016-11-18 15:39:51
ts_auto: 2016-11-18 15:39:51
dtc: 2016-11-18 15:39:51
tsc: 2016-11-18 15:39:51
dt_autoc: 2016-11-18 15:39:51
ts_autoc: 2016-11-18 15:39:51
1 row in set (0.00 sec)
mysql>

The solution is to set JDBC connection parameter noDatetimeStringSync=true with useLegacyDatetimeCode=false. As a bonus I also found sessionVariables=time_zone='-00:00' alleviates the need to set time_zone explicitly on every new connection.
There is some "intelligent" timezone conversion code that gets activated deep inside the ResultSet.getString() method when it detects that the column is a TIMESTAMP column.
Alas, this intelligent code has a bug: TimeUtil.fastTimestampCreate(TimeZone tz, int year, int month, int day, int hour, int minute, int seconds, int secondsPart) returns a Timestamp wrongly tagged to the JVM's default timezone, even when the tz parameter is set to something else:
final static Timestamp fastTimestampCreate(TimeZone tz, int year, int month, int day, int hour, int minute, int seconds, int secondsPart) {
Calendar cal = (tz == null) ? new GregorianCalendar() : new GregorianCalendar(tz);
cal.clear();
// why-oh-why is this different than java.util.date, in the year part, but it still keeps the silly '0' for the start month????
cal.set(year, month - 1, day, hour, minute, seconds);
long tsAsMillis = cal.getTimeInMillis();
Timestamp ts = new Timestamp(tsAsMillis);
ts.setNanos(secondsPart);
return ts;
}
The return ts would be perfectly valid except when further up in the call chain, it is converted back to a string using the bare toString() method, which renders the ts as a String representing what a clock would display in the JVM-default timezone, instead of a String representation of the time in UTC. In ResultSetImpl.getStringInternal(int columnIndex, boolean checkDateTypes):
case Types.TIMESTAMP:
Timestamp ts = getTimestampFromString(columnIndex, null, stringVal, this.getDefaultTimeZone(), false);
if (ts == null) {
this.wasNullFlag = true;
return null;
}
this.wasNullFlag = false;
return ts.toString();
Setting noDatetimeStringSync=true disables the entire parse/unparse mess and just returns the string value as-is received from the database.
Test output:
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
The useLegacyDatetimeCode=false is still important because it changes the behaviour of getDefaultTimeZone() to use the database server's TZ.
While chasing this down I also found the documentation for useJDBCCompliantTimezoneShift is incorrect, although it makes no difference: documentation says [This is part of the legacy date-time code, thus the property has an effect only when "useLegacyDatetimeCode=true."], but that's wrong, see ResultSetImpl.getNativeTimestampViaParseConversion(int, Calendar, TimeZone, boolean).

java PosixFileAttributes return wrong atime and mtime

My code is like
String path = "/home/user/tmp/file1";
Path p = FileSystems.getDefault().getPath(path);
PosixFileAttributes attrs = Files.readAttributes(p, PosixFileAttributes.class);
System.out.println("Last Modified Time: "+attrs.lastModifiedTime());
System.out.println("Last Access Time: "+attrs.lastAccessTime());
The time returned by lastModifiedTime() and lastAccessTime() are 4 hours difference with the correct one.
The output is
Last Modified Time: 2014-06-25T12:50:31Z
Last Access Time: 2014-06-25T18:26:07Z
stat file1 produce:
Access: 2014-06-25 14:26:07.870281008 -0400
Modify: 2014-06-25 08:50:31.922861913 -0400
Change: 2014-06-25 08:50:31.922861913 -0400
Any one can help me?

A time like
2014-06-25T12:50:31Z
is in UTC (that's the Z at the end), so it may be off according to your time zone.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

TimeZone issue with ibatis ORM and postgres - java

It may be best to set the database timezone to UTC and implement an UTC-preserving MyBatis type handler as shown in the accepted answer to this: What's the right way to handle UTC date-times using Java, iBatis, and Oracle?.

Related

Find max trips per day from DataFrame in Java - Spark

Dynamically creating a regex from a DateFormat

Time series data aggregation per similar snapshots

What are JDBC-mysql Driver settings for sane handling of DATETIME and TIMESTAMP in UTC?

java PosixFileAttributes return wrong atime and mtime

Categories

Resources