Create a DateTimeFormater with an Optional Section at Beginning - java

I have timecodes with this structure hh:mm:ss.SSS for which i have a own Class, implementing the Temporal Interface.
It has the custom Field TimecodeHour Field allowing values greater than 23 for hour.
I want to parse with DateTimeFormatter. The hour value is optional (can be omitted, and hours can be greater than 24); as RegEx (\d*\d\d:)?\d\d:\d\d.\d\d\d
For the purpose of this Question my custom Field can be replaced with the normal HOUR_OF_DAY Field.
My current Formatter
DateTimeFormatter UNLIMITED_HOURS = new DateTimeFormatterBuilder()
.appendValue(ChronoField.HOUR_OF_DAY, 2, 2,SignStyle.NEVER)
.appendLiteral(':')
.parseDefaulting(TimecodeHour.HOUR, 0)
.toFormatter(Locale.ENGLISH);
DateTimeFormatter TIMECODE = new DateTimeFormatterBuilder()
.appendOptional(UNLIMITED_HOURS)
.appendValue(MINUTE_OF_HOUR, 2)
.appendLiteral(':')
.appendValue(SECOND_OF_MINUTE, 2)
.appendFraction(MILLI_OF_SECOND, 3, 3, true)
.toFormatter(Locale.ENGLISH);
Timecodes with a hour value parse as expected, but values with hours omittet throw an Exception
java.time.format.DateTimeParseException: Text '20:33.123' could not be parsed at index 5
I assume, as hour and minute have the same pattern, the parser starts at front and captures the minute value for the optional section.
Is this right, and how can solve this?

I started to suspect that 20:33.123 wasn’t meant to indicate a time of day between 20 and 21 minutes past midnight. Maybe rather an amount of time, a little longer than 20 minutes. If this is correct, use a Duration for it.
Unfortunately java.time does not include means for parsing and formatting a Duration in other than ISO 8601 format. This leaves us with at least three options:
Use a third-party library. Time4J offers an elegant solution, see below. Joda-Time has its PeriodFormatter class. Apache may also offer facilities for parsing and formatting of durations.
Convert your string to ISO 8601 format before parsing with Duration.parse().
Write your own parser.
I was thinking that we’re too lazy for 3. and that Joda-Time is getting dated, so I want to pursue options 1. and 2. here, option 1. in the Time4J variant.
A regex for adapting to ISO 8601
ISO 8601 format for a duration feels unusual at first, but is straightforward. PT20M33.123S means 20 minutes 33.123 seconds.
public static Duration parse(String timeCodeString) {
String iso8601 = timeCodeString
.replaceFirst("^(\\d{2,}):(\\d{2}):(\\d{2}\\.\\d{3})$", "PT$1H$2M$3S")
.replaceFirst("^(\\d{2}):(\\d{2}\\.\\d{3})$", "PT$1M$2S");
return Duration.parse(iso8601);
}
Let’s try it out:
System.out.println(parse("20:33.123"));
System.out.println(parse("123:20:33.123"));
Output is:
PT20M33.123S
PT123H20M33.123S
My two calls to replaceFirst first handle the case with hours, then the case without hours. So either will convert a string that matches your regex to ISO 8601 format. Which the Duration class then parses. And as you can see, Duration also prints ISO 8601 format back. Formatting it differently is not bad, though, search for how.
Time4J
The Time4J library offers the really elegant solution very much along the same line of thought as yours. All we really need is this formatter:
private static final Formatter<ClockUnit> TIME_CODE_PARSER
= Duration.formatter(ClockUnit.class, "[###hh:mm:ss.fff][mm:ss.fff]");
Simply use like this:
System.out.println(TIME_CODE_PARSER.parse("20:33.123"));
System.out.println(TIME_CODE_PARSER.parse("123:20:33.123"));
PT20M33,123000000S
PT123H20M33,123000000S
The Time4J Duration class too prints ISO 8601 format. It appears that it uses comma as decimal separator as is preferred in ISO 8601, and that it prints 9 decimals on the seconds also when some of them are 0.
In the format pattern string ###hh means 2 to 5 digit hours, and fff means three digits of decimal fraction of second.
Anything wrong with your approach?
Was there anything wrong with your approach? ChronoField.HOUR_OF_DAY means that: hour of day. 0 is midnight, 12 is noon and 23 is near the end of the day. This is not what you want, so yes, you are using the wrong means. While you can probably get it to work, anyone maintaining your code after you will find it confusing and will probably have a hard time making modification in line with your intentions.
Links
Wikipedia article: ISO 8601
Joda-Time PeriodFormatter
Time4J TimeSpanFormatter

Try with two optional parts (one with hours, other without) like in:
var formatter = new DateTimeFormatterBuilder()
.optionalStart()
.appendValue(HOUR_OF_DAY, 2, 4, SignStyle.NEVER).appendLiteral(":")
.appendValue(MINUTE_OF_HOUR, 2).appendLiteral(":")
.appendValue(SECOND_OF_MINUTE, 2)
.optionalEnd()
.optionalStart()
.parseDefaulting(HOUR_OF_DAY, 0)
.appendValue(MINUTE_OF_HOUR, 2).appendLiteral(":")
.appendValue(SECOND_OF_MINUTE, 2)
.optionalEnd()
.toFormatter(Locale.ENGLISH);
I do not know about TimecodeHour, so I used HOUR_OF_DAY to test(also too lazy to include fractions)

I think fundamentally the problem is that it gets stuck going down the wrong path. It sees a field of length 2, which we know is the minutes but it believes is the hours. Once it believes the optional section is present, when we know it's not, the whole thing is destined to fail.
This is provable by changing the minimum hour length to 3.
.appendValue(TimecodeHour.HOUR, 3, 4, SignStyle.NEVER)
It now knows that the "20" cannot be hours, since hours requires at least 3 digits. With this small change, it now parses correctly, whether the optional section is present or not.
So presuming that the hours field really does need to be between 2 and 4 digits, I think you're stuck with having to implement a workaround. For example, count the number of colons in the string and use a different formatter depending on which one you run into. Using a different delimiter besides a colon for the hours would also work.
The parser logic has undergone quite a few bug fixes over the various Java versions since it was introduced - as you can imagine, there are so many potential edge cases - so I was hopeful using a recent version of Java would make this problem disappear. Unfortunately, it seems even in Java 16, the behaviour is still the same.

Related

SimpleDateFormat [0] issue

I've below SimpleDateFormat Code
Date date = new Date();
DateFormat inpuDateFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SS'Z'");
Calendar calendar = Calendar.getInstance();
calendar.setTime(date);
String dateStr = inpuDateFormat.format(cal.getTime());
It works perfectly on my dev servers but it fails on sandbox instances with following error.
org.junit.ComparisonFailure: expected:<...20-08-12T19:06:02.85[0]Z> but was:<...20-08-12T19:06:02.85[]Z>
I've handled it as
dateStr = dateStr.replace("[0]","");
dateStr = dateStr.replace("[]","");
But, I still didn't get the logic why my date is different on different server instances and is there any better way to handle it
java.time
There certainly is a much better way to handle it. Use java.time, the modern Java date and time API, for your date and time work, not Date, DateFormat, SimpleDateFormat nor Calendar.
Instant now = Instant.now();
String dateStr1 = now.toString();
System.out.println(dateStr1);
Output in one run was:
2020-07-24T18:06:07.988093Z
You notice that six decimals on the seconds were output, not two. In other runs you may have three decimals or no fraction at all. Don’t worry, for the majority of purposes you’ll be just fine. The format printed is ISO 8601, and according to ISO 8601 the count of decimals on the seconds, even the presence of seconds at all, is optional. So whatever you need the string for, as long as ISO 8601 format is expected, the string from the above code snippet should be accepted.
I am exploiting the fact that Instant.toString() produces ISO 8601 format, so we don’t need any formatter.
If for some strange reason you do need exactly two decimals on the seconds, use a formatter for specifying so (edit: now outputting Z):
DateTimeFormatter formatter2 = DateTimeFormatter.ofPattern("uuuu-MM-dd'T'HH:mm:ss.SSX")
.withZone(ZoneOffset.UTC);
String dateStr2 = formatter2.format(now);
System.out.println(dateStr2);
2020-07-24T18:06:07.98Z
To a DateTimeFormatter (opposite a SimpleDateFormat) uppercase S in the format pattern string means fraction of second, and you are free to place from one through nine of them to get from one to nine decimals.
What went wrong in your code?
First, the message that you got from your JUnit test was:
org.junit.ComparisonFailure: expected:<...20-08-12T19:06:02.85[0]Z> but was:<...20-08-12T19:06:02.85[]Z>
The square brackets is JUnit’s way of drawing our attention to the difference between the expected and the actual value. So they are not part of those values. What JUnit tells us is that the value was expected to end in .850Z but instead ended in just .85Z. So a zero was missing. Your test is probably too strict since as I said, it shouldn’t matter whether there are two or three decimals. And 02.85 and 02.850 are just different ways of denoting the exact same value.
This role of the square brackets also explains why replacing [0] and [] in the string didn’t help: the square brackets were never in the strings, so the replacements never made any change to the strings.
Second, to SimpleDateFormat (opposite DateTimeFormatter) format pattern letter uppercase S means millisecond. So putting any other number than three of them makes no sense and gives you incorrect results. In your code you put two. In nine of ten cases the millisecond value is in the interval 100 through 999, and in this case SimpleDateFormat prints all three digits in spite of the only two pattern letters S. This probably explains why your unit test passed in your development environment. On your sandbox incidentally the time ended in 2.085 seconds. The correct ways to render this include 02.08 and 02.085. Your SimpleDateFormat chose neither. To it the millisecond value of 85 was to be rendered in two positions, so it produces 02.85, which is the wrong value, 765 milliseconds later. And your unit test objected while this once there were only two decimals, not three.
Third, not what you asked, but no matter if using the troublesome SimpleDateFormat or the modern DateTimeFormatter you must never hardcode Z as a literal in the format pattern string. The trailing Z means UTC or offset zero from UTC. It needs to be printed (and parsed if that were the case) as an offset, or you get wrong results. The way to make sure you get a Z and not for example an offset of +02:00 is to make sure that an offset of 0 is specified. This was why I put .withZone(ZoneOffset.UTC) on my formatter.
Links
Oracle tutorial: Date Time explaining how to use java.time.
Wikipedia article: ISO 8601
Try to remove the quotes around the 'Z', as 'Z' is a constant whilst without quotes it means 'time zone':
DateFormat inpuDateFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ");
(By the way, in most cases you want to use three decimal places for milliseconds: "SSS".)

Is there a way in Java to convert strings to dates that are either MMddyyyy or Mddyyyy?

I've been trying to convert strings to dates. Some of them show up like this: 1011970 (as in January 1, 1970) and some show up like this: 10011970 (as in October 1, 1970). The fact that the month is at the beginning has created a big problem for me.
I have already come up with the solution that I can just check how many digits the number has and use separate formatters, but I would prefer to use something a little more elegant. I have been trying to use the DateTimeFormatterBuilder to create a 'one size fits all' formatter.
Heres an example of something I've tried and the output I've gotten.
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendValue(ChronoField.MONTH_OF_YEAR, 1, 2, SignStyle.NORMAL)
.appendPattern("ddyyyy")
.toFormatter();
System.out.println(LocalDate.parse("10011970", formatter));
System.out.println(LocalDate.parse("1011970", formatter));
Date: 1970-10-01
Exception in thread "main" java.time.format.DateTimeParseException: Text '1011970' could not be parsed at index 4
at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
at java.time.LocalDate.parse(LocalDate.java:400)
at Main.main(Main.java:36)
So as you can see the above solution works for the first formatted date, but not the second.
Please let me know if you have any ideas.
Thanks in advance!
James
You were on the right track. This works:
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendValue(ChronoField.MONTH_OF_YEAR)
.appendValue(ChronoField.DAY_OF_MONTH, 2)
.appendValue(ChronoField.YEAR, 4)
.toFormatter();
System.out.println(LocalDate.parse("10011970", formatter));
System.out.println(LocalDate.parse("1011970", formatter));
And the output is:
1970-10-01
1970-01-01
I don’t know why it doesn’t work when specifying day of month and year through a format pattern, but it doesn’t, I have seen that before.
Other than that the rule of thumb for adjacent value parsing (parsing of numeric date-time fields with no separator between them) is that you need to specify exact widths of each field except the first. Then the formatter calculates widths from the back end of the string to find out how many digis to use for the first value (here the month). So your use case fits nicely.

How do I parse an ISO-8601 formatted string that contains no punctuation in Java 8?

How could I parse the following String to a LocalDateTime-Object?
20200203092315000000
I always get the following exception but I didn't understand it:
java.time.format.DateTimeParseException: Text '20200203092315000000' could not be parsed at index 0
at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
at java.time.LocalDateTime.parse(LocalDateTime.java:492)
at de.x.struct.type.LocalDateTimeStructField.setBytesValue(LocalDateTimeStructField.java:44)
at de.x.struct.Struct.bytesToStruct(Struct.java:110)
at de.x.struct.StructTest.testStringToStruct(StructTest.java:60)
My application code looks like:
LocalDateTime ldt = LocalDateTime.parse("20200203092315000000", DateTimeFormatter.ofPattern("yyyyMMddHHmmssSSSSSS"));
looks like a known issue...
bug_id=JDK-8031085
bug_id=JDK-8138676
Workaround:
DateTimeFormatter dtf = new
DateTimeFormatterBuilder().appendPattern("yyyyMMddHHmmss").appendValue(ChronoField.MILLI_OF_SECOND,
3).toFormatter()
or
CUSTOMER SUBMITTED WORKAROUND : use the following format (mind the
'.'): "yyyyMMddHHmmss.SSS"
LocalDateTime.parse("20150910121314987",
DateTimeFormatter.ofPattern("yyyyMMddHHmmss.SSS"))
or alternatively use jodatime library
I have been using the following code for close to three years and it has served me well.
ISO 8601 allows two formats: basic and extended. The basic format does not contain any punctuation while the extended format does. For example, 2017-12-10T09:47:00-04:00 in extended format is equivalent to 20171210T094700-0400 in basic format.
Since the behavior in Java 8 is to only accept the extended format, I typically use a new DateTimeFormatter that accepts both.
This format is not an accurate representation of what ISO 8601 requires. The rule is that a textual representation of a timestamp either be completely basic (i.e., no punctuation) or completely extended (i.e., all punctuation present). It is not valid to have just some punctuation but I have found that some libraries do not honor this rule. Specifically, Gson produces timestamps with the colon missing in the time zone specifier. Since I wish to be liberal in what I accept, I completely ignore all punctuation.
Java 8 also fails to accurately handle years especially when the punctuation is missing. ISO 8601 requires exactly four digits in the year but allows extra digits if both parties agree on an explicit number of digits. In this case, the year MUST be preceded by a sign. However, Java 8 does not enforce the requirement to have a sign and accepts from four to nine digits. While this approach is more liberal and in keeping with what I am usually trying to accomplish, it makes it impossible to parse a year since I cannot know how many digits should be present. I tighten the format to honor ISO 8601 in this case although an alternative approach is available if needed. For example, you could pre-parse the text knowing how many digits you expect and add the sign if there are too many digits.
I recommend only using this formatter when parsing and not when serializing since I prefer to be strict in what I produce.
The code below should accept your format, with and without a timezone offset. It only deviates in what you have in that it accepts nine digits for the fractional seconds instead of six. You can adjust it if needed.
// somewhere above
import java.time.format.SignStyle;
import static java.time.temporal.ChronoField.*;
static DateTimeFormatter ISO_8601_LENIENT_FORMAT = new DateTimeFormatterBuilder()
.parseCaseInsensitive()
.appendValue(YEAR, 4, 4, SignStyle.EXCEEDS_PAD)
.optionalStart().appendLiteral('-').optionalEnd() // Basic format has no punctuation
.appendValue(MONTH_OF_YEAR, 2)
.optionalStart().appendLiteral('-').optionalEnd()
.appendValue(DAY_OF_MONTH, 2)
.optionalStart().appendLiteral('T').optionalEnd() // permitted to omit the 'T' character by mutual agreement
.appendValue(HOUR_OF_DAY, 2)
.optionalStart().appendLiteral(':').optionalEnd()
.appendValue(MINUTE_OF_HOUR, 2)
.optionalStart() // seconds are optional
.optionalStart().appendLiteral(':').optionalEnd()
.appendValue(SECOND_OF_MINUTE, 2)
.optionalStart().appendFraction(NANO_OF_SECOND, 0, 9, false).optionalEnd()
.optionalEnd()
.optionalStart().appendOffset("+HH:MM", "Z").optionalEnd()
.optionalStart().appendOffset("+HHMM", "Z").optionalEnd()
.optionalStart().appendOffset("+HH", "Z").optionalEnd()
.toFormatter()
;
A more strict version that rejects all punctuation, assumes non-negative years, limits fractional seconds to six digits and omits timezone information follows.
static DateTimeFormatter ISO_8601_NO_PUNCTUATION = new DateTimeFormatterBuilder()
.appendValue(YEAR, 4, 4, SignStyle.NEVER)
.appendValue(MONTH_OF_YEAR, 2)
.appendValue(DAY_OF_MONTH, 2)
.appendValue(HOUR_OF_DAY, 2)
.appendValue(MINUTE_OF_HOUR, 2)
.appendValue(SECOND_OF_MINUTE, 2)
.appendFraction(NANO_OF_SECOND, 6, 6, false)
.toFormatter()
;
your input is not valid and its out of range.
use this code when your input is correct.
Calendar cal = Calendar.getInstance();
cal.setTimeInMillis(your input * 1000);
System.out.println(cal.getTime());

Parse java.time trying multiple patterns

We have a library where users can pass in dates in multiple formats. They follow the ISO but are abbreviated at times.
So we get things like "19-3-12" and "2019-03-12T13:12:45.1234" where the fractional seconds can be 1 - 7 digits long. It's a very large number of combinations.
DateTimeFormatter.parseBest doesn't work because it won't accept "yy-m-d" for a local date. The solutions here won't work because it assumes we know the pattern - we don't.
And telling people to get their string formats "correct" won't work as there's a ton of existing data (these are mostly in XML & JSON files).
My question is, how can I parse strings coming in in these various pattersn without have to try 15 different explicit patterns?
Or even better, is there some way to parse a string and it will try everything possible and return a Temporal object if the string makes sense for any date[time]?
Without a full specification it is hard to give a precise recommendation. The techniques generally used for variable formats include:
Trying a number of known formats in turn.
Optional parts in the format pattern.
DateTimeFormatterBuilder.parseDefaulting() for parts that may be absent from the parsed string.
As you are aware, parseBest.
I am assuming that y-M-d always come in this order (never M-d-y or d-M-y, for example). 19-3-12 conflicts with ISO 8601 since the standard requires (at least) 4 digit year and 2 digit month. A challenge with 2-digit year is guessing the century: is this 1919 or 2019 or might it be 2119?
The good news: presence and absence of seconds and varying number of fractional digits are all built-in and pose no problem.
From what you have told us it seems to me that the following is a fair shot.
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("[uuuu][uu]-M-d")
.optionalStart()
.appendLiteral('T')
.append(DateTimeFormatter.ISO_LOCAL_TIME)
.optionalEnd()
.toFormatter();
TemporalAccessor dt = formatter.parseBest("19-3-12", LocalDateTime::from, LocalDate::from);
System.out.println(dt.getClass());
System.out.println(dt);
Output:
class java.time.LocalDate
2019-03-12
I figure that it should work with the variations of formats that you describe. Let’s just try your other example:
dt = formatter.parseBest( "2019-03-12T13:12:45.1234", LocalDateTime::from, LocalDate::from);
System.out.println(dt.getClass());
System.out.println(dt);
class java.time.LocalDateTime
2019-03-12T13:12:45.123400
To control the interpretation of 2-digit year you may use one of the overloaded variants of DateTimeFormatterBuilder.appendValueReduced(). I recommend that you consider a range check on top of it.
Trying all the possible formats would perform worse than trying only 15.
You can try to "normalize" to a single format but then you would be doing the work those 15 formats are supposed to do.
I think the best approach is the one described by #JB Nizet, to try only patterns that match string length.
public Date parse(String openFormat) {
String[] formats = {"YYY-MM-DD"};
switch(openFormat.length()) {
case 24: // 2019-03-12T13:12:45.1234
formats = new String[]{"YYY-MM-DDThh:mm:ssetcetc", }; // all the formats for length 24
break;
...
case 6: //YYY-MM-DD, DD-MM-YYYY
formats = new String[]{YYY-MM-DD", "DD-MM-YYYY", }; // all the formats for length 6
break;
}
Date myDate
// now try the reduced number of formats, possibly only 1 or 2
for( String format : formats) try {
myDate = date parse ( format ) etcetc
} catch (DateFormatException d) {
continue;
}
if (myDate == null){
throw InvalidDate
} else {
return myDate
}
}

How to parse datetime like this

I understand that it sounds weird but I have datettime 2018-04-04 12:59:575Z.
Let's assume it is real, not a mistake and I can't find any standard for parsing this.
Is there a way to parse it in Java? What 3 numbers 575 at the end could mean?
edit:
There is strong doubt, that it is correct date time in my samples. I'm going to report a bug to creator. Thanks everyone for good advices.
It probably means there's a bug in wherever this string came from. I would investigate there if I had access to that code or report a bug to its owner.
There's no point parsing buggy data and guessing what the numbers mean. Bad data is worse than no data.
What 3 numbers 575 at the end could mean?
My guesses include:
It’s 12:59:57.5 (the .5 signifying half a second; assuming that the decimal point has been left out from the format).
575 are millisecond of the second, and seconds have been forgot. So it’s 12:59:ss.575 where we don’t know what ss should have been.
It’s 59,575 minutes past 12 o’clock (the same as 12:59:34.5). In defense of this option, ISO 8601 does allow decimals on the minutes, but then the decimal “point” should be either a comma or a period, not a colon.
I can't find any standard for parsing this.
I am pretty convinced that there isn’t any.
Is there a way to parse it in Java?
No, sorry, not as long as we don’t know what the string means.
You can use joda time api to parse the input String like below:-
String strDate="2018-04-04 12:59:575Z";
DateTimeFormatter formatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:SSSZ");
DateTime dt = formatter.parseDateTime(strDate);
System.out.println(dt); //2018-04-04T08:59:00.575-04:00
575 in your input string is milliseconds.
But you need to find out whats the point of precision till
milliseconds if you are not including seconds.
There could be two possible options for 59 at the end. It could be minutes or seconds. I think, that minutes is more likely, because seconds could be not valuable. 575 is definitely millisecond.
DateTimeFormatter dfMinutes = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:SSS");
DateTimeFormatter dfSeconds = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:ss:SSS");
P.S. It could be yyyy-dd-MM instead of yyyy-MM-dd. But as I can see, we're in same locale.

Categories

Resources