Two letter year, shall it be allowed? - java

Recently after library updagrade of Apache POI, I upgraded some other API as well. The other library I used read all cell contents as String and then I had to parse this string into Date.
The problem occurred when user started entering date as dd-mm-yy, the year appeared as 00yy AD.
As per documentation of SimpleDateFormat
For parsing, if the number of pattern letters is more than 2, the year
is interpreted literally, regardless of the number of digits. So using
the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.
So the question is, is it a better to enter the four letter year over two letter year?
The another question is what is best way to predict the year if its in two letter format.
Since the issue will come while parsing below year
Bond Start Date : 12-Jan-98 (1998)
Bond End Date : 12-Jan-70 (2070)
Regards,
Hanumant

It is not clear what you are asking.
If you are asking how to specify a date format that accepts 2 digit years (only) and interprets them conventionally, then you should use "dd-mm-yy".
If you are asking how to specify a date format that accepts 2 digit years and interprets them conventionally, AND ALSO handles 4 (or more) digit years, then you can't. As the javadoc says, if you use "dd-mm-yyyy", 2 digit years are interpreted as years in the first century AD.
One possible solution is to use TWO formats. First attempt to parse using "dd-mm-yy", and if that fails, try "dd-mm-yyyy".
But this is a hack ... and problematic if the user might actually need to enter a historical date.
If you are asking what you should do, then I'd recommend moving away from ambiguous ad-hoc formats that force you to (effectively) guess what the user means.
If the user has to enter dates / times in a character-based form, require them to use one of the ISO 8601 formats, and be strict when parsing the user-supplied date/time strings.
Otherwise, provide the user with a date picker widget.
The another question is what is best way to predict the year if its in two letter format.
Well this is the nub of the problem isn't it! In the 20th century, we all knew what a 2 digit year meant. You just slapped 19 on the front. (Ah ... those were the good old days!)
Now it is necessary to use a different heuristic. And the heuristic that SimpleDateFormat uses is described by the javadoc thus:
"For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964."
The heuristic is 80 years before to 20 years after "now". So actually 12-Jan-98 is in 1998 and 12-Jan-70 is in 1970 ... if you parse using a SimpleDateFormat with a "yy" format.
If you need the dates to mean something else, then you will need to use a different date parser. For example, if you use the Joda-time libraries, it is possible to specify the "pivot year"; i.e. the middle year of the century in which 2-digit years fall.
Reference:
Joda-time Freaky Formatters

Related

Elastic Search and Y10k (years with more than 4 digits)

I discovered this issue in connection with Elastic Search queries, but since the ES date format documentation links to the API documentation for the java.time.format.DateTimeFormatter class, the problem is not really ES specific.
Short summary: We are having problems with dates beyond year 9999, more exactly, years with more than 4 digits.
The documents stored in ES have a date field, which in the index descriptor is defined with format "date", which corresponds to "yyyy-MM-dd" using the pattern language from DateTimeFormatter. We are getting user input, validate the input using org.apache.commons.validator.DateValidator.isValid also with the pattern "yyyy-MM-dd" and if valid, we create an ES query with the user input. This fails with an execption if the user inputs something like 20202-12-03. The search term is probably not intentional, but the expected behaviour would be not to find anything and not that the software coughs up an exception.
The problem is that org.apache.commons.validator.DateValidator is internally using the older SimpleDateFormat class to verify if the input conforms to the pattern and the meaning of "yyyy" as interpreted by SimpleDateFormat is something like: Use at least 4 digits, but allow more digits if required. Creating a SimpleDateFormat with pattern "yyyy-MM-dd" will thus both parse an input like "20202-07-14" and similarly format a Date object with a year beyond 9999.
The new DateTimeFormatter class is much more strict and means with "yyyy" exactly four digits. It will fail to parse an input string like "20202-07-14" and also fail to format a Temporal object with a year beyond 9999. It is worth to notice that DateTimeFormatter is itself capable of handling variable-length fields. The constant DateTimeFormatter.ISO_LOCAL_DATE is for example not equivalent to "yyyy-MM-dd", but does, conforming with ISO8601, allow years with more than four digits, but will use at least four digits. This constant is created programmatically with a DateTimeFormatterBuilder and not using a pattern string.
ES can't be configured to use the constants defined in DateTimeFormatter like ISO_LOCAL_DATE, but only with a pattern string. ES also knows a list of predefined patterns, occasionally the ISO standard is also referred to in the documentation, but they seem to be mistaken and ignore that a valid ISO date string can contain five digit years.
I can configure ES with a list of multiple allowed date patterns, e.g "yyyy-MM-dd||yyyyy-MM-dd". That will allow both four and five digits in the year, but fail for a six digit year. I can support six digit years by adding yet another allowed pattern: "yyyy-MM-dd||yyyyy-MM-dd||yyyyyy-MM-dd", but then it fails for seven digit years and so on.
Am I overseeing something, or is it really not possible to configure ES (or a DateTimeFormatter instance using a pattern string) to have a year field with at least four digits (but potentially more) as used by the ISO standard?
Edit
ISO 8601
Since your requirement is to conform with ISO 8601, let’s first see what ISO 8601 says (quoted from the link at the bottom):
To represent years before 0000 or after 9999, the standard also
permits the expansion of the year representation but only by prior
agreement between the sender and the receiver. An expanded year
representation [±YYYYY] must have an agreed-upon number of extra year
digits beyond the four-digit minimum, and it must be prefixed with a +
or − sign instead of the more common AD/BC (or CE/BCE) notation; …
So 20202-12-03 is not a valid date in ISO 8601. If you explicitly inform your users that you accept, say, up to 6 digit years, then +20202-12-03 and -20202-12-03 are valid, and only with the + or - sign.
Accepting more than 4 digits
The format pattern uuuu-MM-dd formats and parses dates in accordance with ISO 8601, also years with more than four digits. For example:
DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("uuuu-MM-dd");
LocalDate date = LocalDate.parse("+20202-12-03", dateFormatter);
System.out.println("Parsed: " + date);
System.out.println("Formatted back: " + date.format(dateFormatter));
Output:
Parsed: +20202-12-03
Formatted back: +20202-12-03
It works quite similarly for a prefixed minus instead of the plus sign.
Accepting more than 4 digits without sign
yyyy-MM-dd||yyyyy-MM-dd||yyyyyy-MM-dd||yyyyyyy-MM-dd||yyyyyyyy-MM-dd||yyyyyyyyy-MM-dd
As I said, this disagrees with ISO 8601. I also agree with you that it isn’t nice. And obviously it will fail for 10 or more digits, but that would fail for a different reason anyway: java.time handles years in the interval -999 999 999 through +999 999 999. So trying yyyyyyyyyy-MM-dd (10 digit year) would get you into serious trouble except in the corner case where the user enters a year with a leading zero.
I am sorry, this is as good as it gets. DateTimeFormatter format patterns do not support all of what you are asking for. There is no (single) pattern that will give you four digit years in the range 0000 through 9999 and more digits for years after that.
The documentation of DateTimeFormatter says about formatting and parsing years:
Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a
reduced two digit form is used. For printing, this outputs the
rightmost two digits. For parsing, this will parse using the base
value of 2000, resulting in a year within the range 2000 to 2099
inclusive. If the count of letters is less than four (but not two),
then the sign is only output for negative years as per
SignStyle.NORMAL. Otherwise, the sign is output if the pad width is
exceeded, as per SignStyle.EXCEEDS_PAD.
So no matter which count of pattern letters you go for, you will be unable to parse years with more digits without sign, and years with fewer digits will be formatted with this many digits with leading zeroes.
Original answer
You can probably get away with the pattern u-MM-dd. Demonstration:
String formatPattern = "u-MM-dd";
DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern(formatPattern);
LocalDate normalDate = LocalDate.parse("2020-07-14", dateFormatter);
String formattedAgain = normalDate.format(dateFormatter);
System.out.format("LocalDate: %s. String: %s.%n", normalDate, formattedAgain);
LocalDate largeDate = LocalDate.parse("20202-07-14", dateFormatter);
String largeFormattedAgain = largeDate.format(dateFormatter);
System.out.format("LocalDate: %s. String: %s.%n", largeDate, largeFormattedAgain);
Output:
LocalDate: 2020-07-14. String: 2020-07-14.
LocalDate: +20202-07-14. String: 20202-07-14.
Counter-intuituvely but very practically one format letter does not mean 1 digit but rather as many digits as it takes. So the flip side of the above is that years before year 1000 will be formatted with fewer than 4 digits. Which, as you say, disagrees with ISO 8601.
For the difference between pattern letter y and u for year see the link at the bottom.
You might also consider one M and/or one d to accept 2020-007-014, but again, this will cause formatting into just 1 digit for numbers less than 10, like 2020-7-14, which probably isn’t what you want and again disagrees with ISO.
Links
Years section of Wikipedia article: ISO 8601
Documentation of DateTimeFormatter
uuuu versus yyyy in DateTimeFormatter formatting pattern codes in Java?
Maybe this will work:
[uuuu][uuuuu][...]-MM-dd
Format specifiers placed between square brackets are optional parts. Format specifiers inside brackets can be repeated to allow for multiple options to be accepted.
This pattern will allow a year number of either four or five digits, but rejects all other cases.
Here is this pattern in action. Note that this pattern is useful for parsing a string into a LocalDate. However, to format a LocalDate instance into a string, the pattern should be uuuu-MM-dd. That is because the two optional year parts cause the year number to be printed twice.
Repeating all possible year number digit counts, is the closest you can get in order to make it work the way you expect it to work.
The problem with the current implementation of DateTimeFormatter is that when you specify 4 or more u or ys, the resolver will try to consume exactly that number of year digits. However, with less than 4, then the resolver will try to consume as many as possible. I do not know whether this behavior is intentional.
So the intended behavior can be achieved with a formatter builder, but not with a pattern string. As JodaStephen once pointed out, "patterns are a subset of the possible formatters".
Maybe the characters #, { and }, which are reserved for future use, will be useful in this regard.
Update
You can use DateTimeFormatterBuilder#appendValueReduced to restrict the number of digits in a year in the range of 4-9 digits.
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeFormatterBuilder;
import java.time.temporal.ChronoField;
public class Main {
public static void main(String[] args) {
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendValueReduced(ChronoField.YEAR, 4, 9, 1000)
.appendPattern("-MM-dd")
.toFormatter();
String[] dateStrArr = { "2017-10-20", "20171-10-20", "201712-10-20", "2017123-10-20" };
for (String dateStr : dateStrArr) {
System.out.println(LocalDate.parse(dateStr, formatter));
}
}
}
Output:
2017-10-20
+20171-10-20
+201712-10-20
+2017123-10-20
Original answer
You can use the pattern [uuuu][u]-MM-dd where [uuuu] conforms to a 4-digit year and [u] can cater to the requirement of any number of digits allowed for a year.
Demo:
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
public class Main {
public static void main(String[] args) {
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("[uuuu][u]-MM-dd");
String[] dateStrArr = { "2017-10-20", "20171-10-20", "201712-10-20", "2017123-10-20" };
for (String dateStr : dateStrArr) {
System.out.println(LocalDate.parse(dateStr, formatter));
}
}
}
Output:
2017-10-20
+20171-10-20
+201712-10-20
+2017123-10-20

How to Parse Date Strings with 🎌 Japanese Numbers in Java DateTime API

After asking [How to parse 🎌 Japanese Era Date string values into LocalDate & LocalDateTime], I was curious about the following case;
明治二十三年十一月二十九日
Is there a way to parse Japanese numbers on top of Japanese Calendar characters, essentially a pure Japanese date, into LocalDate? Using only Java DateTime API. I don't want to modify the input String values, but want just API to handle the recognition.
For anyone reading along, your example date string holds an era designator, year of era of 23 (in this case correspinding to 1890 CE Gregorian), month 11 and day of month 29. Months and days are the same as in the Gregorian calendar.
Since Japanese numbers are not entirely positional (like Arabic numbers, for example), a DateTimeFormatter doesn’t parse them on its own. So we help it by supplying how the numbers look in Japanese (and Chinese). DateTimeFormatterBuilder has an overloaded appendText method that accepts a map holding all the possible numbers as text. My code example is not complete, but should get you started.
Locale japaneseJapan = Locale.forLanguageTag("ja-JP");
Map<Long, String> numbers = Map.ofEntries(
Map.entry(1L, "\u4e00"),
Map.entry(2L, "\u4e8c"),
Map.entry(3L, "\u4e09"),
Map.entry(4L, "\u56db"),
Map.entry(5L, "\u4e94"),
Map.entry(6L, "\u516d"),
Map.entry(7L, "\u4e03"),
Map.entry(8L, "\u516b"),
Map.entry(9L, "\u4e5d"),
Map.entry(10L, "\u5341"),
Map.entry(11L, "\u5341\u4e00"),
Map.entry(12L, "\u5341\u4e8c"),
Map.entry(13L, "\u5341\u4e09"),
Map.entry(14L, "\u5341\u56db"),
Map.entry(15L, "\u5341\u4e94"),
Map.entry(16L, "\u5341\u516d"),
Map.entry(17L, "\u5341\u4e03"),
Map.entry(18L, "\u5341\u516b"),
Map.entry(19L, "\u5341\u4e5d"),
Map.entry(20L, "\u4e8c\u5341"),
Map.entry(21L, "\u4e8c\u5341\u4e00"),
Map.entry(22L, "\u4e8c\u5341\u4e8c"),
Map.entry(23L, "\u4e8c\u5341\u4e09"),
Map.entry(24L, "\u4e8c\u5341\u56db"),
Map.entry(25L, "\u4e8c\u5341\u4e94"),
Map.entry(26L, "\u4e8c\u5341\u516d"),
Map.entry(27L, "\u4e8c\u5341\u4e03"),
Map.entry(28L, "\u4e8c\u5341\u516b"),
Map.entry(29L, "\u4e8c\u5341\u4e5d"),
Map.entry(30L, "\u4e09\u4e8c\u5341"));
DateTimeFormatter japaneseformatter = new DateTimeFormatterBuilder()
.appendPattern("GGGG")
.appendText(ChronoField.YEAR_OF_ERA, numbers)
.appendLiteral('\u5e74')
.appendText(ChronoField.MONTH_OF_YEAR, numbers)
.appendLiteral('\u6708')
.appendText(ChronoField.DAY_OF_MONTH, numbers)
.appendLiteral('\u65e5')
.toFormatter(japaneseJapan)
.withChronology(JapaneseChronology.INSTANCE);
String dateString = "明治二十三年十一月二十九日";
System.out.println(dateString + " is parsed into " + LocalDate.parse(dateString, japaneseformatter));
The output from this example is:
明治二十三年十一月二十九日 is parsed into 1890-11-29
Assuming that an era can be longer than 30 years, you need to supply yet more numbers to the map. You can do that a lot better than I can (and can also check my numbers for bugs). It’s probably best (less error-prone) to use a couple of nested loops for filling the map, but I wasn’t sure I could do it correctly, so I am leaving that part to you.
Today I learned something about Japanese numerals.
Some links I used
Japanese numerals
Unicode characters for Chinese and Japanese numbers
Late answer, but the accepted answer is somehow lengthy and not so easy to complete so I think my proposal is a good and powerful alternative.
Use my lib Time4J which supports Japanese numerals out of the box and then use the embedded Japanese calendar:
String input = "明治二十三年十一月二十九日";
ChronoFormatter<JapaneseCalendar> f =
ChronoFormatter.ofPattern(
"GGGGy年M月d日",
PatternType.CLDR,
Locale.JAPANESE,
JapaneseCalendar.axis()
).with(Attributes.NUMBER_SYSTEM, NumberSystem.JAPANESE);
JapaneseCalendar jcal = f.parse(input);
LocalDate gregorian = jcal.transform(PlainDate.axis()).toTemporalAccessor();
System.out.println(gregorian); // 1890-11-29
This solution is not just shorter but even works for historic Japanese dates before Meiji 6 (based on the old lunisolar calendar in those ancient times). Furthermore, the gannen-notation for the first year of an era (actually we have such a year) is much better supported than in standard java (where you have to apply again a lengthy workaround using a customized map).

Java MM/dd/yy simple date DateTimeFormatter not working for greater than 2037

Actually, I found problem with MM/dd/yy date format :
If enter year greater than 37 then the year format reflects as 1937.
i.e, if I enter input as 02/05/37 then when I am printing this date into console the date changes into 02/05/1937.
if he entered less than 02/05/37 then working fine.
Date startDate = new SimpleDateFormat("dd/MM/yy").parse("01/01/47");
System.out.println(startDate);
Assuming you're using SimpleDateFormat: It's conform specifications that 02/05/37 is parsed as 02/05/1937. At least for the next year or so...
Java's SimpleDateFormat has to decide in which century your date should be. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. 2037 is within 80 years before the current date (2016), so it uses a time in the past.
The other answers are correct. You need to understand SimpleDateFormat behavior for assuming your intended century.
You are using old outmoded classes. The new recommended classes have a different behavior on this issue.
java.time
The java.time framework built into Java 8 and later supplants the old java.util.Date & SimpleDateFormat classes.
The behavior about assuming century is different. In the DateTimeFormatter class, a two-digit century is interpreted as being in the 21st century, resulting in a year within the range 2000 to 2099 inclusive.
The java.time classes include LocalDate for representing a date-only value without time-of-day and without time zone.
String input = "02/01/47";
DateTimeFormatter formatter = DateTimeFormatter.ofPattern( "dd/MM/yy" );
LocalDate localDate = LocalDate.parse( input , formatter );
2047-01-02
By the way, a tip: Avoid two-digit years if at all possible. The confusion and trouble induced is not worth the savings of two bytes and a smudge of toner.
If you don't supply century info, then it has to make an assumption and it quite reasonably assumes that you are going to want mostly dates in the past, with some scope for future dates, but not too far, as it's more likely that you'll want prior dates, such as birth dates, etc. And people quite commonly live up to about 80 years of age. So far more dates will be in the past for any given current date, than future ones, based on this assumption.
From the spec...
For parsing with the abbreviated year pattern ("y" or "yy"),
SimpleDateFormat must interpret the abbreviated year relative to some
century. It does this by adjusting dates to be within 80 years before
and 20 years after the time the SimpleDateFormat instance is created.
For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat
instance created on Jan 1, 1997, the string "01/11/12" would be
interpreted as Jan 11, 2012 while the string "05/04/64" would be
interpreted as May 4, 1964. During parsing, only strings consisting of
exactly two digits, as defined by Character.isDigit(char), will be
parsed into the default century. Any other numeric string, such as a
one digit string, a three or more digit string, or a two digit string
that isn't all digits (for example, "-1"), is interpreted literally.
So "01/02/3" or "01/02/003" are parsed, using the same pattern, as Jan
2, 3 AD. Likewise, "01/02/-3" is parsed as Jan 2, 4 BC.
Otherwise, calendar system specific forms are applied. For both
formatting and parsing, if the number of pattern letters is 4 or more,
a calendar specific long form is used. Otherwise, a calendar specific
short or abbreviated form is used.
So, if you to do something with this, then you'll need to check if the formatted date is prior to today's date (or some other cut off that you choose) and just add 100 years to the given date, if you wish to only have future dates or beyond a different cut off from the default one.

Parsing "1/1/00" gives 1/1/0001 for the "mm/dd/yy" format in java. How to workaround?

I have to parse date strings in formats that allow both "yyyy" and "yy" and java supports this case. I just have to add "20" or "19" to the year after calling to SimpleDateFormat.parse():
Initial After After adding
String parse() year prefix
"1/1/01" -> "1/1/0001" -> "1/1/2001"
"1/1/96" -> "1/1/0096" -> "1/1/1996".
This works fine for me, except for the problem of "2000": the string "1/1/00" gives "0001" year instead of "0000". How can I detect that the year is "00", not "01"? Thanks!
To workaround the issue with the "00" year the strict parse mode can be used(format.setLenient(false)). In this case the parser won't allow to enter the "00" value and will result in an exception. So, for the "2000" year user will be forced to enter four digits and will not get the confusing "01" year. Hope it will help someone.
It is explained in the documentation at https://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964. During parsing, only strings consisting of exactly two digits, as defined by Character.isDigit(char), will be parsed into the default century. Any other numeric string, such as a one digit string, a three or more digit string, or a two digit string that isn't all digits (for example, "-1"), is interpreted literally. So "01/02/3" or "01/02/003" are parsed, using the same pattern, as Jan 2, 3 AD. Likewise, "01/02/-3" is parsed as Jan 2, 4 BC.

How to parse four digit year only (with Joda Time)?

Is there a way to force Joda time to parse dates only when they contain four digit years? For example:
2009-11-11 - should parse
09-11-11 - should not parse
Tried the following code:
DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder();
DateTimeFormatter formatter = builder.appendYear(4, 4).appendLiteral('-').appendMonthOfYear(1).appendLiteral('-').appendDayOfMonth(1).toFormatter();
formatter.parseDateTime("09-11-11");
Parses into 0009-11-11. Apparently minDigits in the method appendYear are only used for formatting when printing out the date.
The result is the same if I use appendYearOfEra(). If I use appendYearOfCentury(), it parses the year into 1909 instead.
We are implementing a general data parser, which will recognize various types of inputs. Also the example is a shortened form of the real deal (for simplicity). Real life scenarios parses dates which can have weekdays, months as words, time, zone and different characters separating month, day and year. Therefore, writing a RegEx or checking the content/length of the string can prove rather difficult.
Some real examples could look like this:
2009-11-11
Wednesday 2009-11-11T15:00:00
2009/11/11 15:00
and many more...
DateTimeFormatterBuilder#appendFixedDecimal() may well do what you need.
Alternatively, you could implement the DateTimeParser interface to create whatever parser you want and pass that into the DateTimeFormatterBuilder.
You can check the length of the date string.
You can build extremely specific parsers and formatters using DateTimeFormatterBuilder. There's generally no need to use this class directly, since most common formats are more easily available elsewhere in the API, but this is the builder class they all use under the covers.
What do you want to get from a user who enters '0001-01-01' as the date (that is, they entered 4 digits for the year, but the first three were zeroes)? What about '0999-12-31'? And '999-12-31'? And what about '10000-01-01' - the infamous Y10K1 problem?
If that is a legitimate value, then you are stuck with discovering the length of what the user typed as the year portion of the date (probably after any other parsing has been done), and making sure it is at least (or is it exactly?) four digits.
If that is not a legitimate value, then you are stuck with checking the year value after the date is parsed.
Or you can take the code and modify it so it includes your preferred definition of valid year.
1 I do not plan to start working on fixing the Y10K problem before 5000-01-02.

Categories

Resources