After asking [How to parse 🎌 Japanese Era Date string values into LocalDate & LocalDateTime], I was curious about the following case;
明治二十三年十一月二十九日
Is there a way to parse Japanese numbers on top of Japanese Calendar characters, essentially a pure Japanese date, into LocalDate? Using only Java DateTime API. I don't want to modify the input String values, but want just API to handle the recognition.
For anyone reading along, your example date string holds an era designator, year of era of 23 (in this case correspinding to 1890 CE Gregorian), month 11 and day of month 29. Months and days are the same as in the Gregorian calendar.
Since Japanese numbers are not entirely positional (like Arabic numbers, for example), a DateTimeFormatter doesn’t parse them on its own. So we help it by supplying how the numbers look in Japanese (and Chinese). DateTimeFormatterBuilder has an overloaded appendText method that accepts a map holding all the possible numbers as text. My code example is not complete, but should get you started.
Locale japaneseJapan = Locale.forLanguageTag("ja-JP");
Map<Long, String> numbers = Map.ofEntries(
Map.entry(1L, "\u4e00"),
Map.entry(2L, "\u4e8c"),
Map.entry(3L, "\u4e09"),
Map.entry(4L, "\u56db"),
Map.entry(5L, "\u4e94"),
Map.entry(6L, "\u516d"),
Map.entry(7L, "\u4e03"),
Map.entry(8L, "\u516b"),
Map.entry(9L, "\u4e5d"),
Map.entry(10L, "\u5341"),
Map.entry(11L, "\u5341\u4e00"),
Map.entry(12L, "\u5341\u4e8c"),
Map.entry(13L, "\u5341\u4e09"),
Map.entry(14L, "\u5341\u56db"),
Map.entry(15L, "\u5341\u4e94"),
Map.entry(16L, "\u5341\u516d"),
Map.entry(17L, "\u5341\u4e03"),
Map.entry(18L, "\u5341\u516b"),
Map.entry(19L, "\u5341\u4e5d"),
Map.entry(20L, "\u4e8c\u5341"),
Map.entry(21L, "\u4e8c\u5341\u4e00"),
Map.entry(22L, "\u4e8c\u5341\u4e8c"),
Map.entry(23L, "\u4e8c\u5341\u4e09"),
Map.entry(24L, "\u4e8c\u5341\u56db"),
Map.entry(25L, "\u4e8c\u5341\u4e94"),
Map.entry(26L, "\u4e8c\u5341\u516d"),
Map.entry(27L, "\u4e8c\u5341\u4e03"),
Map.entry(28L, "\u4e8c\u5341\u516b"),
Map.entry(29L, "\u4e8c\u5341\u4e5d"),
Map.entry(30L, "\u4e09\u4e8c\u5341"));
DateTimeFormatter japaneseformatter = new DateTimeFormatterBuilder()
.appendPattern("GGGG")
.appendText(ChronoField.YEAR_OF_ERA, numbers)
.appendLiteral('\u5e74')
.appendText(ChronoField.MONTH_OF_YEAR, numbers)
.appendLiteral('\u6708')
.appendText(ChronoField.DAY_OF_MONTH, numbers)
.appendLiteral('\u65e5')
.toFormatter(japaneseJapan)
.withChronology(JapaneseChronology.INSTANCE);
String dateString = "明治二十三年十一月二十九日";
System.out.println(dateString + " is parsed into " + LocalDate.parse(dateString, japaneseformatter));
The output from this example is:
明治二十三年十一月二十九日 is parsed into 1890-11-29
Assuming that an era can be longer than 30 years, you need to supply yet more numbers to the map. You can do that a lot better than I can (and can also check my numbers for bugs). It’s probably best (less error-prone) to use a couple of nested loops for filling the map, but I wasn’t sure I could do it correctly, so I am leaving that part to you.
Today I learned something about Japanese numerals.
Some links I used
Japanese numerals
Unicode characters for Chinese and Japanese numbers
Late answer, but the accepted answer is somehow lengthy and not so easy to complete so I think my proposal is a good and powerful alternative.
Use my lib Time4J which supports Japanese numerals out of the box and then use the embedded Japanese calendar:
String input = "明治二十三年十一月二十九日";
ChronoFormatter<JapaneseCalendar> f =
ChronoFormatter.ofPattern(
"GGGGy年M月d日",
PatternType.CLDR,
Locale.JAPANESE,
JapaneseCalendar.axis()
).with(Attributes.NUMBER_SYSTEM, NumberSystem.JAPANESE);
JapaneseCalendar jcal = f.parse(input);
LocalDate gregorian = jcal.transform(PlainDate.axis()).toTemporalAccessor();
System.out.println(gregorian); // 1890-11-29
This solution is not just shorter but even works for historic Japanese dates before Meiji 6 (based on the old lunisolar calendar in those ancient times). Furthermore, the gannen-notation for the first year of an era (actually we have such a year) is much better supported than in standard java (where you have to apply again a lengthy workaround using a customized map).
Related
We have a library where users can pass in dates in multiple formats. They follow the ISO but are abbreviated at times.
So we get things like "19-3-12" and "2019-03-12T13:12:45.1234" where the fractional seconds can be 1 - 7 digits long. It's a very large number of combinations.
DateTimeFormatter.parseBest doesn't work because it won't accept "yy-m-d" for a local date. The solutions here won't work because it assumes we know the pattern - we don't.
And telling people to get their string formats "correct" won't work as there's a ton of existing data (these are mostly in XML & JSON files).
My question is, how can I parse strings coming in in these various pattersn without have to try 15 different explicit patterns?
Or even better, is there some way to parse a string and it will try everything possible and return a Temporal object if the string makes sense for any date[time]?
Without a full specification it is hard to give a precise recommendation. The techniques generally used for variable formats include:
Trying a number of known formats in turn.
Optional parts in the format pattern.
DateTimeFormatterBuilder.parseDefaulting() for parts that may be absent from the parsed string.
As you are aware, parseBest.
I am assuming that y-M-d always come in this order (never M-d-y or d-M-y, for example). 19-3-12 conflicts with ISO 8601 since the standard requires (at least) 4 digit year and 2 digit month. A challenge with 2-digit year is guessing the century: is this 1919 or 2019 or might it be 2119?
The good news: presence and absence of seconds and varying number of fractional digits are all built-in and pose no problem.
From what you have told us it seems to me that the following is a fair shot.
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("[uuuu][uu]-M-d")
.optionalStart()
.appendLiteral('T')
.append(DateTimeFormatter.ISO_LOCAL_TIME)
.optionalEnd()
.toFormatter();
TemporalAccessor dt = formatter.parseBest("19-3-12", LocalDateTime::from, LocalDate::from);
System.out.println(dt.getClass());
System.out.println(dt);
Output:
class java.time.LocalDate
2019-03-12
I figure that it should work with the variations of formats that you describe. Let’s just try your other example:
dt = formatter.parseBest( "2019-03-12T13:12:45.1234", LocalDateTime::from, LocalDate::from);
System.out.println(dt.getClass());
System.out.println(dt);
class java.time.LocalDateTime
2019-03-12T13:12:45.123400
To control the interpretation of 2-digit year you may use one of the overloaded variants of DateTimeFormatterBuilder.appendValueReduced(). I recommend that you consider a range check on top of it.
Trying all the possible formats would perform worse than trying only 15.
You can try to "normalize" to a single format but then you would be doing the work those 15 formats are supposed to do.
I think the best approach is the one described by #JB Nizet, to try only patterns that match string length.
public Date parse(String openFormat) {
String[] formats = {"YYY-MM-DD"};
switch(openFormat.length()) {
case 24: // 2019-03-12T13:12:45.1234
formats = new String[]{"YYY-MM-DDThh:mm:ssetcetc", }; // all the formats for length 24
break;
...
case 6: //YYY-MM-DD, DD-MM-YYYY
formats = new String[]{YYY-MM-DD", "DD-MM-YYYY", }; // all the formats for length 6
break;
}
Date myDate
// now try the reduced number of formats, possibly only 1 or 2
for( String format : formats) try {
myDate = date parse ( format ) etcetc
} catch (DateFormatException d) {
continue;
}
if (myDate == null){
throw InvalidDate
} else {
return myDate
}
}
I have the following String representing a date range which I need to parse:
2018-10-20:2019-10-20
It consists of 2 ISO date strings separated by :
The string can get more complex by having repeated date ranges mixed with other text. This can be done by a Regex.
However, given that the latest Java has Date/Time support that most coders here and elsewhere are ecstatic about, is it possible to use, say, LocalDate's parser or a custom DateTimeFormatter in order to identify the bits in my String which are candidates for ISO-date and capture them?
Better yet, how can I extract the validation regex from a DateTimeFormatter (the regex which identifies an ISO-date, assuming there is one) and merge/compile it with my own regex for the rest of the String.
I just do not feel comfortable coding yet another ISO-date regex in my code when possibly there is already a regex in Java which does that and I just re-use it.
Please note that I am not asking for a regex. I can do that.
Please also note that my example String can contain other date/time formats, e.g. with timezones and milliseconds and all the whistles.
Actually, DateTimeFormatter doesn't have an internal regex. It uses a CompositePrinterParser, which in turn uses an array of DateTimePrinterParser instances (which is an inner interface of DateTimeFormatterBuilder), where each instance is responsible for parsing/formatting a specific field.
IMO, regex is not the best approach here. If you know that all dates are separated by :, why not simply split the string and try to parse the parts individually? Something like that:
String dates = // big string with dates separated by :
DateTimeFormatter parser = // create a formatter for your patterns
for (String s : dates.split(":")) {
parser.parse(s); // if "s" is invalid, it throws exception
}
If you just want to validate the strings, calling parse as above is enough - it'll throw an exception if the string is invalid.
To support multiple formats, you can use DateTimeFormatterBuilder::appendOptional. Example:
DateTimeFormatter parser = new DateTimeFormatterBuilder()
// full ISO8601 with date/time and UTC offset (ex: 2011-12-03T10:15:30+01:00)
.appendOptional(DateTimeFormatter.ISO_OFFSET_DATE_TIME)
// date/time without UTC offset (ex: 2011-12-03T10:15:30)
.appendOptional(DateTimeFormatter.ISO_LOCAL_DATE_TIME)
// just date (ex: 2011-12-03)
.appendOptional(DateTimeFormatter.ISO_LOCAL_DATE)
// some custom format (day/month/year)
.appendOptional(DateTimeFormatter.ofPattern("dd/MM/yyyy"))
// ... add as many you need
// create formatter
.toFormatter();
A regex to support multiple formats (as you said, "other date/time formats, e.g. with timezones and milliseconds and all the whistles") is possible, but the regex is not good to validate the dates - things like day zero, day > 30 is not valid for all months, February 29th in non-leap years, minutes > 60 etc.
A DateTimeFormatter will validate all these tricky details, while a regex will only guarantee that you have numbers and separators in the correct position and it won't validate the values. So regardless of the regex, you'll have to parse the dates anyway (which, IMHO, makes the use of regex pretty useless in this case).
Regex + Date Parser is the right option.
You have to write regex yourself, since the date parser is not using regex.
Your choice if regex can be simple, e.g. \d{2} for month, and let the date parser validate number range, or if it has to be more strict, e.g. (?:0[1-9]|1[0-2]) (01 - 12). Range checks like 28 vs 30 vs 31 days should not be done in regex. Let the date parser handle that, and since some value ranges are handled by date parser, might as well let it handle them all, i.e. a simple regex is perfectly fine.
I am trying to parse a date into an appropriate format, but I keep getting the error
Unparseable date
Can anyone tell me what the mistake is?
try {
System.out.println(new SimpleDateFormat("d-MMM-Y").parse("05-03-2018").toString());
} catch (ParseException e) {
e.printStackTrace();
}
I want the date to have this format:
05-Mar-18
Since you want to change the format, first read and parse the date (from String) of your own format in a Date type object. Then use that date object by formatting it into a new (desired) format using a SimpleDateFormat.
The error in your code is with the MMM and Y. MMM is the month in string while your input is a numeric value. Plus the Y in your SimpleDateFormat is an invalid year. yy is what needs to be added.
So here is a code that would fix your problem.
SimpleDateFormat dateFormat = new SimpleDateFormat("d-MM-yyyy");
Date date = dateFormat.parse("05-03-2018");
dateFormat = new SimpleDateFormat("dd-MMM-yy");
System.out.println(dateFormat.format(date));
I hope this is what you're looking for.
There are some concepts about dates you should be aware of.
There's a difference between a date and a text that represents a date.
Example: today's date is March 9th 2018. That date is just a concept, an idea of "a specific point in our calendar system".
The same date, though, can be represented in many formats. It can be "graphical", in the form of a circle around a number in a piece of paper with lots of other numbers in some specific order, or it can be in plain text, such as:
09/03/2018 (day/month/year)
03/09/2018 (monty/day/year)
2018-03-09 (ISO8601 format)
March, 9th 2018
9 de março de 2018 (in Portuguese)
2018年3月5日 (in Japanese)
and so on...
Note that the text representations are different, but all of them represent the same date (the same value).
With that in mind, let's see how Java works with these concepts.
a text is represented by a String. This class contains a sequence of characters, nothing more. These characters can represent anything; in this case, it's a date
a date was initially represented by java.util.Date, and then by java.util.Calendar, but those classes are full of problems and you should avoid them if possible. Today we have a better API for that.
With the java.time API (or the respective backport for versions lower than 8), you have easier and more reliable tools to deal with dates.
In your case, you have a String (a text representing a date) and you want to convert it to another format. You must do it in 2 steps:
convert the String to some date-type (transform the text to numerical day/month/year values) - that's called parsing
convert this date-type value to some format (transform the numerical values to text in a specific format) - that's called formatting
For step 1, you can use a LocalDate, a type that represents a date (day, month and year, without hours and without timezone), because that's what your input is:
String input = "05-03-2018";
DateTimeFormatter inputParser = DateTimeFormatter.ofPattern("dd-MM-yyyy");
// parse the input
LocalDate date = LocalDate.parse(input, inputParser);
That's more reliable than SimpleDateFormat because it solves lots of strange bugs and problems of the old API.
Now that we have our LocalDate object, we can do step 2:
// convert to another format
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd-MMM-yy", Locale.ENGLISH);
String output = date.format(formatter);
Note that I used a java.util.Locale. That's because the output you want has a month name in English, and if you don't specify a locale, it'll use the JVM's default (and who guarantees it'll always be English? it's better to tell the API which language you're using instead of relying on the default configs, because those can be changed anytime, even by other applications running in the same JVM).
And how do I know which letters must be used in DateTimeFormatter? Well, I've just read the javadoc. Many developers ignore the documentation, but we must create the habit to check it, specially the javadoc, that tells you things like the difference between uppercase Y and lowercase y in SimpleDateFormat.
System.out.printf("Time: %d-%d %02d:%02d" +
calendar.get(Calendar.DAY_OF_MONTH),
calendar.get(Calendar.MONTH),
calendar.get(Calendar.HOUR_OF_DAY),
calendar.get(Calendar.MINUTE);
That is the code a friend showed me, but how do I get the date to appear in a Format like November 1?
This is how to do it:
DateFormat dateFormat = new SimpleDateFormat( "MMMMM d" );
Calendar calendar = new GregorianCalendar(); // The date you want to format
Date dateToFormat = calendar.getTime();
String formattedDate = dateFormat.format( dateToFormat );
System.out.println( formattedDate );
Date d = new Date();
System.out.printf("%s %tB %<td", "Today", d);
// output :
// Today november 01
%tB for Locale-specific full month name, e.g. "January", "February".
%<td d for Day of month, formatted as two digits with leading zeros as necessary, < for reuse the last parameter.
The DateFormat answer is the way to do this. The printf answer is also good although does not provide locale-specific formats (it provides language-specific names but does not use e.g. the day/month/year ordering that the current locale uses).
You asked in a comment:
Can I do it with the calendar.get(Calendar.MONTH) etc method? Or do I have to use date format?
You don't have to use the other methods here, but if you want to use the Calender fields, it is up to you to convert the numeric values they provide to strings like "Tuesday" or "November". For that you can use the built in DateFormatSymbols, which provides internationalized strings from numbers for dates, in the form of String arrays, which you can use the Calendar fields to index in to. See How can I convert an Integer to localized month name in Java? for example.
Note you can use DateFormat.getDateInstance() to retrieve a pre-made format for the current locale (see the rest of those docs, there are also methods for getting pre-made time-only or date+time formats).
Basically you have the following options:
DateFormat (SimpleDateFormat for custom formats)
Locale-specific format (e.g. day/month/year ordering): Yes
Language-specific names (e.g. English "November" vs. Spanish "Noviembre"): Yes
Does the work for you: Yes. This is the best way and will provide a format that the user is used to working with, with no logic needed on your end.
printf date fields
Locale-specific format: No
Language-specific names: Yes
Does the work for you: Partly (up to you to determine field ordering)
Calendar fields with DateFormatSymbols
Locale-specific format: No
Language-specific names: Yes
Does the work for you: No
Calendar fields with your own string conversions (like a big switch statement):
Locale-specific format: No
Language-specific names: No
Does the work for you: No
Another advantage of DateFormat-based formats vs printf date fields is you can still define your own field ordering and formats with the SimpleDateFormat (just like printf) but you can stick to the DateFormat interface which makes it easier to pass around and combine with stock date formats like DateFormat.getDateInstance(DateFormat.MEDIUM).
Check out the documentation for DateFormat for info on the things you can do with it. Check out the documentation for SimpleDateFormat for info on creating custom date formats. Check out this nice example of date formats (archive) for some example output if you want instant gratification.
There's a direct way how to do it using printf, but it's a pain, too:
String.printf("Time: %1$td-%1$tm %1$tH:%1$tM", new Date());
One problem with it is that it uses 4 formatting strings with the same object, so it needs the 1$ prefix to always access the first argument. The other is that I can never remember what letter means what (but maybe that's just me).
Speed could actually be another problem, if you care.
This is documented in the underlying class Formatter.
My preffered way would be something like
myFormatter.format("Time: [d-m HH:MM]", new Date())
where the braces would save us from repeating $1 and make clear where the argument ends.
Recently after library updagrade of Apache POI, I upgraded some other API as well. The other library I used read all cell contents as String and then I had to parse this string into Date.
The problem occurred when user started entering date as dd-mm-yy, the year appeared as 00yy AD.
As per documentation of SimpleDateFormat
For parsing, if the number of pattern letters is more than 2, the year
is interpreted literally, regardless of the number of digits. So using
the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.
So the question is, is it a better to enter the four letter year over two letter year?
The another question is what is best way to predict the year if its in two letter format.
Since the issue will come while parsing below year
Bond Start Date : 12-Jan-98 (1998)
Bond End Date : 12-Jan-70 (2070)
Regards,
Hanumant
It is not clear what you are asking.
If you are asking how to specify a date format that accepts 2 digit years (only) and interprets them conventionally, then you should use "dd-mm-yy".
If you are asking how to specify a date format that accepts 2 digit years and interprets them conventionally, AND ALSO handles 4 (or more) digit years, then you can't. As the javadoc says, if you use "dd-mm-yyyy", 2 digit years are interpreted as years in the first century AD.
One possible solution is to use TWO formats. First attempt to parse using "dd-mm-yy", and if that fails, try "dd-mm-yyyy".
But this is a hack ... and problematic if the user might actually need to enter a historical date.
If you are asking what you should do, then I'd recommend moving away from ambiguous ad-hoc formats that force you to (effectively) guess what the user means.
If the user has to enter dates / times in a character-based form, require them to use one of the ISO 8601 formats, and be strict when parsing the user-supplied date/time strings.
Otherwise, provide the user with a date picker widget.
The another question is what is best way to predict the year if its in two letter format.
Well this is the nub of the problem isn't it! In the 20th century, we all knew what a 2 digit year meant. You just slapped 19 on the front. (Ah ... those were the good old days!)
Now it is necessary to use a different heuristic. And the heuristic that SimpleDateFormat uses is described by the javadoc thus:
"For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964."
The heuristic is 80 years before to 20 years after "now". So actually 12-Jan-98 is in 1998 and 12-Jan-70 is in 1970 ... if you parse using a SimpleDateFormat with a "yy" format.
If you need the dates to mean something else, then you will need to use a different date parser. For example, if you use the Joda-time libraries, it is possible to specify the "pivot year"; i.e. the middle year of the century in which 2-digit years fall.
Reference:
Joda-time Freaky Formatters