I want to parse some dates in Java, but the format is not defined and could be a lot of them (any ISO-8601 format which is already a lot, Unix timestamp in any unit, and more)
Here are some samples :
1970-01-01T00:00:00.00Z
1234567890
1234567890000
1234567890000000
2021-09-20T17:27:00.000Z+02:00
The perfect parsing might be impossible because of ambiguous cases but, a solution to parse most of the common dates with some logical might be achievable (for example timestamps are considered in seconds / milli / micro / nano in order to give a date close to the 2000 era, dates like '08/07/2021' could have a default for month and day distinction).
I didn't find any easy way to do it in Java while in python it is kind of possible (not working on all my samples but at least some of them) using infer_datetime_format of panda function to_datetime (https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html).
Are there some easy approach in Java?
Well, first of all, I agree with rzwitserloot here that date parsing in free format is extremely difficult and full of ambiguities. So you are skating on thin ice and will eventually run into trouble if you just assume that a user input will be correctly parsed the way you think it will.
Nevertheless, we could make it work if I assume either of the following:
You simply don't care if it will be parsed incorrectly; or
You are doing this for fun or for learning purposes; or
You have a banner, saying:
If the parsing goes wrong, it's your fault. Don't blame us.
Anyway, the DateTimeFormatterBuilder is able to build a DateTimeFormatter which could be able to parse a lot of different patterns. Since a formatter supports optional parsing, it could be instructed to try to parse a certain value, or skip that part if no valid value could be found.
For instance, this builder is able to parse a fairly wide range of ISO-like dates, with many optional parts:
DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder()
.appendPattern("uuuu-M-d")
.optionalStart()
.optionalStart().appendLiteral(' ').optionalEnd()
.optionalStart().appendLiteral('T').optionalEnd()
.appendValue(ChronoField.HOUR_OF_DAY)
.optionalStart()
.appendLiteral(':')
.appendValue(ChronoField.MINUTE_OF_HOUR)
.optionalStart()
.appendLiteral(':')
.appendValue(ChronoField.SECOND_OF_MINUTE)
.optionalStart()
.appendFraction(ChronoField.NANO_OF_SECOND, 1, 9, true)
.optionalEnd()
.optionalEnd()
.optionalEnd()
.appendPattern("[XXXXX][XXXX][XXX][XX][X]")
.optionalEnd();
DateTimeFormatter formatter = builder.toFormatter(Locale.ROOT);
All of the strings below can be successfully parsed by this formatter.
Stream.of(
"2021-09-28",
"2021-07-04T14",
"2021-07-04T14:06",
"2001-09-11 00:00:15",
"1970-01-01T00:00:15.446-08:00",
"2021-07-04T14:06:15.2017323Z",
"2021-09-20T17:27:00.000+02:00"
).forEach(testcase -> System.out.println(formatter.parse(testcase)));
Als you can see, with optionalStart() and optionalEnd(), you could define optional portions of the format.
There are many more patterns you probably want to parse. You could add those patterns to the abovementioned builder. Alternatively, the appendOptional(DateTimeFormatter) method could be used to include multiple builders.
The perfect parsing might be impossible because of ambiguous cases but, a solution to parse most of the common dates with some logical might be achievable
Sure, and such wide-ranging guesswork should most definitely not be part of a standard java.* API. I think you're also wildly underestimating the ambiguity. 1234567890? It's just flat out incorrect to say that this can reasonably be parsed.
You are running into many, many problems here:
Java in general prefers throwing an error instead of guessing. This is inherent in the language (java has few optional syntax constructs; semicolons aren't optional, () for method invocations are not optional, java intentionally does not have 'truthy/false', i.e. if (foo) is only valid if foo is an expression of the boolean type, unlike e.g. python where you can stick anything in there and there's a big list of what counts as falsy, with the rest being considering truthy. When in rome, be like the romans: If this tenet annoys you, well, either learn to love it, begrudgingly accept it, or program in another language. This idea is endemic in the entire ecosystem. For what it is worth, given that debugging tends to take far longer than typing the optional constructs, java is objectively correct or at least making rational decisions for being like this.
Either you can't bring in the notion that 'hey, this number is larger than 12, therefore it cannot possibly be the month', or, you have to accept that whether a certain date format parsers properly depends on whether the day-of-month value is above or below 12. I would strongly advocate that you avoid a library that fails this rule like the plague. What possible point is there, in the end? "My app will parse your date correctly, but only for about 3/5ths of all dates?" So, given that you can't/should not take that into account, 1234567890, is that seconds-since-1970? milliseconds-since-1970? Is that the 12th of the 34th month of the year 5678, the 90th hour, and assumed zeroes for minutes, seconds, and millis? If a library guesses, that library is wrong, because you should not guess unless you're 95%+ sure.
The obvious and perennial "do not guess" example is, of course, 101112. Is that November 10th, 2012 (european style)? Is that October 11th, 2012 (American style), or is that November 12th, 2010 (ISO style)? These are all reasonable guesses and therefore guessing is just wrong here. Do. Not. Guess. Unless you're really sure. Given that this is a somewhat common way to enter dates, thus: Guessing at all costs is objectively silly (see above). Guessing only when it's pretty clear and erroring out otherwise is mostly useless, given that ambiguity is so easy to introduce.
The concept of guessing may be defensible but only with a lot more information. For example, if you give me the input '101112100000', there's no way it's correct to guess here. But if you also tell me that a human entered this input, and that human is clearly clued into, say, german locale, then I can see the need to be able to turn that into '10th of november 2012, 10 o'clock in the morning': Interpreting as seconds or millis since some epoch is precluded by the human factor, and the day-month-year order by locale.
You asked:
Are there some easy approach in Java?
This entire question is incorrect. The in Java part needs to be stripped from this question, and then the answer is a simple: No. There is no simple way to parse strings into date/times without a lot more information than just the input string. If another library says they can do that, they are lying, or at least, operating under a list of cultural and source assumptions as long as my leg, and you should not be using that library.
I don't know any standard library with this functionality, but you can always use DateTimeFormatter class and guess the format looping over a list of predefined formats, or using the ones provides by this class.
This is a typichal approximation of what you want to archive.
Here you can see and old implementation https://balusc.omnifaces.org/2007/09/dateutil.html
FTA (https://github.com/tsegall/fta) is designed to solve exactly this problem (among others). It currently parses thousands of formats and does not do it via a predefined set, so typically runs extremely quickly. In this example we explicitly set the DateResolutionMode, however, it will default to something intelligent based on the Locale. Here is an example:
import com.cobber.fta.dates.DateTimeParser;
import com.cobber.fta.dates.DateTimeParser.DateResolutionMode;
public abstract class Simple {
public static void main(final String[] args) {
final String[] samples = { "1970-01-01T00:00:00.00Z", "2021-09-20T17:27:00.000Z+02:00", "08/07/2021" };
final DateTimeParser dtp = new DateTimeParser().withDateResolutionMode(DateResolutionMode.MonthFirst).withLocale(Locale.ENGLISH);
for (final String sample : samples)
System.err.printf("Format is: '%s'%n", dtp.determineFormatString(sample));
}
}
Which will give the following output:
Format is: 'yyyy-MM-dd'T'HH:mm:ss.SSX'
Format is: 'yyyy-MM-dd'T'HH:mm:ss.SSSX'
Format is: 'MM/dd/yyyy'
I am aware of
NumberFormat nf = NumberFormat.getInstance(Locale.getDefault());
But I want all the numbers shown in my app to be formatted according to the locale, thus I don't think it will be a good way to format them one by one using the above method.
So is there some global setting/variable/configuration that I have to change in order to do that?
Locale-aware formatting requires more than just translating e.g. month names from one language to another. In Java that's handled by separate classes apart from the ones that actually hold the values, e.g. NumberFormat, DateFormat. So there's no way around using them like you already do.
What you could try is to create some wrappers or convenience methods (like formatDate(Date)) to simplify things for you. Also put format strings into Android Resources (res/values).
I'm new to code in Android Studio and when i set a integer in a text like this:
textview.setText(String.format("%d",1));
This code give me back a warning:
Implicitly using the default locale is a common source of bugs: Use String.format (Locale,...)
What is the correct code for put an integer in a .setText?
I founded more question on stackoverflow but don't apply to this.
What is the correct code for put an integer in a .setText?
You simply need to convert your int as a String, you can use Integer.toString(int) for this purpose.
Your code should then be:
textview.setText(Integer.toString(myInt));
If you want to set a fixed value simply use the corresponding String literal.
So here your code could simply be:
textview.setText("1");
You get this warning because String.format(String format, Object... args) will use the default locale for your instance of the Java Virtual Machine which could cause behavior change according to the chosen format since you could end up with a format locale dependent.
For example if you simply add a comma in your format to include the grouping characters, the result is now locale dependent as you can see in this example:
System.out.println(String.format(Locale.FRANCE, "10000 for FR is %,d", 10_000));
System.out.println(String.format(Locale.US, "10000 for US is %,d", 10_000));
Output:
10000 for FR is 10 000
10000 for US is 10,000
I'm trying to get the number format according to current locale but I have a problem with the currency symbol.
This is my method:
import java.util.Locale;
import java.text.NumberFormat;
public void i18nCurrency(Locale currentLocale) {
Double price = 9876543.21;
NumberFormat currencyFormatter =
NumberFormat.getCurrencyInstance(currentLocale);
System.out.println(currencyFormatter.format(price));
}
It prints: ¤ 9 876 543,21 for uk and ¤9.876.543,21 for german. The number format is correct, but I need to get the currency symbol as well. Why I can't get the symbol?
The symbol you're getting is a universal currency placeholder. It is displayed when currency is unknown.
You probably wonder why it is unknown. Well, from your description you simply called the method passing something like Locale.GERMAN. If you did, there is no way of knowing what currency to use, because:
Euro is a currency of Germany and Austria
Swiss Frank (SFr.) is a currency of Switzerland
Each of these countries has German as at least one of their official languages. In order to resolve the problem, you always need to pass a country, i.e. call the method with Locale.GERMANY as a parameter.
Now, the harder part. It is all fairly easy when you are working with desktop application. All you have to do is to detect current OS locale like this:
Locale currentLocale = Locale.getDefault(LocaleCategory.FORMAT);
However, this method won't work with web applications. I suspect this is the case. Well, the Locale that web browsers give you might be not suitable for formatting currencies, as they may lack information about the country.
The recommended way to solve this issue is to create user profile and ask users to select the Locale (separately for UI translations and formatting purposes).
I still have to point out one important thing, because I don't want you to run into problems. When you have some monetary value in your application (usually it should be an instance of BigDecimal class, as double is not suitable for this purpose), it represents some value in a specific currency. Be it Euro, British Pound, or a Dollar, but the value is specific. It doesn't really make sense to format this value for specific country currency, as you should first change the amount (I believe you understand why).
What you probably need instead, is overriding the currency symbol or currency code to match your currency. The format and the symbol placement should obviously stay intact.
Please consider this example:
Currency dollar = Currency.getInstance("USD");
NumberFormat fmt = NumberFormat.getCurrencyInstance(Locale.GERMANY); //this gets € as currency symbol
BigDecimal monetaryAmount = BigDecimal.valueOf(12.34d);
String originalEuros = fmt.format(monetaryAmount);
System.out.println(originalEuros);
fmt.setCurrency(dollar); // change the currency symbol to $
String modifiedDollars = fmt.format(monetaryAmount);
System.out.println(modifiedDollars);
This prints:
12,34 €
12,34 USD
Wait, why? The answer to your question lies in this subtle code snippet:
System.out.println(currency.getSymbol(Locale.GERMANY));
System.out.println(currency.getSymbol(Locale.US));
The result:
USD
$
What gets printed depends on a Locale. It is probably better this way, I cannot tell...
I believe, unless you are creating Internet currency exchange application, you should stick to my example.
I'm working with DecimalFormat, I want to be able to read and write decimals with as much precision as given (I'm converting to BigDecimal).
Essentially, I want a DecimalFormat which enforces the following pattern "\d+(\.\d+)?" i.e. "at least one digit then, optionally, a decimal separator followed by at least one digit".
I'm struggling to be able to implement this using DecimalFormat, I've tried several patterns but they seem to enforced fixed number of digits.
I'm open to alternative ways of achieving this too.
Edit:
For a little more background, I'm parsing user-supplied data in which decimals could be formatted in any way, and possibly not in the locale format. I'm hoping to let them supply a decimal format string which I can use the parse the data.
Since you noted in a comment that you need Locale support:
Locale locale = //get this from somewhere else
DecimalFormat df = new DecimalFormat();
df.setDecimalFormatSymbols(new DecimalFormatSymbols(locale));
df.setMaximumFractionDigits(Integer.MAX_VALUE);
df.setMinimumFractionDigits(1);
df.setParseBigDecimal(true);
And then parse.
This seems to work fine:
public static void main(String[] args) throws Exception{
DecimalFormat f = new DecimalFormat("0.#");
f.setParseBigDecimal(true);
f.setDecimalFormatSymbols(new DecimalFormatSymbols(Locale.US));// if required
System.out.println(f.parse("1.0")); // 1.0
System.out.println(f.parse("1")); // 1
System.out.println(f.parse("1.1")); // 1.1
System.out.println(f.parse("1.123")); // 1.123
System.out.println(f.parse("1.")); // 1
System.out.println(f.parse(".01")); // 0.01
}
Except for the last two that violate your "at least one digit" requirement. You may have to check that separately using a regex if it's really important.