Parsing simple times in hh:mmaa format - java

DateFormat formatter = new SimpleDateFormat("hh:mmaa");
formatter.parse("01:20pm")
I'm trying to parse times in the format of 01:20pm. If I run the above code, I get the following exception:
java.text.ParseException: Unparseable date: "01:20pm"
at java.text.DateFormat.parse(DateFormat.java:366)
As far as the format I put in the SimpleDateFormat constructor, I don't see anything wrong. What went wrong here?

Your system locale must not recognize AM/PM. Use a Locale that does. Something like,
DateFormat formatter = new SimpleDateFormat("hh:mmaa", Locale.US);
Or, in Java 8+, use the new java.time API like
LocalTime lt = LocalTime.parse("01:20pm",
DateTimeFormatter.ofPattern("hh:mmaa", Locale.US));

Number and date parsing in Java uses the Locale to specify, well, locale-specific symbols. In this case, it is mostly the pm value that is being rejected.
To confirm this, here is a piece of code to exercise all available locales in the VM.
For locales that don't work, I was curious to see why, so instead of parsing a time, I format a valid time instead. Had to enable UTF-8 output, but it's interesting to see.
The really interesting part is that all Spanish (es) locales, except the United States variant (es_US) works fine. Hmmm........
Set<String> good = new TreeSet<>();
Set<String> bad = new TreeSet<>();
for (Locale locale : Locale.getAvailableLocales()) {
try {
new SimpleDateFormat("hh:mmaa", locale).parse("01:20pm");
good.add(locale.toLanguageTag());
} catch (ParseException e) {
bad.add(locale.toLanguageTag());
}
}
System.out.println("Good locales: " + good);
Date time = new SimpleDateFormat("hh:mmaa", Locale.ENGLISH).parse("01:20pm");
System.out.println("Bad locales:");
for (String languageTag : bad)
System.out.printf(" %-5s: %s%n", languageTag, new SimpleDateFormat("hh:mmaa", Locale.forLanguageTag(languageTag)).format(time));
OUTPUT
Good locales: [be, be-BY, bg, bg-BG, ca, ca-ES, da, da-DK, de, de-AT, de-CH, de-DE, de-GR, de-LU, en, en-AU, en-CA, en-GB, en-IE, en-IN, en-MT, en-NZ, en-PH, en-SG, en-US, en-ZA, es, es-AR, es-BO, es-CL, es-CO, es-CR, es-CU, es-DO, es-EC, es-ES, es-GT, es-HN, es-MX, es-NI, es-PA, es-PE, es-PR, es-PY, es-SV, es-UY, es-VE, et, et-EE, fr, fr-BE, fr-CA, fr-CH, fr-FR, fr-LU, he, he-IL, hi, hr, hr-HR, id, id-ID, is, is-IS, it, it-CH, it-IT, lt, lt-LT, lv, lv-LV, mk, mk-MK, ms, ms-MY, nl, nl-BE, nl-NL, nn-NO, no, no-NO, pl, pl-PL, pt, pt-BR, pt-PT, ro, ro-RO, ru, ru-RU, sk, sk-SK, sl, sl-SI, sr, sr-BA, sr-CS, sr-Latn, sr-Latn-BA, sr-Latn-ME, sr-Latn-RS, sr-ME, sr-RS, tr, tr-TR, uk, uk-UA, und]
Bad locales:
ar : 01:20م
ar-AE: 01:20م
ar-BH: 01:20م
ar-DZ: 01:20م
ar-EG: 01:20م
ar-IQ: 01:20م
ar-JO: 01:20م
ar-KW: 01:20م
ar-LB: 01:20م
ar-LY: 01:20م
ar-MA: 01:20م
ar-OM: 01:20م
ar-QA: 01:20م
ar-SA: 01:20م
ar-SD: 01:20م
ar-SY: 01:20م
ar-TN: 01:20م
ar-YE: 01:20م
cs : 01:20odp.
cs-CZ: 01:20odp.
el : 01:20μμ
el-CY: 01:20ΜΜ
el-GR: 01:20μμ
es-US: 01:20p.m.
fi : 01:20ip.
fi-FI: 01:20ip.
ga : 01:20p.m.
ga-IE: 01:20p.m.
hi-IN: ०१:२०अपराह्न
hu : 01:20DU
hu-HU: 01:20DU
ja : 01:20午後
ja-JP: 01:20午後
ja-JP-u-ca-japanese-x-lvariant-JP: 01:20午後
ko : 01:20오후
ko-KR: 01:20오후
mt : 01:20WN
mt-MT: 01:20WN
sq : 01:20MD
sq-AL: 01:20MD
sv : 01:20em
sv-SE: 01:20em
th : 01:20หลังเที่ยง
th-TH: 01:20หลังเที่ยง
th-TH-u-nu-thai-x-lvariant-TH: ๐๑:๒๐หลังเที่ยง
vi : 01:20CH
vi-VN: 01:20CH
zh : 01:20下午
zh-CN: 01:20下午
zh-HK: 01:20下午
zh-SG: 01:20下午
zh-TW: 01:20下午

Related

Convert language code three characters (ISO 639-2) to two-character code (ISO 639-1)

I'm developing an android app using Text-to-Speech (TTS) engine. TTS component return the list of available languages as list of Locale objects.
But both methods Locale::getLanguage and Locale::getISO3Language of each Locale object return the same 3-character code (ISO 639-2). Usually getLanguage() return the language code in 2-character format (ISO 639-1) but for a particular device the code is three characters. Same for country code. However I need to have the language and country code in two character format (ISO 639-1).
Someone know a way to make a conversion? Please note, I need a corresponding Locale object with both language and country codes in two letter format.
tl;dr
As a workaround, make your own Map< Locale , String> mapping each known Locale to its 2-letter language code per ISO 639-1.
new LocaleLookup().lookupTwoLetterLanguageCode( Locale.CANADA_FRENCH )
fr
Or maybe just parse the text of Locale::toString.
Locale
.CANADA_FRENCH
.toString() // fr_CA
.split( "_" ) // Array: { "fr" , "CA" }
[ 0 ] // Grab first element in array, "fr".
fr
For two-letter country code, use the second part of that split string. Use index of 1 instead of 0.
Locale
.CANADA_FRENCH
.toString() // fr_CA
.split( "_" ) // Array: { "fr" , "CA" }
[ 1 ] // Grab first element in array, "CA".
CA
Bug?
It seems to be a bug that Locale::getLanguage would return a 3-letter code. The Javadoc uses a 2-letter code in its code example. But unfortunately the Javadoc fails to specify explicitly 2 or 3 letters. I suggest you file a request with the OpenJDK project to clarify this Javadoc.
Workaround
As a workaround, perhaps you could call Locale.getISOLanguages to get an array of all known languages in 2-letter codes. Then loop those. For each, use the code seen in the Javadoc, passing 2-letter code to constrict a Locale object for comparison:
if (locale.getLanguage().equals(new Locale("he").getLanguage()))
From this build your own Map between locale and 2-letter code.
Example class
Here is my first stab at such a workaround map.
In the constructor, we get a list of all known locales, and all known 2-letter ISO 639-1 language codes.
Next we do a nested loop. For each locale, we loop all the 2-letter language codes until we find a match. Notice that we do not do a string match. The Javadoc warns us that the ISO 639 standard is not stable; the codes are changing. Quoting:
Note: ISO 639 is not a stable standard— some languages' codes have changed. Locale's constructor recognizes both the new and the old codes for the languages whose codes have changed, but this function always returns the old code. If you want to check for a specific language whose code has changed, don't do
if (locale.getLanguage().equals("he")) // BAD!
Instead, do
if (locale.getLanguage().equals(new Locale("he").getLanguage())) // GOOD.
So our inner loop looks at each known 2-letter language code, and gets a Locale object for that language. Then our if statement compares the output of getLanguage for (a) our outer loop’s Locale, and (b) our inner loop’s generated language-only Locale (generated by our 2-letter code). In your case, you claim some device is outputting 3-letter code value for our call to getLanguage. But whether 2 or 3 letters, does not matter. We are just looking for a match.
Once instantiated, we can ask our LocaleLookup instance for the two-letter code matching a particular Locale by calling the lookupTwoLetterLanguageCode method.
LocaleLookup localeLookup = new LocaleLookup();
Locale locale = Locale.CANADA_FRENCH;
String code = localeLookup.lookupTwoLetterLanguageCode( locale );
System.out.println( "Locale: " + locale.toString() + " " + locale.getDisplayName( Locale.getDefault() ) + " | ISO 639-1 code: " + code );
Locale: fr_CA French (Canada) | ISO 639-1 code: fr
I'm just guessing at all this. I have not thought it through, nor have I tested any of this. So buyer-beware, this solution is worth every penny you paid for it. Good luck.
Here is the entire class, with a public static void main to use as demonstration.
package work.basil.example;
import java.util.*;
public class LocaleLookup
{
private Map < Locale, String > mapLocaleToTwoLetterLangCode;
public LocaleLookup ( )
{
this.mapLocaleToTwoLetterLangCode = new HashMap <>( Locale.getAvailableLocales().length );
this.makeMaps();
System.out.println( "mapLocaleToTwoLetterLangCode = " + mapLocaleToTwoLetterLangCode );
}
private void makeMaps ( )
{
// Get all locales.
Set < Locale > locales = Set.of( Locale.getAvailableLocales() );
// Get all languages, per 2-letter code.
Set < String > twoLetterLanguageCodes = Set.of( Locale.getISOLanguages() ); // Returns: An array of ISO 639 two-letter language codes.
for ( Locale locale : locales )
{
for ( String twoLetterLanguageCode : twoLetterLanguageCodes )
{
if ( locale.getLanguage().equals( new Locale( twoLetterLanguageCode ).getLanguage() ) )
{
this.mapLocaleToTwoLetterLangCode.put( locale , twoLetterLanguageCode );
break;
}
}
}
// System.out.println( "locales = " + locales );
// System.out.println( "twoLetterLanguageCodes = " + twoLetterLanguageCodes );
}
public String lookupTwoLetterLanguageCode ( final Locale locale )
{
String code = this.mapLocaleToTwoLetterLangCode.get( locale );
Objects.requireNonNull( code );
return code;
}
public static void main ( String[] args )
{
LocaleLookup localeLookup = new LocaleLookup();
Locale locale = Locale.CANADA_FRENCH;
String code = localeLookup.lookupTwoLetterLanguageCode( locale );
System.out.println( "Locale: " + locale.toString() + " " + locale.getDisplayName( Locale.getDefault() ) + " | ISO 639-1 code: " + code );
}
}
And here is the map I produce in a pre-release version of Java 15. Note this may be incorrect, as I have seen some goofiness with locales in the pre-release version.
mapLocaleToTwoLetterLangCode = {nn=nn, ar_JO=ar, bg=bg, zu=zu, am_ET=am, fr_DZ=fr, ti_ET=ti, bo_CN=bo, qu_EC=qu, ta_SG=ta, lv=lv, en_NU=en, en_MS=en, zh_SG_#Hans=zh, ff_LR_#Adlm=ff, en_GG=en, en_JM=en, vo=vo, sd__#Arab=sd, sv_SE=sv, sr_ME=sr, dz_BT=dz, es_BO=es, en_ZM=en, fr_ML=fr, br=br, ha_NG=ha, fa_AF=fa, ar_SA=ar, sk=sk, os_GE=os, ml=ml, en_MT=en, en_LR=en, ar_TD=ar, en_GH=en, en_IL=en, sv=sv, cs=cs, el=el, af=af, ff_MR_#Latn=ff, sw_UG=sw, tk_TM=tk, sr_ME_#Cyrl=sr, ar_EG=ar, sd__#Deva=sd, ji_001=yi, yo_NG=yo, se_NO=se, ku=ku, sw_CD=sw, vo_001=vo, en_PW=en, pl_PL=pl, ff_MR_#Adlm=ff, it_VA=it, sr_CS=sr, ne_IN=ne, es_PH=es, es_ES=es, es_CO=es, bg_BG=bg, ji=yi, ar_EH=ar, bs_BA_#Latn=bs, en_VC=en, nb_SJ=nb, es_US=es, en_US_POSIX=en, en_150=en, ar_SD=ar, en_KN=en, ha_NE=ha, pt_MO=pt, ro_RO=ro, zh__#Hans=zh, lb_LU=lb, sr_ME_#Latn=sr, es_GT=es, so_KE=so, ff_LR_#Latn=ff, ff_GH_#Latn=ff, fr_PM=fr, ar_KM=ar, no_NO_NY=no, fr_MG=fr, es_CL=es, mn=mn, tr_TR=tr, eu=eu, fa_IR=fa, en_MO=en, wo=wo, en_BZ=en, sq_AL=sq, ar_MR=ar, es_DO=es, ru=ru, az=az, su__#Latn=su, fa=fa, kl_GL=kl, en_NR=en, nd=nd, kk=kk, en_MP=en, az__#Cyrl=az, en_GD=en, tk=tk, hy=hy, en_BW=en, en_AU=en, en_CY=en, ta_MY=ta, ti_ER=ti, en_RW=en, sv_FI=sv, nd_ZW=nd, lb=lb, ne=ne, su=su, zh_SG=zh, en_IE=en, ln_CD=ln, en_KI=en, om_ET=om, no=no, ja_JP=ja, my=my, ka=ka, ar_IL=ar, ff_GH_#Adlm=ff, or_IN=or, fr_MF=fr, ms_ID=ms, kl=kl, en_SZ=en, zh=zh, es_PE=es, ta=ta, az__#Latn=az, en_GB=en, zh_HK_#Hant=zh, ar_SY=ar, bo=bo, kk_KZ=kk, tt_RU=tt, es_PA=es, om_KE=om, ar_PS=ar, fr_VU=fr, en_AS=en, zh_TW=zh, sd_IN=sd, fr_MC=fr, kw=kw, fr_NE=fr, pt_MZ=pt, ur_IN=ur, ln=ln, en_JE=en, ln_CF=ln, en_CX=en, pt=pt, en_AT=en, gl=gl, sr__#Cyrl=sr, es_GQ=es, kn_IN=kn, ff__#Adlm=ff, ar_YE=ar, en_SX=en, to=to, ga=ga, qu=qu, ru_KZ=ru, en_TZ=en, et=et, en_PR=en, jv=jv, ko_KP=ko, in=in, sn=sn, ps=ps, nl_SR=nl, en_BS=en, km=km, fr_NC=fr, be=be, gv=gv, es=es, gd_GB=gd, nl_BQ=nl, ff_GN_#Adlm=ff, fr_CM=fr, uz_UZ_#Cyrl=uz, pa_IN_#Guru=pa, en_KE=en, ja=ja, fr_SN=fr, or=or, fr_MA=fr, pt_LU=pt, ff_GM_#Adlm=ff, fr_BL=fr, en_NL=en, ln_CG=ln, te=te, sl=sl, ha=ha, mr_IN=mr, ko_KR=ko, el_CY=el, ku_TR=ku, es_MX=es, es_HN=es, hu_HU=hu, ff_SN=ff, sq_MK=sq, sr_BA_#Cyrl=sr, fi=fi, bs__#Cyrl=bs, uz=uz, et_EE=et, sr__#Latn=sr, en_SS=en, bo_IN=bo, sw=sw, fy_NL=fy, ar_OM=ar, tr_CY=tr, rm=rm, fr_BI=fr, en_MG=en, uz_UZ_#Latn=uz, bn=bn, de_IT=de, kn=kn, fr_TN=fr, sr_RS=sr, bn_BD=bn, de_CH=de, fr_PF=fr, gu=gu, pt_GQ=pt, en_ZA=en, en_TV=en, lo=lo, fr_FR=fr, en_PN=en, fr_BJ=fr, en_MH=en, zh__#Hant=zh, zh_HK_#Hans=zh, cu_RU=cu, nl_NL=nl, en_GY=en, ps_AF=ps, bs__#Latn=bs, ky=ky, os=os, bs_BA_#Cyrl=bs, nl_CW=nl, ar_DZ=ar, sk_SK=sk, pt_CH=pt, fr_GQ=fr, xh=xh, ki_KE=ki, am=am, fr_CI=fr, en_NG=en, ia_001=ia, en_PK=en, zh_CN=zh, en_LC=en, rw=rw, ff_BF_#Adlm=ff, wo_SN=wo, gv_IM=gv, iw=iw, en_TT=en, mk_MK=mk, sl_SI=sl, fr_HT=fr, te_IN=te, nl_SX=nl, ce=ce, fr_CG=fr, xh_ZA=xh, fr_BE=fr, ff_NE_#Adlm=ff, es_VE=es, mt_MT=mt, mr=mr, mg=mg, ko=ko, en_BM=en, nb_NO=nb, ak=ak, dz=dz, vi_VN=vi, en_VU=en, ia=ia, en_US=en, ff_SL_#Latn=ff, to_TO=to, ff_SN_#Adlm=ff, fr_BF=fr, pa__#Guru=pa, it_SM=it, su_ID=su, fr_YT=fr, gu_IN=gu, ii_CN=ii, ff_CM_#Latn=ff, pa_PK_#Arab=pa, fr_RE=fr, fi_FI=fi, ca_FR=ca, sr_BA_#Latn=sr, bn_IN=bn, fr_GP=fr, pa=pa, tg=tg, fr_DJ=fr, rn=rn, uk_UA=uk, ks__#Arab=ks, hu=hu, fr_CH=fr, en_NF=en, ff_GW_#Adlm=ff, ha_GH=ha, sr_XK_#Cyrl=sr, bm=bm, ar_SS=ar, en_GU=en, nl_AW=nl, de_BE=de, en_AI=en, en_CM=en, cs_CZ=cs, ca_ES=ca, tr=tr, ff_GW_#Latn=ff, rm_CH=rm, ru_MD=ru, ms_MY=ms, ta_LK=ta, en_TO=en, ff_SN_#Latn=ff, ff_SL_#Adlm=ff, cy=cy, en_PG=en, fr_CF=fr, pt_TL=pt, sq=sq, tg_TJ=tg, fr=fr, en_ER=en, qu_PE=qu, sr_BA=sr, es_PY=es, de=de, es_EC=es, ff_CM_#Adlm=ff, lg_UG=lg, ff_NE_#Latn=ff, zu_ZA=zu, fr_TG=fr, su_ID_#Latn=su, sr_XK_#Latn=sr, en_PH=en, ig_NG=ig, fr_GN=fr, zh_MO_#Hans=zh, lg=lg, ru_RU=ru, se_FI=se, ff=ff, en_DM=en, en_CK=en, sd=sd, ar_MA=ar, ga_IE=ga, en_BI=en, en_AG=en, fr_TD=fr, fr_LU=fr, en_WS=en, fr_CD=fr, so=so, rn_BI=rn, en_NA=en, mi_NZ=mi, ar_ER=ar, ms=ms, sn_ZW=sn, iw_IL=iw, ug=ug, es_EA=es, ga_GB=ga, th_TH_TH_#u-nu-thai=th, hi=hi, fr_SC=fr, ca_IT=ca, ff_NG_#Latn=ff, en_SL=en, no_NO=no, ca_AD=ca, ff_NG_#Adlm=ff, zh_MO_#Hant=zh, en_SH=en, qu_BO=qu, vi=vi, sd_PK_#Arab=sd, fr_CA=fr, de_LU=de, sq_XK=sq, en_KY=en, mi=mi, mt=mt, it_CH=it, de_DE=de, si_LK=si, en_AE=en, en_DK=en, so_DJ=so, eo=eo, lt_LT=lt, it_IT=it, en_ZW=en, ar_SO=ar, ro=ro, en_UM=en, ps_PK=ps, eo_001=eo, ee=ee, fr_MU=fr, nn_NO=nn, se_SE=se, pl=pl, en_TK=en, en_SI=en, ur=ur, uz__#Arab=uz, pt_GW=pt, se=se, lo_LA=lo, af_ZA=af, ar_LB=ar, ms_SG=ms, ee_TG=ee, ln_AO=ln, be_BY=be, ff_GN=ff, in_ID=in, es_BZ=es, ar_AE=ar, hr_HR=hr, as=as, it=it, pt_CV=pt, ks_IN=ks, uk=uk, my_MM=my, mn_MN=mn, ur_PK=ur, en_FM=en, da_DK=da, es_PR=es, en_BE=en, ii=ii, fr_WF=fr, tt=tt, ru_BY=ru, fo_DK=fo, ee_GH=ee, en_SG=en, ar_BH=ar, ff_GM_#Latn=ff, om=om, en_CH=en, hi_IN=hi, fo_FO=fo, yo_BJ=yo, fr_KM=fr, fr_MQ=fr, ff_GN_#Latn=ff, en_SD=en, es_AR=es, ff__#Latn=ff, en_MY=en, ja_JP_JP_#u-ca-japanese=ja, es_SV=es, pt_BR=pt, ml_IN=ml, en_FK=en, uz__#Cyrl=uz, is_IS=is, hy_AM=hy, en_GM=en, en_DG=en, fo=fo, ne_NP=ne, pt_ST=pt, hr=hr, ak_GH=ak, lt=lt, uz_AF_#Arab=uz, ta_IN=ta, fr_GF=fr, en_SE=en, zh_CN_#Hans=zh, es_419=es, is=is, pt_AO=pt, si=si, en_001=en, jv_ID=jv, en=en, es_IC=es, fr_MR=fr, ca=ca, ru_KG=ru, ar_TN=ar, ks=ks, zh_TW_#Hant=zh, ff_BF_#Latn=ff, bm_ML=bm, kw_GB=kw, ug_CN=ug, as_IN=as, es_BR=es, zh_HK=zh, sw_KE=sw, en_SB=en, th_TH=th, rw_RW=rw, ar_IQ=ar, en_MW=en, mk=mk, en_IO=en, pa__#Arab=pa, en_DE=en, ar_QA=ar, en_CC=en, ro_MD=ro, en_FI=en, bs=bs, pt_PT=pt, fy=fy, az_AZ_#Cyrl=az, th=th, es_CU=es, ar=ar, en_SC=en, en_VI=en, eu_ES=eu, en_UG=en, en_NZ=en, es_UY=es, sg_CF=sg, ru_UA=ru, sg=sg, uz__#Latn=uz, el_GR=el, da_GL=da, en_FJ=en, de_LI=de, en_BB=en, km_KH=km, hr_BA=hr, de_AT=de, nl=nl, lu_CD=lu, ca_ES_VALENCIA=ca, ar_001=ar, so_SO=so, lv_LV=lv, sd_IN_#Deva=sd, es_CR=es, ar_KW=ar, fr_GA=fr, ar_LY=ar, sr=sr, sr_RS_#Cyrl=sr, en_MU=en, da=da, gl_ES=gl, az_AZ_#Latn=az, en_IM=en, en_LS=en, ig=ig, en_HK=en, en_GI=en, ce_RU=ce, gd=gd, en_CA=en, ka_GE=ka, fr_SY=fr, sw_TZ=sw, so_ET=so, fr_RW=fr, nl_BE=nl, ar_DJ=ar, mg_MG=mg, en_VG=en, cy_GB=cy, cu=cu, sr_RS_#Latn=sr, os_RU=os, en_TC=en, sv_AX=sv, ky_KG=ky, af_NA=af, lu=lu, en_IN=en, yo=yo, ki=ki, es_NI=es, nb=nb, sd_PK=sd, ti=ti, ms_BN=ms, br_FR=br}
Substring of Locale.toString?
Now, after having done all that work, I notice that the toString representation of the locale name starts with the two-letter language code! 🤦
If this always the case for all Locale objects, we can simply parse that string.
String twoLetterLanguageCode = Locale.CANADA_FRENCH.toString().split( "_" )[ 0 ];
twoLetterCode = fr
For country code, do the same, but pull the second part. Use an index value of 1 versus 0.
String twoLetterCountryCode = Locale.CANADA_FRENCH.toString().split( "_" )[ 1 ];
For this quick check on my pre-release Java 15, it does seem to be the case that every Locale object’s toString text starts with the 2-letter language code. But I do not know if you can count on that always being the case in the past and in the future.
System.out.println( Locale.getAvailableLocales().length );
ArrayList < Locale > problemLocales = new ArrayList <>( Locale.getAvailableLocales().length );
for ( Locale locale : Locale.getAvailableLocales() )
{
String parsed = locale.toString().split( "_" )[ 0 ];
if ( ! parsed.equalsIgnoreCase( locale.getLanguage() ) )
{
problemLocales.add( locale );
}
}
System.out.println( "problemLocales = " + problemLocales );
problemLocales = []
Or, vice-versa:
System.out.println( "Locale.getAvailableLocales().length: " + Locale.getAvailableLocales().length );
ArrayList < Locale > matchingLocales = new ArrayList <>( Locale.getAvailableLocales().length );
for ( Locale locale : Locale.getAvailableLocales() )
{
String parsed = locale.toString().split( "_" )[ 0 ];
if ( parsed.equalsIgnoreCase( locale.getLanguage() ) )
{
matchingLocales.add( locale );
}
}
System.out.println( "matchingLocales.size: " + matchingLocales.size() );
System.out.println( "matchingLocales = " + matchingLocales );
Locale.getAvailableLocales().length: 810
matchingLocales.size: 810

Supported Locales - ga_IE

while setting locale for google sheet api, throws the followinge error
Invalid requests[0].updateSpreadsheetProperties: Unsupported locale: ga_IE", "status" : "INVALID_ARGUMENT"
Reviewing the API doc, it seems to be not all locales are supported.
The locale of the spreadsheet in one of the following formats:
an ISO 639-1 language code such as en
an ISO 639-2 language code such as fil, if no 639-1 code exists
a combination of the ISO language code and country code, such as en_US
Note: when updating this field, not all locales/languages are supported.
Where can I find the list of supported locale?
As you quotes over Spreadsheet Properties, ISO 639-1 codes are preferred in first instance, ISO 639-2 are used when no ISO 639-1 exists, and, if no code exists for a given language on those ISOs, the combination of language_COUNTRY is used. This later case varies depending on the context. I assume that your code lays in any of the ISOs 639-1/2, so here you have the full lists:
ISO 639-1
Language 639-1 code
Abkhazian ab
Afar aa
Afrikaans af
Akan ak
Albanian sq
Amharic am
Arabic ar
Aragonese an
Armenian hy
Assamese as
Avaric av
Avestan ae
Aymara ay
Azerbaijani az
Bambara bm
Bashkir ba
Basque eu
Belarusian be
Bengali bn
Bihari languages bh
Bislama bi
Bosnian bs
Breton br
Bulgarian bg
Burmese my
Catalan, Valencian ca
Chamorro ch
Chechen ce
Chichewa, Chewa, Nyanja ny
Chinese zh
Chuvash cv
Cornish kw
Corsican co
Cree cr
Croatian hr
Czech cs
Danish da
Divehi, Dhivehi, Maldivian dv
Dutch, Flemish nl
Dzongkha dz
English en
Esperanto eo
Estonian et
Ewe ee
Faroese fo
Fijian fj
Finnish fi
French fr
Fulah ff
Galician gl
Georgian ka
German de
Greek, Modern (1453-) el
Guarani gn
Gujarati gu
Haitian, Haitian Creole ht
Hausa ha
Hebrew he
Herero hz
Hindi hi
Hiri Motu ho
Hungarian hu
Interlingua(International Auxiliary Language Association) ia
Indonesian id
Interlingue, Occidental ie
Irish ga
Igbo ig
Inupiaq ik
Ido io
Icelandic is
Italian it
Inuktitut iu
Japanese ja
Javanese jv
Kalaallisut, Greenlandic kl
Kannada kn
Kanuri kr
Kashmiri ks
Kazakh kk
Central Khmer km
Kikuyu, Gikuyu ki
Kinyarwanda rw
Kirghiz, Kyrgyz ky
Komi kv
Kongo kg
Korean ko
Kurdish ku
Kuanyama, Kwanyama kj
Latin la
Luxembourgish, Letzeburgesch lb
Ganda lg
Limburgan, Limburger, Limburgish li
Lingala ln
Lao lo
Lithuanian lt
Luba-Katanga lu
Latvian lv
Manx gv
Macedonian mk
Malagasy mg
Malay ms
Malayalam ml
Maltese mt
Maori mi
Marathi mr
Marshallese mh
Mongolian mn
Nauru na
Navajo, Navaho nv
North Ndebele nd
Nepali ne
Ndonga ng
Norwegian Bokmål nb
Norwegian Nynorsk nn
Norwegian no
Sichuan Yi, Nuosu ii
South Ndebele nr
Occitan oc
Ojibwa oj
Church Slavic, Old Slavonic, Church Slavonic, Old Bulgarian,Old Church Slavonic cu
Oromo om
Oriya or
Ossetian, Ossetic os
Punjabi, Panjabi pa
Pali pi
Persian fa
Polish pl
Pashto, Pushto ps
Portuguese pt
Quechua qu
Romansh rm
Rundi rn
Romanian, Moldavian, Moldovan ro
Russian ru
Sanskrit sa
Sardinian sc
Sindhi sd
Northern Sami se
Samoan sm
Sango sg
Serbian sr
Gaelic, Scottish Gaelic gd
Shona sn
Sinhala, Sinhalese si
Slovak sk
Slovenian sl
Somali so
Southern Sotho st
Spanish, Castilian es
Sundanese su
Swahili sw
Swati ss
Swedish sv
Tamil ta
Telugu te
Tajik tg
Thai th
Tigrinya ti
Tibetan bo
Turkmen tk
Tagalog tl
Tswana tn
Tonga(Tonga Islands) to
Turkish tr
Tsonga ts
Tatar tt
Twi tw
Tahitian ty
Uighur, Uyghur ug
Ukrainian uk
Urdu ur
Uzbek uz
Venda ve
Vietnamese vi
Volapük vo
Walloon wa
Welsh cy
Wolof wo
Western Frisian fy
Xhosa xh
Yiddish yi
Yoruba yo
Zhuang, Chuang za
Zulu zu
ISO 639-2 not covered by ISO 639-1
Language ISO 639-2
Achinese ace
Acoli ach
Adangme ada
Adyghe; Adygei ady
Afro-Asiatic languages afa
Afrihili afh
Ainu ain
Akkadian akk
Aleut ale
Algonquian languages alg
Southern Altai alt
English, Old(ca.450–1100) ang
Angika anp
Apache languages apa
Official Aramaic(700–300 BCE);Imperial Aramaic(700–300 BCE) arc
Mapudungun;Mapuche arn
Arapaho arp
Artificial languages art
Arawak arw
Asturian;Bable;Leonese;Asturleonese ast
Athapascan languages ath
Australian languages aus
Awadhi awa
Banda languages bad
Bamileke languages bai
Baluchi bal
Balinese ban
Basa bas
Baltic languages bat
Beja;Bedawiyet bej
Bemba bem
Berber languages ber
Bhojpuri bho
Bikol bik
Bini;Edo bin
Siksika bla
Bantu (Other) bnt
Braj bra
Batak languages btk
Buriat bua
Buginese bug
Blin; Bilin byn
Caddo cad
Central American Indian languages cai
Galibi Carib car
Caucasian languages cau
Cebuano ceb
Celtic languages cel
Chibcha chb
Chagatai chg
Chuukese chk
Mari chm
Chinook jargon chn
Choctaw cho
Chipewyan;Dene Suline chp
Cherokee chr
Cheyenne chy
Chamic languages cmc
Montenegrin cnr
Coptic cop
Creolesandpidgins, English based cpe
Creolesand pidgins, French-based cpf
Creolesand pidgins, Portuguese-based cpp
Crimean Tatar;Crimean Turkish crh
Creolesandpidgins crp
Kashubian csb
Cushitic languages cus
Dakota dak
Dargwa dar
Land Dayak languages day
Delaware del
Slave (Athapascan) den
Dogrib dgr
Dinka din
Dogri doi
Dravidian languages dra
Lower Sorbian dsb
Duala dua
Dutch, Middle(ca. 1050–1350) dum
Dyula dyu
Efik efi
Egyptian (Ancient) egy
Ekajuk eka
Elamite elx
English, Middle(1100–1500) enm
Ewondo ewo
Fang fan
Fanti fat
Filipino;Pilipino fil
Finno-Ugrian languages fiu
Fon fon
French, Middle(ca. 1400–1600) frm
French, Old(842–ca. 1400) fro
Northern Frisian frr
Eastern Frisian frs
Friulian fur
Ga gaa
Gayo gay
Gbaya gba
Germanic languages gem
Geez gez
Gilbertese gil
German, Middle High(ca. 1050–1500) gmh
German, Old High(ca. 750–1050) goh
Gondi gon
Gorontalo gor
Gothic got
Grebo grb
Greek, Ancient(to 1453) grc
Swiss German;Alemannic;Alsatian gsw
Gwich'in gwi
Haida hai
Hawaiian haw
Hiligaynon hil
Himachali languages; Pahari languages him
Hittite hit
Hmong;Mong hmn
Upper Sorbian hsb
Hupa hup
Iban iba
Ijo languages ijo
Iloko ilo
Indic languages inc
Indo-European languages ine
Ingush inh
Iranian languages ira
Iroquoian languages iro
Lojban jbo
Judeo-Persian jpr
Judeo-Arabic jrb
Kara-Kalpak kaa
Kabyle kab
Kachin;Jingpho kac
Kamba kam
Karen languages kar
Kawi kaw
Kabardian kbd
Khasi kha
Khoisan languages khi
Khotanese;Sakan kho
Kimbundu kmb
Konkani kok
Kosraean kos
Kpelle kpe
Karachay-Balkar krc
Karelian krl
Kru languages kro
Kurukh kru
Kumyk kum
Kutenai kut
Ladino lad
Lahnda lah
Lamba lam
Lezghian lez
Mongo lol
Lozi loz
Luba-Lulua lua
Luiseno lui
Lunda lun
Luo (Kenya and Tanzania) luo
Lushai lus
Madurese mad
Magahi mag
Maithili mai
Makasar mak
Mandingo man
Austronesian languages map
Masai mas
Moksha mdf
Mandar mdr
Mende men
Irish, Middle(900–1200) mga
Mi'kmaq;Micmac mic
Minangkabau min
Uncoded languages mis
Mon-Khmer languages mkh
Manchu mnc
Manipuri mni
Manobo languages mno
Mohawk moh
Mossi mos
Multiple languages mul
Munda languages mun
Creek mus
Mirandese mwl
Marwari mwr
Mayan languages myn
Erzya myv
Nahuatl languages nah
North American Indian languages nai
Neapolitan nap
Low German; Low Saxon; German, Low; Saxon, Low nds
Nepal Bhasa;Newari new
Nias nia
Niger-Kordofanian languages nic
Niuean niu
Nogai nog
Norse, Old non
N'Ko nqo
Pedi;Sepedi;Northern Sotho nso
Nubian languages nub
Classical Newari;Old Newari;Classical Nepal Bhasa nwc
Nyamwezi nym
Nyankole nyn
Nyoro nyo
Nzima nzi
Osage osa
Turkish, Ottoman(1500–1928) ota
Otomian languages oto
Papuan languages paa
Pangasinan pag
Pahlavi pal
Pampanga;Kapampangan pam
Papiamento pap
Palauan pau
Persian, Old(ca. 600–400 B.C.) peo
Philippine languages phi
Phoenician phn
Pohnpeian pon
Prakrit languages pra
Provençal, Old(to 1500);Old Occitan (to 1500) pro
Reserved for local use qaa-qtz
Rajasthani raj
Rapanui rap
Rarotongan;Cook Islands Maori rar
Romance languages roa
Romany rom
Aromanian;Arumanian;Macedo-Romanian rup
Sandawe sad
Yakut sah
South American Indian (Other) sai
Salishan languages sal
Samaritan Aramaic sam
Sasak sas
Santali sat
Sicilian scn
Scots sco
Selkup sel
Semitic languages sem
Irish, Old(to 900) sga
Sign Languages sgn
Shan shn
Sidamo sid
Siouan languages sio
Sino-Tibetan languages sit
Slavic languages sla
Southern Sami sma
Sami languages smi
Lule Sami smj
Inari Sami smn
Skolt Sami sms
Soninke snk
Sogdian sog
Songhai languages son
Sranan Tongo srn
Serer srr
Nilo-Saharan languages ssa
Sukuma suk
Susu sus
Sumerian sux
Classical Syriac syc
Syriac syr
Tai languages tai
Timne tem
Tereno ter
Tetum tet
Tigre tig
Tiv tiv
Tokelau tkl
Klingon;tlhIngan-Hol tlh
Tlingit tli
Tamashek tmh
Tonga (Nyasa) tog
Tok Pisin tpi
Tsimshian tsi
Tumbuka tum
Tupi languages tup
Altaic languages tut
Tuvalu tvl
Tuvinian tyv
Udmurt udm
Ugaritic uga
Umbundu umb
Undetermined und
Vai vai
Votic vot
Wakashan languages wak
Walamo wal
Waray war
Washo was
Sorbian languages wen
Kalmyk;Oirat xal
Yao yao
Yapese yap
Yupik languages ypk
Zapotec zap
Blissymbols;Blissymbolics;Bliss zbl
Zenaga zen
Standard Moroccan Tamazight zgh
Zande languages znd
Zuni zun
No linguistic content; Not applicable zxx
Zaza;Dimili;Dimli;Kirdki;Kirmanjki;Zazaki zza
Just for clarification: you can see the full list of ISO 639 codes on the Library of Congress. As I said before, I assume that your language is one of the former. If that is not the case, please ask for further help so I can assist you better.

Dynamically creating a regex from a DateFormat

I need to detect some stuff within a String that contains, among other things, dates. Now, parsing dates using regex is a known question on SO.
However, the dates in this text are localized. And the app needs to be able to adapt to differently localized dates. Luckily, I can figure out the correct date format for the current locale using DateFormat.getDateInstance(SHORT, locale). I can get a date pattern from that. But how do I turn it into a regex, dynamically?
The regex would not need to do in-depth validation of the format (leap years, correct amount of days for a month etc.), I can already be sure that the data is provided in a valid format. The date just needs to be identified (as in, the regex should be able to detect the start and end index of where a date is).
The answers in the linked question all assume the handful of common date formats. But assuming that in this case is a likely cause of getting an edge case that breaks the app in a very non-obvious way. Which is why I'd prefer a dynamically generated regex over a one-fits-all(?) solution.
I can't use DateFormat.parse(...), since I have to actually detect the date first, and can't directly extract it.
Since you're doing getDateInstance(SHORT, locale), with emphasis on Date and SHORT, the patterns are fairly limited, so the following code will do:
public static String dateFormatToRegex(Locale locale) {
StringBuilder regex = new StringBuilder();
String fmt = ((SimpleDateFormat) DateFormat.getDateInstance(DateFormat.SHORT, locale)).toPattern();
for (Matcher m = Pattern.compile("[^a-zA-Z]+|([a-zA-Z])\\1*").matcher(fmt); m.find(); ) {
String part = m.group();
if (m.start(1) == -1) { // Not letter(s): Literal text
regex.append(Pattern.quote(part));
} else {
switch (part.charAt(0)) {
case 'G': // Era designator
regex.append("\\p{L}+");
break;
case 'y': // Year
regex.append("\\d{1,4}");
break;
case 'M': // Month in year
if (part.length() > 2)
throw new UnsupportedOperationException("Date format part: " + part);
regex.append("(?:1[0-2]|0?[1-9])");
break;
case 'd': // Day in month
regex.append("(?:3[01]|[12][0-9]|0?[1-9])");
break;
default:
throw new UnsupportedOperationException("Date format part: " + part);
}
}
}
return regex.toString();
}
To see what regex's you'll get for various locales:
Locale[] locales = Locale.getAvailableLocales();
Arrays.sort(locales, Comparator.comparing(Locale::toLanguageTag));
Map<String, List<String>> fmtLocales = new TreeMap<>();
for (Locale locale : locales) {
String fmt = ((SimpleDateFormat) DateFormat.getDateInstance(DateFormat.SHORT, locale)).toPattern();
fmtLocales.computeIfAbsent(fmt, k -> new ArrayList<>()).add(locale.toLanguageTag());
}
fmtLocales.forEach((k, v) -> System.out.println(dateFormatToRegex(Locale.forLanguageTag(v.get(0))) + " " + v));
Output
\p{L}+\d{1,4}\Q.\E(?:0[1-9]|1[0-2])\Q.\E(?:0[1-9]|[12][0-9]|3[01]) [ja-JP-u-ca-japanese-x-lvariant-JP]
(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01])\Q/\E\d{1,4} [brx, brx-IN, chr, chr-US, ee, ee-GH, ee-TG, en, en-AS, en-BI, en-GU, en-MH, en-MP, en-PR, en-UM, en-US, en-US-POSIX, en-VI, fil, fil-PH, ks, ks-IN, ug, ug-CN, zu, zu-ZA]
(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01])\Q/\E\d{1,4} [es-PA, es-PR]
(?:0[1-9]|[12][0-9]|3[01])\Q-\E(?:0[1-9]|1[0-2])\Q-\E\d{1,4} [or, or-IN]
(?:0[1-9]|[12][0-9]|3[01])\Q. \E(?:0[1-9]|1[0-2])\Q. \E\d{1,4} [ksh, ksh-DE]
(?:0[1-9]|[12][0-9]|3[01])\Q. \E(?:0[1-9]|1[0-2])\Q. \E\d{1,4} [sl, sl-SI]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [fi, fi-FI, he, he-IL, is, is-IS]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [be, be-BY, dsb, dsb-DE, hsb, hsb-DE, sk, sk-SK, sq, sq-AL, sq-MK, sq-XK]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4}\Q.\E [bs-Cyrl, bs-Cyrl-BA, sr, sr-CS, sr-Cyrl, sr-Cyrl-BA, sr-Cyrl-ME, sr-Cyrl-RS, sr-Cyrl-XK, sr-Latn, sr-Latn-BA, sr-Latn-ME, sr-Latn-RS, sr-Latn-XK, sr-ME, sr-RS]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [tr, tr-CY, tr-TR]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4}\Q 'г'.\E [bg, bg-BG]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [agq, agq-CM, bas, bas-CM, bm, bm-ML, dje, dje-NE, dua, dua-CM, dyo, dyo-SN, en-HK, en-ZW, ewo, ewo-CM, ff, ff-CM, ff-GN, ff-MR, ff-SN, kab, kab-DZ, kea, kea-CV, khq, khq-ML, ksf, ksf-CM, ln, ln-AO, ln-CD, ln-CF, ln-CG, lo, lo-LA, lu, lu-CD, mfe, mfe-MU, mg, mg-MG, mua, mua-CM, nmg, nmg-CM, rn, rn-BI, seh, seh-MZ, ses, ses-ML, sg, sg-CF, shi, shi-Latn, shi-Latn-MA, shi-MA, shi-Tfng, shi-Tfng-MA, sw-CD, twq, twq-NE, yav, yav-CM, zgh, zgh-MA, zh-HK, zh-Hant-HK, zh-Hant-MO]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [ast, ast-ES, bn, bn-BD, bn-IN, ca, ca-AD, ca-ES, ca-ES-VALENCIA, ca-FR, ca-IT, el, el-CY, el-GR, en-AU, en-SG, es, es-419, es-AR, es-BO, es-BR, es-CR, es-CU, es-DO, es-EA, es-EC, es-ES, es-GQ, es-HN, es-IC, es-NI, es-PH, es-PY, es-SV, es-US, es-UY, es-VE, gu, gu-IN, ha, ha-GH, ha-NE, ha-NG, haw, haw-US, hi, hi-IN, km, km-KH, kn, kn-IN, ml, ml-IN, mr, mr-IN, pa, pa-Guru, pa-Guru-IN, pa-IN, pa-PK, ta, ta-IN, ta-LK, ta-MY, ta-SG, th, th-TH, to, to-TO, ur, ur-IN, ur-PK, zh-Hans-HK, zh-Hans-MO]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [th-TH-u-nu-thai-x-lvariant-TH]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [nus, nus-SS]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [en-NZ, es-CO, es-GT, es-PE, fr-BE, ms, ms-BN, ms-MY, ms-SG, nl-BE]
(?:0[1-9]|[12][0-9]|3[01])\Q-\E(?:0[1-9]|1[0-2])\Q-\E\d{1,4} [sv-FI]
(?:0[1-9]|[12][0-9]|3[01])\Q-\E(?:0[1-9]|1[0-2])\Q-\E\d{1,4} [es-CL, fy, fy-NL, my, my-MM, nl, nl-AW, nl-BQ, nl-CW, nl-NL, nl-SR, nl-SX, rm, rm-CH, te, te-IN]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [mk, mk-MK]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [nb, nb-NO, nb-SJ, nn, nn-NO, nn-NO, no, no-NO, pl, pl-PL, ro, ro-MD, ro-RO, tk, tk-TM]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4}\Q.\E [hr, hr-BA, hr-HR]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4} [az, az-AZ, az-Cyrl, az-Cyrl-AZ, az-Latn, az-Latn-AZ, cs, cs-CZ, de, de-AT, de-BE, de-CH, de-DE, de-LI, de-LU, et, et-EE, fo, fo-DK, fo-FO, fr-CH, gsw, gsw-CH, gsw-FR, gsw-LI, hy, hy-AM, it-CH, ka, ka-GE, kk, kk-KZ, ky, ky-KG, lb, lb-LU, lv, lv-LV, os, os-GE, os-RU, ru, ru-BY, ru-KG, ru-KZ, ru-MD, ru-RU, ru-UA, uk, uk-UA]
(?:0[1-9]|[12][0-9]|3[01])\Q.\E(?:0[1-9]|1[0-2])\Q.\E\d{1,4}\Q.\E [bs, bs-BA, bs-Latn, bs-Latn-BA]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q \E\d{1,4} [kkj, kkj-CM]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [am, am-ET, asa, asa-TZ, bem, bem-ZM, bez, bez-TZ, cgg, cgg-UG, da, da-DK, da-GL, dav, dav-KE, ebu, ebu-KE, en-001, en-150, en-AG, en-AI, en-AT, en-BB, en-BM, en-BS, en-CC, en-CH, en-CK, en-CM, en-CX, en-CY, en-DE, en-DG, en-DK, en-DM, en-ER, en-FI, en-FJ, en-FK, en-FM, en-GB, en-GD, en-GG, en-GH, en-GI, en-GM, en-GY, en-IE, en-IL, en-IM, en-IO, en-JE, en-JM, en-KE, en-KI, en-KN, en-KY, en-LC, en-LR, en-LS, en-MG, en-MO, en-MS, en-MT, en-MU, en-MW, en-MY, en-NA, en-NF, en-NG, en-NL, en-NR, en-NU, en-PG, en-PH, en-PK, en-PN, en-PW, en-RW, en-SB, en-SC, en-SD, en-SH, en-SI, en-SL, en-SS, en-SX, en-SZ, en-TC, en-TK, en-TO, en-TT, en-TV, en-TZ, en-UG, en-VC, en-VG, en-VU, en-WS, en-ZM, fr, fr-BF, fr-BI, fr-BJ, fr-BL, fr-CD, fr-CF, fr-CG, fr-CI, fr-CM, fr-DJ, fr-DZ, fr-FR, fr-GA, fr-GF, fr-GN, fr-GP, fr-GQ, fr-HT, fr-KM, fr-LU, fr-MA, fr-MC, fr-MF, fr-MG, fr-ML, fr-MQ, fr-MR, fr-MU, fr-NC, fr-NE, fr-PF, fr-PM, fr-RE, fr-RW, fr-SC, fr-SN, fr-SY, fr-TD, fr-TG, fr-TN, fr-VU, fr-WF, fr-YT, ga, ga-IE, gd, gd-GB, guz, guz-KE, ig, ig-NG, jmc, jmc-TZ, kam, kam-KE, kde, kde-TZ, ki, ki-KE, kln, kln-KE, ksb, ksb-TZ, lag, lag-TZ, lg, lg-UG, luo, luo-KE, luy, luy-KE, mas, mas-KE, mas-TZ, mer, mer-KE, mgh, mgh-MZ, mt, mt-MT, naq, naq-NA, nd, nd-ZW, nyn, nyn-UG, pa-Arab, pa-Arab-PK, qu, qu-BO, qu-EC, qu-PE, rof, rof-TZ, rwk, rwk-TZ, saq, saq-KE, sbp, sbp-TZ, sn, sn-ZW, sw, sw-KE, sw-TZ, sw-UG, teo, teo-KE, teo-UG, tzm, tzm-MA, vai, vai-LR, vai-Latn, vai-Latn-LR, vai-Vaii, vai-Vaii-LR, vi, vi-VN, vun, vun-TZ, xog, xog-UG, yo, yo-BJ, yo-NG]
(?:0[1-9]|[12][0-9]|3[01])\Q/\E(?:0[1-9]|1[0-2])\Q/\E\d{1,4} [cy, cy-GB, en-BE, en-BW, en-BZ, en-IN, es-MX, fur, fur-IT, gl, gl-ES, id, id-ID, it, it-IT, it-SM, nnh, nnh-CM, om, om-ET, om-KE, pt, pt-AO, pt-BR, pt-CH, pt-CV, pt-GQ, pt-GW, pt-LU, pt-MO, pt-MZ, pt-PT, pt-ST, pt-TL, so, so-DJ, so-ET, so-KE, so-SO, ti, ti-ER, ti-ET, uz, uz-AF, uz-Cyrl, uz-Cyrl-UZ, uz-Latn, uz-Latn-UZ, uz-UZ, yi, yi-001, zh-Hans-SG, zh-SG]
(?:0[1-9]|[12][0-9]|3[01])\Q‏/\E(?:0[1-9]|1[0-2])\Q‏/\E\d{1,4} [ar, ar-001, ar-AE, ar-BH, ar-DJ, ar-DZ, ar-EG, ar-EH, ar-ER, ar-IL, ar-IQ, ar-JO, ar-KM, ar-KW, ar-LB, ar-LY, ar-MA, ar-MR, ar-OM, ar-PS, ar-QA, ar-SA, ar-SD, ar-SO, ar-SS, ar-SY, ar-TD, ar-TN, ar-YE]
\d{1,4}\Q-\E(?:0[1-9]|1[0-2])\Q-\E(?:0[1-9]|[12][0-9]|3[01]) [af, af-NA, af-ZA, as, as-IN, bo, bo-CN, bo-IN, br, br-FR, ce, ce-RU, ckb, ckb-IQ, ckb-IR, cu, cu-RU, dz, dz-BT, en-CA, en-SE, gv, gv-IM, ii, ii-CN, jgo, jgo-CM, kl, kl-GL, kok, kok-IN, kw, kw-GB, lkt, lkt-US, lrc, lrc-IQ, lrc-IR, lt, lt-LT, mgo, mgo-CM, mn, mn-MN, mzn, mzn-IR, ne, ne-IN, ne-NP, prg, prg-001, se, se-FI, se-NO, se-SE, si, si-LK, smn, smn-FI, sv, sv-AX, sv-SE, und, uz-Arab, uz-Arab-AF, vo, vo-001, wae, wae-CH]
\d{1,4}\Q. \E(?:0[1-9]|1[0-2])\Q. \E(?:0[1-9]|[12][0-9]|3[01])\Q.\E [hu, hu-HU]
\d{1,4}\Q/\E(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01]) [fa, fa-AF, fa-IR, ps, ps-AF, yue, yue-HK, zh, zh-CN, zh-Hans, zh-Hans-CN, zh-Hant, zh-Hant-TW, zh-TW]
\d{1,4}\Q/\E(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01]) [en-ZA, eu, eu-ES, ja, ja-JP]
\d{1,4}\Q-\E(?:0[1-9]|1[0-2])\Q-\E(?:0[1-9]|[12][0-9]|3[01]) [eo, eo-001, fr-CA, sr-BA]
\d{1,4}\Q. \E(?:0[1-9]|1[0-2])\Q. \E(?:0[1-9]|[12][0-9]|3[01])\Q.\E [ko, ko-KP, ko-KR]
\d{1,4}\Q/\E(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01]) [sah, sah-RU]
\d{1,4}\Q/\E(?:0[1-9]|1[0-2])\Q/\E(?:0[1-9]|[12][0-9]|3[01]) [ak, ak-GH, rw, rw-RW]
What you're asking is really complicated, but it's not impossible — just likely many hundreds of lines of code before you're done. I'm really not sure that this is the route you want to go — honestly, if you already know what format the date is in, you should probably just parse() it — but let's say for the sake of argument that you really do want to turn a date pattern like YYYY-mm-dd HH:mm:ss into a regular expression that can match dates in that format.
There are several steps in the solution: You'll need to lexically analyze the pattern; transform the tokens into correct regex pieces in the current locale; and then mash them all together to make a regex you can use. (Thankfully, you don't need to perform complex parsing on the date-pattern string; lexical analysis is good enough for this.)
Lexical analysis or tokenization is the act of breaking the input string into its component tokens, so that instead of an array of characters, it becomes a sequence of enumerated values or objects: So for the previous example, you'd end up with an array or list like this: [YYYY, Hyphen, mm, Hyphen, dd, Space, HH, Colon, mm, Colon, ss]. This kind of tokenization is often done with a big state machine, and you may be able to find some open-source code somewhere (part of the Android source code, maybe?) that already does it. If not, you'll have to read each letter, count up how many of that letter there is, and choose an appropriate enum value to add to the growing list of tokens.
Once you have the tokenized sequence of elements, the next step is to transform each into a chunk of a regular expression that is valid for the current localization. This is probably a giant switch statement inside a loop over the tokens, and thus would turn a YYYY enum value into the string piece "[0-9]{4}", or the mmm enum value into a big chunk of regex string that matches all of the month names in the current locale ("jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec"). This obviously involves you pulling all the data for the given locale, so that you can make regex chunks out of its words.
Finally, you can concatenate all of the regex bits together, wrapping each bit in parentheses to ensure precedence is correct, and then finally Pattern.compile() the whole string. Don't forget to make it use a case-insensitive test.
If you don't know what locale you're in, you'll have to do this many times to produce many regexes for each possible locale, and then test the input against each one of them in turn.
This is a project-and-a-half, but it is something that could be built, if you really really really need it to work exactly like you described.
But again, if I were you, I'd stick with something that already exists: If you already know what locale you're in (or even if you don't), the parse() method already not only does the lexical analysis and input-validation for you — and is not only already written! — but it also produces a usable date object, too!
I still think that parsing from each position in the string and seeing if it succeeds is simpler and easier than first generating a regular expression.
Locale loc = Locale.forLanguageTag("en-AS");
DateTimeFormatter dateFormatter
= DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT).withLocale(loc);
String mixed = "09/03/18Some data06/29/18Some other data04/27/18A third piece of data";
// Check that the string starts with a date
ParsePosition pos = new ParsePosition(0);
LocalDate.from(dateFormatter.parse(mixed, pos));
int dataStartIndex = pos.getIndex();
System.out.println("Date: " + mixed.substring(0, dataStartIndex));
int candidateDateStartIndex = dataStartIndex;
while (candidateDateStartIndex < mixed.length()) {
try {
pos.setIndex(candidateDateStartIndex);
LocalDate.from(dateFormatter.parse(mixed, pos));
// Date found
System.out.println("Data: "
+ mixed.substring(dataStartIndex, candidateDateStartIndex));
dataStartIndex = pos.getIndex();
System.out.println("Date: "
+ mixed.substring(candidateDateStartIndex, dataStartIndex));
candidateDateStartIndex = dataStartIndex;
} catch (DateTimeException dte) {
// No date here; try next
candidateDateStartIndex++;
pos.setErrorIndex(-1); // Clear error
}
}
System.out.println("Data: " + mixed.substring(dataStartIndex, mixed.length()));
The output from this snippet was:
Date: 09/03/18
Data: Some data
Date: 06/29/18
Data: Some other data
Date: 04/27/18
Data: A third piece of data
If you’re happy with the accepted answer, please don’t let me take that away from you. Only please allow me to demonstrate the alternative to anyone reading along.
Exactly because I am presenting this for a broader audience, I am using java.time, the modern Java date and time API. If your data was originally written with a DateFormat, you may want to substitute that class into the above code. I trust you to do that in that case.

java PosixFileAttributes return wrong atime and mtime

My code is like
String path = "/home/user/tmp/file1";
Path p = FileSystems.getDefault().getPath(path);
PosixFileAttributes attrs = Files.readAttributes(p, PosixFileAttributes.class);
System.out.println("Last Modified Time: "+attrs.lastModifiedTime());
System.out.println("Last Access Time: "+attrs.lastAccessTime());
The time returned by lastModifiedTime() and lastAccessTime() are 4 hours difference with the correct one.
The output is
Last Modified Time: 2014-06-25T12:50:31Z
Last Access Time: 2014-06-25T18:26:07Z
stat file1 produce:
Access: 2014-06-25 14:26:07.870281008 -0400
Modify: 2014-06-25 08:50:31.922861913 -0400
Change: 2014-06-25 08:50:31.922861913 -0400
Any one can help me?
A time like
2014-06-25T12:50:31Z
is in UTC (that's the Z at the end), so it may be off according to your time zone.

Sort Japanese data in java

I need to sort list of Japanese Strings.As of now I am using Java's Collator API.That is working fine for all the languages.But for Japanese it is not giving the expected results.How can I achieve this ??
`Collator collator = Collator.getInstance(Locale.JAPAN);
collator.setStrength(Collator.PRIMARY);
Collections.sort(Words, collator);`
Here Words is the list of Japanese String.
There are three for Japan, I would make sure you are using the right locale. (they could be aliases for the same Collator)
final Locale[] availableLocales = Collator.getAvailableLocales();
Arrays.sort(availableLocales, new Comparator<Locale>() {
#Override
public int compare(Locale o1, Locale o2) {
return o1.toString().compareTo(o2.toString());
}
});
for(Locale locale : availableLocales)
System.out.println(locale) ;
prints
ar
ar_AE
ar_BH
ar_DZ
ar_EG
ar_IQ
ar_JO
ar_KW
ar_LB
ar_LY
ar_MA
ar_OM
ar_QA
ar_SA
ar_SD
ar_SY
ar_TN
ar_YE
be
be_BY
bg
bg_BG
ca
ca_ES
cs
cs_CZ
da
da_DK
de
de_AT
de_CH
de_DE
de_LU
el
el_CY
el_GR
en
en_AU
en_CA
en_GB
en_IE
en_IN
en_MT
en_NZ
en_PH
en_SG
en_US
en_ZA
es
es_AR
es_BO
es_CL
es_CO
es_CR
es_DO
es_EC
es_ES
es_GT
es_HN
es_MX
es_NI
es_PA
es_PE
es_PR
es_PY
es_SV
es_US
es_UY
es_VE
et
et_EE
fi
fi_FI
fr
fr_BE
fr_CA
fr_CH
fr_FR
fr_LU
ga
ga_IE
hi_IN
hr
hr_HR
hu
hu_HU
in
in_ID
is
is_IS
it
it_CH
it_IT
iw
iw_IL
ja
ja_JP
ja_JP_JP_#u-ca-japanese
ko
ko_KR
lt
lt_LT
lv
lv_LV
mk
mk_MK
ms
ms_MY
mt
mt_MT
nl
nl_BE
nl_NL
no
no_NO
no_NO_NY
pl
pl_PL
pt
pt_BR
pt_PT
ro
ro_RO
ru
ru_RU
sk
sk_SK
sl
sl_SI
sq
sq_AL
sr
sr_BA
sr_BA_#Latn
sr_CS
sr_ME
sr_ME_#Latn
sr_RS
sr_RS_#Latn
sr__#Latn
sv
sv_SE
th
th_TH
th_TH_TH_#u-nu-thai
tr
tr_TR
uk
uk_UA
vi
vi_VN
zh
zh_CN
zh_HK
zh_SG
zh_TW

Categories

Resources