Sort Japanese data in java - java
I need to sort list of Japanese Strings.As of now I am using Java's Collator API.That is working fine for all the languages.But for Japanese it is not giving the expected results.How can I achieve this ??
`Collator collator = Collator.getInstance(Locale.JAPAN);
collator.setStrength(Collator.PRIMARY);
Collections.sort(Words, collator);`
Here Words is the list of Japanese String.
There are three for Japan, I would make sure you are using the right locale. (they could be aliases for the same Collator)
final Locale[] availableLocales = Collator.getAvailableLocales();
Arrays.sort(availableLocales, new Comparator<Locale>() {
#Override
public int compare(Locale o1, Locale o2) {
return o1.toString().compareTo(o2.toString());
}
});
for(Locale locale : availableLocales)
System.out.println(locale) ;
prints
ar
ar_AE
ar_BH
ar_DZ
ar_EG
ar_IQ
ar_JO
ar_KW
ar_LB
ar_LY
ar_MA
ar_OM
ar_QA
ar_SA
ar_SD
ar_SY
ar_TN
ar_YE
be
be_BY
bg
bg_BG
ca
ca_ES
cs
cs_CZ
da
da_DK
de
de_AT
de_CH
de_DE
de_LU
el
el_CY
el_GR
en
en_AU
en_CA
en_GB
en_IE
en_IN
en_MT
en_NZ
en_PH
en_SG
en_US
en_ZA
es
es_AR
es_BO
es_CL
es_CO
es_CR
es_DO
es_EC
es_ES
es_GT
es_HN
es_MX
es_NI
es_PA
es_PE
es_PR
es_PY
es_SV
es_US
es_UY
es_VE
et
et_EE
fi
fi_FI
fr
fr_BE
fr_CA
fr_CH
fr_FR
fr_LU
ga
ga_IE
hi_IN
hr
hr_HR
hu
hu_HU
in
in_ID
is
is_IS
it
it_CH
it_IT
iw
iw_IL
ja
ja_JP
ja_JP_JP_#u-ca-japanese
ko
ko_KR
lt
lt_LT
lv
lv_LV
mk
mk_MK
ms
ms_MY
mt
mt_MT
nl
nl_BE
nl_NL
no
no_NO
no_NO_NY
pl
pl_PL
pt
pt_BR
pt_PT
ro
ro_RO
ru
ru_RU
sk
sk_SK
sl
sl_SI
sq
sq_AL
sr
sr_BA
sr_BA_#Latn
sr_CS
sr_ME
sr_ME_#Latn
sr_RS
sr_RS_#Latn
sr__#Latn
sv
sv_SE
th
th_TH
th_TH_TH_#u-nu-thai
tr
tr_TR
uk
uk_UA
vi
vi_VN
zh
zh_CN
zh_HK
zh_SG
zh_TW
Related
Convert language code three characters (ISO 639-2) to two-character code (ISO 639-1)
I'm developing an android app using Text-to-Speech (TTS) engine. TTS component return the list of available languages as list of Locale objects. But both methods Locale::getLanguage and Locale::getISO3Language of each Locale object return the same 3-character code (ISO 639-2). Usually getLanguage() return the language code in 2-character format (ISO 639-1) but for a particular device the code is three characters. Same for country code. However I need to have the language and country code in two character format (ISO 639-1). Someone know a way to make a conversion? Please note, I need a corresponding Locale object with both language and country codes in two letter format.
tl;dr As a workaround, make your own Map< Locale , String> mapping each known Locale to its 2-letter language code per ISO 639-1. new LocaleLookup().lookupTwoLetterLanguageCode( Locale.CANADA_FRENCH ) fr Or maybe just parse the text of Locale::toString. Locale .CANADA_FRENCH .toString() // fr_CA .split( "_" ) // Array: { "fr" , "CA" } [ 0 ] // Grab first element in array, "fr". fr For two-letter country code, use the second part of that split string. Use index of 1 instead of 0. Locale .CANADA_FRENCH .toString() // fr_CA .split( "_" ) // Array: { "fr" , "CA" } [ 1 ] // Grab first element in array, "CA". CA Bug? It seems to be a bug that Locale::getLanguage would return a 3-letter code. The Javadoc uses a 2-letter code in its code example. But unfortunately the Javadoc fails to specify explicitly 2 or 3 letters. I suggest you file a request with the OpenJDK project to clarify this Javadoc. Workaround As a workaround, perhaps you could call Locale.getISOLanguages to get an array of all known languages in 2-letter codes. Then loop those. For each, use the code seen in the Javadoc, passing 2-letter code to constrict a Locale object for comparison: if (locale.getLanguage().equals(new Locale("he").getLanguage())) From this build your own Map between locale and 2-letter code. Example class Here is my first stab at such a workaround map. In the constructor, we get a list of all known locales, and all known 2-letter ISO 639-1 language codes. Next we do a nested loop. For each locale, we loop all the 2-letter language codes until we find a match. Notice that we do not do a string match. The Javadoc warns us that the ISO 639 standard is not stable; the codes are changing. Quoting: Note: ISO 639 is not a stable standard— some languages' codes have changed. Locale's constructor recognizes both the new and the old codes for the languages whose codes have changed, but this function always returns the old code. If you want to check for a specific language whose code has changed, don't do if (locale.getLanguage().equals("he")) // BAD! Instead, do if (locale.getLanguage().equals(new Locale("he").getLanguage())) // GOOD. So our inner loop looks at each known 2-letter language code, and gets a Locale object for that language. Then our if statement compares the output of getLanguage for (a) our outer loop’s Locale, and (b) our inner loop’s generated language-only Locale (generated by our 2-letter code). In your case, you claim some device is outputting 3-letter code value for our call to getLanguage. But whether 2 or 3 letters, does not matter. We are just looking for a match. Once instantiated, we can ask our LocaleLookup instance for the two-letter code matching a particular Locale by calling the lookupTwoLetterLanguageCode method. LocaleLookup localeLookup = new LocaleLookup(); Locale locale = Locale.CANADA_FRENCH; String code = localeLookup.lookupTwoLetterLanguageCode( locale ); System.out.println( "Locale: " + locale.toString() + " " + locale.getDisplayName( Locale.getDefault() ) + " | ISO 639-1 code: " + code ); Locale: fr_CA French (Canada) | ISO 639-1 code: fr I'm just guessing at all this. I have not thought it through, nor have I tested any of this. So buyer-beware, this solution is worth every penny you paid for it. Good luck. Here is the entire class, with a public static void main to use as demonstration. package work.basil.example; import java.util.*; public class LocaleLookup { private Map < Locale, String > mapLocaleToTwoLetterLangCode; public LocaleLookup ( ) { this.mapLocaleToTwoLetterLangCode = new HashMap <>( Locale.getAvailableLocales().length ); this.makeMaps(); System.out.println( "mapLocaleToTwoLetterLangCode = " + mapLocaleToTwoLetterLangCode ); } private void makeMaps ( ) { // Get all locales. Set < Locale > locales = Set.of( Locale.getAvailableLocales() ); // Get all languages, per 2-letter code. Set < String > twoLetterLanguageCodes = Set.of( Locale.getISOLanguages() ); // Returns: An array of ISO 639 two-letter language codes. for ( Locale locale : locales ) { for ( String twoLetterLanguageCode : twoLetterLanguageCodes ) { if ( locale.getLanguage().equals( new Locale( twoLetterLanguageCode ).getLanguage() ) ) { this.mapLocaleToTwoLetterLangCode.put( locale , twoLetterLanguageCode ); break; } } } // System.out.println( "locales = " + locales ); // System.out.println( "twoLetterLanguageCodes = " + twoLetterLanguageCodes ); } public String lookupTwoLetterLanguageCode ( final Locale locale ) { String code = this.mapLocaleToTwoLetterLangCode.get( locale ); Objects.requireNonNull( code ); return code; } public static void main ( String[] args ) { LocaleLookup localeLookup = new LocaleLookup(); Locale locale = Locale.CANADA_FRENCH; String code = localeLookup.lookupTwoLetterLanguageCode( locale ); System.out.println( "Locale: " + locale.toString() + " " + locale.getDisplayName( Locale.getDefault() ) + " | ISO 639-1 code: " + code ); } } And here is the map I produce in a pre-release version of Java 15. Note this may be incorrect, as I have seen some goofiness with locales in the pre-release version. mapLocaleToTwoLetterLangCode = {nn=nn, ar_JO=ar, bg=bg, zu=zu, am_ET=am, fr_DZ=fr, ti_ET=ti, bo_CN=bo, qu_EC=qu, ta_SG=ta, lv=lv, en_NU=en, en_MS=en, zh_SG_#Hans=zh, ff_LR_#Adlm=ff, en_GG=en, en_JM=en, vo=vo, sd__#Arab=sd, sv_SE=sv, sr_ME=sr, dz_BT=dz, es_BO=es, en_ZM=en, fr_ML=fr, br=br, ha_NG=ha, fa_AF=fa, ar_SA=ar, sk=sk, os_GE=os, ml=ml, en_MT=en, en_LR=en, ar_TD=ar, en_GH=en, en_IL=en, sv=sv, cs=cs, el=el, af=af, ff_MR_#Latn=ff, sw_UG=sw, tk_TM=tk, sr_ME_#Cyrl=sr, ar_EG=ar, sd__#Deva=sd, ji_001=yi, yo_NG=yo, se_NO=se, ku=ku, sw_CD=sw, vo_001=vo, en_PW=en, pl_PL=pl, ff_MR_#Adlm=ff, it_VA=it, sr_CS=sr, ne_IN=ne, es_PH=es, es_ES=es, es_CO=es, bg_BG=bg, ji=yi, ar_EH=ar, bs_BA_#Latn=bs, en_VC=en, nb_SJ=nb, es_US=es, en_US_POSIX=en, en_150=en, ar_SD=ar, en_KN=en, ha_NE=ha, pt_MO=pt, ro_RO=ro, zh__#Hans=zh, lb_LU=lb, sr_ME_#Latn=sr, es_GT=es, so_KE=so, ff_LR_#Latn=ff, ff_GH_#Latn=ff, fr_PM=fr, ar_KM=ar, no_NO_NY=no, fr_MG=fr, es_CL=es, mn=mn, tr_TR=tr, eu=eu, fa_IR=fa, en_MO=en, wo=wo, en_BZ=en, sq_AL=sq, ar_MR=ar, es_DO=es, ru=ru, az=az, su__#Latn=su, fa=fa, kl_GL=kl, en_NR=en, nd=nd, kk=kk, en_MP=en, az__#Cyrl=az, en_GD=en, tk=tk, hy=hy, en_BW=en, en_AU=en, en_CY=en, ta_MY=ta, ti_ER=ti, en_RW=en, sv_FI=sv, nd_ZW=nd, lb=lb, ne=ne, su=su, zh_SG=zh, en_IE=en, ln_CD=ln, en_KI=en, om_ET=om, no=no, ja_JP=ja, my=my, ka=ka, ar_IL=ar, ff_GH_#Adlm=ff, or_IN=or, fr_MF=fr, ms_ID=ms, kl=kl, en_SZ=en, zh=zh, es_PE=es, ta=ta, az__#Latn=az, en_GB=en, zh_HK_#Hant=zh, ar_SY=ar, bo=bo, kk_KZ=kk, tt_RU=tt, es_PA=es, om_KE=om, ar_PS=ar, fr_VU=fr, en_AS=en, zh_TW=zh, sd_IN=sd, fr_MC=fr, kw=kw, fr_NE=fr, pt_MZ=pt, ur_IN=ur, ln=ln, en_JE=en, ln_CF=ln, en_CX=en, pt=pt, en_AT=en, gl=gl, sr__#Cyrl=sr, es_GQ=es, kn_IN=kn, ff__#Adlm=ff, ar_YE=ar, en_SX=en, to=to, ga=ga, qu=qu, ru_KZ=ru, en_TZ=en, et=et, en_PR=en, jv=jv, ko_KP=ko, in=in, sn=sn, ps=ps, nl_SR=nl, en_BS=en, km=km, fr_NC=fr, be=be, gv=gv, es=es, gd_GB=gd, nl_BQ=nl, ff_GN_#Adlm=ff, fr_CM=fr, uz_UZ_#Cyrl=uz, pa_IN_#Guru=pa, en_KE=en, ja=ja, fr_SN=fr, or=or, fr_MA=fr, pt_LU=pt, ff_GM_#Adlm=ff, fr_BL=fr, en_NL=en, ln_CG=ln, te=te, sl=sl, ha=ha, mr_IN=mr, ko_KR=ko, el_CY=el, ku_TR=ku, es_MX=es, es_HN=es, hu_HU=hu, ff_SN=ff, sq_MK=sq, sr_BA_#Cyrl=sr, fi=fi, bs__#Cyrl=bs, uz=uz, et_EE=et, sr__#Latn=sr, en_SS=en, bo_IN=bo, sw=sw, fy_NL=fy, ar_OM=ar, tr_CY=tr, rm=rm, fr_BI=fr, en_MG=en, uz_UZ_#Latn=uz, bn=bn, de_IT=de, kn=kn, fr_TN=fr, sr_RS=sr, bn_BD=bn, de_CH=de, fr_PF=fr, gu=gu, pt_GQ=pt, en_ZA=en, en_TV=en, lo=lo, fr_FR=fr, en_PN=en, fr_BJ=fr, en_MH=en, zh__#Hant=zh, zh_HK_#Hans=zh, cu_RU=cu, nl_NL=nl, en_GY=en, ps_AF=ps, bs__#Latn=bs, ky=ky, os=os, bs_BA_#Cyrl=bs, nl_CW=nl, ar_DZ=ar, sk_SK=sk, pt_CH=pt, fr_GQ=fr, xh=xh, ki_KE=ki, am=am, fr_CI=fr, en_NG=en, ia_001=ia, en_PK=en, zh_CN=zh, en_LC=en, rw=rw, ff_BF_#Adlm=ff, wo_SN=wo, gv_IM=gv, iw=iw, en_TT=en, mk_MK=mk, sl_SI=sl, fr_HT=fr, te_IN=te, nl_SX=nl, ce=ce, fr_CG=fr, xh_ZA=xh, fr_BE=fr, ff_NE_#Adlm=ff, es_VE=es, mt_MT=mt, mr=mr, mg=mg, ko=ko, en_BM=en, nb_NO=nb, ak=ak, dz=dz, vi_VN=vi, en_VU=en, ia=ia, en_US=en, ff_SL_#Latn=ff, to_TO=to, ff_SN_#Adlm=ff, fr_BF=fr, pa__#Guru=pa, it_SM=it, su_ID=su, fr_YT=fr, gu_IN=gu, ii_CN=ii, ff_CM_#Latn=ff, pa_PK_#Arab=pa, fr_RE=fr, fi_FI=fi, ca_FR=ca, sr_BA_#Latn=sr, bn_IN=bn, fr_GP=fr, pa=pa, tg=tg, fr_DJ=fr, rn=rn, uk_UA=uk, ks__#Arab=ks, hu=hu, fr_CH=fr, en_NF=en, ff_GW_#Adlm=ff, ha_GH=ha, sr_XK_#Cyrl=sr, bm=bm, ar_SS=ar, en_GU=en, nl_AW=nl, de_BE=de, en_AI=en, en_CM=en, cs_CZ=cs, ca_ES=ca, tr=tr, ff_GW_#Latn=ff, rm_CH=rm, ru_MD=ru, ms_MY=ms, ta_LK=ta, en_TO=en, ff_SN_#Latn=ff, ff_SL_#Adlm=ff, cy=cy, en_PG=en, fr_CF=fr, pt_TL=pt, sq=sq, tg_TJ=tg, fr=fr, en_ER=en, qu_PE=qu, sr_BA=sr, es_PY=es, de=de, es_EC=es, ff_CM_#Adlm=ff, lg_UG=lg, ff_NE_#Latn=ff, zu_ZA=zu, fr_TG=fr, su_ID_#Latn=su, sr_XK_#Latn=sr, en_PH=en, ig_NG=ig, fr_GN=fr, zh_MO_#Hans=zh, lg=lg, ru_RU=ru, se_FI=se, ff=ff, en_DM=en, en_CK=en, sd=sd, ar_MA=ar, ga_IE=ga, en_BI=en, en_AG=en, fr_TD=fr, fr_LU=fr, en_WS=en, fr_CD=fr, so=so, rn_BI=rn, en_NA=en, mi_NZ=mi, ar_ER=ar, ms=ms, sn_ZW=sn, iw_IL=iw, ug=ug, es_EA=es, ga_GB=ga, th_TH_TH_#u-nu-thai=th, hi=hi, fr_SC=fr, ca_IT=ca, ff_NG_#Latn=ff, en_SL=en, no_NO=no, ca_AD=ca, ff_NG_#Adlm=ff, zh_MO_#Hant=zh, en_SH=en, qu_BO=qu, vi=vi, sd_PK_#Arab=sd, fr_CA=fr, de_LU=de, sq_XK=sq, en_KY=en, mi=mi, mt=mt, it_CH=it, de_DE=de, si_LK=si, en_AE=en, en_DK=en, so_DJ=so, eo=eo, lt_LT=lt, it_IT=it, en_ZW=en, ar_SO=ar, ro=ro, en_UM=en, ps_PK=ps, eo_001=eo, ee=ee, fr_MU=fr, nn_NO=nn, se_SE=se, pl=pl, en_TK=en, en_SI=en, ur=ur, uz__#Arab=uz, pt_GW=pt, se=se, lo_LA=lo, af_ZA=af, ar_LB=ar, ms_SG=ms, ee_TG=ee, ln_AO=ln, be_BY=be, ff_GN=ff, in_ID=in, es_BZ=es, ar_AE=ar, hr_HR=hr, as=as, it=it, pt_CV=pt, ks_IN=ks, uk=uk, my_MM=my, mn_MN=mn, ur_PK=ur, en_FM=en, da_DK=da, es_PR=es, en_BE=en, ii=ii, fr_WF=fr, tt=tt, ru_BY=ru, fo_DK=fo, ee_GH=ee, en_SG=en, ar_BH=ar, ff_GM_#Latn=ff, om=om, en_CH=en, hi_IN=hi, fo_FO=fo, yo_BJ=yo, fr_KM=fr, fr_MQ=fr, ff_GN_#Latn=ff, en_SD=en, es_AR=es, ff__#Latn=ff, en_MY=en, ja_JP_JP_#u-ca-japanese=ja, es_SV=es, pt_BR=pt, ml_IN=ml, en_FK=en, uz__#Cyrl=uz, is_IS=is, hy_AM=hy, en_GM=en, en_DG=en, fo=fo, ne_NP=ne, pt_ST=pt, hr=hr, ak_GH=ak, lt=lt, uz_AF_#Arab=uz, ta_IN=ta, fr_GF=fr, en_SE=en, zh_CN_#Hans=zh, es_419=es, is=is, pt_AO=pt, si=si, en_001=en, jv_ID=jv, en=en, es_IC=es, fr_MR=fr, ca=ca, ru_KG=ru, ar_TN=ar, ks=ks, zh_TW_#Hant=zh, ff_BF_#Latn=ff, bm_ML=bm, kw_GB=kw, ug_CN=ug, as_IN=as, es_BR=es, zh_HK=zh, sw_KE=sw, en_SB=en, th_TH=th, rw_RW=rw, ar_IQ=ar, en_MW=en, mk=mk, en_IO=en, pa__#Arab=pa, en_DE=en, ar_QA=ar, en_CC=en, ro_MD=ro, en_FI=en, bs=bs, pt_PT=pt, fy=fy, az_AZ_#Cyrl=az, th=th, es_CU=es, ar=ar, en_SC=en, en_VI=en, eu_ES=eu, en_UG=en, en_NZ=en, es_UY=es, sg_CF=sg, ru_UA=ru, sg=sg, uz__#Latn=uz, el_GR=el, da_GL=da, en_FJ=en, de_LI=de, en_BB=en, km_KH=km, hr_BA=hr, de_AT=de, nl=nl, lu_CD=lu, ca_ES_VALENCIA=ca, ar_001=ar, so_SO=so, lv_LV=lv, sd_IN_#Deva=sd, es_CR=es, ar_KW=ar, fr_GA=fr, ar_LY=ar, sr=sr, sr_RS_#Cyrl=sr, en_MU=en, da=da, gl_ES=gl, az_AZ_#Latn=az, en_IM=en, en_LS=en, ig=ig, en_HK=en, en_GI=en, ce_RU=ce, gd=gd, en_CA=en, ka_GE=ka, fr_SY=fr, sw_TZ=sw, so_ET=so, fr_RW=fr, nl_BE=nl, ar_DJ=ar, mg_MG=mg, en_VG=en, cy_GB=cy, cu=cu, sr_RS_#Latn=sr, os_RU=os, en_TC=en, sv_AX=sv, ky_KG=ky, af_NA=af, lu=lu, en_IN=en, yo=yo, ki=ki, es_NI=es, nb=nb, sd_PK=sd, ti=ti, ms_BN=ms, br_FR=br} Substring of Locale.toString? Now, after having done all that work, I notice that the toString representation of the locale name starts with the two-letter language code! 🤦 If this always the case for all Locale objects, we can simply parse that string. String twoLetterLanguageCode = Locale.CANADA_FRENCH.toString().split( "_" )[ 0 ]; twoLetterCode = fr For country code, do the same, but pull the second part. Use an index value of 1 versus 0. String twoLetterCountryCode = Locale.CANADA_FRENCH.toString().split( "_" )[ 1 ]; For this quick check on my pre-release Java 15, it does seem to be the case that every Locale object’s toString text starts with the 2-letter language code. But I do not know if you can count on that always being the case in the past and in the future. System.out.println( Locale.getAvailableLocales().length ); ArrayList < Locale > problemLocales = new ArrayList <>( Locale.getAvailableLocales().length ); for ( Locale locale : Locale.getAvailableLocales() ) { String parsed = locale.toString().split( "_" )[ 0 ]; if ( ! parsed.equalsIgnoreCase( locale.getLanguage() ) ) { problemLocales.add( locale ); } } System.out.println( "problemLocales = " + problemLocales ); problemLocales = [] Or, vice-versa: System.out.println( "Locale.getAvailableLocales().length: " + Locale.getAvailableLocales().length ); ArrayList < Locale > matchingLocales = new ArrayList <>( Locale.getAvailableLocales().length ); for ( Locale locale : Locale.getAvailableLocales() ) { String parsed = locale.toString().split( "_" )[ 0 ]; if ( parsed.equalsIgnoreCase( locale.getLanguage() ) ) { matchingLocales.add( locale ); } } System.out.println( "matchingLocales.size: " + matchingLocales.size() ); System.out.println( "matchingLocales = " + matchingLocales ); Locale.getAvailableLocales().length: 810 matchingLocales.size: 810
Supported Locales - ga_IE
while setting locale for google sheet api, throws the followinge error Invalid requests[0].updateSpreadsheetProperties: Unsupported locale: ga_IE", "status" : "INVALID_ARGUMENT" Reviewing the API doc, it seems to be not all locales are supported. The locale of the spreadsheet in one of the following formats: an ISO 639-1 language code such as en an ISO 639-2 language code such as fil, if no 639-1 code exists a combination of the ISO language code and country code, such as en_US Note: when updating this field, not all locales/languages are supported. Where can I find the list of supported locale?
As you quotes over Spreadsheet Properties, ISO 639-1 codes are preferred in first instance, ISO 639-2 are used when no ISO 639-1 exists, and, if no code exists for a given language on those ISOs, the combination of language_COUNTRY is used. This later case varies depending on the context. I assume that your code lays in any of the ISOs 639-1/2, so here you have the full lists: ISO 639-1 Language 639-1 code Abkhazian ab Afar aa Afrikaans af Akan ak Albanian sq Amharic am Arabic ar Aragonese an Armenian hy Assamese as Avaric av Avestan ae Aymara ay Azerbaijani az Bambara bm Bashkir ba Basque eu Belarusian be Bengali bn Bihari languages bh Bislama bi Bosnian bs Breton br Bulgarian bg Burmese my Catalan, Valencian ca Chamorro ch Chechen ce Chichewa, Chewa, Nyanja ny Chinese zh Chuvash cv Cornish kw Corsican co Cree cr Croatian hr Czech cs Danish da Divehi, Dhivehi, Maldivian dv Dutch, Flemish nl Dzongkha dz English en Esperanto eo Estonian et Ewe ee Faroese fo Fijian fj Finnish fi French fr Fulah ff Galician gl Georgian ka German de Greek, Modern (1453-) el Guarani gn Gujarati gu Haitian, Haitian Creole ht Hausa ha Hebrew he Herero hz Hindi hi Hiri Motu ho Hungarian hu Interlingua(International Auxiliary Language Association) ia Indonesian id Interlingue, Occidental ie Irish ga Igbo ig Inupiaq ik Ido io Icelandic is Italian it Inuktitut iu Japanese ja Javanese jv Kalaallisut, Greenlandic kl Kannada kn Kanuri kr Kashmiri ks Kazakh kk Central Khmer km Kikuyu, Gikuyu ki Kinyarwanda rw Kirghiz, Kyrgyz ky Komi kv Kongo kg Korean ko Kurdish ku Kuanyama, Kwanyama kj Latin la Luxembourgish, Letzeburgesch lb Ganda lg Limburgan, Limburger, Limburgish li Lingala ln Lao lo Lithuanian lt Luba-Katanga lu Latvian lv Manx gv Macedonian mk Malagasy mg Malay ms Malayalam ml Maltese mt Maori mi Marathi mr Marshallese mh Mongolian mn Nauru na Navajo, Navaho nv North Ndebele nd Nepali ne Ndonga ng Norwegian Bokmål nb Norwegian Nynorsk nn Norwegian no Sichuan Yi, Nuosu ii South Ndebele nr Occitan oc Ojibwa oj Church Slavic, Old Slavonic, Church Slavonic, Old Bulgarian,Old Church Slavonic cu Oromo om Oriya or Ossetian, Ossetic os Punjabi, Panjabi pa Pali pi Persian fa Polish pl Pashto, Pushto ps Portuguese pt Quechua qu Romansh rm Rundi rn Romanian, Moldavian, Moldovan ro Russian ru Sanskrit sa Sardinian sc Sindhi sd Northern Sami se Samoan sm Sango sg Serbian sr Gaelic, Scottish Gaelic gd Shona sn Sinhala, Sinhalese si Slovak sk Slovenian sl Somali so Southern Sotho st Spanish, Castilian es Sundanese su Swahili sw Swati ss Swedish sv Tamil ta Telugu te Tajik tg Thai th Tigrinya ti Tibetan bo Turkmen tk Tagalog tl Tswana tn Tonga(Tonga Islands) to Turkish tr Tsonga ts Tatar tt Twi tw Tahitian ty Uighur, Uyghur ug Ukrainian uk Urdu ur Uzbek uz Venda ve Vietnamese vi Volapük vo Walloon wa Welsh cy Wolof wo Western Frisian fy Xhosa xh Yiddish yi Yoruba yo Zhuang, Chuang za Zulu zu ISO 639-2 not covered by ISO 639-1 Language ISO 639-2 Achinese ace Acoli ach Adangme ada Adyghe; Adygei ady Afro-Asiatic languages afa Afrihili afh Ainu ain Akkadian akk Aleut ale Algonquian languages alg Southern Altai alt English, Old(ca.450–1100) ang Angika anp Apache languages apa Official Aramaic(700–300 BCE);Imperial Aramaic(700–300 BCE) arc Mapudungun;Mapuche arn Arapaho arp Artificial languages art Arawak arw Asturian;Bable;Leonese;Asturleonese ast Athapascan languages ath Australian languages aus Awadhi awa Banda languages bad Bamileke languages bai Baluchi bal Balinese ban Basa bas Baltic languages bat Beja;Bedawiyet bej Bemba bem Berber languages ber Bhojpuri bho Bikol bik Bini;Edo bin Siksika bla Bantu (Other) bnt Braj bra Batak languages btk Buriat bua Buginese bug Blin; Bilin byn Caddo cad Central American Indian languages cai Galibi Carib car Caucasian languages cau Cebuano ceb Celtic languages cel Chibcha chb Chagatai chg Chuukese chk Mari chm Chinook jargon chn Choctaw cho Chipewyan;Dene Suline chp Cherokee chr Cheyenne chy Chamic languages cmc Montenegrin cnr Coptic cop Creolesandpidgins, English based cpe Creolesand pidgins, French-based cpf Creolesand pidgins, Portuguese-based cpp Crimean Tatar;Crimean Turkish crh Creolesandpidgins crp Kashubian csb Cushitic languages cus Dakota dak Dargwa dar Land Dayak languages day Delaware del Slave (Athapascan) den Dogrib dgr Dinka din Dogri doi Dravidian languages dra Lower Sorbian dsb Duala dua Dutch, Middle(ca. 1050–1350) dum Dyula dyu Efik efi Egyptian (Ancient) egy Ekajuk eka Elamite elx English, Middle(1100–1500) enm Ewondo ewo Fang fan Fanti fat Filipino;Pilipino fil Finno-Ugrian languages fiu Fon fon French, Middle(ca. 1400–1600) frm French, Old(842–ca. 1400) fro Northern Frisian frr Eastern Frisian frs Friulian fur Ga gaa Gayo gay Gbaya gba Germanic languages gem Geez gez Gilbertese gil German, Middle High(ca. 1050–1500) gmh German, Old High(ca. 750–1050) goh Gondi gon Gorontalo gor Gothic got Grebo grb Greek, Ancient(to 1453) grc Swiss German;Alemannic;Alsatian gsw Gwich'in gwi Haida hai Hawaiian haw Hiligaynon hil Himachali languages; Pahari languages him Hittite hit Hmong;Mong hmn Upper Sorbian hsb Hupa hup Iban iba Ijo languages ijo Iloko ilo Indic languages inc Indo-European languages ine Ingush inh Iranian languages ira Iroquoian languages iro Lojban jbo Judeo-Persian jpr Judeo-Arabic jrb Kara-Kalpak kaa Kabyle kab Kachin;Jingpho kac Kamba kam Karen languages kar Kawi kaw Kabardian kbd Khasi kha Khoisan languages khi Khotanese;Sakan kho Kimbundu kmb Konkani kok Kosraean kos Kpelle kpe Karachay-Balkar krc Karelian krl Kru languages kro Kurukh kru Kumyk kum Kutenai kut Ladino lad Lahnda lah Lamba lam Lezghian lez Mongo lol Lozi loz Luba-Lulua lua Luiseno lui Lunda lun Luo (Kenya and Tanzania) luo Lushai lus Madurese mad Magahi mag Maithili mai Makasar mak Mandingo man Austronesian languages map Masai mas Moksha mdf Mandar mdr Mende men Irish, Middle(900–1200) mga Mi'kmaq;Micmac mic Minangkabau min Uncoded languages mis Mon-Khmer languages mkh Manchu mnc Manipuri mni Manobo languages mno Mohawk moh Mossi mos Multiple languages mul Munda languages mun Creek mus Mirandese mwl Marwari mwr Mayan languages myn Erzya myv Nahuatl languages nah North American Indian languages nai Neapolitan nap Low German; Low Saxon; German, Low; Saxon, Low nds Nepal Bhasa;Newari new Nias nia Niger-Kordofanian languages nic Niuean niu Nogai nog Norse, Old non N'Ko nqo Pedi;Sepedi;Northern Sotho nso Nubian languages nub Classical Newari;Old Newari;Classical Nepal Bhasa nwc Nyamwezi nym Nyankole nyn Nyoro nyo Nzima nzi Osage osa Turkish, Ottoman(1500–1928) ota Otomian languages oto Papuan languages paa Pangasinan pag Pahlavi pal Pampanga;Kapampangan pam Papiamento pap Palauan pau Persian, Old(ca. 600–400 B.C.) peo Philippine languages phi Phoenician phn Pohnpeian pon Prakrit languages pra Provençal, Old(to 1500);Old Occitan (to 1500) pro Reserved for local use qaa-qtz Rajasthani raj Rapanui rap Rarotongan;Cook Islands Maori rar Romance languages roa Romany rom Aromanian;Arumanian;Macedo-Romanian rup Sandawe sad Yakut sah South American Indian (Other) sai Salishan languages sal Samaritan Aramaic sam Sasak sas Santali sat Sicilian scn Scots sco Selkup sel Semitic languages sem Irish, Old(to 900) sga Sign Languages sgn Shan shn Sidamo sid Siouan languages sio Sino-Tibetan languages sit Slavic languages sla Southern Sami sma Sami languages smi Lule Sami smj Inari Sami smn Skolt Sami sms Soninke snk Sogdian sog Songhai languages son Sranan Tongo srn Serer srr Nilo-Saharan languages ssa Sukuma suk Susu sus Sumerian sux Classical Syriac syc Syriac syr Tai languages tai Timne tem Tereno ter Tetum tet Tigre tig Tiv tiv Tokelau tkl Klingon;tlhIngan-Hol tlh Tlingit tli Tamashek tmh Tonga (Nyasa) tog Tok Pisin tpi Tsimshian tsi Tumbuka tum Tupi languages tup Altaic languages tut Tuvalu tvl Tuvinian tyv Udmurt udm Ugaritic uga Umbundu umb Undetermined und Vai vai Votic vot Wakashan languages wak Walamo wal Waray war Washo was Sorbian languages wen Kalmyk;Oirat xal Yao yao Yapese yap Yupik languages ypk Zapotec zap Blissymbols;Blissymbolics;Bliss zbl Zenaga zen Standard Moroccan Tamazight zgh Zande languages znd Zuni zun No linguistic content; Not applicable zxx Zaza;Dimili;Dimli;Kirdki;Kirmanjki;Zazaki zza Just for clarification: you can see the full list of ISO 639 codes on the Library of Congress. As I said before, I assume that your language is one of the former. If that is not the case, please ask for further help so I can assist you better.
Can not identify text in Spanish with Lingpipe
Some days ago, I am developing an java server to keep a bunch of data and identify its language, so I decided to use lingpipe for such task. But I have facing an issue, after training code and evaluating it with two languages(English and Spanish) by getting that I can't identify spanish text, but I got a successful result with english and french. The tutorial that I have followed in order to complete this task is: http://alias-i.com/lingpipe/demos/tutorial/langid/read-me.html An the next steps I have made in order to complete the task: Steps followed to train a Language Classifier ~1.First place and unpack the english and spanish metadata inside a folder named leipzig, as follow (Note: Metadata and Sentences are provided from http://wortschatz.uni-leipzig.de/en/download): leipzig //Main folder 1M sentences //Folder with data of the last trial eng_news_2015_1M eng_news_2015_1M.tar.gz spa-hn_web_2015_1M spa-hn_web_2015_1M.tar.gz ClassifyLang.java //Custom program to try the trained code dist //Folder eng_news_2015_300K.tar.gz //unpackaged english sentences spa-hn_web_2015_300K.tar.gz //unpackaged spanish sentences EvalLanguageId.java langid-leipzig.classifier //trained code lingpipe-4.1.2.jar munged //Folder eng //folder containing the sentences.txt for english sentences.txt spa //folder containing the sentences.txt for spanish sentences.txt Munge.java TrainLanguageId.java unpacked //Folder eng_news_2015_300K //Folder with the english metadata eng_news_2015_300K-co_n.txt eng_news_2015_300K-co_s.txt eng_news_2015_300K-import.sql eng_news_2015_300K-inv_so.txt eng_news_2015_300K-inv_w.txt eng_news_2015_300K-sources.txt eng_news_2015_300K-words.txt sentences.txt spa-hn_web_2015_300K //Folder with the spanish metadata sentences.txt spa-hn_web_2015_300K-co_n.txt spa-hn_web_2015_300K-co_s.txt spa-hn_web_2015_300K-import.sql spa-hn_web_2015_300K-inv_so.txt spa-hn_web_2015_300K-inv_w.txt spa-hn_web_2015_300K-sources.txt spa-hn_web_2015_300K-words.txt ~2.Second unpack the language metadata compressed into a unpack folder unpacked //Folder eng_news_2015_300K //Folder with the english metadata eng_news_2015_300K-co_n.txt eng_news_2015_300K-co_s.txt eng_news_2015_300K-import.sql eng_news_2015_300K-inv_so.txt eng_news_2015_300K-inv_w.txt eng_news_2015_300K-sources.txt eng_news_2015_300K-words.txt sentences.txt spa-hn_web_2015_300K //Folder with the spanish metadata sentences.txt spa-hn_web_2015_300K-co_n.txt spa-hn_web_2015_300K-co_s.txt spa-hn_web_2015_300K-import.sql spa-hn_web_2015_300K-inv_so.txt spa-hn_web_2015_300K-inv_w.txt spa-hn_web_2015_300K-sources.txt spa-hn_web_2015_300K-words.txt ~3.Then Munge the sentences of each one in order to remove the line numbers, tabs and replacing line breaks with single space characters. The output is uniformly written using the UTF-8 unicode encoding (Note:the munge.java at Lingpipe site). /-----------------Command line----------------------------------------------/ javac -cp lingpipe-4.1.2.jar: Munge.java java -cp lingpipe-4.1.2.jar: Munge /home/samuel/leipzig/unpacked /home/samuel/leipzig/munged ----------------------------------------Results----------------------------- spa reading from=/home/samuel/leipzig/unpacked/spa-hn_web_2015_300K/sentences.txt charset=iso-8859-1 writing to=/home/samuel/leipzig/munged/spa/spa.txt charset=utf-8 total length=43267166 eng reading from=/home/samuel/leipzig/unpacked/eng_news_2015_300K/sentences.txt charset=iso-8859-1 writing to=/home/samuel/leipzig/munged/eng/eng.txt charset=utf-8 total length=35847257 /---------------------------------------------------------------/ <---------------------------------Folder-------------------------------------> munged //Folder eng //folder containing the sentences.txt for english sentences.txt spa //folder containing the sentences.txt for spanish sentences.txt <--------------------------------------------------------------------------> ~4.Next we start by training the language(Note:the TrainLanguageId.java at Lingpipe LanguageId tutorial). /---------------Command line--------------------------------------------/ javac -cp lingpipe-4.1.2.jar: TrainLanguageId.java java -cp lingpipe-4.1.2.jar: TrainLanguageId /home/samuel/leipzig/munged /home/samuel/leipzig/langid-leipzig.classifier 100000 5 -----------------------------------Results----------------------------------- nGram=100000 numChars=5 Training category=eng Training category=spa Compiling model to file=/home/samuel/leipzig/langid-leipzig.classifier /----------------------------------------------------------------------------/ ~5. We evaluated our trained code with the next result, having some issues on the confusion matrix (Note:the EvalLanguageId.java at Lingpipe LanguageId tutorial). /------------------------Command line---------------------------------/ javac -cp lingpipe-4.1.2.jar: EvalLanguageId.java java -cp lingpipe-4.1.2.jar: EvalLanguageId /home/samuel/leipzig/munged /home/samuel/leipzig/langid-leipzig.classifier 100000 50 1000 -------------------------------Results------------------------------------- Reading classifier from file=/home/samuel/leipzig/langid-leipzig.classifier Evaluating category=eng Evaluating category=spa TEST RESULTS BASE CLASSIFIER EVALUATION Categories=[eng, spa] Total Count=2000 Total Correct=1000 Total Accuracy=0.5 95% Confidence Interval=0.5 +/- 0.02191346617949794 Confusion Matrix reference \ response ,eng,spa eng,1000,0 <---------- not diagonal sampling spa,1000,0 Macro-averaged Precision=NaN Macro-averaged Recall=0.5 Macro-averaged F=NaN Micro-averaged Results the following symmetries are expected: TP=TN, FN=FP PosRef=PosResp=NegRef=NegResp Acc=Prec=Rec=F Total=4000 True Positive=1000 False Negative=1000 False Positive=1000 True Negative=1000 Positive Reference=2000 Positive Response=2000 Negative Reference=2000 Negative Response=2000 Accuracy=0.5 Recall=0.5 Precision=0.5 Rejection Recall=0.5 Rejection Precision=0.5 F(1)=0.5 Fowlkes-Mallows=2000.0 Jaccard Coefficient=0.3333333333333333 Yule's Q=0.0 Yule's Y=0.0 Reference Likelihood=0.5 Response Likelihood=0.5 Random Accuracy=0.5 Random Accuracy Unbiased=0.5 kappa=0.0 kappa Unbiased=0.0 kappa No Prevalence=0.0 chi Squared=0.0 phi Squared=0.0 Accuracy Deviation=0.007905694150420948 Random Accuracy=0.5 Random Accuracy Unbiased=0.625 kappa=0.0 kappa Unbiased=-0.3333333333333333 kappa No Prevalence =0.0 Reference Entropy=1.0 Response Entropy=NaN Cross Entropy=Infinity Joint Entropy=1.0 Conditional Entropy=0.0 Mutual Information=0.0 Kullback-Liebler Divergence=Infinity chi Squared=NaN chi-Squared Degrees of Freedom=1 phi Squared=NaN Cramer's V=NaN lambda A=0.0 lambda B=NaN ONE VERSUS ALL EVALUATIONS BY CATEGORY CATEGORY[0]=eng VERSUS ALL First-Best Precision/Recall Evaluation Total=2000 True Positive=1000 False Negative=0 False Positive=1000 True Negative=0 Positive Reference=1000 Positive Response=2000 Negative Reference=1000 Negative Response=0 Accuracy=0.5 Recall=1.0 Precision=0.5 Rejection Recall=0.0 Rejection Precision=NaN F(1)=0.6666666666666666 Fowlkes-Mallows=1414.2135623730949 Jaccard Coefficient=0.5 Yule's Q=NaN Yule's Y=NaN Reference Likelihood=0.5 Response Likelihood=1.0 Random Accuracy=0.5 Random Accuracy Unbiased=0.625 kappa=0.0 kappa Unbiased=-0.3333333333333333 kappa No Prevalence=0.0 chi Squared=NaN phi Squared=NaN Accuracy Deviation=0.011180339887498949 CATEGORY[1]=spa VERSUS ALL First-Best Precision/Recall Evaluation Total=2000 True Positive=0 False Negative=1000 False Positive=0 True Negative=1000 Positive Reference=1000 Positive Response=0 Negative Reference=1000 Negative Response=2000 Accuracy=0.5 Recall=0.0 Precision=NaN Rejection Recall=1.0 Rejection Precision=0.5 F(1)=NaN Fowlkes-Mallows=NaN Jaccard Coefficient=0.0 Yule's Q=NaN Yule's Y=NaN Reference Likelihood=0.5 Response Likelihood=0.0 Random Accuracy=0.5 Random Accuracy Unbiased=0.625 kappa=0.0 kappa Unbiased=-0.3333333333333333 kappa No Prevalence=0.0 chi Squared=NaN phi Squared=NaN Accuracy Deviation=0.011180339887498949 /-----------------------------------------------------------------------/ ~6.Then we tried to make a real evaluation with spanish text: /-------------------Command line----------------------------------/ javac -cp lingpipe-4.1.2.jar: ClassifyLang.java java -cp lingpipe-4.1.2.jar: ClassifyLang /-------------------------------------------------------------------------/ <---------------------------------Result------------------------------------> Text: Yo soy una persona increíble y muy inteligente, me admiro a mi mismo lo que me hace sentir ansiedad de lo que viene, por que es algo grandioso lleno de cosas buenas y de ahora en adelante estaré enfocado y optimista aunque tengo que aclarar que no lo haré por querer algo, sino por que es mi pasión. Best Language: eng <------------- Wrong Result <-----------------------------------------------------------------------> Code for ClassifyLang.java: import com.aliasi.classify.Classification; import com.aliasi.classify.Classified; import com.aliasi.classify.ConfusionMatrix; import com.aliasi.classify.DynamicLMClassifier; import com.aliasi.classify.JointClassification; import com.aliasi.classify.JointClassifier; import com.aliasi.classify.JointClassifierEvaluator; import com.aliasi.classify.LMClassifier; import com.aliasi.lm.NGramProcessLM; import com.aliasi.util.AbstractExternalizable; import java.io.File; import java.io.IOException; import com.aliasi.util.Files; public class ClassifyLang { public static String text = "Yo soy una persona increíble y muy inteligente, me admiro a mi mismo" + " estoy ansioso de lo que viene, por que es algo grandioso lleno de cosas buenas" + " y de ahora en adelante estaré enfocado y optimista" + " aunque tengo que aclarar que no lo haré por querer algo, sino por que no es difícil serlo. "; private static File MODEL_DIR = new File("/home/samuel/leipzig/langid-leipzig.classifier"); public static void main(String[] args) throws ClassNotFoundException, IOException { System.out.println("Text: " + text); LMClassifier classifier = null; try { classifier = (LMClassifier) AbstractExternalizable.readObject(MODEL_DIR); } catch (IOException | ClassNotFoundException ex) { // Handle exceptions System.out.println("Problem with the Model"); } Classification classification = classifier.classify(text); String bestCategory = classification.bestCategory(); System.out.println("Best Language: " + bestCategory); } } ~7.I tried with a 1 million metadata file, but it got the same result and also changing the ngram number by getting the same results. I will be so thankfull for your help.
Well, after days working in Natural Language Processing I found a way to determine the language of one text using OpenNLP. Here is the Sample Code: https://github.com/samuelchapas/languagePredictionOpenNLP/tree/master/TrainingLanguageDecOpenNLP and over here is the training Corpus for the model created to make language predictions. I decided to use OpenNLP for the issue described in this question, really this library has a complete stack of functionalities. Here is the sample for model training> https://mega.nz/#F!HHYHGJ4Q!PY2qfbZr-e0w8tg3cUgAXg
Is there any way to paint my plate numbers to black using JavaCV?
Hi I started using JavaCV two days ago, I m trying to make an ANPR Open Source System at my git repository under Java SE and Maven. I have detected my plate rectangle and now I'm trying to prepare a good image for OCR reading. the original image : Right now I have obtained this image : , Is there any way to turn my plate numbers black using JavaCv ? I don't have the slightest idea how to do so using javacv functions . here I give you the methods that are producing this result : first I call this after a blur public void toB_A_W(JLabel jLabel){ Mat rgbImage = Imgcodecs.imread(original); Mat destination = new Mat(rgbImage.rows(), rgbImage.cols(), rgbImage.type()); // l objectif et de corriger les erreur de la transformation en noire et blan int dilation_size = 2; // la matrice de la dilatation on cherche a dilater en forme de rectange ( Imgproc.MORPH_RECT ) Mat element1 = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(dilation_size + 1, dilation_size + 1)); // on dilate l image Imgproc.dilate(rgbImage, destination, element1); Mat labImage = new Mat(); cvtColor(destination, labImage, Imgproc.COLOR_BGR2GRAY); Imgcodecs.imwrite(ocrReadFrom, labImage); jLabel.setIcon(new ImageIcon(ocrReadFrom)); JOptionPane.showConfirmDialog(null, ""); } then I call this : public void toB_W(JLabel jLabelBlackAndWhiteImage) { // cella est l image de l ocr smouthedImage = opencv_imgcodecs.cvLoadImage(ocrReadFrom); blackAndWhiteImageOCR = opencv_core.IplImage.create(smouthedImage.width(), smouthedImage.height(), IPL_DEPTH_8U, 1); // la fonction qui va executé la transformation en noire et blan System.out.println("0"); //cvAdaptiveThreshold(smouthedImage, smouthedImage, 255, CV_ADAPTIVE_THRESH_GAUSSIAN_C, opencv_imgproc.CV_THRESH_MASK, 15, -2); opencv_imgproc.cvSmooth(smouthedImage, smouthedImage); System.out.println("1"); cvCvtColor(smouthedImage, blackAndWhiteImageOCR, CV_BGR2GRAY); System.out.println("2"); cvAdaptiveThreshold(blackAndWhiteImageOCR, blackAndWhiteImageOCR, 255, CV_ADAPTIVE_THRESH_GAUSSIAN_C, CV_THRESH_BINARY_INV, 17, -4); System.out.println("3"); opencv_imgproc.cvSmooth(blackAndWhiteImageOCR, blackAndWhiteImageOCR); // fin de la transformation cvSaveImage(ocrReadFrom, blackAndWhiteImageOCR); ...} Thanks
You want to fill the numbers, you could have considered performing binary threshold rather than adaptive threshold. I chose a threshold level of 40 to make the numbers distinct.
Parsing simple times in hh:mmaa format
DateFormat formatter = new SimpleDateFormat("hh:mmaa"); formatter.parse("01:20pm") I'm trying to parse times in the format of 01:20pm. If I run the above code, I get the following exception: java.text.ParseException: Unparseable date: "01:20pm" at java.text.DateFormat.parse(DateFormat.java:366) As far as the format I put in the SimpleDateFormat constructor, I don't see anything wrong. What went wrong here?
Your system locale must not recognize AM/PM. Use a Locale that does. Something like, DateFormat formatter = new SimpleDateFormat("hh:mmaa", Locale.US); Or, in Java 8+, use the new java.time API like LocalTime lt = LocalTime.parse("01:20pm", DateTimeFormatter.ofPattern("hh:mmaa", Locale.US));
Number and date parsing in Java uses the Locale to specify, well, locale-specific symbols. In this case, it is mostly the pm value that is being rejected. To confirm this, here is a piece of code to exercise all available locales in the VM. For locales that don't work, I was curious to see why, so instead of parsing a time, I format a valid time instead. Had to enable UTF-8 output, but it's interesting to see. The really interesting part is that all Spanish (es) locales, except the United States variant (es_US) works fine. Hmmm........ Set<String> good = new TreeSet<>(); Set<String> bad = new TreeSet<>(); for (Locale locale : Locale.getAvailableLocales()) { try { new SimpleDateFormat("hh:mmaa", locale).parse("01:20pm"); good.add(locale.toLanguageTag()); } catch (ParseException e) { bad.add(locale.toLanguageTag()); } } System.out.println("Good locales: " + good); Date time = new SimpleDateFormat("hh:mmaa", Locale.ENGLISH).parse("01:20pm"); System.out.println("Bad locales:"); for (String languageTag : bad) System.out.printf(" %-5s: %s%n", languageTag, new SimpleDateFormat("hh:mmaa", Locale.forLanguageTag(languageTag)).format(time)); OUTPUT Good locales: [be, be-BY, bg, bg-BG, ca, ca-ES, da, da-DK, de, de-AT, de-CH, de-DE, de-GR, de-LU, en, en-AU, en-CA, en-GB, en-IE, en-IN, en-MT, en-NZ, en-PH, en-SG, en-US, en-ZA, es, es-AR, es-BO, es-CL, es-CO, es-CR, es-CU, es-DO, es-EC, es-ES, es-GT, es-HN, es-MX, es-NI, es-PA, es-PE, es-PR, es-PY, es-SV, es-UY, es-VE, et, et-EE, fr, fr-BE, fr-CA, fr-CH, fr-FR, fr-LU, he, he-IL, hi, hr, hr-HR, id, id-ID, is, is-IS, it, it-CH, it-IT, lt, lt-LT, lv, lv-LV, mk, mk-MK, ms, ms-MY, nl, nl-BE, nl-NL, nn-NO, no, no-NO, pl, pl-PL, pt, pt-BR, pt-PT, ro, ro-RO, ru, ru-RU, sk, sk-SK, sl, sl-SI, sr, sr-BA, sr-CS, sr-Latn, sr-Latn-BA, sr-Latn-ME, sr-Latn-RS, sr-ME, sr-RS, tr, tr-TR, uk, uk-UA, und] Bad locales: ar : 01:20م ar-AE: 01:20م ar-BH: 01:20م ar-DZ: 01:20م ar-EG: 01:20م ar-IQ: 01:20م ar-JO: 01:20م ar-KW: 01:20م ar-LB: 01:20م ar-LY: 01:20م ar-MA: 01:20م ar-OM: 01:20م ar-QA: 01:20م ar-SA: 01:20م ar-SD: 01:20م ar-SY: 01:20م ar-TN: 01:20م ar-YE: 01:20م cs : 01:20odp. cs-CZ: 01:20odp. el : 01:20μμ el-CY: 01:20ΜΜ el-GR: 01:20μμ es-US: 01:20p.m. fi : 01:20ip. fi-FI: 01:20ip. ga : 01:20p.m. ga-IE: 01:20p.m. hi-IN: ०१:२०अपराह्न hu : 01:20DU hu-HU: 01:20DU ja : 01:20午後 ja-JP: 01:20午後 ja-JP-u-ca-japanese-x-lvariant-JP: 01:20午後 ko : 01:20오후 ko-KR: 01:20오후 mt : 01:20WN mt-MT: 01:20WN sq : 01:20MD sq-AL: 01:20MD sv : 01:20em sv-SE: 01:20em th : 01:20หลังเที่ยง th-TH: 01:20หลังเที่ยง th-TH-u-nu-thai-x-lvariant-TH: ๐๑:๒๐หลังเที่ยง vi : 01:20CH vi-VN: 01:20CH zh : 01:20下午 zh-CN: 01:20下午 zh-HK: 01:20下午 zh-SG: 01:20下午 zh-TW: 01:20下午