Have the NFC Normalization semantics changed between Java 6 and 7?

Have the NFC Normalization semantics changed between Java 6 and 7? - java

The unicode character U+FA8E CJK COMPATIBILITY IDEOGRAPH-FA8E is a compatibility character mapped to U+641C [CJK Unified Ideographs]. In Java 6 NFC normalization leaves it U+FA8E, while in Java 7 it does decompose it to U+641C?
When running this small snippet:
String fancyChar = "\uFA8E";
String normalized = Normalizer.normalize(fancyChar, Normalizer.Form.NFC);
System.out.printf("%04x == %04x\n", (int)(fancyChar.charAt(0)), (int)(normalized.charAt(0)));
System.out.println(fancyChar.equals(normalized));
In Java 6 (latest versions of both Sun/Oracle and OpenJDK):
fa8e == fa8e
true
In Java 7 (latest versions of both Sun/Oracle and OpenJDK):
fa8e == 641c
false
So my question is, why has this changed?
Reading the UNICODE NORMALIZATION FORMS it seems NFC should not decompose characters with compatibility mapping?
But the fact that both Oracle and OpenJDK have switched this for Java 7 makes me wonder.

The character U+FA8E has canonical mapping to U+641C. The authoritative reference on this is the UnicodeData.txt file in the Unicode Character Database. Thus, the correct NFC form of U+FA8E is U+641C.
So this is apparently a bug fix. It seems to affect other characters in the same group, too.

Related

Idiomatic way to remove country code from currency format?

Somewhere between Java 11 and 17 currency formatting changed to where this:
NumberFormat.getCurrencyInstance(Locale.CANADA_FRENCH).format(100.00)
would print 100,00 $ CA instead of 100,00 $.
Is there a better way than this to remove the country code CA?
var currencyFormat = NumberFormat.getCurrencyInstance(Locale.CANADA_FRENCH);
if (currencyFormat instanceof DecimalFormat decimalFormat) {
var symbols = DecimalFormatSymbols.getInstance(Locale.CANADA_FRENCH);
symbols.setCurrencySymbol("$");
decimalFormat.setDecimalFormatSymbols(symbols);
}
Seems a bit much just to get back something that was the default behavior up until recently.

I dug a bit into this, the JDK locale data comes from Unicode CLDR by default, and it seems they reverted from $ CA to $ back in August, see CLDR-14862 and this commit (expand common/main/fr_CA.xml and then go to lines 5914/5923).
This was part of v40, released in October, so too late for JDK 17 whose doc says it uses CLDR v35.1
(which was introduced in Java 13)
but it seems it was updated to v39 in April 2021 and
they forgot the release note
(JDK 16 appears to have been upgraded to v38 already).
CLDR v40 is planned for JDK 19.
You may want to run your application using the COMPAT locales first, with
-Djava.locale.providers=COMPAT,CLDR,SPI
(found here but see also LocaleServiceProvider)
This will use the locales compatible with Java 8, where this issue is not present.

Check if a Java version is greater than a certain iteration in Java?

I wish to check if a user's Java version is at least 1.8.0_171. I mean that specific iteration or higher, meaning 1.8.0_151, for instance, would not work.
I planned to originally use org.apache.commons.lang3.SystemUtils' isJavaVersionAtLeast(JavaVersion requiredVersion) method, but it seems that you cannot specify the iteration number.
Based on this and Java's changing way of representing version numbers in Java (e.g. 1.8 then 9), what is the best way to check the Java version of the user in the Java program?
Edit:
This was marked as a duplicate of this question; however, I think it is different in that it asks how to compare the java version with a certain version given the changes in format of how the java version is shown.

Even with the versioning change, I think the solution is still as simple as using the following boolean expression:
"1.8.0_171".compareTo(System.getProperty("java.version")) <= 0
If the user's java.version property is any less than 1.8.0_171, then the above expression returns false, and vice versa. This works for using "9" or "10" in place of the java.version property as well.

Which keywords are reserved in JavaScript but not in Java?

Which keywords are reserved in JavaScript but not in Java?
One example is debugger, but there are more.
By reserved I mean reserved words as well as future reserved words (in both strict and non-strict mode) and special tokens like null, true and false.
I'm interested in ECMAScript 5.1 as well as current 6 vs. Java 5-8 (not sure if there were new keywords since Java 5).
Update
For those who's interested in reasons to know this.
I know many Java developers switching from Java to JavaScript (my story). Knowing delta in keywords is helpful.
Language history.
My very specific reason for asking: I'm building code Java/JavaScript code generation tools (quasi cross-langiuage). Which reserved keywords should I add to Java code generator so that it produces JavaScript-compatible identifiers in cross-language case?

This is what I've found out so far.
There were seems to be no new keywords in Java since 5.0 (which added enum).
Java vs. ECMAScript 5.1:
debugger
delete
function
in
typeof
var
with
export
let
yield
Java vs. ECMAScript 6 Rev 36 Release Candidate 3:
all of above
await

Internal character encoding of Java 7

So far as I know, when JRE executes an Java application,
the string will be seen as a USC2 byte array internally.
In wikipedia, the following content can be found.
Java originally used UCS-2, and added UTF-16 supplementary character support in J2SE 5.0.
With the new release version of Java (Java 7) ,
what is its internal character-encoding?
Is there any possibility that Java start to use UCS-4 internally ?

Java 7 still uses UTF-16 internally (Read the last section of the Charset Javadoc), and it's very unlikely that will change to UCS-4. I'll give you two reasons for that:
Changing from UCS-2=>UCS-4 would most likely meant that they would have to change the char primitive from a 16 bits type to a 32 bits type. Looking in the past at how high Sun/Oracle have valued backwards compatibility, a change like this is very unlikely.
A UCS-4 takes a lot more memory than a UTF-16 encoded String for most use cases.

Q: So far as I know, when JRE executes an Java application, the string
will be seen as a (16-bit Unicode) byte array
A: Yes
Q: With the new release version of Java (Java 7) , what is its
internal charater-encoding?
A: Same
Q: Is there any possibility that Java start to use UCS-4 internally?
A: I haven't heard anything of the kind
However, you can use "code-points" to implement UTF-32 characters in Java 5 and higher:
http://www.ibm.com/developerworks/java/library/j-unicode/
http://jcp.org/en/jsr/detail?id=204

OSGI Valid Version Ranges

Currently I'm trying to implement OSGI Version Ranges (for a different topic, but I like the way they define version ranges). However I'm having a hard time to find the specific Definition of a Version Range in OSGI.
Unfortunately, the OSGI API does contain a Version class but not a VersionRange class. It seems like all OSGI Containers come up with their own interpretation of the (somewhat unfindable) Version Range definition.
Therefore I have several questions:
If I used versionRange=1.4.0, would this map to Version >= 1.4.0?
Is this a valid version range: versionRange=[1.0.0,0]: I'd say yes (version 1.0.0 up to any version), Eclipse implementation accepts it as a version but does not handle it correctly.
Would this be a valid versionRange aswell: versionRange=[1.0.0,0)?
Where is the actual source of truth for all those questions? I seem to be unable to find it.

So, to answer your concrete questions in order:
If I used versionRange=1.4.0, would this map to Version >= 1.4.0?
Yes. This is exactly the way spec says it should be interpreted (see below).
Is this a valid version range: versionRange=[1.0.0,0]
Yes, it is a valid range, but it will not evaluate to what you seem to be expecting.
It effectively evaluates to an empty set of versions, so no version will match this expression.
Would this be a valid versionRange aswell: versionRange=[1.0.0,0)?
Same as above -- it is a valid version range, but it will evaluate to an empty set.
Where is the actual source of truth for all those questions? I seem to be unable to find it
The specs are available on OSGi Alliance's home page from:
http://www.osgi.org/Release4/Download (for R4 specs)
Below is an excerpt from the OSGi R4 core specification that covers the version ranges:
Version Ranges
A version range describes a range of versions using a mathematical interval notation. See [31] Mathematical Convention for Interval Notation.
The syntax of a version range is:
version-range ::= interval | atleast
interval ::= ( '[' | '(' ) floor ',' ceiling ( ']' | ')' )
atleast ::= version
floor ::= version
ceiling ::= version
If a version range is specified as a single version, it must be interpreted as the range [version,∞). The default for a non-specified version range is 0, which maps to [0.0.0,∞).
Note that the use of a comma in the version range requires it to be enclosed in double quotes. For example:
Import-Package: com.acme.foo;version="[1.23, 2)",
com.acme.bar;version="[4.0, 5.0)"
In the following table, for each specified range in the left-hand column, a version x is considered to be a member of the range if the predicate in the right-hand column is true.
[1.2.3, 4.5.6) | 1.2.3 <= x < 4.5.6
[1.2.3, 4.5.6] | 1.2.3 <= x <= 4.5.6
(1.2.3, 4.5.6) | 1.2.3 < x < 4.5.6
(1.2.3, 4.5.6] | 1.2.3 < x <= 4.5.6
1.2.3 | 1.2.3 <= x

Version ranges are precisely defined in section 3.2.6 of the OSGi Core Specification. You're correct that there is no VersionRange class in the current API, though there will be in the next specification release.
OSGi framework implementations do not come up with their own interpretation of ranges; if you find a case where a specific framework interprets a range differently from section 3.2.6 of the Core Spec then please raise a bug against that framework.
To address your specific questions:
Yes, version=1.4.0 on an Import-Package (or bundle-version=1.4.0 on a Require-Bundle) does map informally to "version >= 1.4.0".
I believe that both of these version ranges are valid, BUT they will never match any version. E.g. first example will match only version x where x >= 1.0.0 and x<=0. There is no value of x that can satisfy both of these requirements. So it sounds like Eclipse is behaving correctly... it should successfully parse the range string but never return any results.
As already mentioned, the "source for truth" is section 3.2.6 of the OSGi Core Specification.... page 29 if you are reading the R4.3 version of the document.

1) versionRange=1.4.0 is equivalent to [1.4.0, infinity)
2) I'd say it isn't valid, since the floor should be lower than the ceiling.
3) The next OSGi spec will define a VersionRange class, I believe.

See RFC 175 in http://www.osgi.org/Download/File?url=/download/osgi-early-draft-2011-09.pdf. It defines an update to the version definition and also introduces a VersionRange class.
Version ranges can be empty such as your example in the second bullet. An empty version range includes no versions.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Have the NFC Normalization semantics changed between Java 6 and 7? - java

The character U+FA8E has canonical mapping to U+641C. The authoritative reference on this is the UnicodeData.txt file in the Unicode Character Database. Thus, the correct NFC form of U+FA8E is U+641C. So this is apparently a bug fix. It seems to affect other characters in the same group, too.

Related

Idiomatic way to remove country code from currency format?

Check if a Java version is greater than a certain iteration in Java?

Which keywords are reserved in JavaScript but not in Java?

Internal character encoding of Java 7

OSGI Valid Version Ranges

Categories

Resources