String.toUpperCase may strip accents or not

String.toUpperCase may strip accents or not - java

I have to apply toUpperCase on a name that may contain accents ("é", "à", etc.).
Problem:
with JUnit, "é".toUpperCase converts to "E", the accent is removed
in my application (a Spring REST API), "é".toUpperCase converts to "É". The input comes from an Ember frontend, but the encoding is the same (UTF-8)
JUnit tests and Spring application use the same characters set (UTF-8) and the locale is French. Both running on Oracle Java 8, on the same machine (Jenkins CI on Debian, but I can reproduce this behavior on my computer: Windows 7).
I tried to specify the locale toUpperCase(Locale.FRANCE), but it doesn't solve my problem.
Are you aware of something that may explain this difference?

As in the conversation with #JonathanLermitage this is not a Java problem but is related to the embedded database (h2) used in the unit tests that is not correctly configured.
I'm using Java 8, no particular configuration.
#Test
public void test()
{
String a = "àòùìèé";
String b = a.toUpperCase();
System.out.println(b);
System.out.println(Locale.getDefault());
assertEquals(b,"ÀÒÙÌÈÉ");
}
Returns
ÀÒÙÌÈÉ
en_US

I had the same problem once and it was fixed for me by setting the default Locale:
Locale.setDefault(new Locale("fr_FR"));

Related

Logging files names that contain Norwegian letters in the file name in Unix OS using a Jar executable

I have a simple java program that when run is supposed to traverse through the whole directory on a Unix server and log all files on the fileserver that contain Norwegian letters "å,ø,æ".
This is how it looks on the fileserver using winSCP:
In the end the logs.log file should look like this:
2022-10-25 14:27:02 INFO Logger:99 - File: 'DN_Oppmålings.pdf'
2022-10-25 14:27:02 INFO Logger:99 - File: 'Salg_av_gærden.pdf'
However, this is how it ends up in the log file, all Norwegian letters are represented with a square.
I can't seem to figure out why it happens. It probably has something to do with the encodings. Because when I run it on windows locally, everything runs as expected and I get the result I need. But when I build the project as an executable jar and run on the server it gets wrong.
Here is the code I am using.
public static void renameFiles3(File[] files) throws IOException {
for (File filename : files) {
if (filename.isDirectory()) {
renameFiles3(filename.listFiles());
} else {
String fileNameString = filename.getName();
if (fileNameString.contains("å") || fileNameString.contains("ø") || fileNameString.contains("æ")){
logger.info("File: '" + filename.getName());
}
}
}
}
public static void main(String[] args) {
File[] files = new File(path).listFiles();
try {
renamer.renameFiles3(files);
} catch catch(IOException ex){
logger.error(ex.toString());
}
}
Someone pointed out that the encoding should be specified, but I am not sure how that is done. If I run "locale" command on the Unix server this is what I get as output.
[e1111111#ilt repository]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
I use Putty to run the jar file. Here are the configs.
Stacktrace of the error I get when running the jar:
java.nio.file.NoSuchFileException: ./documentRepository/DN_Oppm�lings.pdf
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:430)
at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
at java.base/java.nio.file.Files.move(Files.java:1422)
at com.example.fixfilenamesonfileserver.Renamer.renameFiles2(Renamer.java:105)
at com.example.fixfilenamesonfileserver.Renamer.renameFiles2(Renamer.java:89)
at com.example.fixfilenamesonfileserver.Renamer.renameFiles2(Renamer.java:89)
at com.example.fixfilenamesonfileserver.Renamer.renameFiles2(Renamer.java:89)
at com.example.fixfilenamesonfileserver.Renamer.renameFiles2(Renamer.java:89)
at com.example.fixfilenamesonfileserver.Renamer.renameFiles2(Renamer.java:89)
at com.example.fixfilenamesonfileserver.Renamer.renameFiles2(Renamer.java:89)
at com.example.fixfilenamesonfileserver.Renamer.main(Renamer.java:154)
What makes it even more strange, is that I can create for instance a folder with mkdir containing Norwegian letters in the name and it would be displayed correctly and also logged correctly if I create a file with Norwegian letters.

Some time ago I wrote an answer for a very similar problem.
As stated in the aforementioned solution, the problem could be related with the use of different charsets in your local Windows laptop (probably, cp-1252 or some variant) and your server.
As suggested, please, consider review the charset which is in place in the JVM in every environment, and review and adapt if necessary the value of the file.encoding system property on your laptop and the server environment, maybe it will help you solve the problem.
Probably running your jar with a proper value for the file.encoding JVM property may do the application work properly:
java -Dfile.encoding=UTF-8 -jar your_app.jar

I suspect there is no problem with your Java nor your file.
The problem is likely with the app you use to view that text. That app is using a font that lacks a glyph for those characters.
Edit your Question to note the app and OS if you want further assistance.

Assuming you are printing the letters to a terminal, the problem is most likely the terminal you use. If you are printing the characters to a terminal, make sure it is set to chcp65001, and a font that supports displaying norwegian letters fully. I have encountered similar problems while trying to display multilingual text due to the shortage of support for multiple languages in the same font.
So, to summarize, first set the terminal code-page encoding to chcp 65001, and then change the font of the terminal to a font that supports norwegian letter fully, and then run the jar file from the terminal like : java -jar <jarname>.jar

How to fix Fortify Locale Dependent Comparison issue

In my springboot application when I can fortify to check vulnerabilities, I got some issues related to Locale Dependent Comparison . I have number of files where fortify shows to fix Locale Dependent Comparison issue.
I have 3 options to change:
I have to go and change all those files by using Locale.US like below.
from
switch (strngvariable.toLowerCase())
to
switch (strngvariable.toLowerCase(Locale.US))
During the spring Boot Initialization I can set the default locale as
public static void main(String[] args)
{
Locale.setDefault(new Locale("en","US"));
SpringApplication.run(OWDNApplication.class, args);
}
In the first REST call, set the default locale as
#GetMapping("/applogin")
public ResponseEntity getSignonDetails(#RequestParam("uname") String uname)
{
Locale.setDefault(new Locale("en","US"));
}
I know the first thing will work. I want to know will the 2 and 3 options work ? which would be the best choice.
I can try on my own and figure it out if by running Fortify scan on my eclipse. Unfortunately the fortify plugin is not showing me all the issues on my local.
I have to deploy onto server and run the software on server. this would be time consuming to test the changes.

Java NumberFormat seams to be stuck on / permanently overriden to english format

I am trying to parse a german number from a string. Here is a minimal example:
String stringValue = "1,00";
double value = NumberFormat.getInstance(Locale.GERMANY).parse(stringValue).doubleValue();
// value = 100.0 (locale machine from cmd, server jdk)
// value = 1.0 (locale machine from cmd, locale jdk)
Somehow the format is permanently stuck at an englisch formater. I debugged the line and had a look at the symbols.decimalSeperator and it is still a dot.
I must admit, that I can only reproduce this behaviour if I run the application with a JDK from a server in our company. Running it with my locale JDK everything works fine and as expected. So maybe the locale Locale on the server is englisch, but how can this override my hardcode Locale.GERMANY?
We are using Java 8.

API throws java.io.UnsupportedEncodingException

I am developing a Java program in eclipse using a proprietary API and it throws the following exception at run-time:
java.io.UnsupportedEncodingException:
at java.lang.StringCoding.encode(StringCoding.java:287)
at java.lang.String.getBytes(String.java:954)...
my code:
private static String SERVER = "localhost";
private static int PORT = 80;
private static String DFT="";
private static String USER = "xx";
private static String pwd = "xx";
public static void main(String[] args) {
LLValue entInfo = new LLValue();
LLSession session = new LLSession(SERVER, PORT, DFT, USER, pwd);
try {
LAPI_DOCUMENTS doc = new LAPI_DOCUMENTS(session);
doc.AccessPersonalWS(entInfo);
} catch (Exception e) {
e.printStackTrace();
}
}
The session appears to open with no errors, but the encoding exception is thrown at doc.AccessEnterpriseWS(entInfo)
Through researching this error I have tried using the -encoding option of the compiler, changing the encoding of my editor, etc.
My questions are:
how can I find out the encoding of the .class files I am trying to use?
should I be matching the encoding of my new program to the encoding of the API?
If java is machine independent why isn't there standard encoding?
I have read this stack trace and this guide already --
Any suggestions will be appreciated!
Cheers

Run it in your debugger with a breakpoint on String.getBytes() or StringCoding.encode(). Both are classes in the JDK so you have access to them and should be able to see what the third party is passing in.
The character encoding is used to specify how to interpret the raw binary. The default encoding on English Windows systems in CP1252. Other languages and systems may use different a different default encoding. As a quick test, you might try specifying UTF-8 to see if the problem magically disappears.
As noted in this question, the JVM uses the default encoding of the OS, although you can override this default.
Without knowing more about the third party API you are trying to use, it's hard to say what encoding they might be using. Unfortunately from looking at the implementation of StringCoding.encode() it appears there are a couple different ways you could get an UnsupportedEncodingException. Stepping through with a debugger should help narrow things down.

It looks to me as if something in the proprietary API is calling String.getBytes with an empty string for the character set.
I compiled the following class
public class Test2 {
public static void main(String[] args) throws Exception {
"test".getBytes("");
}
}
and when I ran it, I got the following stacktrace:
Exception in thread "main" java.io.UnsupportedEncodingException:
at java.lang.StringCoding.encode(StringCoding.java:286)
at java.lang.String.getBytes(String.java:954)
at Test2.main(Test2.java:3)
I would be surprised if this is anything to do with the encoding in which the class files are written. It looks to me like this is a problem with code, not a problem you can fix by changing file encodings or compiler/JVM switches.
I don't know anything about what this proprietary API is supposed to do or how it works. Perhaps it is expecting to be run inside a Java EE or web application container? Perhaps it has a bug? Perhaps it needs more configuration before it can run without throwing exceptions? Given that it's proprietary, can you get any support from the vendor?

Is it possible for e JUnit test to tell if it's running in Eclipse (rather than ant)

I have a test that compares a large blob of expected XML with the actual XML received. If the XML is significantly different, the actual XML is written to disk for analysis and the test fails.
I would prefer to use assertEquals so that I can compare the XML more easily in Eclipse - but this could lead to very large JUnit and CruiseControl logs.
Is there a way I can change a JUnit test behaviour depending on whether it's running through Eclipse or through Ant.

Here are 2 solutions.
Use system properties
boolean isEclipse() {
return System.getProperty("java.class.path").contains("eclipse");
}
Use stacktrace
boolean isEclipse() {
Throwable t = new Throwable();
StackTraceElement[] trace = t.getStackTrace();
return trace[trace.length - 1].getClassName().startsWith("org.eclipse");
}

Yes - you can test if certain osgi properties are set (System.getProperty("osgi.instance.area") for instance). They will be empty if junit is started through ant outside of eclipse.

Maybe the "java.class.path" approach can be weak if you include some eclipse jar in the path.
An alternative approch could be to test "sun.java.command" instead:
On my machine (openjdk-8):
sun.java.command org.eclipse.jdt.internal.junit.runner.RemoteTestRunner ...
A possible test:
boolean isEclipse() {
return System.getProperty("sun.java.command")
.startsWith("org.eclipse.jdt.internal.junit.runner.RemoteTestRunner");
}

Usually, the system proeprties are different in different environments. Try to look for a system property which is only set by eclipse or ant.
BTW: The output in eclipse is the same, its just that the console for eclipse renders the output in a more readable form.
Personally, I wouldn't worry about the size of the logs. Generally you don't need to keep them very long and disk space is cheap.

With Java 1.6+, it looks like the result of System.console() makes a difference between running for Eclipse or from a terminal:
boolean isRealTerminal()
{
// Java 1.6+
return System.console() != null;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

String.toUpperCase may strip accents or not - java

I had the same problem once and it was fixed for me by setting the default Locale: Locale.setDefault(new Locale("fr_FR"));

Related

Logging files names that contain Norwegian letters in the file name in Unix OS using a Jar executable

How to fix Fortify Locale Dependent Comparison issue

Java NumberFormat seams to be stuck on / permanently overriden to english format

API throws java.io.UnsupportedEncodingException

Is it possible for e JUnit test to tell if it's running in Eclipse (rather than ant)

Categories

Resources