Typing special characters (using Alt Gr) in NetBeans / Java

Typing special characters (using Alt Gr) in NetBeans / Java - java

I'm using Serbian Latin keyboard on CentOS 6.1. When I press Alt Gr + N I get }. Everywhere, except in NetBeans.
Also, I'm unable to type any bracket []{} or \|. Did anyone come across solution to this?
Changing keyboard for every brace or other symbol is not an option.

The solution was to install Sun/Oracle Java, and reinstall NetBEans.

actually it is an X11/distro bug.
KDE or GTK apps use their own keyboard mechanism, so they don't show that problem;
but java (and thus, netbeans) use the X11 keyboard mechanism for input.
The problem is in how X11 handles your locale; if set properly it works; if not it doesn't.
X11 doesn't has any "default" rule; if your locale isn't know to X11, you have nothing.
Also, X11 locales support isn't much updated either.
X11, in order to allow proper altgr/compose rules has to load a proper "Compose" file.
It loads it (or not) depending on the locale: in a /usr/share/X11/locale/compose.dir file (your path may vary) there are lines like:
en_US.UTF-8/Compose en_US.UTF-8
en_US.UTF-8/Compose sr_CS.UTF-8
en_US.UTF-8/Compose: en_US.UTF-8
en_US.UTF-8/Compose: sr_CS.UTF-8
etc.
(yes, two lines per locale, with and without colon; one is used by old programs, other by new ones; but I don't remember which is which)
there must be a line for the locale you use (shown with "locale" command).
Note that if the system uses locales like "en_US.utf8" there must be an alias
(in the locales.alias file); something like:
sr_CS.utf8 sr_CS.UTF-8
...
sr_CS.utf8: sr_CS.UTF-8
(again, duplicate with and without colon)
To solve your problem, you can either set LC_ALL=en_US.UTF-8 before launching java programs; or edit (you need to be root, and do it at each X11 update) the compose.dir (and locale.dir and/or locale.alias) files, copy the en_US.UTF-8 lines and adapt to your locale.
You can also report to your distro so they patch those .dir/.alias files to work properly for all locales provided by the distro.

Related

finding out gracefully the version of makeindex

I wrote something like a make-tool for latex documentation (in fact a maven plugin https://github.com/Reissner/maven-latex-plugin).
As this invokes just a bunch of command line tools
and i frequently have problems when new versions of those tools come up, i decided to provide a way to check the versions (in fact a goal).
E.g. i let my tool invoke pdflatex -v to get version of pdflatex.
This works fine for all tools but makeindex,
because makeindex offers no regular way, to get the version and nothing but that.
To find out by hand i just type makeindex and then ctrl-d on my bash shell.
The result is then
This is makeindex, version 2.15 [TeX Live 2020] (kpathsea + Thai support).
Scanning input file stdin...done (0 entries accepted, 0 rejected).
Nothing written in stdout.
Transcript written in stderr.
The keystroke ctrl-d seems to be tied to unicode 0004.
So from the bash shell, this works just fine.
Is there a way to do that with a java execution?
I tried to invoke (java string)
"makeindex \u0004" and related.
No success,... makeindex just quits with an error.
To be honest, I dont know why because from the terminal all is fine.
The last problem to solve: the procedure shall be platform independent.
Now tried some hack providing a dummy file.
But of course this is the contrary of elegant.

Assuming you use texlive, you could use
tlmgr info --only-installed --data="localrev" makeindex
which returns something like
52851
This is the revision information from the makeindex package you have installed. This is not the version number, but should be enough to detect if the installed version changes.

Encoding issue on filename with Java 7 on OSX with jnlp/webstart

I have this problem that has been dropped on me, and have been a couple of days of unsuccessful searches and workaround attempts.
I have now an internal java swing program distributed by jnlp/webstart, on osx and windows computers, that, among other things, downloads some files from WebDav.
Recently, on a test machine with OSX 10.8 and Java 7, filenames and directory names with accented characters started having those replaced by question marks.
No problem on OSX with versions of Java before 7.
example :
XXXYYY_è_ABCD/
becomes
XXXYYY_?_ABCD/
using java.text.Normalizer (NFD, NFC, NFKD, NFKC) on the original string, the result is different but still wrong :
XXXYYY_e?_ABCD/
or
XXXYYY_e_ABCD/
I know, from correspondence between [andrew.brygin at oracle.com] and [mik3hall at gmail.com] that
Yes, file.encoding is set based on the locale that the jvm is running
on, and if you run your java vm in xxxx.UTF-8 locale, the
file.encoding should be UTF-8, set to MacRoman will be problematic.
So I believe Oracle/OpenJDK7 behaves correctly. That said, as Andrew
Thompson pointed out, if all previous Apple JDK releases use MacRoman
as the file.encoding for english/UTF-8 locale, there is a
"compatibility" concern here, it might worth putting something in the
release note to give Oracle/OpenJDK MacOS user a heads up.
original mail
from Joni Salonen blog (java-and-file-names-with-invalid-characters) i know that :
You probably know that Java uses a “default character encoding” to
convert binary data to Strings. To read or write text using another
encoding you can use an InputStreamReader or OutputStreamWriter. But
for data-to-text conversions deep in the API you have no choice but to
change the default encoding.
and
What about file.encoding?
The file.encoding system property can also be used to set the default
character encoding that Java uses for I/O. Unfortunately it seems to
have no effect on how file names are decoded into Strings.
executing locale from inside the jnlp invariabily prints
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
the most similar problem on stackoverflow with a solution is this :
encoding-issues-on-java-7-file-names-in-os-x
but the solution is wrapping the execution of the java program in a script with
#!/bin/bash
export LC_CTYPE="UTF-8" # Try other options if this doesn't work
exec java your.program.Here
but I don't think this option is available to me because of the webstart, and I haven't found any way to set the LC_CTYPE environment variable from within the program.
Any solutions or workarounds?
P.S. :
If we run the program directly from shell, it writes the file/directory correctly even on OSX 10+Java 7.
The problem appears only with the combination of JNLP+OSX+Java7

I take it it's acceptable to have maximal ASCII representation of the file name, which works in virtually any encoding.
First, you want to use specifically NFKD, so that maximum information is retained in the ASCII form. For example, "2⁵" becomes "25"rather than just
"2", "ﬁ" becomes "fi" rather than "" etc once the non-ascii and non-control characters are filtered out.
String str = "XXXYYY_è_ABCD/";
str = Normalizer.normalize(str, Normalizer.Form.NFKD);
str = str.replaceAll( "[^\\x20-\\x7E]", "");
//The file name will be XXXYYY_e_ABCD no matter what system encoding
You would then always pass filenames through this filter to get their filesystem name. You only lose is some uniqueness, I.E file asdé.txt is the same
as asde.txt and in this system they cannot be differentiated.

EDIT: After experimenting with OS X some more I realized my answer was totally wrong, so I'm redoing it.
If your JVM supports -Dfile.encoding=UTF-8 on the JVM command line, that might fix the issue. I believe that is a standard property but I'm not certain about that.
HFS Plus, like other POSIX-compliant file systems, stores filenames as bytes. But unlike Linux's ext3 filesystem, it forces filenames to be valid decomposed UTF-8. This can be seen here with the Python interpreter on my OS X system, starting in an empty directory.
$ python
Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
>>> import os
>>> os.mkdir('\xc3\xa8')
>>> os.mkdir('e\xcc\x80')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 17] File exists: 'e\xcc\x80'
>>> os.mkdir('\x8f')
>>> os.listdir('.')
['%8F', 'e\xcc\x80']
>>> ^D
$ ls
%8F è
This proves that the directory name on your filesystem cannot be Mac-Roman encoded (i.e. with byte value 8F where the è is seen), as long as it's an HFS Plus filesystem. But of course, the JVM is not assured of an HFS Plus filesystem, and SMB and NFS do not have the same encoding guarantees, so the JVM should not assume this scheme.
Therefore, you have to convince the JVM to interpret file and directory names with UTF-8 encoding, in order to read the names as java.lang.String objects correctly.

Shot in the dark: File Encoding does not influence the way how the file names are created, just how the content gets written into the file - check this guy here: http://jonisalonen.com/2012/java-and-file-names-with-invalid-characters/
Here is a short entry from Apple: http://developer.apple.com/library/mac/#qa/qa1173/_index.html
Comparing this to http://docs.oracle.com/javase/tutorial/i18n/text/normalizerapi.html I would assume you want to use
normalized_string = Normalizer.normalize(target_chars, Normalizer.Form.NFD);
to normalize the file names before you pass them to the File constructor. Does this help?

I don't think there is a real solution to this problem, right now.
Meantime I came to the conclusion that the "C" environment variables printed from inside the program are from the Java Web Start sandbox, and (by design, apparently) you can't influence those using the jnlp.
The accepted (as accepted by the company) workaround/compromise was of launching the jnlp using javaws from a bash script.
Apparently, launching the jnlp from browser or from finder creates a new sandbox environment with the LANG not setted (so is setted to "C" that is equal to ASCII).
Launching the jnlp from command line instead prints the right LANG from the system default, inheriting it from the shell.
This permits to at least preserve the autoupdating feature of the jnlp and dependencies.
Anyway, we sent a bug report to Oracle, but personally I'm not hoping it to be resolved anytime soon, if ever.

It's a bug in the old-skool java File api, maybe just on a mac? Anyway, the new java.nio api works much better. I have several files containing unicode characters and content that failed to load using java.io.File and related classes. After converting all my code to use java.nio.Path EVERYTHING started working. And I replaced org.apache.commons.io.FileUtils (which has the same problem) with java.nio.Files...
...and be sure to read and write the content of file using an appropriate charset, for example:
Files.readAllLines(myPath, StandardCharsets.UTF_8)

JTable won't display Unicode correctly when the application is executed from the command line or a jar file. It works fine in eclipse, though

I'm writing an application that reads a text file containing a list of vocabulary words in both English and Chinese. These are then displayed in a JTable. When I run or debug the app in Eclipse, everything displays fine. I can see and read the characters and the English. However, when I execute the app from the command line or from an executable jar, it's all wrong. The characters show up as either squares or as gibberish.
I also have a text box that when I type Chinese into it, it displays correctly.
My first thought was that it was a font problem. I was using a font installed on my system. Since I can't guarantee that the person using this app will have that font, I moved it to a resource folder and load the font from a file. The font appears as though it's been loaded so I'm convinced it's not a font issue.
I found another question that suggested using -Dfile.encoding=utf-8. I've tried this and it did not work.
Would the brilliant folks at Stack Overflow have any advice on how to make this work?

I'm writing this on a non-chinese version of Windows.
Well then you won't ever be able to get a Java program to produce Chinese command-line output.
Java, like almost all languages, uses the C standard library which has byte-based IO. The Windows command prompt interprets byte-based IO using the locale-specific default code page. That's never a UTF, so Unicode characters outside of the current locale's default code page just won't work.
(In theory you should be able to get it to work by changing your console fonts and using chcp 65001 (UTF-8) together with -Dfile.encoding=UTF-8, but in practice it doesn't work reliably due to bugs in the C runtime. Unicode on the command prompt is a long-standing sore point.)

English characters don't show up when entering text with Indic input method in Swing

I'm working on an application that accepts text in English and performs transliteration with a custom 3rd party API into an Indic language (one of several that are supported). The application is targeted at Windows XP/7 and Ubuntu.
We use a custom input method that loads the required Indic font, and uses it render text. Also, the user can correct the transliterated text by typing in English and pressing space (similar to how Google Transliterate works).
The problem is that with certain Indic fonts, typing in English shows up empty box characters (even though the actual characters typed are detected and transliterated accordingly).
I have used the ttf-indic-fonts-0.5.0 font pack that comes with Ubuntu, and was able to substitute some of them. For the others, if I copy the corresponding font from Windows (I'm developing this on Windows 7) then all problems are solved.
However, we cannot redistribute Windows fonts with our application and want an open alternative.
Other than trying to find a substitute font, is there anything else that we're doing wrong, or need to check, to make sure that English characters can be typed in a JTextField when an Indic font is being used?
As an example - from the Ubuntu package mentioned above, the lohit_pa.ttf font for Punjabi has this problem. If I copy the default Windows font for Punjabi (raavi.ttf) then it works fine.

I found a solution to the problem. I installed the fonts on Windows (open the font, click install) and browsed the contents using Character Map. These fonts only have glyphs for the language in question, not English.
I used FontForge to merge the 2 fonts. (Fontforge's author doesn't provide binaries anymore - but if you have access to Ubuntu you can install it from the repository. No such luck if you only have Windows).
After this, I'm able to type in English again.

hebrew appears as question marks in netbeans

I am using netbeans 6.1 on 2 computers.
on one of them the program:
public static void main(String argv[])
{
System.out.println("שלום");
}
prints normally, and the on the other question marks.
what can be the difference between the 2 environments?
edit:
on both computers
Control Panel \ Regional and Language Options \ Advanced
is set to hebrew
edit:
Thank you Michael Burr,
but the value of the encoding is already UTF-8.
Maybe this something with the JVM?
edit:
I have installed Eclipse and the problem occurs there as well.
I also tried reading the hebrew from a file with the same result.
edit:
System.getProperty("file.encoding");
returns "Cp1252"
I tried
System.setProperty("file.encoding","UTF-8")
but the question marks remains.
Thanks,
Ido

Make sure that NetBeans is set up with an encoding that supports Hebrew characters. From the NetBeans Wiki:
To change the language encoding for a project:
Right-click a project node in the Projects windows and choose Properties.
Under Sources, select an encoding value from the Encoding drop-down field.

You can't set the "file.encoding" property with System.setProperty(); it has to be set on the command line when the JVM is started with -Dfile.encoding=UTF-8. The value of this property is read during JVM initialization and cached. By the time your main method is invoked, the value has been cached and changes to the property are ignored.

Is Hebrew installed by default? Could be that a language pack isn't installed?
Control Panel > Regional and Language Options > Languages. Select the 'Install files for complex script and right-to-left languages (including Thai)' option. This will install support for Hebrew. You'll probably need an OS disc.

How exactly are you running the program? Where does it print its output? It could be as simple as netbeans or the console using different fonts, one of which does not include Hebrew characters.
To eliminate encoding problems during compilation, try replacing the Hebrew characters with their unicode escape sequences and see if the result is different.

I think I misunderstood your problem (I thought that the characters were not being displayed in the NetBeans editor properly). The exact steps to solve your problem might depend on the version of the OS you're running on. Win2K, WInXP, and Vista all have slightly different dialogs and wording unfortuantely.
Take a look at this help page for the JVM:
http://java.com/en/download/help/locale.xml
It sounds like you've already configured the system like it should be, but the devil is in the details - there are several different 'locale' settings on a system that might affect this (and for all I know the JVM might throw in one or two on its own).
http://www.siao2.com/2005/02/01/364707.aspx

Usually it's the default encoding on:
Control Panel \ Regional and Language Options \ Advanced
(Select Hebrew on the combo)
You'll have to restart after changing this setting.

What helped me is this (on Win7):

No one answer from above doesn't work.
I spent about an hour, but had figured out, that the problem is located not in the String encoding, but in default encoding, which is used by IDE from the start-up.
So, to get Hebrew, Arabic, Russian etc symbols in Netbeans console output you need to modify netbeans.conf.
Search for the key netbeans_default_options and add -J-Dfile.encoding=UTF-8 into the quotes.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.