extracting unicode text from mysql to java - java

I have a table 'test' in my database with a column 'msg' that stores strings in various languages like English, Hindi, Telugu, etc.
These strings are displayed properly in the database.
But if i extract it into my java code and print it on the console using system.out.println , it shows ????????? instead of the actual string.
Is this because the font used in eclipse console does not support these scripts? If so, how can i change the font to something else?

You cannot print the UTF-8 characters on console with its default settings. Make sure the console you use to display the output is also encoded in UTF-8.
In Eclipse for example, you need to go to Run Configuration > Common
-> Encoding -> Select UTF-8 from dropdown.

Try adding characterEncoding=UTF-8 parameter to the end of your JDBC connection URL. Even set the table and column character set to UTF-8. This article explains how to do that.
Also change the Eclipse console Output encoding using
Run Configuration > Common -> Encoding

Related

Select japanese character from sqlite database

I created a database from the Edict files with java and i used for that SQLite.
SQLite by default encode the string in UTF-8
Here is a sample of the database: sample
If i do
Select* FROM entry
In Java i get the japanese words in their "correct" form (graphical representation at least).
But if i try and do.
Select * FROM entry WHERE wordJP LIKE '食べる'"
I obviously get nothing. That makes it very hard to find the definition of a word.
Can someone explain why this is occuring, and how to solve it ?
I kind of understand that it is a problem of encoding but i don't understand where it happens and why.
So i managed to solve this:
Using iconv from linux to encode the file from EUC-JP to UTF-8
Setting SQLITE to UTF-8
Java is supposed to be natively in UTF-8, but eclipse put it by default on some ISO-xxx codage, so you need to change that by right-clicking on your project > properties > text file encoding > other (scroll the list)
From your link,
[EDICT] is a plain text document in EUC-JP coding.
If you query strings are encoded in UTF-8, matching will fail.
You should probably try to convert the database into UTF-8 when you fill-in your sqlite database.

I am trying to read in .properties files with Chinese characters encoded in utf-8

I am trying to read .properties files with having Chinese characters. when I read the them using keys they are printing like ??????. I am writing JSF application. Where I need to do translation for this application in chinese. On UI in JSF it is showing all characters correctly as it should be. But in my java code it is showing like ?????. I donot why it is. I also tried "手提電話" and also tried "\u88dc\u7fd2\u500b\u6848" and tried to print them on console with main function, it is printing correctly in chinese lang chars. my properties file having encoding utf-8.
To clarify your confusion about it, I am able to display them console in chinese like
String str="Алексей";
String str="\u88dc\u7fd2\u500b\u6848";
System.out.println("direct output: "+str);
working fine in psvm. but after reading using properties file is shows ???.e
Hope it is clear now.
Also My database is receiving the ?? in place for actual chinese charactersets.
Please help. any other clarification required then please confirm so I can update my lines over here.
Here is my code for reading the properties file which returns the bundle of relevant locale.
FacesContext context = FacesContext.getCurrentInstance();
bundle = context.getApplication().getResourceBundle(context, "hardvalue");
after this I am going to call this to access the value:
bundle.getString("tutorsearch.header"); // results in ?????
Any other code you need then please confirm.
Multiple question marks usually comes from
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
For Chinese (or Emoji), you need to use MySQL's utf8mb4.
The cure (for future INSERTs):
utf8-encoded data (good)
mysqli_set_charset('utf8mb4') (or whatever your client needs for establishing the CHARACTER SET)
check that the column(s) and/or table default are CHARACTER SET utf8mb4
If you are displaying on a web page, <meta...charset=utf-8> should be near the top. (Note different spelling.)
adding below tagging have resolved my issue.
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
Thanks to All

UTF-8 issue in linux

String departmentName = request.getParameter("dept_name");
departmentName = new String(departmentName.getBytes(Charset.forName("UTF8")),"UTF8");
System.out.println(departmentName);//O/p: composés
In windows, the displayed output is what I expected and it is also fetching the record on department name matching criteria.
But in Linux it is returning "compos??s", so my DB query fails.
Can anyone give me solution?
Maybe because the Charset UTF8 doesn't exist. You must use UTF-8. See the javadoc.
First of all, using unicode output with System.out.println is no good indicator since you are dealing with console encoding. Open the file with OutputStreamWriter, explicite setting encoding to UTF-8, then you can say if the request parameter in encoded correctly or not.
Second, there may be database connection encoding issue. For MySQL you need to explicite specify encoding in connection string, as for other, it could also be, that the default system encoding is taken, when not specified.
First of all, try to figure out the encoding you have in every particular place.
For example, the string might already have the proper encoding, if your Linux system is running with UTF-8 charset; that hack was maybe only needed on Windows.
Last but not least, how do you know it is incorrect? And it is not your viewer that is incorrect? What character set does your console or log viewer use?
You need to perform a number of checks to find out where exactly the encoding is different from what is expected at that point of your application.

Output a number raised to a power as a string

I am new to Java and I'm not quite sure how to output an integer raised to a power as a string output. I know that
Math.pow(double, double)
will actually compute the value of raising a double to a power. But if I wanted to output "2^6" as an output (except with 6 as a superscript and not with the carat), how do I do that?
EDIT: This is for an Android app. I'm passing in the integer raised to the power as a string and I would like to know how to convert this to superscript in the UI for the phone.
Unicode does have superscript versions of the digits 0 to 9: http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
This should print 2⁶:
System.out.println("2⁶");
System.out.println("2\u2076");
If you're outputting the text to the GUI then you can use HTML formatting and the <sup> tag to get a superscript. Otherwise, you'll have to use Unicode characters to get the other superscripts. Wikipedia has a nice article on superscripts and subscripts in Unicode:
http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
This answer should only be used if using Eclipse (a java editor) what eclipse does is it only supports certain Unicode symbols. 1 2 and 3 can all be done and are supported by eclipse, anything else you have to play with the settings of eclipse which isnt too hard. There's this thing called Windows-1252 which is the default for a lot of stuff it seems, including being the default encoding for files in Eclipse. So, whenever you're printing to the console, that's what it's trying to interpret it as, and it doesn't support the full unicode charset since it's a 1 byte encoding. This isn't actually a problem with the Java, then, it's a problem with Eclipse, which means you need to edit your Eclipse to use UTF-8 instead. You can do this in multiple places; if you just want the unicode characters to be displayed when you're running this one file, you can right click the file, properties -> resource -> text file encoding and change from default to other: utf-8. If you want to do this to your whole workspace, you can go to Window -> Preferences -> General -> Workspace -> Text file encoding. You could also do this project-wide or even just package-wide following pretty similar steps depending what you're going for.

Encoding problems exporting file

I'm trying to find out what has happen in an integration project. We just can't get the encoding right at the end.
A Lithuanian file was imported to the as400. There, text is stored in the encoding EBCDIC. Exporting the data to ANSI file and then read as windows-1257. ASCII-characters works fine and some Lithuanian does, but the rest looks like crap with chars like ~, ¶ and ].
Example string going thou the pipe
Start file
Tuskulënö
as400
Tuskulënö
EAA9A9596
34224335A
exported file (after conversion to windows-1257)
Tuskulėnö
expected result for exported file
Tuskulėnų
Any ideas?
Regards,
Karl
EBCDIC isn't a single encoding, it's a family of encodings (in this case called codepages), similar to how ISO-8859-* is a family of encodings: the encodings within the families share about half the codes for "basic" letters (roughly what is present in ASCII) and differ on the other half.
So if you say that it's stored in EBCDIC, you need to tell us which codepage is used.
A similar problem exists with ANSI: when used for an encoding it refers to a Windows default encoding. Unfortunately the default encoding of a Windows installation can vary based on the locale configured.
So again: you need to find out which actual encoding is used here (these are usually from the Windows-* family, the "normal" English one s Windows-1252).
Once you actually know what encoding you have and want at each point, you can go towards the second step: fixing it.
My personal preference for this kind of problems is this: Have only one step where encodings are converted: take whatever the initial tool produces and convert it to UTF-8 in the first step. From then on, always use UTF-8 to handle that data. If necessary convert UTF-8 to some other encoding in the last step (but avoid this if possible).

Categories

Resources