I am trying to read a UTF-8 string from my MySql database, which I create using:
CREATE DATABASE april
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
I make the table of interest using:
DROP TABLE IF EXISTS `article`;
CREATE TABLE `article` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`text` longtext NOT NULL,
`date_created` timestamp DEFAULT NOW(),
PRIMARY KEY (`id`)
) CHARACTER SET utf8;
If I select * from article in the MySql command line util, I get:
OIL sands output at Nexen’s Long Lake project dropped in February.
However, when I do
ResultSet rs = st.executeQuery(QUERY);
long id = -1;
String text = null;
Timestamp date = null;
while (rs.next()) {
text = rs.getString("text");
LOGGER.debug("text=" text);
}
the output I get is:
text=OIL sands output at Nexen’s Long Lake project dropped in February.
I get my Connection via:
DriverManager.getConnection("jdbc:" + this.dbms + "://" + this.serverHost + ":" + this.serverPort + "/" + this.dbName + "?useUnicode&user=" + this.username + "&password=" + this.password);
I've also tried, instead of the useUnicode parameter:
characterEncoding=UTF-8
and
characterEncoding=utf8
I also tried, instead of the line text = rs.getString("text")
rs.getBytes("text");
String[] encodings = new String[]{"US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-16", "Latin1"};
for (String encoding : encodings) {
text = new String(temp, encoding);
LOGGER.debug(encoding + ": " + text);
}
// Which outputted:
US-ASCII: OIL sands output at Nexen��������s Long Lake project dropped in February.
ISO-8859-1: OIL sands output at Nexenââ¬â¢s Long Lake project dropped in February.
UTF-8: OIL sands output at Nexen’s Long Lake project dropped in February.
UTF-16BE: 佉䰠獡湤猠潵瑰畴琠乥硥滃ꋢ芬ꉳ⁌潮朠䱡步⁰牯橥捴牯灰敤渠䙥扲畡特�
UTF-16LE: 䥏⁌慳摮畯灴瑵愠⁴敎數썮겂蓢玢䰠湯慌敫瀠潲敪瑣搠潲灰摥椠敆牢慵祲�
UTF-16: 佉䰠獡湤猠潵瑰畴琠乥硥滃ꋢ芬ꉳ⁌潮朠䱡步⁰牯橥捴牯灰敤渠䙥扲畡特�
Latin1: OIL sands output at Nexenââ¬â¢s Long Lake project dropped in February.
I load the strings into the DB using some pre-defined sql in a file. This file is UTF-8 encoded.
mysql -u april -p -D april < insert_articles.sql
This file includes the line:
INSERT INTO article (text) value ("OIL sands output at Nexen’s Long Lake project dropped in February.");
When I print out that file within my application using:
BufferedReader reader = new BufferedReader(new FileReader(new File("/home/path/to/file/sql_article_inserts.sql")));
String str;
while((str = reader.readLine()) != null) {
LOGGER.debug("LINE: " + str);
}
I get the correct, expected output:
LINE: INSERT INTO article (text) value ("OIL sands output at Nexen’s Long Lake project dropped in February.");
Any help would be much appreciated.
Some System Details:
I am running on linux (Ubuntu)
Edits:
* Edited to specify OS
* Edited to detail output of reading sql input file.
* Edited to specify more about how the data is inserted into the DB.
* Edited to to fix typo in code, and clarify example.
Is it possible you're reading the log file using the incorrect encoding? windows-1252, I am guessing.
UTF-8: OIL sands output at Nexen’s Long Lake project dropped in February.
If this is appearing in the log, do a hex dump of the log file. If the data is UTF-8, you would expect the sequence Nexen’s to become 4E 65 78 65 6E E2 80 99 73. If some other application reads this as a native ANSI encoding, it'll decode it as Nexen’s.
To confirm, you can also dump the individual characters of the return value to see if they are correct in UTF-16:
//untested
for(char ch : text.toCharArray()) {
System.out.printf("%04x%n", (int) ch);
}
I'm assuming all data is in the BMP, so you can just look up the results in the Unicode charts.
Try setting the database itself to UTF-8. When creating the DB:
CREATE DATABASE mydb
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
Also see MySQL reference on connection charsets and MySQL reference on configuring charsets for applications
Parameters in the JDBC URL only define how the driver should communicate with the server. If the server does not use UTF8 by default these parameters won't change it either.
Have you tried executing the following SQL query after connecting? (This should switch the current connection to UTF8 on the server-side too):
SET names utf8
There are several character encodings involved.
The terminal/cmd window that the mysql command line tool is running. (putty?)
the environment in the shell (bash) where you are running your stuff. (LC_CTYPE)
Mysql internal (used in tables) : you have defined this to UTF-8
The JVM internal (always UTF16)
The character used by the writers the logger use. Default (system property) or perhaps defined in the logging frameworks configuration.
The terminal/cmd/editor that you read the logs with. ( putty/bash?)
If the terminal settings are wrong, you might have inserted corrupted data in mysql. (If your terminal is iso-8859-1 and you read a file that is UTF-8, for instance) Assuming linux, mysql should look at the env LC_CTYPE (but I am not 100% sure that it does.)
The JDBCD driver is responsible for converting the database character encoding to the JVMs internal format (UTF16) so that should not be a problem. But you can test this with a simpel java program that inserts a hard coded string, and reads it back. Print the original and received string - they should be identical. But; If both are wrong, you have a problem with the terminals character set definition.
Use a string like "HejÅÄÖ" for some drama...
ALso, write a small program that prints the same string to a file using a printwriter that converts to UTF-8 and verify that the tool you use for reading the log prints that file correctly. If not, there terminals settings are to be suspected, again.
String test = "Test HEJ \u00C5\u00C4\u00D6 ÅÄÖ";
// here's how to define what character set to use when writing to a fileOutputStream
PrintWriter pw = new PrintWriter("test.txt","UTF8");
pw.println(test);
pw.flush();
pw.close();
System.out.println(test);
output -> Test HEJ ÅÄÖ ÅÄÖ
The contents ni the file test.txt should look the same.
Related
I have the following string value: "walmart obama 👽💔"
I am using MySQL and Java.
I am getting the following exception: `java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F...'
Here is the variable I am trying to insert into:
var1 varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL`
My Java code that is trying to insert "walmart obama 👽💔" is a preparedStatement. So I am using the setString() method.
It looks like the problem is the encoding of the values 👽💔. How can I fix this? Previously I was using Derby SQL and the values 👽💔 just ended up being two sqaures (I think this is the representation of the null character)
All help is greatly appreciated!
What you have is EXTRATERRESTRIAL ALIEN (U+1F47D) and BROKEN HEART (U+1F494) which
are not in the basic multilingual plane. They cannot be even represented in java as one char, "👽💔".length() == 4. They are definitely not null characters and one will see squares if you are not using fonts that support them.
MySQL's utf8 only supports basic multilingual plane, and you need to use utf8mb4 instead:
For a supplementary character, utf8 cannot store the character at all,
while utf8mb4 requires four bytes to store it. Since utf8 cannot store
the character at all, you do not have any supplementary characters in
utf8 columns and you need not worry about converting characters or
losing data when upgrading utf8 data from older versions of MySQL.
So to support these characters, your MySQL needs to be 5.5+ and you need to use utf8mb4 everywhere. Connection encoding needs to be utf8mb4, character set needs to be utf8mb4 and collaction needs to be utf8mb4. For java it's still just "utf-8", but MySQL needs a distinction.
I don't know what driver you are using but a driver agnostic way to set connection charset is to send the query:
SET NAMES 'utf8mb4'
Right after making the connection.
See also this for Connector/J:
14.14: How can I use 4-byte UTF8, utf8mb4 with Connector/J?
To use 4-byte UTF8 with Connector/J configure the MySQL server with
character_set_server=utf8mb4. Connector/J will then use that setting
as long as characterEncoding has not been set in the connection
string. This is equivalent to autodetection of the character set.
Adjust your columns and database as well:
var1 varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL
Again, your MySQL version needs to be relatively up-to-date for utf8mb4 support.
Weirdly, I found that REMOVING &characterEncoding=UTF-8 from the JDBC url did the trick for me with similar issues.
Based on my properties,
jdbc_url=jdbc:mysql://localhost:3306/dbName?useUnicode=true
I think this supports what #Esailija has said above, i.e. my MySQL, which is indeed 5.5, is figuring out its own favorite flavor of UTF-8 encoding.
(Note, I'm also specifying the InputStream I'm reading from as UTF-8 in the java code, which probably doesn't hurt)...
All in all, to save symbols that require 4 bytes you need to update characher-set and collation for utf8mb4:
database table/column:
alter table <some_table> convert to character set utf8mb4 collate utf8mb4_unicode_ci
database server connection (see)
On my development enviromnt for #2 I prefer to set parameters on command line when starting the server:
mysqld --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
btw, pay attention to Connector/J behavior with SET NAMES 'utf8mb4':
Do not issue the query set names with Connector/J, as the driver will not detect that the character set has changed, and will continue to use the character set detected during the initial connection setup.
And avoid setting characterEncoding parameter in connection url as it will override configured server encoding:
To override the automatically detected encoding on the client side, use the characterEncoding property in the URL used to connect to the server.
How I solved my problem.
I had
?useUnicode=true&characterEncoding=UTF-8
In my hibernate jdbc connection url and I changed the string datatype to longtext in database, which was varchar before.
Append the line useUnicode=true&characterEncoding=UTF-8 to your jdbc url.
In your case the data is not being send using UTF-8 encoding.
I faced the same issue and solved it by setting the Collation to utf8_general_ci for each column.
I guess MySQL doesn't believe this to be valid UTF8 text. I tried an insert on a test table with the same column definition (mysql client connection was also UTF8) and although it did the insert, the data I retrieved with the MySQL CLI client as well as JDBC didn't retrieve the values correctly. To be sure UTF8 did work correctly, I inserted an "ö" instead of an "o" for obama:
johan#maiden:~$ mysql -vvv test < insert.sql
--------------
insert into utf8_test values(_utf8 "walmart öbama 👽💔")
--------------
Query OK, 1 row affected, 1 warning (0.12 sec)
johan#maiden:~$ file insert.sql
insert.sql: UTF-8 Unicode text
Small java application to test with:
package test.sql;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
public class Test
{
public static void main(String[] args)
{
System.out.println("test string=" + "walmart öbama 👽💔");
String url = "jdbc:mysql://hostname/test?useUnicode=true&characterEncoding=UTF-8";
try
{
Class.forName("com.mysql.jdbc.Driver").newInstance();
Connection c = DriverManager.getConnection(url, "username", "password");
PreparedStatement p = c.prepareStatement("select * from utf8_test");
p.execute();
ResultSet rs = p.getResultSet();
while (!rs.isLast())
{
rs.next();
String retrieved = rs.getString(1);
System.out.println("retrieved=\"" + retrieved + "\"");
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
Output:
johan#appel:~/workspaces/java/javatest/bin$ java test.sql.Test
test string=walmart öbama 👽💔
retrieved="walmart öbama "
Also, I've tried the same insert with the JDBC connection and it threw the same exception you are getting.
I believe this to be a MySQL bug. Maybe there's a bug report about such a situation already..
I had kind of the same problem and after going carefully against all charsets and finding that they were all right, I realized that the bugged property I had in my class was annotated as #Column instead of #JoinColumn (javax.presistence; hibernate) and it was breaking everything up.
execute
show VARIABLES like "%char%”;
find character-set-server if is not utf8mb4.
set it in your my.cnf, like
vim /etc/my.cnf
add one line
character_set_server = utf8mb4
at last restart mysql
This setting useOldUTF8Behavior=true worked fine for me. It gave no incorrect string errors but it converted special characters like à into multiple characters and saved in the database.
To avoid such situations, I removed this property from the JDBC parameter and instead converted the datatype of my column to BLOB. This worked perfect.
Besides,data type can use blob install of varchar or text.
I have several csv files in am loading into MySQl using Java. In the Description field I have several Special Characters that are causing the load to fail. I am using LOAD DATA INFILE as seen in the code block below. This is nested in a for each loop which parses an array of filenames / tables and runs through each combination until it is finished with all the files.
Here is my jdbc connection string where I am passing a definitive collation param/value for UTF8 collation
static String url = "jdbc:mysql://localhost:3306/iber_stage?verifyServerCertificate=false&characterEncoding=UTF8";
other connection parameters and parsing an array of filenames/tablenames
final String sql1 = ("TRUNCATE TABLE" + tableName);
final String sql2 = ("LOAD DATA INFILE" + filetoEat + "INTO TABLE staging." +tableName + "CHARACTER SET UTF8 FIELDS TERMINATED BY',' ENCLOSED BY '\"\' LINES TERMINATED BY '\n' IGNORE 1 LINES");
try {
Class.forName("com.mysql.jdbc.Driver");
con = DriverManager.getConnection(url, username, password);
st = con.createStatement();
st.executeUpdate(sql1);
rs = st.executeQuery(sql2);
if (rs.toString() != null) {
returnMsg = rs.toString();
System.out.println(returnMsg);
updFlag = 0;
String strRecs = returnMsg.substring(40);
updateControlTable(updFlag, strRecs);
}
} catch (SQLException ex) {
Logger lgr = Logger.getLogger(update.class.getName());
lgr.log(Level.SEVERE, ex.getMessage(), ex);
updFlag = 1;
} catch (ClassNotFoundException e) {
Logger lgr = Logger.getLogger(update.class.getName());
lgr.log(Level.SEVERE, e.getMessage(), e);
e.printStackTrace();
updFlag = 1;
}
The code is working fine until it comes across a special character like a degree symbol or micro symbol µ within a Material Description . At that point it throws an Exception
Invalid utf8 character string: 'LUG'
The string LUG is followed by a µ symbol. The DB is set to utf8 - utf8_unicode_ci and the column in question is a VARCHAR(60) that holds material descriptions.
I have tried using ESCAPED BY '\\' but I can't seem to get it working correctly. I have also tried CHARACTER SET UTF8. I have also tried different collation ie, utf8_general_ci to no avail.
Any insight is greatly appreciated
Have you tried adding
CHARACTER SET UTF8
to the LOAD DATA INFILE instruction?
Full doc: http://dev.mysql.com/doc/refman/5.7/en/load-data.html
Can you check with database collation utf8_general_ci and character set as utf_8, It may work for you.
As It applies Unicode normalization using language-specific rules.
I figured that I would answer this now that I found the solution. Because I am using Java to run the LOAD DATA INFILE via JDBC the JDBC driver seems to be checking the collation at the DB and not the actual table being loaded as it is parsing the file. So you can't have the DB set to UTF-8 and have a Latin collated table as you would be able to do with an INSERT statement. I had tried to set the Table collation as Latin and even had the field in question Latin, but until I changed the entire DB to Latin it was failing. The CSV files are large so checking every char in question is not easy, but I was catching the Exceptions in Java and was able to determine that the error was generated by the JDBC driver and was complaining that "Character at line xx is not a UTF-8 character" Running in Debug allowed me to see more details.
I then concluded it must not be looking at the Latin collated table it would be filling, but was looking at the DB which was still set to UTF-8. Changing the DB to Latin was all I needed to do.
I hope this will help others in the future.
Pat
I'm trying to save some values in MySQL database by using Hibernate, but most Lithuanian characters won't get saved, including ąĄ čČ ęĘ ėĖ įĮ ųŲ ūŪ(they are saved as ?), however, šŠ žŽ do get saved.
If I do inserts manually, then those values are properly saved, so the problem is most likely in Hibernate configuration.
What I have tried so far:
hibernate.charset=UTF-8
hibernate.character_encoding=UTF-8
hibernate.use_unicode=true
---------
properties.put(PROPERTY_NAME_HIBERNATE_USE_UNICODE,
env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_USE_UNICODE));
properties.put(PROPERTY_NAME_HIBERNATE_CHARSET,
env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARSET));
properties
.put(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING,
env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING));
---------
private void registerCharachterEncodingFilter(ServletContext aContext) {
CharacterEncodingFilter cef = new CharacterEncodingFilter();
cef.setForceEncoding(true);
cef.setEncoding("UTF-8");
aContext.addFilter("charachterEncodingFilter", cef)
.addMappingForUrlPatterns(null, true, "/*");
}
As described here
I tried adding ?useUnicode=true&characterEncoding=utf-8 to db connection url.
As described here
I ensured that my db is set to UTF-8 charset. phpmyadmin > information_schema > schemata
def db_name utf8 utf8_lithuanian_ci NULL
This is how I save into db:
//Controller
buildingService.addBuildings(schema.getBuildings());
List<Building> buildings = buildingService.getBuildings();
System.out.println("-----------");
for (Building b : schema.getBuildings()) {
System.out.println(b.toString());
}
System.out.println("-----------");
for (Building b : buildings) {
System.out.println(b.toString());
}
System.out.println("-----------");
//Service:
#Override
public void addBuildings(List<Building> buildings) {
for (Building b : buildings) {
getCurrentSession().saveOrUpdate(b);
}
}
First set of println contains all Lithuanian characters, while second replaces most with ?
EDIT: Added details
insert into buildings values (11,'ąĄčČęĘ', 'asda');
select short, hex(short) from buildings;
//Šalt. was inserted via hibernate
//letters are properly displayed:
ąĄčČęĘ | C485C484C48DC48CC499C498
MIF Šalt. | 4D494620C5A0616C742E
select address, hex(address) from buildings;
Šaltini? <...> | C5A0616C74696E693F20672E2031412C2056696C6E697573
//should contain "ų"
--------
show create table buildings;
buildings | CREATE TABLE `buildings` (
`id` int(11) NOT NULL,
`short` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL,
`address` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_lithuanian_ci
EDIT:
I did not find a proper solution, so I came up with a workaround. I ended up escaping/unescaping characters, storing them like this: \uXXXX.
Let's verify that they were stored correctly... Please do SELECT col, HEX(col) ... to fetch some cell with Lithuanian characters. A correctly stored ą will show C485. The others should show various hex values of C4xx or C5xx. 3F is ?.
But, more importantly, 4 characters do show. Š should be C5A0 if properly stored as utf8. However, I suspect, you will see 8A, implying that the column in the table is really declared as CHARACTER SET latin1. (The 4 characters show up in the first column of my charset blog ).
Do SHOW CREATE TABLE to see how the column is defined. If it says latin1, then the problem is with the table definition, and you probably ought to start over.
You have to ensure that every component taking part in data entry uses UTF-8 encoding explicitly.
If you enter the values via browser, make sure that the
page displaying the results with the following header
Content-Type: text/html; charset=utf-8.
The input form is defined as follows
<form action="submit" accept-charset="UTF-8">...</form>.
If you are creating String objects from byte array, make sure you
explicitly state the Charset in the constructor.
If your entry happens from a text file, that file has to be UTF-8
encoded.
If it is hardcoded directly in your code, then the source has to be
UTF-8 encoded.
The fact that your DB holds correct UTF-8 (two or more bytes for a special letter) is reassuring.
If you get one single ? for a special letter, it was attempted to do a UTF-8 conversion to some encoding that does not contain those letters. And that seems to be the case. The letters that are converted correctly are in the ISO-8859-1 or Windows-1252 range. The others are not.
Now ISO-88591-1 aka Latin-1 is the default HTTP encoding, default in java EE server. You might like to do before writing:
response.setCharacterEncoding("UTF-8");
Now one problem with System.out.println is that it uses the system default encoding. Logging to a file with a logger is more interesting. Or debugging and inspecting the String and its char array.
That the schema does seemingly work, may be that the schema Strings stem immediately from a Java source, and the editor encoding and javac compiler encoding differ. This can be checked by u-escaping the string literals in java: "\u0105" instead of "ą".
Make a unit test that writes and reads from the database.
In Java I pass a String to PHP.
In PHP I take that String and do a search for it with a MySQL query.
Here is php code:
$query = $database->escape_value(trim($_POST['query']));
$result = mysqli_query($dbconnection, Data::getSearchQuery($query));
while ($row = mysqli_fetch_assoc($result)) {
$output[] = $row;
}
print(json_encode($output));
mysqli_close($dbconnection);
public static function getSearchQuery($item_query) {
$query = "
SELECT i.item, i.item_id, c.category, c.cat_id
FROM items as i
LEFT JOIN master_cat AS c
ON (c.cat_id = i.cat_id)
WHERE i.item LIKE '%{$item_query}%'
ORDER BY i.item ASC;";
return $query;
}
This always works if I use regular characters on my U.S. keyboard. But the moment I start using irregular characters, the search turns empty.
I can verify that MySQL stores the data AS THE USER ENTERS IT. So if they typed Beyoncè, that is how database stores it.
But when I search for Beyoncè (or whatever) in the above code, it returns empty.
How should I handle the char. encoding here?
Three points to think of:
1) The $item_query variable could come in wrong encoding.
2) >>I can verify that MySQL stores the data AS THE USER ENTERS IT
This can get tricky. If one writes an iso8859-1 encoded string to an utf-8 database, the string is obviously stored incorrectly. If that string is read with a client (i. e. phpmyadmin or mysql command line tool) configured to iso8859-1, the string is correctly returned - although its representation in the database is clearly wrong.
3) The MySql settings:
Have your set utf-8 for the connection itself? What about charsets and collations for the database/the table?
https://dev.mysql.com/doc/refman/5.5/en/charset-syntax.html
UPDATE:
I assume you want everything to be UTF-8. Kind of quick hack to test:
Beyoncé has 7 characters (see MySQL CHAR_LENGTH function)
in UTF-8, it occupies 8 bytes (see MySQL LENGTH function). The eight bytes are, represented in a one-byte-per-character encoding like windows-1252, something like Beyoncé.
This leads to the following diagnostic tests ...
The PHP-issued SQL command
"SELECT CHAR_LENGTH($item_query), LENGTH($item_query);"
should then return a result of (7, 8) to show us that the $item_query variable is probably correctly encoded and the database likes UTF-8. (7, 7) would mean $item_query wasn't UTF-8, and (8, 8) would mean the database doesn't want to deal with UTF-8 yet. If the latter is the case, then perhaps issue a SET NAMES 'UTF8'; before the query.
Similarly, the PHP-issued SQL command
SELECT CHAR_LENGTH('Beyoncé'), LENGTH('Beyoncé');
should return the result (7, 8) to show us that your PHP editor is configured to edit UTF-8 php files.
Repeat the previous step with phpmyadmin (or any SQL client) to be sure that this client uses UTF-8, too.
No table was involved yet! The SQL command
SELECT CHAR_LENGTH(somecolumn), LENGTH(somecolumn) FROM sometable;
(with sometable having UTF-8 character encoding and somecolumn containing some diacritical characters) should tell you if UTF-8 was used when storing values to the table.
If all previous tests passed, test again with LIKE. Even 'Beyoncé' LIKE 'Beyonce' should work then. For more information, google MySQL collation.
I have POST data that contains the Japanese string AKB48 ネ申テレビ シーズン3, defined in jQuery as data.
$("#some_div").load("someurl", { data : "AKB48 ネ申テレビ シーズン3"})
The post data is sent to Java Servlet:
String data = new String(this.request.getParameter("data").getBytes("ISO-8859-1"), "UTF-8");
My program saves it to MySQL, but after the data is saved to the database it becomes:
AKB48 u30CDu7533u30C6u30ECu30D3 u30B7u30FCu30BAu30F33
What should I do if I want to save it as it is in UTF-8? All my files are in UTF-8.
MySQL encoding is utf8 and here is the code
String sql = "INSERT INTO Inventory (uid, item_id, item_data, ctime) VALUES ("
+ inventory.getUid() + ",'"
+ inventory.getItemId() + "','"
+ StringEscapeUtils.escapeJava(inventory.getItemData()) + "',CURRENT_TIMESTAMP)";
Statement stmt = con.createStatement();
int cnt = stmt.executeUpdate(sql);
From your example above, I can verify that the Japanese string is getting saved to your MySQL database correctly, but as escaped Unicode.
I would check these items in order:
Are your tables and columns all set to have a character set and collation for utf8? I.e.,
CHARACTER SET utf8 COLLATE utf8_general_ci
Are explicitly setting the character set encoding before POST? request.setCharacterEncoding("UTF-8");
Are you setting the character encoding for your db connections? I.e., jdbc:mysql://localhost:3306/YOURDB?useUnicode=true&characterEncoding=UTF8
As the others have pointed out, you should not use that getBytes trick. It will surely mess up the POSTed values.
EDIT
Do not use StringEscapeUtils.escapeJava, since that will turn your string into escaped Unicode. That is what is transforming AKB48 ネ申テレビ シーズン3 into AKB48 u30CDu7533u30C6u30ECu30D3 u30B7u30FCu30BAu30F33.
Why you do not just extract value of parameter like this.request.getParameter("data")?
Your data is sent correctly using URL encoding where each unicode character is replaced by its code. Then you have to get the value of the parameter. When you are requesting bytes using ISO-8859-1 you are actually corrupting your data because the string is represented as a sequence if codes in textual form.
Java strings are stored in UTF-16. So, this code:
String data = new String(this.request.getParameter("data").getBytes("ISO-8859-1"), "UTF-8");
decodes a UTF-16 string (which has been re-encoded from UTF-8 in the HTTP protocol) into a binary array using the ISO-8859-1 charset, and re-encodes the binary array using the UTF-8 charset. This is almost certainly not what you want.
What happens when you use this?
String data = this.request.getParameter("data");
System.out.println(data);
If the second line generates bad data, then your problem is likely in jQuery. Determine that you are indeed getting unicode in your jQuery request:
System.out.println(this.request.getHeader("Content-Encoding"));
If it does not generate bad data, but the data doesn't get stored correctly in mySQL, your problem is at the database level. Make sure your column type supports unicode strings.
What's the point of the line
String data = new String(this.request.getParameter("data").getBytes("ISO-8859-1"), "UTF-8");
You're transforming chinese (or at least non-occidental) characters into bytes using the ISO-8859-1 encoding. Of course this can't work, since chinese characters are not supported by the ISO-8859-1 encoding. ANd then you're constructing a new String from bytes that are supposed to represent ISO-8859-1-encoded characters, using the UTF-8 encoding. This, once again, doesn't make any sense. UTF-8 and ISO-8859-1 are not the same thing, and only a small set of chars have the same encoding in both formats.
Just use
String data = this.request.getParameter("data");
and everything should be OK, provided that the column in the MySQL table uses an encoding that supports these characters.
EDIT:
now that you've shown us the code used to insert the data in database, I know where all this comes from (the preceding points are still valid, though). You're doing
StringEscapeUtils.escapeJava(inventory.getItemData())
What's the point? escapeJava is used to take a String and escape special characters in order to make it a valid Java String literal. It has nothing to do with SQL. Use a prepared statement:
String sql = "INSERT INTO Inventory (uid, item_id, item_data, ctime) VALUES (?, ?, ?, CURRENT_TIMESTAMP);
PreparedStatement stmt = con.prepareStatement();
stmt.setInteger(1, inventory.getUid()); // or setLong, depending on the type
stmt.setString(2, inventory.getItemId());
stmt.setString(inventory.getItemData());
int cnt = stmt.executeUpdate();
The PreparedStatement will take care of escaping special SQL characters correctly. They're the best tool agains SQL injection attack, and should always be used when a query has parameters, especially if the parameters come from the end user. See http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html.