I'm using mysql with hibernate and having problems with all other languages then English. I'm getting and exception saying that the language is not utf8 though the language is utf8 (hebrew).
I ran show variables like '%character%'; and this is what I got:
I think maybe character_set_server is the problem? it is latin1 and I can't change it to utf8, how do I do it? I'm using amazon RDS and there under parameters group I see utf character_set_server, so I don't understand why its not utf8 above.
On the other hand maybe it's not the problem at all. Any Other suggestions are welcome.
EDIT:
I managed to change the attached image values to utf8 for everything but still I"m getting the following exception:
2016-02-21 08:46:05 DEBUG SqlExceptionHelper:139 - could not execute statement [n/a]
java.sql.SQLException: Incorrect string value: '\xD7\xAA\xD7\xA9\xD7\x95...' for column 'text' at row 1
...
...
...
2016-02-21 08:46:05 WARN SqlExceptionHelper:144 - SQL Error: 1366, SQLState: HY000
2016-02-21 08:46:05 ERROR SqlExceptionHelper:146 - Incorrect string value: '\xD7\xAA\xD7\xA9\xD7\x95...' for column 'text' at row 1
2016-02-21 08:46:05 INFO AbstractBatchImpl:208 - HHH000010: On release of batch it still contained JDBC statements
2016-02-21 08:46:05 DEBUG SqlExceptionHelper:225 - SQL Warning
java.sql.SQLWarning: Incorrect string value: '\xD7\xAA\xD7\xA9\xD7\x95...' for column 'text' at row 1
EDIT 2:
So I managed also to fix the exception. It is now saved fine in the DB.
I fixed it by calling the following command for each column:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci;
My problem now is that the results are returned from DB with question marks.
I still see empty value for character_set_results when calling show variables like '%character%';
I don't think character_set_server is the problem.
\xD7\xAA\xD7\xA9\xD7\x95 is hex for the utf8-encoding for 'תשו'. If it were interpreted as latin1, it would be 'תשו'.
A strange setting is the empty value (empty string? NULL?) for character_set_result, which controls transliteration during SELECT.
Please provide the output of SELECT col, HEX(col) FROM ... -- If you get hex of D7AAD7A9D795 for that Hebrew string, then the data is stored correctly, and we should look at the output side. If not, then the data is stored incorrectly. Or that ALTER messed things up.
Hebrew, in utf8, displayed as hex, is mostly 'D7xx'.
You need utf8 in several places:
The bytes you are inserting need to be encode in utf8.
The connection needs to be in utf8. <property name="url" value="jdbc:mysql://...&characterSetResults=utf8&characterEncoding=utf-8"/>
The table definition needs to say CHARACTER SET utf8 (or utf8mb4). Do SHOW CREATE TABLE.
The html output needs <meta charset=utf-8" />.
If data is coming from an HTML form: <form accept-charset="UTF-8">
Related
I have a problem when I try to execute an insert in a MySQL database in a Java application using jdbc and the value of a varchar column has special characters (accented in Portuguese). Ex. Words with Ç, Ã, Â, À, Á, etc.
I get the java.sql.SQLException Exception: Incorrect string value: '\xEF\xBF\xBD\xC7\xB6\xEF...' for column 'cdesc' at row 1
However, when I log into MySQL on the server and run the command on the MySQL client, the command is executed successfully.
I tried several solutions proposed here, but all without success.
Insert:
INSERT INTO DICTIONARY VALUES('Test','T','ÇÃÁÀÂ','S')
Error:
java.sql.SQLException: Incorrect string value: '\xEF\xBF\xBD\xC7\xB6\xEF...' for column 'cdesc' at row 1
I tried it in mysqld.conf and changing the encoding of the table I'm trying to write to utf8mb4, but it didn't work.
[mysql]
default-character-set=utf8mb4
[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
Important informations:
Mysql Version: Server version: 8.0.28-0ubuntu0.20.04.3 (Ubuntu) (In version 5.7 it works) jdbc Driver Version: mysql-connector-java-8.0.28
Has anyone experienced this before, do you have any ideas that could help me?
Thanks
I have problems when I insert into tables words with accent marks. So I think that I have to "activate" UTF-8 to fix that error.
I'm not using Class for name. That's my code:
miInitialContext = new InitialContext();
miDS = (DataSource) miInitialContext.lookup(InformacionProperties.getStrDataSource());
Connection conexion = miDS.getConnection();
Statement myStatement = conexion.createStatement();
myStatement.executeUpdate("INSERT INTO table values ......)
How can I "activate" that UTF8 with my code?
Use set names to set your connection charset. However, that won't matter much if the columns/tables/databases you interact with aren't configured with a compatible charset. For instance, with a latin1 column testcol, inserting utf8 data will result in an error like
INSERT INTO `test`.`table` (`testcol`) VALUES ('test_val'), ('Ídata');
ERROR 1366 (HY000): Incorrect string value: '\xE2\x88\x9A\xC3\xA7d...' for column 'testcol' at row 2
So you'll need to update the table structure
ALTER TABLE t MODIFY testcol CHAR(50) CHARACTER SET utf8;
Which then fixes the issue:
INSERT INTO `test`.`table` (`testcol`) VALUES ('test_val'), ('Ídata');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
(See the mysql docs for details).
This post has good documentation on finding the character sets of various structures.
(Thanks to Shadow for the SET NAMES part)
i have a MySQL database having standard encoding and server encoding all set as utf8.I have csv files coming in of multiple encoding which I have to load in the database using jdbc. But when the incoming file is of encoding ANSII, load data infile fails
java.sql.SQLException: Invalid utf8 character string: '1080'
I am creating a table table_abc based on csv headers and then using the below query to load the csv file into database
LOAD DATA LOCAL INFILE 'XXX.csv' INTO TABLE table_abc CHARACTER SET UTF8 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES
Here is my DB definition
character_set_client utf8
character_set_connection utf8
character_set_database utf8
character_set_filesystem binary
character_set_results utf8
character_set_server utf8
character_set_system utf8
character_sets_dir C:\Program Files\MySQL\MySQL Server 5.7\share\charsets\
What should I do now,
Should i convert all files to utf8 before uploading? if yes then how in Java
Should I have multiple encoded tables for multiple encoded files? If yes, then how do i detect encoding of incoming file in java?
P.S I have no issues in missing out non-utf8 characters while loading in the table, my only intention is the sucessful upload of the file in the DB without giving any error irrespective of encoding.
Thanks
If you mean that some columns are utf8 and some columns are, say, latin1, then it gets a bit complicated, but still possible.
Create a "staging" table to put the data into from the LOAD. But have all the VARCHAR columns be VARBINARY and TEXT be BLOB. This way the data bytes will be loaded unchanged.
Then ALTER that table to convert the binary/blob columns to the suitable varchar/text types:
ALTER ...
MODIFY COLUMN col1 VARCHAR(111) CHARACTER SET ... COLLATION ...,
MODIFY COLUMN col2 TEXT CHARACTER SET ... COLLATION ...,
...;
Then copy the data over to your 'real' table (unless this table is sufficient).
If one column has a mixture of encodings, you are SOOL.
Identifying a charset
Provide a sample or two of the HEX of non-English characters in the column; I can usually spot what it is. This gives some clues of how to recognize a charset from hex samples.
I'm using mysql DB. The DB is amazone RDS DB.
When I execute this query: show global variables like '%character%';
I get the following result:
But when I execute this query: show variables like '%character%'; I get this result:
As you can see character_set_results is empty. I tried following queries and nothing changes the empty value:
ALTER DATABASE myDB CHARACTER SET utf8;
ALTER DATABASE myDB CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER DATABASE myDB DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
SET CHARACTER_SET_RESULTS=UTF8
SET NAMES 'utf8';
SET CHARACTER SET 'utf8';
SET session CHARACTER_SET_RESULTS = utf8;
As I understand the second query returns session parameters. Might this affect the results I'm getting?
I have two machines. Development machine where I run the application with eclipse (this is a java app with hibernate) and another one, the deployment machine. Everything works fine on the development machine but on the deployment machine sometimes I'm getting ???? or other strange characters when getting data from DB.
Both machines connect to the same DB and data is stored fine in the db itself.
Also the connection url is jdbc:mysql://myDB:3306/myApp?autoReconnect=true&useUnicode=true&createDatabaseIfNotExist=true&characterEncoding=utf-8.
Any ideas what can cause this?
Multiple question marks usually come from:
You are INSERTing Chinese (or any non-western-Europe text).
You said SET NAMES utf8 to declare that the client bytes are utf8-encoded (correct).
But the table columns are declared CHARSET latin1. <-- This is the problem.
Since there is no way to convert Chinese characters into latin1, '?' stored.
Please provide SHOW CREATE TABLE to confirm the above hypothesis.
If you are getting "other strange characters", please provide:
SELECT col, HEX(col) FROM ...
to show an example of what is stored for the "strange characters".
Ive been stunk in working with java and mysql these days.
The problem is, ive got a mysql database. There is a column in one table which shows the chinese city names. One collegue changed the db to utf8 for every character(connection, db, results, server and system) The consequence is that the data before the change didn't show correctly any more only if i set the %character% back to latin1. In either character set i can only retrive half the data correctly. Could you please help me how to solve the problem?
Ive tried to use java to solve the problem but it doesn't work.
String sql = "SELECT * FROM customer_addresses";
ResultSet result = query.executeQuery(sql);
while (result.next()) {
byte b[] = result.getBytes("city");
c = new String(result.getBytes("city"), "UTF-8");
}
For example: there is one city in db like this 乌é²æœ¨é½å¸‚
the java print: 乌�?木�?市
it should be:乌鲁木齐市
Thanks in advance
Default charset of your MySQL server is probably not UTF8. Try to execute the following SQL queries before getting data from the database:
SET NAMES utf8
and
SET CHARACTER SET utf8
Add characterEncoding=UTF-8 to the connection string, where you connect to the database. For example:
"jdbc:mysql://servername:3306/databasename?characterEncoding=UTF-8"
Incidentally, the data in the database appears to be broken. If you want the database to store 乌鲁木齐市, that's what should be in the table, not 乌é²æœ¨é½å¸.
Update: The problem with the how the data is stored in the database is easier to solve using database's own tools, not Java. For each table that stores text do this:
ALTER TABLE tablename CONVERT TO CHARACTER SET binary;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8;