Incorrect String value without "?characterEncoding=UTF-8" - java

I am trying to insert some text in a field in my database and I have a problem with emojis. What happens is that if I don't set my connection url to jdbc:mysql://localhost:3306/MyDatabase?characterEncoding=UTF-8 then the server will store the emojis just fine but will also store non-latin characters as question marks.
Now if I set my connection url like that then the server will not like the emojis and will output the error:
Incorrect string value: '\xF0\x9F\x98\xB1\xF0\x9F...' for column 'fullTweet' at row 1
I have done all the necessary steps for utf8 compatibility on my local server:
I added the line character-set-server=utf8mb4 to my.ini
The query show variables like 'character_set_server' returns utf8mb4
I create my database with the query CREATE DATABASE twitter DEFAULT CHARACTER SET utf8mb4
and by default all of my tables and fields of my tables are using utf8mb4_general_ci (as I can see in phpmyadmin)
What is missing? I am sure I did all the necessary steps and I still can't get this to work, either it will store only latin characters or it won't store emoji.
Additional information from a previous question of mine:
I can manually enter the emoji in the database and they show exactly as they show in the debugger (as boxes). I run this query:
INSERT INTO `tweets`(`id`, `createdAt`, `screenName`, `fullTweet`, `editedTweet`) VALUES (450,"1994-12-19","john",_utf8mb4 x'F09F98B1',_utf8mb4 x'F09F98B1')
and this is what the row in the table looks like:

This is explained in details in How to support full Unicode in MySQL databases · Mathias Bynens
Basically, I think you miss this (from Step 5)
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
Then, you should remove characterEncoding=UTF-8 from your URL.
Also check you have MySQL Connector/J 5.1.13+ (see JDBC url for MySQL configuration to use utf8 character encoding)

Try this-->
convert('string' using utf8)
or
convert('string',char)
or
convert('string' using latin1)
or use whatever the character encoding is that you have set in the table.
The code I have provided are mysql functions.

Related

Emoji values not saved in MariaDB

In a Java application that processes long text messages with emoji characters and some metadata, each represented as a JSON string, serialized via org.json.simple.JSONObject.toJSONString(), Hibernate produces errors like
Incorrect string value: '\xF0\x9F\x98\x8A\x0A<...' for column 'message' at row 1
while saving the messages to an AWS RDS instance of a MariaDB database.
The table is created as
CREATE TABLE messages (
`message` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
Using other collations (e.g., utf8mb4_unicode_ci) does not make any difference.
Hibernate configuration properties include the following values:
hibernate.connection.url --> jdbc:mariadb://localhost:3306/null?useUnicode=true&character_set_server=utf8mb4
hibernate.connection.characterEncoding --> utf8mb4
hibernate.connection.driver_class --> org.mariadb.jdbc.Driver
hibernate.dialect --> org.hibernate.dialect.MariaDBDialect
hibernate.hikari.dataSource.url --> jdbc:mariadb://localhost:3306/null?useUnicode=true&character_set_server=utf8mb4
hibernate.connection.CharSet --> utf8mb4
hibernate.connection.useUnicode --> true
The code is run via a command-line tool in a Kubernetes pod. The first run of the tool completed fine. Every subsequent run on the same data, even after dropping and recreating the table, produces the Incorrect string value... error, always for the same messages.
Interestingly, running the tool via IntelliJ IDEA does not produce any error, everything else (code, configuration and data) being the same as the command line tool.
Any ideas?

How to restrict ‘ this special character in textarea using regular expression in java servlet [duplicate]

I have the following string value: "walmart obama 👽💔"
I am using MySQL and Java.
I am getting the following exception: `java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F...'
Here is the variable I am trying to insert into:
var1 varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL`
My Java code that is trying to insert "walmart obama 👽💔" is a preparedStatement. So I am using the setString() method.
It looks like the problem is the encoding of the values 👽💔. How can I fix this? Previously I was using Derby SQL and the values 👽💔 just ended up being two sqaures (I think this is the representation of the null character)
All help is greatly appreciated!
What you have is EXTRATERRESTRIAL ALIEN (U+1F47D) and BROKEN HEART (U+1F494) which
are not in the basic multilingual plane. They cannot be even represented in java as one char, "👽💔".length() == 4. They are definitely not null characters and one will see squares if you are not using fonts that support them.
MySQL's utf8 only supports basic multilingual plane, and you need to use utf8mb4 instead:
For a supplementary character, utf8 cannot store the character at all,
while utf8mb4 requires four bytes to store it. Since utf8 cannot store
the character at all, you do not have any supplementary characters in
utf8 columns and you need not worry about converting characters or
losing data when upgrading utf8 data from older versions of MySQL.
So to support these characters, your MySQL needs to be 5.5+ and you need to use utf8mb4 everywhere. Connection encoding needs to be utf8mb4, character set needs to be utf8mb4 and collaction needs to be utf8mb4. For java it's still just "utf-8", but MySQL needs a distinction.
I don't know what driver you are using but a driver agnostic way to set connection charset is to send the query:
SET NAMES 'utf8mb4'
Right after making the connection.
See also this for Connector/J:
14.14: How can I use 4-byte UTF8, utf8mb4 with Connector/J?
To use 4-byte UTF8 with Connector/J configure the MySQL server with
character_set_server=utf8mb4. Connector/J will then use that setting
as long as characterEncoding has not been set in the connection
string. This is equivalent to autodetection of the character set.
Adjust your columns and database as well:
var1 varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL
Again, your MySQL version needs to be relatively up-to-date for utf8mb4 support.
Weirdly, I found that REMOVING &characterEncoding=UTF-8 from the JDBC url did the trick for me with similar issues.
Based on my properties,
jdbc_url=jdbc:mysql://localhost:3306/dbName?useUnicode=true
I think this supports what #Esailija has said above, i.e. my MySQL, which is indeed 5.5, is figuring out its own favorite flavor of UTF-8 encoding.
(Note, I'm also specifying the InputStream I'm reading from as UTF-8 in the java code, which probably doesn't hurt)...
All in all, to save symbols that require 4 bytes you need to update characher-set and collation for utf8mb4:
database table/column:
alter table <some_table> convert to character set utf8mb4 collate utf8mb4_unicode_ci
database server connection (see)
On my development enviromnt for #2 I prefer to set parameters on command line when starting the server:
mysqld --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
btw, pay attention to Connector/J behavior with SET NAMES 'utf8mb4':
Do not issue the query set names with Connector/J, as the driver will not detect that the character set has changed, and will continue to use the character set detected during the initial connection setup.
And avoid setting characterEncoding parameter in connection url as it will override configured server encoding:
To override the automatically detected encoding on the client side, use the characterEncoding property in the URL used to connect to the server.
How I solved my problem.
I had
?useUnicode=true&characterEncoding=UTF-8
In my hibernate jdbc connection url and I changed the string datatype to longtext in database, which was varchar before.
Append the line useUnicode=true&characterEncoding=UTF-8 to your jdbc url.
In your case the data is not being send using UTF-8 encoding.
I faced the same issue and solved it by setting the Collation to utf8_general_ci for each column.
I guess MySQL doesn't believe this to be valid UTF8 text. I tried an insert on a test table with the same column definition (mysql client connection was also UTF8) and although it did the insert, the data I retrieved with the MySQL CLI client as well as JDBC didn't retrieve the values correctly. To be sure UTF8 did work correctly, I inserted an "ö" instead of an "o" for obama:
johan#maiden:~$ mysql -vvv test < insert.sql
--------------
insert into utf8_test values(_utf8 "walmart öbama 👽💔")
--------------
Query OK, 1 row affected, 1 warning (0.12 sec)
johan#maiden:~$ file insert.sql
insert.sql: UTF-8 Unicode text
Small java application to test with:
package test.sql;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
public class Test
{
public static void main(String[] args)
{
System.out.println("test string=" + "walmart öbama 👽💔");
String url = "jdbc:mysql://hostname/test?useUnicode=true&characterEncoding=UTF-8";
try
{
Class.forName("com.mysql.jdbc.Driver").newInstance();
Connection c = DriverManager.getConnection(url, "username", "password");
PreparedStatement p = c.prepareStatement("select * from utf8_test");
p.execute();
ResultSet rs = p.getResultSet();
while (!rs.isLast())
{
rs.next();
String retrieved = rs.getString(1);
System.out.println("retrieved=\"" + retrieved + "\"");
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
Output:
johan#appel:~/workspaces/java/javatest/bin$ java test.sql.Test
test string=walmart öbama 👽💔
retrieved="walmart öbama "
Also, I've tried the same insert with the JDBC connection and it threw the same exception you are getting.
I believe this to be a MySQL bug. Maybe there's a bug report about such a situation already..
I had kind of the same problem and after going carefully against all charsets and finding that they were all right, I realized that the bugged property I had in my class was annotated as #Column instead of #JoinColumn (javax.presistence; hibernate) and it was breaking everything up.
execute
show VARIABLES like "%char%”;
find character-set-server if is not utf8mb4.
set it in your my.cnf, like
vim /etc/my.cnf
add one line
character_set_server = utf8mb4
at last restart mysql
This setting useOldUTF8Behavior=true worked fine for me. It gave no incorrect string errors but it converted special characters like à into multiple characters and saved in the database.
To avoid such situations, I removed this property from the JDBC parameter and instead converted the datatype of my column to BLOB. This worked perfect.
Besides,data type can use blob install of varchar or text.

UTF-8 won't persist on Hibernate + MySQL

I'm trying to save some values in MySQL database by using Hibernate, but most Lithuanian characters won't get saved, including ąĄ čČ ęĘ ėĖ įĮ ųŲ ūŪ(they are saved as ?), however, šŠ žŽ do get saved.
If I do inserts manually, then those values are properly saved, so the problem is most likely in Hibernate configuration.
What I have tried so far:
hibernate.charset=UTF-8
hibernate.character_encoding=UTF-8
hibernate.use_unicode=true
---------
properties.put(PROPERTY_NAME_HIBERNATE_USE_UNICODE,
env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_USE_UNICODE));
properties.put(PROPERTY_NAME_HIBERNATE_CHARSET,
env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARSET));
properties
.put(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING,
env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING));
---------
private void registerCharachterEncodingFilter(ServletContext aContext) {
CharacterEncodingFilter cef = new CharacterEncodingFilter();
cef.setForceEncoding(true);
cef.setEncoding("UTF-8");
aContext.addFilter("charachterEncodingFilter", cef)
.addMappingForUrlPatterns(null, true, "/*");
}
As described here
I tried adding ?useUnicode=true&characterEncoding=utf-8 to db connection url.
As described here
I ensured that my db is set to UTF-8 charset. phpmyadmin > information_schema > schemata
def db_name utf8 utf8_lithuanian_ci NULL
This is how I save into db:
//Controller
buildingService.addBuildings(schema.getBuildings());
List<Building> buildings = buildingService.getBuildings();
System.out.println("-----------");
for (Building b : schema.getBuildings()) {
System.out.println(b.toString());
}
System.out.println("-----------");
for (Building b : buildings) {
System.out.println(b.toString());
}
System.out.println("-----------");
//Service:
#Override
public void addBuildings(List<Building> buildings) {
for (Building b : buildings) {
getCurrentSession().saveOrUpdate(b);
}
}
First set of println contains all Lithuanian characters, while second replaces most with ?
EDIT: Added details
insert into buildings values (11,'ąĄčČęĘ', 'asda');
select short, hex(short) from buildings;
//Šalt. was inserted via hibernate
//letters are properly displayed:
ąĄčČęĘ | C485C484C48DC48CC499C498
MIF Šalt. | 4D494620C5A0616C742E
select address, hex(address) from buildings;
Šaltini? <...> | C5A0616C74696E693F20672E2031412C2056696C6E697573
//should contain "ų"
--------
show create table buildings;
buildings | CREATE TABLE `buildings` (
`id` int(11) NOT NULL,
`short` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL,
`address` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_lithuanian_ci
EDIT:
I did not find a proper solution, so I came up with a workaround. I ended up escaping/unescaping characters, storing them like this: \uXXXX.
Let's verify that they were stored correctly... Please do SELECT col, HEX(col) ... to fetch some cell with Lithuanian characters. A correctly stored ą will show C485. The others should show various hex values of C4xx or C5xx. 3F is ?.
But, more importantly, 4 characters do show. Š should be C5A0 if properly stored as utf8. However, I suspect, you will see 8A, implying that the column in the table is really declared as CHARACTER SET latin1. (The 4 characters show up in the first column of my charset blog ).
Do SHOW CREATE TABLE to see how the column is defined. If it says latin1, then the problem is with the table definition, and you probably ought to start over.
You have to ensure that every component taking part in data entry uses UTF-8 encoding explicitly.
If you enter the values via browser, make sure that the
page displaying the results with the following header
Content-Type: text/html; charset=utf-8.
The input form is defined as follows
<form action="submit" accept-charset="UTF-8">...</form>.
If you are creating String objects from byte array, make sure you
explicitly state the Charset in the constructor.
If your entry happens from a text file, that file has to be UTF-8
encoded.
If it is hardcoded directly in your code, then the source has to be
UTF-8 encoded.
The fact that your DB holds correct UTF-8 (two or more bytes for a special letter) is reassuring.
If you get one single ? for a special letter, it was attempted to do a UTF-8 conversion to some encoding that does not contain those letters. And that seems to be the case. The letters that are converted correctly are in the ISO-8859-1 or Windows-1252 range. The others are not.
Now ISO-88591-1 aka Latin-1 is the default HTTP encoding, default in java EE server. You might like to do before writing:
response.setCharacterEncoding("UTF-8");
Now one problem with System.out.println is that it uses the system default encoding. Logging to a file with a logger is more interesting. Or debugging and inspecting the String and its char array.
That the schema does seemingly work, may be that the schema Strings stem immediately from a Java source, and the editor encoding and javac compiler encoding differ. This can be checked by u-escaping the string literals in java: "\u0105" instead of "ą".
Make a unit test that writes and reads from the database.

Error with two byte UTF-8 character in UPDATE statement for MySQL database

An update-statement seems to work only with one or three byte long UTF-8 characters.
My test code
def sql = sql.newInstance('jdbc:mysql://.../...?useUnicode=true&characterEncoding=UTF-8',
'...', '...', 'com.mysql.jdbc.Driver')
String value = 'β'
sql.execute('UPDATE Kldb_SynonymVersion SET synonyms=? WHERE id=11940', [value])
fails with
com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 'synonyms' at row 1
in com.mysql.jdbc.MysqlIO.checkErrorPacket.
It works with value="a" or value = '€'.
I am using
java 1.6.0_20
mysql 5.0.26
mysql-connector 5.1.13
The character-set of the table is set to utf8.
I know that I can disable the truncation, but than I only avoid the exception and get an invalid character ('?') in the database.
Are you sure that column encoding of MySQL column is UTF-8.
MySQL driver is able to write unicode characters to ASCII columns, and even read them correctly, so the problem can be unnotified for long time.

Spring 3 MVC + MySQL: cannot store € character

I have Spring 3 MVC set up with Hibernate and MySQL 5. In a web form, I enter a single character into a field, € (i.e. just the one character). When I then attempt to save the data, I get the following exception:
java.sql.BatchUpdateException: Data truncation: Data truncated for column 'name' at row 1
'name' is a String on my model object. The 'name' column is of datatype VARCHAR(80) in MySQL. I have also tried entering a € into a TEXT column, with the same result.
I have configured a CharacterEncodingFilter for my webapp and my DB connection string looks like this:
jdbc:mysql://localhost/baseApp?zeroDateTimeBehavior=convertToNull&useUnicode=true&characterEncoding=utf8
Any ideas what the problem might be?
Update:
I don't think MySQL has anything to do with this issue. I have intercepted the HTTP POST before the properties of my model object are set and the € is properly encoded as %80. When I interrogate the properties of my model object, however, €'s are simply ?'s.
Any thoughts?
Are you sure the MySQL database suports UTF-8? I think the default install settings uses latin1. You also need to make sure that the 'default-character-set' for [mysql] and [mysqld] in the my.ini configuration file is set to 'utf8'. Furthermore make sure the table was built with UTF-8 settings.

Categories

Resources