MySQL does not store properly some UTF8 chars - java

I'm using the mysql dbms to store pages from Wikipedia. I've set the character-set encoding to utf-8 (wikipedia encoding) in my.cnf file with the directive:
[mysqld]
character_set_server = utf8
And created my database with the 'chararacter set utf8' property definition.
I've also changed the charset-encoding for mysqld client by:
inserting the 'charSet=utf8' property when initializing my jdbc driver.
doing a query to 'set names utf8'
However I've noticed that mysql server replaces some characters with others.
For example it replaces á with a.
UPDATE
I've run the command show variables like '%char%' ensuring that both character_set_client and character_set_set are utf8.
How can I store the correct chars in my db? Thanks!

Try to specify the encoding in the DB URL :
url="jdbc:mysql://localhost:port/DBNAME?characterEncoding=UTF-8"
Here's some more information regarding my answer :
The following is taken from the MySQL documentation (http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html) :
All strings sent from the JDBC driver to the server are converted
automatically from native Java Unicode form to the client character
encoding, including all queries sent using Statement.execute(),
Statement.executeUpdate(), Statement.executeQuery() as well as all
PreparedStatement and CallableStatement parameters with the exclusion
of parameters set using setBytes(), setBinaryStream(),
setAsciiStream(), setUnicodeStream() and setBlob().
Setting the Character Encoding
The character encoding between client
and server is automatically detected upon connection. You specify the
encoding on the server using the character_set_server for server
versions 4.1.0 and newer. The driver automatically uses the encoding
specified by the server. To override the automatically detected
encoding on the client side, use the characterEncoding property in the
URL used to connect to the server. To allow multiple character sets
to be sent from the client, use the UTF-8 encoding, either by
configuring utf8 as the default server character set, or by
configuring the JDBC driver to use UTF-8 through the characterEncoding
property.
I encountered a similar problem a few months ago. I checked the default value of character_set_server on my MySQL (using the “mysqld --verbose –help” command).
It was latin1.

Related

Cannot read special characters with JDBC

I'm having problem reading UTF-8 data from MySQL database by using MySQL Connector v. 8.0.19. Scandic letters, such as "äö" are replaced with unknown characters. I already made sure the database and its tables and columns are using utf8mb4. Then I added useUnicode=true&characterEncoding=UTF-8 to JDBC connection string, but the outcome is still unexpected. I'm running MySQL CE v. 8 in a Docker container. I can see the scandic letters fine when I run the SELECT queries in a command-line.
I solved this problem by passing --default-character-set=utf8mb4 to MySQL command-line before creating the schema from a separate file. I could add this option to MySQL server configuration as a default.

How MySQL connector jar version can affect query performance?

Currently, in my application with mysql-connector-java:5.1.36, everything works fine. But when I upgrade connector to mysql-connector-java:5.1.47, a query starts to take minutes-2-hours time to execute. If I run the same query directly from the terminal or from the application with v5.1.36, it takes less than a few seconds to execute.
How MySQL connector jar version can affect query performance?
I found the reason. There is a change in mysql-connector-java:5.1.47 and above:
When UTF-8 is used for characterEncoding in the connection string, it maps to the MySQL character set name utf8mb4. While for mysql-connector-java:5.1.46 and below it corresponds to utf8 (or utf8mb3 more appropriately).
Link: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-charsets.html
For my database character encoding is set to utf8 (or utf8mb3 more appropriately). Due to this index was not being used in query. After setting server property
character_set_server : utf8
it is working fine.

Jaybird / JDBC + National characters in path

I'm working with Firebird database in Java. Everything works fine, but I have problem with connecting to my database if database file path contains national characters, eg. "á" or "č".
Sample exception:
org.firebirdsql.jdbc.FBSQLException: GDS Exception. 335544344. I/O error during "CreateFile (open)" operation for file "Z:/testing/á/sample.fdb"
Path is correct, database exists. Jaybird / JDBC has problem with "á" character in path.
Any ideas how to fix it or where is problem? Thanks for all responses.
OS: Windows 7 Pro 64 bit
JDK: 1.7.0.25
Jaybird: 2.2.3
Starting with Jaybird 3, database filename will always be sent as UTF-8, if the server is Firebird 2.5 or higher.
Jaybird 2.2 and earlier has limited support for special characters in the databasename. There are several options you can use to workaround this problem, but if they actually work depends highly on the version of Firebird and the default characterset of the OS (where Firebird is running).
Option 1: use connection property filename_charset=<name of charset> where <name of charset> is the default character set of the operating system where the Firebird server is running.
For example:
jdbc:firebirdsql://myserver/mydatabase?filename_charset=Cp1252
Option 2 (Firebird 2.5 or higher, with Jaybird 2.2): use the workaround described in JDBC-251:
Start your Java application with -Dfile.encoding=UTF8 and include utf8_filename=1 in the connection URL:
jdbc:firebirdsql://myserver/mydatabase?utf8_filename=1
When using this option, make sure that you already specify a connection characterset using connection property charSet, localEncoding or local_encoding (for Java character set names), and/or encoding or lc_ctype (for Firebird character set names). If not, you are using Firebird character set NONE which uses the JVM default character set, and you will need to set the charSet to the 'normal' default encoding of your JVM to prevent character set conversion issues because of the changed value of file.encoding (in some cases - in addition to specifying charSet - you may also explicitly need to set encoding to NONE).
Option 3: Define an alias with only ASCII characters in aliases.conf of the firebird server for the database and connect using this alias instead:
jdbc:firebirdsql://myserver/thealias
Disclosure: I am one of the Jaybird developers

MySQL works in Latin1 - how to insert UTF-8 encoded data?

I have web app in jsf2.0 on tomcat 6 and MySQL. I'm using special characters for example "ąśżźć". I'm able to read data from MySQL and show it properly on the web page. The problem occurs when user is filling in the form with any special character then in MySQL the characters are replaced to '?'. But if I use MySQL Workbench tool and I'll make insert to DB everything works and the special characters are showed proper on the web page.
My Configuration:
JSF - UTF-8 (for response and request character encoding)
MySQL columns are configured to use UTF-8, example: DESCRIPTION varchar(100) CHARACTER SET utf8 null
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
And the character_set_server = latin1 makes me big problems.
When it is UTF-8 everything is ok, in latin1 not.
My hosting company will not change the MySQL setting so the latin1 stays :-(.
How can I make workaround for that?
I tried already many solutions like creating filter, makes conversion using Charset and String.getBytes(). I'm running out of ideas.
For connecting I'm using JDBC by org.apache.tomcat.jdbc.pool.DataSource.
MySQL JDBC driver uses by default the platform default character encoding as obtained by Charset#getDefaultCharset() which is not necessarily UTF-8. You need to set it as a parameter in the JDBC URL:
jdbc:mysql://localhost:3306/dbname?useUnicode=true&characterEncoding=UTF-8
See also:
Unicode - How to get the characters right?

UTF-8 encoding problem in unix machine

Im exporting a set of data to excel in java, the data has certain non ascii characters, while exporting in Windows machine the data are coming correctly in UTF-8 encoded format.But when i deploy my code in Unix machine it is not working properly.UTF-8 encoding is not working properly.
Im using Tomcat 5.5 server.I have also included URIencoding="UTF_8" parameter in server.xml. But still in unix machine it is not working properly
Running Tomcat with a
-Dfile.encoding=UTF8
option will force the VM to adopt UTF8 as it's default encoding regardless of your environment. I suspect that's your problem (and it's good practise nonetheless)
When you are working with UTF-8 data it can be very fragile. Every step in the chain needs to specify utf8.
(1) In the database, be sure the table has UTF8 encoding and collation. "show table status" will tell you this. Google for "mysql alter table utf8" for more info.
(2) The JDBC connection needs to specify the UTF8 encoding. For Connector/J this will be something similar to:
useUnicode=true&characterEncoding=UTF-8
(3) Every stream read/write, every string to byte conversion in the code needs to specify UTF-8 as the encoding. It's easy to miss one. If it's missed Java will use the system default which will vary server to server.
(4) Not applicable here, but for user submitted data from a form, you need to set the request character encoding. (I do this in a servlet filter)
request.setCharacterEncoding("UTF-8");
(5) Probably not applicable here, but if you output HTML/XML/text from your servlet, set the HTTP character encoding header. (This might apply if you are generating the Excel file as an XML file).
response.setContentType("text/html; charset=UTF-8");

Categories

Resources