UTF-8 encoding problem in unix machine - java

Im exporting a set of data to excel in java, the data has certain non ascii characters, while exporting in Windows machine the data are coming correctly in UTF-8 encoded format.But when i deploy my code in Unix machine it is not working properly.UTF-8 encoding is not working properly.
Im using Tomcat 5.5 server.I have also included URIencoding="UTF_8" parameter in server.xml. But still in unix machine it is not working properly

Running Tomcat with a
-Dfile.encoding=UTF8
option will force the VM to adopt UTF8 as it's default encoding regardless of your environment. I suspect that's your problem (and it's good practise nonetheless)

When you are working with UTF-8 data it can be very fragile. Every step in the chain needs to specify utf8.
(1) In the database, be sure the table has UTF8 encoding and collation. "show table status" will tell you this. Google for "mysql alter table utf8" for more info.
(2) The JDBC connection needs to specify the UTF8 encoding. For Connector/J this will be something similar to:
useUnicode=true&characterEncoding=UTF-8
(3) Every stream read/write, every string to byte conversion in the code needs to specify UTF-8 as the encoding. It's easy to miss one. If it's missed Java will use the system default which will vary server to server.
(4) Not applicable here, but for user submitted data from a form, you need to set the request character encoding. (I do this in a servlet filter)
request.setCharacterEncoding("UTF-8");
(5) Probably not applicable here, but if you output HTML/XML/text from your servlet, set the HTTP character encoding header. (This might apply if you are generating the Excel file as an XML file).
response.setContentType("text/html; charset=UTF-8");

Related

Invalid character found in the request target Tomcat 9

In the migration of a project that works correctly under Tomcat 7 to Tomcat 9 I am receiving the error Invalid character found in the request target when passing words with accents in the request.
The server.xml file was modified by introducing in the connector URIEncoding="UTF-8" and the clause relaxedQueryChars= áÁéÉíÍóÓúÚ was also introduced, but I still receive the same error. I can't touch the actual code of the project.
log traces show this: java.lang.IllegalArgumentException: Invalid character found in the request target [/sahab/lupaDem.do?filtro=LupaDem&param1=1&param2=29527&param3=OBSERVACI0xd3N%20ANTIRR0xc1BICA
HTTP does not allow to specify an encoding for the requested path. Thus in the past servers used OS settings which were actually confusing especially for applications used world-wide.
Therefore the standard established to always encode requests as UTF-8, and even then there is the URLEncoding which would prevent UTF-8 problems by %-escaping any special characters.
In a nutshell, ensure your requests are properly encoded. Previous versions of Tomcat may not have errored on this. The requests are coming from the client, not the server itself.

How MySQL connector jar version can affect query performance?

Currently, in my application with mysql-connector-java:5.1.36, everything works fine. But when I upgrade connector to mysql-connector-java:5.1.47, a query starts to take minutes-2-hours time to execute. If I run the same query directly from the terminal or from the application with v5.1.36, it takes less than a few seconds to execute.
How MySQL connector jar version can affect query performance?
I found the reason. There is a change in mysql-connector-java:5.1.47 and above:
When UTF-8 is used for characterEncoding in the connection string, it maps to the MySQL character set name utf8mb4. While for mysql-connector-java:5.1.46 and below it corresponds to utf8 (or utf8mb3 more appropriately).
Link: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-charsets.html
For my database character encoding is set to utf8 (or utf8mb3 more appropriately). Due to this index was not being used in query. After setting server property
character_set_server : utf8
it is working fine.

Jaybird / JDBC + National characters in path

I'm working with Firebird database in Java. Everything works fine, but I have problem with connecting to my database if database file path contains national characters, eg. "á" or "č".
Sample exception:
org.firebirdsql.jdbc.FBSQLException: GDS Exception. 335544344. I/O error during "CreateFile (open)" operation for file "Z:/testing/á/sample.fdb"
Path is correct, database exists. Jaybird / JDBC has problem with "á" character in path.
Any ideas how to fix it or where is problem? Thanks for all responses.
OS: Windows 7 Pro 64 bit
JDK: 1.7.0.25
Jaybird: 2.2.3
Starting with Jaybird 3, database filename will always be sent as UTF-8, if the server is Firebird 2.5 or higher.
Jaybird 2.2 and earlier has limited support for special characters in the databasename. There are several options you can use to workaround this problem, but if they actually work depends highly on the version of Firebird and the default characterset of the OS (where Firebird is running).
Option 1: use connection property filename_charset=<name of charset> where <name of charset> is the default character set of the operating system where the Firebird server is running.
For example:
jdbc:firebirdsql://myserver/mydatabase?filename_charset=Cp1252
Option 2 (Firebird 2.5 or higher, with Jaybird 2.2): use the workaround described in JDBC-251:
Start your Java application with -Dfile.encoding=UTF8 and include utf8_filename=1 in the connection URL:
jdbc:firebirdsql://myserver/mydatabase?utf8_filename=1
When using this option, make sure that you already specify a connection characterset using connection property charSet, localEncoding or local_encoding (for Java character set names), and/or encoding or lc_ctype (for Firebird character set names). If not, you are using Firebird character set NONE which uses the JVM default character set, and you will need to set the charSet to the 'normal' default encoding of your JVM to prevent character set conversion issues because of the changed value of file.encoding (in some cases - in addition to specifying charSet - you may also explicitly need to set encoding to NONE).
Option 3: Define an alias with only ASCII characters in aliases.conf of the firebird server for the database and connect using this alias instead:
jdbc:firebirdsql://myserver/thealias
Disclosure: I am one of the Jaybird developers

Issue with microsoft word url encoding during webdav protocol

We have :
a webdav server running on Linux (java application)
a client on Windows 7, using ms-word 2010
The urls to open our files end with the files' names, and are encoded in UTF-8 before to be sent to the UI :
server.com/path/my_file_name.doc
It works perfectly with file names without special characters but with an ugly url like
server.com/path/En-tête de lettre + capital 1 050 000 €.doc
, our server does not manage to access to the file.
In the stack trace, we can see that the url received by the server is
server.com/path/En-tête de lettre + capital 1 050 000 �.doc
, but the error message that ms-word is displaying contains the right url, so I think that the original url is right.
And last but not least : it works when the server is running on a windows platform.
I suppose that ms-word re-encodes the url before transmitting it to the server, but I can't guess how to decode it.
All suggestions are welcome ^^
I'm the author of http://milton.io (java webdav server lib) and I've seen an issue where MS clients do incorrect encoding of some url's, and milton has some workarounds for that. What webdav framework/server are you using?
However, the example given looks more like mangling, as suggested by Marc B. Your server is probably outputting the propfind response in UTF-8, but windows is interpreting it as win-1252.
So you should look at the response headers and check to see what character encoding is given for the response and check that it matches the actual encoding used in the PROPFIND response.
Note that earlier versions of milton had a problem where they would use the server default encoding but always report UTF-8, so this problem would occur on any server not using UTF-8 as the default character encoding.

MySQL does not store properly some UTF8 chars

I'm using the mysql dbms to store pages from Wikipedia. I've set the character-set encoding to utf-8 (wikipedia encoding) in my.cnf file with the directive:
[mysqld]
character_set_server = utf8
And created my database with the 'chararacter set utf8' property definition.
I've also changed the charset-encoding for mysqld client by:
inserting the 'charSet=utf8' property when initializing my jdbc driver.
doing a query to 'set names utf8'
However I've noticed that mysql server replaces some characters with others.
For example it replaces á with a.
UPDATE
I've run the command show variables like '%char%' ensuring that both character_set_client and character_set_set are utf8.
How can I store the correct chars in my db? Thanks!
Try to specify the encoding in the DB URL :
url="jdbc:mysql://localhost:port/DBNAME?characterEncoding=UTF-8"
Here's some more information regarding my answer :
The following is taken from the MySQL documentation (http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html) :
All strings sent from the JDBC driver to the server are converted
automatically from native Java Unicode form to the client character
encoding, including all queries sent using Statement.execute(),
Statement.executeUpdate(), Statement.executeQuery() as well as all
PreparedStatement and CallableStatement parameters with the exclusion
of parameters set using setBytes(), setBinaryStream(),
setAsciiStream(), setUnicodeStream() and setBlob().
Setting the Character Encoding
The character encoding between client
and server is automatically detected upon connection. You specify the
encoding on the server using the character_set_server for server
versions 4.1.0 and newer. The driver automatically uses the encoding
specified by the server. To override the automatically detected
encoding on the client side, use the characterEncoding property in the
URL used to connect to the server. To allow multiple character sets
to be sent from the client, use the UTF-8 encoding, either by
configuring utf8 as the default server character set, or by
configuring the JDBC driver to use UTF-8 through the characterEncoding
property.
I encountered a similar problem a few months ago. I checked the default value of character_set_server on my MySQL (using the “mysqld --verbose –help” command).
It was latin1.

Categories

Resources