We are having a character set issue and have not been able to figure it out. We have a server in a data center in Poland being used by some people in Italy. Italy is FTPing the data to Poland in a flat file that is being read by a Java program and being inserting into an MS SQL server database. The data is then being displayed on the web using an IBM IHS web server fronting an IBM WebSphere server. The batch, database, web and app servers are all Windows boxes in Poland.
We are getting some instances of character substitution. Specifically, the à (small letter A with a grace) is getting displayed on the web as an ŕ (small letter R with an acute). We can see that the à in the CP1252 Western European character set, and the ŕ the CP1250 Eastern European character set occupy the same place (see http://www.kreativekorp.com/charset/), so we believe this is a character set issue.
The fields in the database are all nvarchar. We have tried various setting for the field collation to no avail. We tried setting the character set on the WebSphere app server JVM, but that did not help either. The Poland server will be hosting sites for multiple countries in Europe, so changing the default language and character set in Windows is not really a good option.
Any clues would be greatly appreciated!
Does data gets messed up only in the front-end or is it altered in the database also? It would be interesting to try to divide the problem to identify at which point data gets changed.
You can see a discussion about the JVM charset here.
Related
Currently, I am collecting tweets based on emotions and doing the analysis. I have tweets with emojis but while collecting, it simply returns with a question mark.
For example:
Original tweet (in Twitter):
lipton ice tea💛
After collection (in MongoDB):
lipton ice tea?
I am using Twitter 4j Java package with MongoDB.
MongoDB uses UTF-8 by default so, unless you configured it not to, it is perfectly capable of storing the emojis.
This one time I spent a whole week banging my head against the wall because MongoDB wouldn't store Latin special characters. Turns out MongoDB worked just fine and it was Log4j the one that wasn't configured to print logs using UTF-8, so all I saw in the logs was ???? instead of ñáçÜ.
If you connect to your MongoDB instance using Mongo Shell (<mongo installation dir>/bin/mongo.exe in Windows), as I did, and query your data, you should be able to see the emojis. Here's a quick reference for the Mongo Shell.
Your problem lies in your JSON visor, or in the encoding of the strings you're sending to MongoDB.
In Java, you might want to set the file.encoding system property to UTF-8, to make sure your program uses the right enconding when reading from files, input streams etc.
If you're using Robomongo, this is a robomongo problem.
Robomongo displays a ? instead of emojis in table mode.
Project is based on
Postgres database version 9.3.5,
Java 7, org.hibernate hibernate-core 3.6.10.Final
Problem :
I have two separate system running the same web application. One on of the systems everything is persisted correctly on the other Strings sent to Postgres database contain unicode characters and text like 'nnés' is persisted as 'nns' or 'nnés-2' . The only difference I noticed between those two systems Is one displaying UNICODE and the otherUTF8 as client encoding when doing SHOW client_encoding; in the console. The one running unicode works correctly the other does not.
My question is
Is it possible that client encoding got stuck/hardcoded somehow and it is not being selected based on real client encoding which would mean the strings sent in unicode arent converted to UTF8 but just persisted.
What can be the reasons for such a behavior.
try request.setCharacterEncoding("UTF-8");
I have an application running on Jelastic. The Java based web application is running on Glassfish and the database server is MySql.
I developed the project on Netbeans and there was no character problem when running the project on the local machine (Turkish Windows 8).
When running on Jelastic, there is no character problem related to the web pages. However, there is problem when form based interactions are called.
Some Turkish characters are not processed when a search query or customer registration tasks are executed. Those characters missing (recorded to the MySql as ?)are the ones differing from the Latin. For example "ö", which is also used in German is not the problem.
Problematic characters: http://en.wikipedia.org/wiki/Wikipedia:Turkish_characters
As I said previously, I dont have such problem when working on local Glassfish rolled in Netbeans.
I checked out the phpMyAdmin server, and I think that some values (that are set by default such as latin1_swedish_ci) might be the cause for loss of Turkish characters.
I tried to change the values on , but those are reset to the defaults when the server is restarted. Could this be the source of my problem? If so, how could I set them permentantly?
Your kind support will be greatly appreciated.☺
where exactly you are applied changes?
As I know these settings can be changed at etc/my.cnf via Jelastic Dashboard in MySQL.
Concerning Character Setting you will help this link:
Change MySQL default character set to UTF-8 in my.cnf?
If the problem persists ask your hosting provider for help with this issue.
I am trying to get Chinese characters from a SQL Server 2005 database server with my web application hosted with Jboss server on a Linux box (RHEL). The issue is that the Chinese characters never get returned from the database, showing some square boxes instead. I have tried both the JTDS drivers as well as the SqlJdbc drivers from Microsoft for this. Interestingly the same combination of database and drivers works fine in a Windows environment with the Chinese characters returned in a string from the result set.
Any help on the issue would be greatly appreciated.
There's not really enough info about what you're doing with the data between the time it comes out of the database, and the time it gets displayed in the view. It might be a good idea to print some debug information in both Linux and Windows to see what the differences are for certain System properties, like if you output System.getProperty("file.encoding") in both scenarios, what do you get?
You might want to try using JAVA_OPTS=-Dfile.encoding=UTF-8.
Perhaps the discussion at the link below might help.
https://community.jboss.org/thread/155260?_sscc=t
It doesn't sound like this is a database/driver related problem.
The unicode character from a rails app appears as ??? in the mysql database (ie when I view through putty or linux console), but my rails app reads it properly and shows as intended.I have another java application which reads from the rails database and stores the values in its own database. and try to show in from its database. But in the webpage, it appears like ??? instead of the unicode characters.
How is that the rails application is able to show it properly and not the java application. Do I need to specify any encoding within the java application?
You really need to find out whether it's the Java app that's wrong, the Rails app that's wrong, or both. Use PuTTY or a Linux console isn't a great way of checking this, as they may well not support the relevant Unicode characters. Ideally, you should find a GUI which you can connect to the database, and use that to check the values. Alternatively, find some MySQL functions which will return the Unicode code points directly.
It's quite possible that the Rails app is doing the wrong thing, but in a reversible way (possibly not always reversible - you may just be lucky at the moment). I've seen this before, where a developer has consistently used the same incorrect code page when both encoding and decoding text, and managed to get the right results out without actually storing the correct data. Obviously that screws up any other system trying to get at the same data.
You may want to check the connection parameters: http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
I guess your Java application may use wrong encoding when reading from rails' database, wrong encoding of its own database or in connection with it.