Reading Unicode characters from an Access database via JDBC-ODBC

Reading Unicode characters from an Access database via JDBC-ODBC - java

I have some non-standard characters in my Access 2010 database. When I read them via
Connection con = null;
try{
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
java.util.Properties prop = new java.util.Properties();
prop.put("charSet", "UTF8");
String database = "jdbc:odbc:Lb";
con = DriverManager.getConnection(database, prop);
} catch (Exception ex) {
System.out.println("Error");
}
Statement stm = conn.createStatement();
ResultSet rs = stm.executeQuery("SELECT distinct forename, surname from PSN where isValid");
while (rs.next()) {
String forename = rs.getString("forename");
}
I receive question marks (?) where the character should be. Why is this?

I had question marks when DB contained polish characters. It was fixed when I set charecter encoding to windows-1250.
def establish(dbFile: File): Connection = {
val fileName = dbFile.getAbsolutePath
val database = s"${driver}DBQ=${fileName.trim};DriverID=22;READONLY=true}"
val props = new Properties()
props.put("charSet", "Cp1250")
val connection= DriverManager.getConnection(database,props)
connection
}

I expect your JDBC driver to handle reading and writing characters to your database transparently. Java's internal string representation is UTF-16.
Java(UTF-16) --JDBC--> Database(DbEncoding)
Database(DbEncoding) --JDBC--> Java(UTF-16)
Perhaps the problem is that you are trying to force reading them with UTF8 and the database uses another internal representation?
Also, how do you verify that you receive '?'
If System.out is involved, you should take into consideration that this PrintStream converts in memory Strings to the Charset that it uses. IIRC this Charset can be found with Charset.defaultcharset() and is a property of th JVM that runs the program.
It is preferable to inspect the hexadecimal value of the char and look up a Unicode table to be sure that information has been lost while reading from the database.
Hope this helps a bit.

It's not "utf8", "Cp1250" !
One must use : ISO-8859-1
java.util.Properties prop = new java.util.Properties();
prop.put("charSet", "ISO-8859-1");
String connURL = "jdbc:odbc:DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=" + accessFileName + ";uid=''; pwd='';";
sql = "SELECT * FROM enq_horaires;";'
con = DriverManager.getConnection(connURL, prop);
stmt = con.createStatement();
ResultSet rset = stmt.executeQuery(sql);

This is a long-standing interoperability issue between the Access ODBC driver and the JDBC-ODBC Bridge. Access stores Unicode characters using a variation of UTF-16LE encoding (not UTF-8) and the JDBC-ODBC bridge is unable to retrieve them.
(Note that this is not a problem with the Access ODBC driver per se because other tools like pyodbc for Python can retrieve the Unicode characters correctly. It is a compatibility issue between the JDBC-ODBC Bridge and the Access ODBC driver.)
A bug report was filed with Sun in November 2005 outlining the issue. That report was closed as "Won't Fix" in April 2013 with the comment
The bridge has been removed from Java SE 8 and is not supported
If you need to work with arbitrary Unicode characters in an Access database you should consider using UCanAccess. For more information, see
Manipulating an Access database from Java without ODBC

Related

JDBC Clob and NClob

Is there any difference between java.sql.Clob and java.sql.NClob? There is no new method for java.sql.NClob interface. I tried the following:
The setup SQL:
create table tab(id number(2), clobcol clob, nclobcol nclob)
insert into tab values (1, to_clob('你好'), to_nclob('你好'))
JDBC code:
conn = getConnection();
stmt = conn.createStatement();
rs = stmt.executeQuery("select * from tab");
rs.next();
Clob c = rs.getClob(2);
NClob nc = rs.getNClob(3);
InputStream inputStream1 = c.getAsciiStream();
InputStream inputStream2 = nc.getAsciiStream();
System.out.println(inputStream1.available());
System.out.println(inputStream2.available());
c.free();
nc.free();
I have also tried some other methods, looks like there is no difference from the output. Is there a specific I can see some differences ?
Added the supported character set in the database:
SELECT parameter, value
FROM v$nls_parameters
3 WHERE parameter LIKE '%CHARACTERSET';
PARAMETER VALUE
--------------------------------- --------------------
NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16

In the old days (80s) many Databases were created using US7ASCII (in the US) or ISOLATIN1 (in Europe) as the character set. For these Databases that still exist today (after many upgrades), the only way to store non-ASCII character String data is to use the special types NVARCHAR or NCLOB. These Nxxx types are not used by newer Databases that were created directly using UTF8 (now the default in Oracle) as the encoding.

Platform independent Connection String for EXCEL in java

Is there a PLATFORM INDEPENDENT connection string for EXCEL file in java. jdbc:odbc is platform dependent. Is there anything else ??

In this post you could see an example of using a connection string without the ODBC. I suppose this is what you are looking for...
Class.forName("com.hxtt.sql.excel.ExcelDriver").newInstance();
String url = "jdbc:Excel:///E:/JavaWithExcel/Feedback.xlsx";
String sql = "select * from [Sheet1]";
Connection con = DriverManager.getConnection(url, "", "");
Statement stmt = con.createStatement();
System.out.println(con);
System.out.println(stmt);
stmt.close();
con.close();
Anyway, you have to understand that the connection string is always specific to the underlying environment. It could depend on the operating system, on the file system or something else... I would just use different configuration files for each environment.
Hope I helped!

Getting Unicode Data from MS Access output will be " ???????????????????"

I am using MS Access & MySQL ,in access input this word
کوردستان ی عیراق (it's kurdish language using unicode )
my code is :
try{
String path ="src\\Database.accdb";
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver").newInstance();
Connection c = DriverManager.getConnection(""
+ "jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)}; DBQ="+path);
Statement s = c.createStatement();
ResultSet rs = s.executeQuery("select * from mytable");
rs.next();
jTextArea1.setText(rs.getString(1));
}catch(Exception ex){
JOptionPane.showMessageDialog(null, ex.getMessage());
}
with access the output is ??????????????
but with MYSQL and the output is کوردستان ی عیراق
Why ??
thanks

If you are just trying to get data from an MS Access db and you don't need to run complex queries, you might want to check out the Jackcess project, which is a native, cross-platform Java API for opening MS Access files. it doesn't currently have support for running SQL queries, but it does give you access to all the data without going through the (flaky) jdbc-odbc bridge. it also has support for looking up data using indexes (via an IndexCursor).
(disclaimer, i am the primary author).

You should set a appropriate charset for the properties when you try to establish the connection, e.g.:
java.util.Properties prop = new java.util.Properties();
prop.put("charSet", "UTF8"); // Not tested..
Connection c = DriverManager.getConnection(
"jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)}; DBQ="+path, prop);
Look into the documentation for the JDBC-ODBC Bridge for further details

From microsoft forums:
If your connection character set is utf8, then you should run this query right
after connecting to DB:
SET NAMES 'utf8';
Also your DB and tables and columns should have utf8_general_ci or other type of
utf8 collation.
Hope this helps

Setting the mysql collation correctly

I'm trying to insert to my table hebrew values but the result is always "??????".
Which collation should I use?I tried to use hebrew_bin and hebrew_general_ci and the result was the same.
The reason I used the Java tag is that my code is written in java and as far as I know in web development you have to specify the collation in the web scripts also.
So maybe I have to do that in the java code also?
EDIT
Here is the code:
private String url = "jdbc:mysql://ip:port/";
private String dbName = "dbname";
private String driver = "com.mysql.jdbc.Driver";
private String userName = "user";
private String password = "password";
....
stmt = conn.prepareStatement("INSERT INTO locations(location) VALUES (?)");
stmt.setString(1, "hebrew sentence");
stmt.execute();
Now how do I change my code into the example you showed?

As per a possible duplicate you might try this:
Posted by: Sai Ye Yan Naing Aye:
Set UTF-8 in your code. See the following:
Connection con = DriverManager.getConnection("jdbc:mysql://localhost/embeddedChat?" +
"user=site_access&password=XXXXXXXX&useUnicode=true&characterEncoding=UTF-8");
Personalized to your connection of course
See the following:
Connecting to MySQL Using the JDBC DriverManager Interface
JDBC Basics - Establishing a Connection

set the table you needed to utf8 and all the rows in it to
utf8 the added this to the connection script:mysql_query("SET NAMES 'utf8'");
This will help you I guess

Can't store UTF-8 Content in MySQL Using Java PreparedStatement

For some strange reason I can't seem to add UTF-8 data to my MySQL database. When I enter a non-latin character, it's stored as ?????. Everything else is stored fine. So for example, "this is an example®™" is stored fine, but "和英辞典" is stored as "????".
The connection url is fine:
private DataSource getDB() throws PropertyVetoException {
ComboPooledDataSource db = new ComboPooledDataSource();
db.setDriverClass("com.mysql.jdbc.Driver");
db.setJdbcUrl("jdbc:mysql://domain.com:3306/db?useUnicode=true&characterEncoding=UTF-8");
db.setUser("...");
db.setPassword("...");
return db;
}
I'm using PreparedStatement as you would expect, I even tried entering "set names utf8" as someone suggested.
Connection conn = null;
PreparedStatement stmt = null;
ResultSet rs = null;
try {
conn = db.getConnection();
stmt = conn.prepareStatement("set names utf8");
stmt.execute();
stmt = conn.prepareStatement("set character set utf8");
stmt.execute();
... set title...
stmt = conn.prepareStatement("INSERT INTO Table (title) VALUES (?)");
stmt.setString(1,title);
stmt.execute();
} catch (final SQLException e) {
...
The table itself seems to be fine.
Default Character Set: utf8
Default Collation: utf8_general_ci
...
Field title:
Type text
Character Set: utf8
Collation: utf8_unicode_ci
I tested it by entering in Unicode ("和英辞典" specifically) through a GUI editor and then selecting from the table -- and it was returned just fine. So this seems to be an issue with JDBC.
What am I missing?

On your JDBC connection string, you just need set the charset encoding like this:
jdbc:mysql://localhost:3306/dbname?characterEncoding=utf8

There is 2 points in the mysql server to check in order to correctly set the UTF-8 charset.
Database Level
This is obtained by creating it :
CREATE DATABASE 'db' CHARACTER SET 'utf8';
Table Level
All of the tables need to be in UTF-8 also (which seems to be the case for you)
CREATE TABLE `Table1` (
[...]
) DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
The important part being DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
Finally, if your code weren't handling utf8 correctly, you could have forced your JVM to use utf8 encoding by changing the settings by on startup :
java -Dfile.encoding=UTF-8 [...]
or changing the environment variable
"**JAVA_TOOLS_OPTIONS**" to -Dfile.encoding="UTF-8"
or programmatically by using :
System.setProperty("file.encoding" , "UTF-8");
(this last one may not have the desire effect since the JVM caches value of default character encoding on startup)
Hope that helped.

Use stmt.setNString(...) instead of stmt.setString(...).
Also don't forget to check column collation in database side.

If you log in to your mysql database and run show variables like 'character%';
this might provide some insight.
Since you're getting a one-to-one ratio of multi-byte characters to question marks then it's likely that the connection is doing a character set conversion and replacing the Chinese characters with the replacement character for the single-byte set.

Also check locale -a on ubuntu default Ubuntu works with en_us locale and doesn't have other locale installed.
must specify characterEncoding=utf8 while connecting through JDBC.

add at the end of your DB connection url - (nothing else needed)
ex.
spring.datasource.url = jdbc:mysql://localhost:3306/dbname?characterEncoding=utf8

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading Unicode characters from an Access database via JDBC-ODBC - java

Related

JDBC Clob and NClob

Platform independent Connection String for EXCEL in java

Getting Unicode Data from MS Access output will be " ???????????????????"

Setting the mysql collation correctly

Can't store UTF-8 Content in MySQL Using Java PreparedStatement

Categories

Resources