Insert/update to sql_ascii encoding postgreSQL

Insert/update to sql_ascii encoding postgreSQL - java

Here is a postgreSQL with server encoding SQL_ASCII. When I get data, I must use function convert_to(column1, 'SQL_ASCII') in select, and then use new String(value1, 'GBK') in java to get the right value.
But, when I send data by insert/update, the value in DB always error. Anyone can tell me how to send SQL including Chinese or other character by Java?
Apache DBCP config:
driverClassName=org.postgresql.Driver
url=jdbc:postgresql://127.0.0.1:5432/fxk_db_sql_ascii
username=test
password=test
initialSize=10
maxTotal=10
maxIdle=10
minIdle=5
maxWaitMillis=1000
removeAbandonedOnMaintenance=true
removeAbandonedOnBorrow=true
removeAbandonedTimeout=1
connectionProperties=useUnicode=true;characterEncoding=SQL_ASCII;allowEncodingChanges=true
SQL query in java:
String sql = "select user_id, first_name as first_name, convert_to(first_name, 'sql_ascii') as first_name1, last_name as last_name, convert_to(last_name, 'sql_ascii') as last_name1 from public.tbl_users";
ResultSet rs = stmt.executeQuery(sql);
List<Map<String, Object>> list = new ArrayList<Map<String, Object>>();
ResultSetMetaData md = rs.getMetaData();
int columnCount = md.getColumnCount();
while (rs.next()) {
Map<String, Object> rowData = new HashMap<String, Object>();
for (int i = 1; i <= columnCount; i++) {
rowData.put(md.getColumnName(i), rs.getObject(i)==null?"":new String(rs.getBytes(i),"GBK"));
}
list.add(rowData);
}
rs.close();
But how should I do while insert/update?

Avoid SQL_ASCII
You should be using a server encoding of UTF8 rather than SQL_ASCII.
The documentation is quite clear about this matter, and even includes a warning to not do what you are doing. To quote (emphasis mine):
The SQL_ASCII setting behaves considerably differently from the other settings. When the server character set is SQL_ASCII, the server interprets byte values 0-127 according to the ASCII standard, while byte values 128-255 are taken as uninterpreted characters. No encoding conversion will be done when the setting is SQL_ASCII. Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting because PostgreSQL will be unable to help you by converting or validating non-ASCII characters.
Use UTF8
Use an encoding of UTF8, meaning UTF-8. This can handle the characters for any language including Chinese.
And the UTF8 encoding allows Postgres to make use of the new support for International Components for Unicode (ICU) in Postgres 10 and later.
Java also uses Unicode encoding. Just let your JDBC driver handle the marshaling of text between Java and the database.

Related

UCanAccess in Java returning wrong order with ORDER BY clause in column with special characters

Using the Microsoft Access Database (2007-2013) with a .mdb file, I created a simple test_table table with only one text column "name" and inserted the following test values:
óbito,
fanatico,
orbita,
fanático,
fanta,
órbita,
fantástico,
obito,
obituario,
orbitando
When I execute the query SELECT * FROM test_table ORDER BY name using MS Access query design, the following ordered result is returned:
fanatico,
fanático,
fanta,
fantástico,
obito,
óbito,
obituario,
orbita,
órbita,
orbitando
This order is totally correct and expected.
Now, I need to retrieve and use these values in my Java software. In order to do this, I am using the UCanAccess JDBC driver on version 5.0.0 to connect to the database. The connection itself is successfully being opened, but, when I execute the same query above, it returns the following:
fanatico
fanta
fantástico
fanático
obito
obituario
orbita
orbitando
óbito
órbita
And this is NOT the correct order (for instance, óbito should come immediately after obito). The desired order should consider accented words as if they were the same as the equivalent unnacented word.
It doesn't matter if óbito comes before or after obito, but they must be together.
I tried using COLLATE, tried changing the charset, etc, but nothing worked. Has anyone gone through something similar and could you help me solve this issue? Thanks in advance.

The driver is sorting by their binary representation and/or the individual ASCII characters. Both provide the sort order you provided at the bottom. This is entirely a problem created by the driver, and "fixes" are going to be limited.
There is a workaround posted in the JDBC driver changelog, under the 2.0.9.3 Release notes: WORKAROUND suggested: if you want the same behaviour of Access: select * from table2 order by orderJet( COLUMN1).
If that doesn't work, then you either need to
a) subvert the driver's sorting by creating/maintaining a SORTORDER column in the original database that holds the same word with all accented characters stripped, or b) find a way to change the sort after it arrives from the driver. Neither of these are preferable, so I hope the workaround provided by the developer is sufficient.

Java by default do not perform locale-sensitive String comparison.
In your example I tried following ran program as below for natural sorting
List<String> strings = Arrays.asList(new String[]{"óbito",
"fanatico",
"orbita",
"fanático",
"fanta",
"órbita",
"fantástico",
"obito",
"obituario",
"orbitando"});
Collections.sort(strings);
System.out.println("Output = " + strings);
the output is
Output = [fanatico, fanta, fantástico, fanático, obito, obituario, orbita, orbitando, óbito, órbita]
Now jut by replacing sort by below line
Collections.sort(strings, Collator.getInstance(Locale.US));
I am getting output which you are expecting
Output = [fanatico, fanático, fanta, fantástico, obito, óbito, obituario, orbita, órbita, orbitando]
Giving you example above to understand the difference when you use string comparison using locale-sensitive. There are ways you can handle this from your code or from db configuration.
You can check here for example

Did you try enforcing a CharSet?
Have a look here CharSet for MS Access '97 DB using UCanAccess
class DatabaseOpener : JackcessOpenerInterface {
override fun open(fl: File, pwd: String?): Database {
return DatabaseBuilder.open(fl).apply {
this.charset = charset("Cp1252")
}
}
}
// URL
"jdbc:ucanaccess://<path-to-mdb-file>;memory=false;jackcessOpener=${DatabaseOpener::class.qualifiedName!!}"
When using plain JDBC connection you could try adding a connection parameter:
private static java.sql.ResultSet executeDataTable(String sql) throws Exception {
Class.forName("net.ucanaccess.jdbc.UcanaccessDriver");
String conStr = "jdbc:ucanaccess://" + dataDir + "ABC.mdb";
Properties props = new java.util.Properties();
props.put("charSet", "Cp1252");
java.sql.Connection con = java.sql.DriverManager.getConnection(conStr, props);
java.sql.Statement stmt = con.createStatement();
return stmt.executeQuery(sql);
}
You need to check what your charset might be. So potentially replace 'Cp1252'.

How to restrict ‘ this special character in textarea using regular expression in java servlet [duplicate]

I have the following string value: "walmart obama 👽💔"
I am using MySQL and Java.
I am getting the following exception: `java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F...'
Here is the variable I am trying to insert into:
var1 varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL`
My Java code that is trying to insert "walmart obama 👽💔" is a preparedStatement. So I am using the setString() method.
It looks like the problem is the encoding of the values 👽💔. How can I fix this? Previously I was using Derby SQL and the values 👽💔 just ended up being two sqaures (I think this is the representation of the null character)
All help is greatly appreciated!

What you have is EXTRATERRESTRIAL ALIEN (U+1F47D) and BROKEN HEART (U+1F494) which
are not in the basic multilingual plane. They cannot be even represented in java as one char, "👽💔".length() == 4. They are definitely not null characters and one will see squares if you are not using fonts that support them.
MySQL's utf8 only supports basic multilingual plane, and you need to use utf8mb4 instead:
For a supplementary character, utf8 cannot store the character at all,
while utf8mb4 requires four bytes to store it. Since utf8 cannot store
the character at all, you do not have any supplementary characters in
utf8 columns and you need not worry about converting characters or
losing data when upgrading utf8 data from older versions of MySQL.
So to support these characters, your MySQL needs to be 5.5+ and you need to use utf8mb4 everywhere. Connection encoding needs to be utf8mb4, character set needs to be utf8mb4 and collaction needs to be utf8mb4. For java it's still just "utf-8", but MySQL needs a distinction.
I don't know what driver you are using but a driver agnostic way to set connection charset is to send the query:
SET NAMES 'utf8mb4'
Right after making the connection.
See also this for Connector/J:
14.14: How can I use 4-byte UTF8, utf8mb4 with Connector/J?
To use 4-byte UTF8 with Connector/J configure the MySQL server with
character_set_server=utf8mb4. Connector/J will then use that setting
as long as characterEncoding has not been set in the connection
string. This is equivalent to autodetection of the character set.
Adjust your columns and database as well:
var1 varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL
Again, your MySQL version needs to be relatively up-to-date for utf8mb4 support.

Weirdly, I found that REMOVING &characterEncoding=UTF-8 from the JDBC url did the trick for me with similar issues.
Based on my properties,
jdbc_url=jdbc:mysql://localhost:3306/dbName?useUnicode=true
I think this supports what #Esailija has said above, i.e. my MySQL, which is indeed 5.5, is figuring out its own favorite flavor of UTF-8 encoding.
(Note, I'm also specifying the InputStream I'm reading from as UTF-8 in the java code, which probably doesn't hurt)...

All in all, to save symbols that require 4 bytes you need to update characher-set and collation for utf8mb4:
database table/column:
alter table <some_table> convert to character set utf8mb4 collate utf8mb4_unicode_ci
database server connection (see)
On my development enviromnt for #2 I prefer to set parameters on command line when starting the server:
mysqld --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
btw, pay attention to Connector/J behavior with SET NAMES 'utf8mb4':
Do not issue the query set names with Connector/J, as the driver will not detect that the character set has changed, and will continue to use the character set detected during the initial connection setup.
And avoid setting characterEncoding parameter in connection url as it will override configured server encoding:
To override the automatically detected encoding on the client side, use the characterEncoding property in the URL used to connect to the server.

How I solved my problem.
I had
?useUnicode=true&characterEncoding=UTF-8
In my hibernate jdbc connection url and I changed the string datatype to longtext in database, which was varchar before.

Append the line useUnicode=true&characterEncoding=UTF-8 to your jdbc url.
In your case the data is not being send using UTF-8 encoding.

I faced the same issue and solved it by setting the Collation to utf8_general_ci for each column.

I guess MySQL doesn't believe this to be valid UTF8 text. I tried an insert on a test table with the same column definition (mysql client connection was also UTF8) and although it did the insert, the data I retrieved with the MySQL CLI client as well as JDBC didn't retrieve the values correctly. To be sure UTF8 did work correctly, I inserted an "ö" instead of an "o" for obama:
johan#maiden:~$ mysql -vvv test < insert.sql
--------------
insert into utf8_test values(_utf8 "walmart öbama 👽💔")
--------------
Query OK, 1 row affected, 1 warning (0.12 sec)
johan#maiden:~$ file insert.sql
insert.sql: UTF-8 Unicode text
Small java application to test with:
package test.sql;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
public class Test
{
public static void main(String[] args)
{
System.out.println("test string=" + "walmart öbama 👽💔");
String url = "jdbc:mysql://hostname/test?useUnicode=true&characterEncoding=UTF-8";
try
{
Class.forName("com.mysql.jdbc.Driver").newInstance();
Connection c = DriverManager.getConnection(url, "username", "password");
PreparedStatement p = c.prepareStatement("select * from utf8_test");
p.execute();
ResultSet rs = p.getResultSet();
while (!rs.isLast())
{
rs.next();
String retrieved = rs.getString(1);
System.out.println("retrieved=\"" + retrieved + "\"");
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
Output:
johan#appel:~/workspaces/java/javatest/bin$ java test.sql.Test
test string=walmart öbama 👽💔
retrieved="walmart öbama "
Also, I've tried the same insert with the JDBC connection and it threw the same exception you are getting.
I believe this to be a MySQL bug. Maybe there's a bug report about such a situation already..

I had kind of the same problem and after going carefully against all charsets and finding that they were all right, I realized that the bugged property I had in my class was annotated as #Column instead of #JoinColumn (javax.presistence; hibernate) and it was breaking everything up.

execute
show VARIABLES like "%char%”;
find character-set-server if is not utf8mb4.
set it in your my.cnf, like
vim /etc/my.cnf
add one line
character_set_server = utf8mb4
at last restart mysql

This setting useOldUTF8Behavior=true worked fine for me. It gave no incorrect string errors but it converted special characters like Ã into multiple characters and saved in the database.
To avoid such situations, I removed this property from the JDBC parameter and instead converted the datatype of my column to BLOB. This worked perfect.

Besides,data type can use blob install of varchar or text.

MySQL Query Parsing using java

I need to parse a query which a user enters, say in a text box, and then what I need, is that I want to encrypt all the values in query leaving the query keywords. To convert it into an equivalent query that can be performed on an encrypted database.
Such as,
select name from employee where salary = 10000
I need an equivalent query as,
select name_enc from employee_enc where salary_enc = 10000_enc
where name_enc,employee_enc, salary_enc and 10000_enc are the encrypted values of name, employee, salary and 10000. I need to do this in java and the the database I'm using is MySQL Server where the table Employee is already encrypted.
Please provide any necessary help. Thanks in advance.

You may want to consider using code from Alibaba's Druid project. Although designed as a sophisticated connection pooling library, this project supports a very advanced parser and AST for ANSI SQL and non-ANSI dialects such as MySQL, Oracle, SQL Server, etc. The project is open source and bears the very liberal Apache License Version 2.0.
The main entry points into this part of the library is SQLUtils.java. You can use values returned from SQLUtils.parseStatements to access a typed model of the statements:
List<SQLStatement> statements = SQLUtils.parseStatements(sql, JdbcConstants.MYSQL);
for (SQLStatement statement : statements) {
if (statement instanceof SQLSelectStatement) {
SQLSelectStatement createTable = (SQLSelectStatement) statement;
// Use methods like: createTable.getSelect().getQuery().
}
}

If you don't need to do it manually use SQL's included encryption and encoding operations.
If you need to do it manually split your SQL query string by spaces and ignore SQL key words as you loop to encrypt. Remember to encode your cipher results with base 64 or hex to ensure data integrity.
private String encryptSQLQuery(String plainSQLQuery){
StringBuilder cipherQuery = new StringBuilder();
String plainQuery = plainSQLQuery;
String[] splitQuery = plainQuery.split("\\s+");
for(String queryWord : splitQuery){
if(!isSQLKeyWord(queryWord))
queryWord = cryptoObject.encryptAndEncode(queryWord);
cipherQuery.append(queryWord);
cipherQuery.append(" ");
}
return cipherQuery.toString();
}
Note that you will have to implement the isSQLKeyWord() and CryptoObject.encryptAndEncode() methods.

Handling character encoding from Java to PHP to MySQL

In Java I pass a String to PHP.
In PHP I take that String and do a search for it with a MySQL query.
Here is php code:
$query = $database->escape_value(trim($_POST['query']));
$result = mysqli_query($dbconnection, Data::getSearchQuery($query));
while ($row = mysqli_fetch_assoc($result)) {
$output[] = $row;
}
print(json_encode($output));
mysqli_close($dbconnection);
public static function getSearchQuery($item_query) {
$query = "
SELECT i.item, i.item_id, c.category, c.cat_id
FROM items as i
LEFT JOIN master_cat AS c
ON (c.cat_id = i.cat_id)
WHERE i.item LIKE '%{$item_query}%'
ORDER BY i.item ASC;";
return $query;
}
This always works if I use regular characters on my U.S. keyboard. But the moment I start using irregular characters, the search turns empty.
I can verify that MySQL stores the data AS THE USER ENTERS IT. So if they typed Beyoncè, that is how database stores it.
But when I search for Beyoncè (or whatever) in the above code, it returns empty.
How should I handle the char. encoding here?

Three points to think of:
1) The $item_query variable could come in wrong encoding.
2) >>I can verify that MySQL stores the data AS THE USER ENTERS IT
This can get tricky. If one writes an iso8859-1 encoded string to an utf-8 database, the string is obviously stored incorrectly. If that string is read with a client (i. e. phpmyadmin or mysql command line tool) configured to iso8859-1, the string is correctly returned - although its representation in the database is clearly wrong.
3) The MySql settings:
Have your set utf-8 for the connection itself? What about charsets and collations for the database/the table?
https://dev.mysql.com/doc/refman/5.5/en/charset-syntax.html
UPDATE:
I assume you want everything to be UTF-8. Kind of quick hack to test:
Beyoncé has 7 characters (see MySQL CHAR_LENGTH function)
in UTF-8, it occupies 8 bytes (see MySQL LENGTH function). The eight bytes are, represented in a one-byte-per-character encoding like windows-1252, something like BeyoncÃ©.
This leads to the following diagnostic tests ...
The PHP-issued SQL command
"SELECT CHAR_LENGTH($item_query), LENGTH($item_query);"
should then return a result of (7, 8) to show us that the $item_query variable is probably correctly encoded and the database likes UTF-8. (7, 7) would mean $item_query wasn't UTF-8, and (8, 8) would mean the database doesn't want to deal with UTF-8 yet. If the latter is the case, then perhaps issue a SET NAMES 'UTF8'; before the query.
Similarly, the PHP-issued SQL command
SELECT CHAR_LENGTH('Beyoncé'), LENGTH('Beyoncé');
should return the result (7, 8) to show us that your PHP editor is configured to edit UTF-8 php files.
Repeat the previous step with phpmyadmin (or any SQL client) to be sure that this client uses UTF-8, too.
No table was involved yet! The SQL command
SELECT CHAR_LENGTH(somecolumn), LENGTH(somecolumn) FROM sometable;
(with sometable having UTF-8 character encoding and somecolumn containing some diacritical characters) should tell you if UTF-8 was used when storing values to the table.
If all previous tests passed, test again with LIKE. Even 'Beyoncé' LIKE 'Beyonce' should work then. For more information, google MySQL collation.

Java POST data to mySQL UTF-8 encoding issue

I have POST data that contains the Japanese string AKB48 ネ申テレビ シーズン3, defined in jQuery as data.
$("#some_div").load("someurl", { data : "AKB48 ネ申テレビ シーズン3"})
The post data is sent to Java Servlet:
String data = new String(this.request.getParameter("data").getBytes("ISO-8859-1"), "UTF-8");
My program saves it to MySQL, but after the data is saved to the database it becomes:
AKB48 u30CDu7533u30C6u30ECu30D3 u30B7u30FCu30BAu30F33
What should I do if I want to save it as it is in UTF-8? All my files are in UTF-8.
MySQL encoding is utf8 and here is the code
String sql = "INSERT INTO Inventory (uid, item_id, item_data, ctime) VALUES ("
+ inventory.getUid() + ",'"
+ inventory.getItemId() + "','"
+ StringEscapeUtils.escapeJava(inventory.getItemData()) + "',CURRENT_TIMESTAMP)";
Statement stmt = con.createStatement();
int cnt = stmt.executeUpdate(sql);

From your example above, I can verify that the Japanese string is getting saved to your MySQL database correctly, but as escaped Unicode.
I would check these items in order:
Are your tables and columns all set to have a character set and collation for utf8? I.e.,
CHARACTER SET utf8 COLLATE utf8_general_ci
Are explicitly setting the character set encoding before POST? request.setCharacterEncoding("UTF-8");
Are you setting the character encoding for your db connections? I.e., jdbc:mysql://localhost:3306/YOURDB?useUnicode=true&characterEncoding=UTF8
As the others have pointed out, you should not use that getBytes trick. It will surely mess up the POSTed values.
EDIT
Do not use StringEscapeUtils.escapeJava, since that will turn your string into escaped Unicode. That is what is transforming AKB48 ネ申テレビ シーズン3 into AKB48 u30CDu7533u30C6u30ECu30D3 u30B7u30FCu30BAu30F33.

Why you do not just extract value of parameter like this.request.getParameter("data")?
Your data is sent correctly using URL encoding where each unicode character is replaced by its code. Then you have to get the value of the parameter. When you are requesting bytes using ISO-8859-1 you are actually corrupting your data because the string is represented as a sequence if codes in textual form.

Java strings are stored in UTF-16. So, this code:
String data = new String(this.request.getParameter("data").getBytes("ISO-8859-1"), "UTF-8");
decodes a UTF-16 string (which has been re-encoded from UTF-8 in the HTTP protocol) into a binary array using the ISO-8859-1 charset, and re-encodes the binary array using the UTF-8 charset. This is almost certainly not what you want.
What happens when you use this?
String data = this.request.getParameter("data");
System.out.println(data);
If the second line generates bad data, then your problem is likely in jQuery. Determine that you are indeed getting unicode in your jQuery request:
System.out.println(this.request.getHeader("Content-Encoding"));
If it does not generate bad data, but the data doesn't get stored correctly in mySQL, your problem is at the database level. Make sure your column type supports unicode strings.

What's the point of the line
String data = new String(this.request.getParameter("data").getBytes("ISO-8859-1"), "UTF-8");
You're transforming chinese (or at least non-occidental) characters into bytes using the ISO-8859-1 encoding. Of course this can't work, since chinese characters are not supported by the ISO-8859-1 encoding. ANd then you're constructing a new String from bytes that are supposed to represent ISO-8859-1-encoded characters, using the UTF-8 encoding. This, once again, doesn't make any sense. UTF-8 and ISO-8859-1 are not the same thing, and only a small set of chars have the same encoding in both formats.
Just use
String data = this.request.getParameter("data");
and everything should be OK, provided that the column in the MySQL table uses an encoding that supports these characters.
EDIT:
now that you've shown us the code used to insert the data in database, I know where all this comes from (the preceding points are still valid, though). You're doing
StringEscapeUtils.escapeJava(inventory.getItemData())
What's the point? escapeJava is used to take a String and escape special characters in order to make it a valid Java String literal. It has nothing to do with SQL. Use a prepared statement:
String sql = "INSERT INTO Inventory (uid, item_id, item_data, ctime) VALUES (?, ?, ?, CURRENT_TIMESTAMP);
PreparedStatement stmt = con.prepareStatement();
stmt.setInteger(1, inventory.getUid()); // or setLong, depending on the type
stmt.setString(2, inventory.getItemId());
stmt.setString(inventory.getItemData());
int cnt = stmt.executeUpdate();
The PreparedStatement will take care of escaping special SQL characters correctly. They're the best tool agains SQL injection attack, and should always be used when a query has parameters, especially if the parameters come from the end user. See http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.