csv character set load infile issue - java

i have a MySQL database having standard encoding and server encoding all set as utf8.I have csv files coming in of multiple encoding which I have to load in the database using jdbc. But when the incoming file is of encoding ANSII, load data infile fails
java.sql.SQLException: Invalid utf8 character string: '1080'
I am creating a table table_abc based on csv headers and then using the below query to load the csv file into database
LOAD DATA LOCAL INFILE 'XXX.csv' INTO TABLE table_abc CHARACTER SET UTF8 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES
Here is my DB definition
character_set_client utf8
character_set_connection utf8
character_set_database utf8
character_set_filesystem binary
character_set_results utf8
character_set_server utf8
character_set_system utf8
character_sets_dir C:\Program Files\MySQL\MySQL Server 5.7\share\charsets\
What should I do now,
Should i convert all files to utf8 before uploading? if yes then how in Java
Should I have multiple encoded tables for multiple encoded files? If yes, then how do i detect encoding of incoming file in java?
P.S I have no issues in missing out non-utf8 characters while loading in the table, my only intention is the sucessful upload of the file in the DB without giving any error irrespective of encoding.
Thanks

If you mean that some columns are utf8 and some columns are, say, latin1, then it gets a bit complicated, but still possible.
Create a "staging" table to put the data into from the LOAD. But have all the VARCHAR columns be VARBINARY and TEXT be BLOB. This way the data bytes will be loaded unchanged.
Then ALTER that table to convert the binary/blob columns to the suitable varchar/text types:
ALTER ...
MODIFY COLUMN col1 VARCHAR(111) CHARACTER SET ... COLLATION ...,
MODIFY COLUMN col2 TEXT CHARACTER SET ... COLLATION ...,
...;
Then copy the data over to your 'real' table (unless this table is sufficient).
If one column has a mixture of encodings, you are SOOL.
Identifying a charset
Provide a sample or two of the HEX of non-English characters in the column; I can usually spot what it is. This gives some clues of how to recognize a charset from hex samples.

Related

How to "activate" UTF8 with Mysql - java

I have problems when I insert into tables words with accent marks. So I think that I have to "activate" UTF-8 to fix that error.
I'm not using Class for name. That's my code:
miInitialContext = new InitialContext();
miDS = (DataSource) miInitialContext.lookup(InformacionProperties.getStrDataSource());
Connection conexion = miDS.getConnection();
Statement myStatement = conexion.createStatement();
myStatement.executeUpdate("INSERT INTO table values ......)
How can I "activate" that UTF8 with my code?
Use set names to set your connection charset. However, that won't matter much if the columns/tables/databases you interact with aren't configured with a compatible charset. For instance, with a latin1 column testcol, inserting utf8 data will result in an error like
INSERT INTO `test`.`table` (`testcol`) VALUES ('test_val'), ('Ídata');
ERROR 1366 (HY000): Incorrect string value: '\xE2\x88\x9A\xC3\xA7d...' for column 'testcol' at row 2
So you'll need to update the table structure
ALTER TABLE t MODIFY testcol CHAR(50) CHARACTER SET utf8;
Which then fixes the issue:
INSERT INTO `test`.`table` (`testcol`) VALUES ('test_val'), ('Ídata');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
(See the mysql docs for details).
This post has good documentation on finding the character sets of various structures.
(Thanks to Shadow for the SET NAMES part)

mysql character_set_results is not changed

I'm using mysql with hibernate and having problems with all other languages then English. I'm getting and exception saying that the language is not utf8 though the language is utf8 (hebrew).
I ran show variables like '%character%'; and this is what I got:
I think maybe character_set_server is the problem? it is latin1 and I can't change it to utf8, how do I do it? I'm using amazon RDS and there under parameters group I see utf character_set_server, so I don't understand why its not utf8 above.
On the other hand maybe it's not the problem at all. Any Other suggestions are welcome.
EDIT:
I managed to change the attached image values to utf8 for everything but still I"m getting the following exception:
2016-02-21 08:46:05 DEBUG SqlExceptionHelper:139 - could not execute statement [n/a]
java.sql.SQLException: Incorrect string value: '\xD7\xAA\xD7\xA9\xD7\x95...' for column 'text' at row 1
...
...
...
2016-02-21 08:46:05 WARN SqlExceptionHelper:144 - SQL Error: 1366, SQLState: HY000
2016-02-21 08:46:05 ERROR SqlExceptionHelper:146 - Incorrect string value: '\xD7\xAA\xD7\xA9\xD7\x95...' for column 'text' at row 1
2016-02-21 08:46:05 INFO AbstractBatchImpl:208 - HHH000010: On release of batch it still contained JDBC statements
2016-02-21 08:46:05 DEBUG SqlExceptionHelper:225 - SQL Warning
java.sql.SQLWarning: Incorrect string value: '\xD7\xAA\xD7\xA9\xD7\x95...' for column 'text' at row 1
EDIT 2:
So I managed also to fix the exception. It is now saved fine in the DB.
I fixed it by calling the following command for each column:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci;
My problem now is that the results are returned from DB with question marks.
I still see empty value for character_set_results when calling show variables like '%character%';
I don't think character_set_server is the problem.
\xD7\xAA\xD7\xA9\xD7\x95 is hex for the utf8-encoding for 'תשו'. If it were interpreted as latin1, it would be 'תשו'.
A strange setting is the empty value (empty string? NULL?) for character_set_result, which controls transliteration during SELECT.
Please provide the output of SELECT col, HEX(col) FROM ... -- If you get hex of D7AAD7A9D795 for that Hebrew string, then the data is stored correctly, and we should look at the output side. If not, then the data is stored incorrectly. Or that ALTER messed things up.
Hebrew, in utf8, displayed as hex, is mostly 'D7xx'.
You need utf8 in several places:
The bytes you are inserting need to be encode in utf8.
The connection needs to be in utf8. <property name="url" value="jdbc:mysql://...&characterSetResults=utf8&characterEncoding=utf-8"/>
The table definition needs to say CHARACTER SET utf8 (or utf8mb4). Do SHOW CREATE TABLE.
The html output needs <meta charset=utf-8" />.
If data is coming from an HTML form: <form accept-charset="UTF-8">

character_result_set is empty in mysql

I'm using mysql DB. The DB is amazone RDS DB.
When I execute this query: show global variables like '%character%';
I get the following result:
But when I execute this query: show variables like '%character%'; I get this result:
As you can see character_set_results is empty. I tried following queries and nothing changes the empty value:
ALTER DATABASE myDB CHARACTER SET utf8;
ALTER DATABASE myDB CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER DATABASE myDB DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
SET CHARACTER_SET_RESULTS=UTF8
SET NAMES 'utf8';
SET CHARACTER SET 'utf8';
SET session CHARACTER_SET_RESULTS = utf8;
As I understand the second query returns session parameters. Might this affect the results I'm getting?
I have two machines. Development machine where I run the application with eclipse (this is a java app with hibernate) and another one, the deployment machine. Everything works fine on the development machine but on the deployment machine sometimes I'm getting ???? or other strange characters when getting data from DB.
Both machines connect to the same DB and data is stored fine in the db itself.
Also the connection url is jdbc:mysql://myDB:3306/myApp?autoReconnect=true&useUnicode=true&createDatabaseIfNotExist=true&characterEncoding=utf-8.
Any ideas what can cause this?
Multiple question marks usually come from:
You are INSERTing Chinese (or any non-western-Europe text).
You said SET NAMES utf8 to declare that the client bytes are utf8-encoded (correct).
But the table columns are declared CHARSET latin1. <-- This is the problem.
Since there is no way to convert Chinese characters into latin1, '?' stored.
Please provide SHOW CREATE TABLE to confirm the above hypothesis.
If you are getting "other strange characters", please provide:
SELECT col, HEX(col) FROM ...
to show an example of what is stored for the "strange characters".

Java insert bit data to Mysql [duplicate]

I have an unnormalized events-diary CSV from a client that I'm trying to load into a MySQL table so that I can refactor into a sane format. I created a table called 'CSVImport' that has one field for every column of the CSV file. The CSV contains 99 columns , so this was a hard enough task in itself:
CREATE TABLE 'CSVImport' (id INT);
ALTER TABLE CSVImport ADD COLUMN Title VARCHAR(256);
ALTER TABLE CSVImport ADD COLUMN Company VARCHAR(256);
ALTER TABLE CSVImport ADD COLUMN NumTickets VARCHAR(256);
...
ALTER TABLE CSVImport Date49 ADD COLUMN Date49 VARCHAR(256);
ALTER TABLE CSVImport Date50 ADD COLUMN Date50 VARCHAR(256);
No constraints are on the table, and all the fields hold VARCHAR(256) values, except the columns which contain counts (represented by INT), yes/no (represented by BIT), prices (represented by DECIMAL), and text blurbs (represented by TEXT).
I tried to load data into the file:
LOAD DATA INFILE '/home/paul/clientdata.csv' INTO TABLE CSVImport;
Query OK, 2023 rows affected, 65535 warnings (0.08 sec)
Records: 2023 Deleted: 0 Skipped: 0 Warnings: 198256
SELECT * FROM CSVImport;
| NULL | NULL | NULL | NULL | NULL |
...
The whole table is filled with NULL.
I think the problem is that the text blurbs contain more than one line, and MySQL is parsing the file as if each new line would correspond to one databazse row. I can load the file into OpenOffice without a problem.
The clientdata.csv file contains 2593 lines, and 570 records. The first line contains column names. I think it is comma delimited, and text is apparently delimited with doublequote.
UPDATE:
When in doubt, read the manual: http://dev.mysql.com/doc/refman/5.0/en/load-data.html
I added some information to the LOAD DATA statement that OpenOffice was smart enough to infer, and now it loads the correct number of records:
LOAD DATA INFILE "/home/paul/clientdata.csv"
INTO TABLE CSVImport
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
But still there are lots of completely NULL records, and none of the data that got loaded seems to be in the right place.
Use mysqlimport to load a table into the database:
mysqlimport --ignore-lines=1 \
--fields-terminated-by=, \
--local -u root \
-p Database \
TableName.csv
I found it at http://chriseiffel.com/everything-linux/how-to-import-a-large-csv-file-to-mysql/
To make the delimiter a tab, use --fields-terminated-by='\t'
The core of your problem seems to be matching the columns in the CSV file to those in the table.
Many graphical mySQL clients have very nice import dialogs for this kind of thing.
My favourite for the job is Windows based HeidiSQL. It gives you a graphical interface to build the LOAD DATA command; you can re-use it programmatically later.
Screenshot: "Import textfile" dialog
To open the Import textfile" dialog, go to Tools > Import CSV file:
Simplest way which I have imported 200+ rows is below command in phpmyadmin sql window
I have a simple table of country with two columns
CountryId,CountryName
here is .csv data
here is command:
LOAD DATA INFILE 'c:/country.csv'
INTO TABLE country
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
Keep one thing in mind, never appear , in second column, otherwise your import will stop
I Used this method to import more than 100K records (~5MB) in 0.046sec
Here's how you do it:
LOAD DATA LOCAL INFILE
'c:/temp/some-file.csv'
INTO TABLE your_awesome_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(field_1,field_2 , field_3);
It is very important to include the last line , if you have more than one field i.e normally it skips the last field (MySQL 5.6.17)
LINES TERMINATED BY '\n'
(field_1,field_2 , field_3);
Then, assuming you have the first row as the title for your fields, you might want to include this line also
IGNORE 1 ROWS
This is what it looks like if your file has a header row.
LOAD DATA LOCAL INFILE
'c:/temp/some-file.csv'
INTO TABLE your_awesome_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(field_1,field_2 , field_3);
phpMyAdmin can handle CSV import. Here are the steps:
Prepare the CSV file to have the fields in the same order as the MySQL table fields.
Remove the header row from the CSV (if any), so that only the data is in the file.
Go to the phpMyAdmin interface.
Select the table in the left menu.
Click the import button at the top.
Browse to the CSV file.
Select the option "CSV using LOAD DATA".
Enter "," in the "fields terminated by".
Enter the column names in the same order as they are in the database table.
Click the go button and you are done.
This is a note that I prepared for my future use, and sharing here if someone else can benefit.
If you are using MySQL Workbench (currently 6.3 version) you can do this by:
Right click on "Tables";
Chose Table Data Import Wizard;
Chose your csv file and follow the instructions (JSON also could be used);
The good thing is that you can create a new table based on the csv file you want to import or load data to an existing table
You can fix this by listing the columns in you LOAD DATA statement. From the manual:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);
...so in your case you need to list the 99 columns in the order in which they appear in the csv file.
Try this, it worked for me
LOAD DATA LOCAL INFILE 'filename.csv' INTO TABLE table_name FIELDS TERMINATED BY ',' ENCLOSED BY '"' IGNORE 1 ROWS;
IGNORE 1 ROWS here ignores the first row which contains the fieldnames. Note that for the filename you must type the absolute path of the file.
I see something strange. You are using for ESCAPING the same character you use for ENCLOSING. So the engine does not know what to do when it founds a '"' and I think that is why nothing seems to be in the right place.
I think that if you remove the line of ESCAPING, should run great. Like:
LOAD DATA INFILE "/home/paul/clientdata.csv"
INTO TABLE CSVImport
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
Unless you analyze (manually, visually, ... ) your CSV and find which character uses for escape. Sometimes is '\'. But if you do not have it, do not use it.
The mysql command line is prone to too many problems on import. Here is how you do it:
use excel to edit the header names to have no spaces
save as .csv
use free Navicat Lite Sql Browser to import and auto create a new table (give it a name)
open the new table insert a primary auto number column for ID
change the type of the columns as desired.
done!
Yet another solution is to use csvsql tool from amazing csvkit suite.
Usage example:
csvsql --db mysql://$user:$password#localhost/$database --insert --tables $tablename $file
This tool can automatically infer the data types (default behavior), create table and insert the data into the created table. --overwrite option can be used to drop table if it already exists. --insert option — to populate the table from the file.
To install the suite
pip install csvkit
Prerequisites: python-dev, libmysqlclient-dev, MySQL-python
apt-get install python-dev libmysqlclient-dev
pip install MySQL-python
In case if you using Intellij
https://www.jetbrains.com/datagrip/features/importexport.html
I use mysql workbench to do the same job.
create new schema
open newly created schema
right click on "Tables" and select "Table Data Import Wizard"
give the csv file path and table name and finally configure your column type because the wizard set default column type based on their values.
Note: take a look at mysql workbench's log file for any errors by using "tail -f [mysqlworkbenchpath]/log/wb*.log"
How to import csv files to sql tables
Example file: Overseas_trade_index data CSV File
Steps:
Need to create table for overseas_trade_index.
Need to create columns related to csv file.
SQL Query:
( id int not null primary key auto_increment,
series_reference varchar (60),
period varchar (60),
data_value decimal(60,0),
status varchar (60),
units varchar (60),
magnitude int(60),
subject text(60),
group text(60),
series_title_1 varchar (60),
series_title_2 varchar (60),
series_title_3 varchar (60),
series_title_4 varchar (60),
series_title_5 varchar (60),
);
Need to connect mysql database in terminal.
=>show databases;
=>use database;
=>show tables;
Please enter this command to import the csv data to mysql tables.
load data infile '/home/desktop/Documents/overseas.csv' into table trade_index fields terminated by ',' lines terminated by '\n' (series_reference,period,data_value,status,units,magnitude,subject,series_title1,series_title_2,series_title_3,series_title_4,series_title_5);
Find this overseas trade index data on sqldatabase:
select * from trade_index;
If you are using a windows machine with Excel spreadsheet loaded, the new mySql plugin to Excel is phenomenal. The folks at Oracle really did a nice job on that software. You can make the database connection directly from Excel. That plugin will analyse your data, and set up the tables for you in a format consistent with the data. I had some monster big csv files of data to convert. This tool was a big time saver.
http://dev.mysql.com/downloads/windows/excel/
You can make updates from within Excel that will populate to the database online. This worked exceedingly well with mySql files created on ultra inexpensive GoDaddy shared hosting. (Note when you create the table at GoDaddy, you have to select some off-standard settings to enable off site access of the database...)
With this plugin you have pure interactivity between your XL spreadsheet and online mySql data storage.
I know that my answer is late, but I'd like to mention a few other ways to do it.
The easiest one is using command line. The steps will be the following:
Accessing the MySQL CLI by entering the below command:
mysql -u my_user_name -p
Creating a table in the database
use new_schema;
CREATE TABLE employee_details (
id INTEGER,
employee_name VARCHAR(100),
employee_age INTEGER,
PRIMARY KEY (id)
);
Importing the CSV file into a table. We can either mention the file path or store the file in the default directory of the MySQL server.
LOAD DATA INFILE 'Path to the exported csv file'
INTO TABLE employee_details
FIELDS TERMINATED BY ','
IGNORE 1 ROWS;
It's the only one of many solutions, I found it in this tutorial
If loading CSV files into MySQL database is your daily task, then it'll be better to automate this process. In this case you can use some 3rd-party tools that allows you to load data in schedule.
PHP Query for import csv file to mysql database
$query = <<<EOF
LOAD DATA LOCAL INFILE '$file'
INTO TABLE users
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(name,mobile,email)
EOF;
if (!$result = mysqli_query($this->db, $query))
{
exit(mysqli_error($this->db));
}
**Sample CSV file data **
name,mobile,email
Christopher Gritton,570-686-3439,ChristopherKGritton#inbound.plus
Brandon Wilson,541-309-5149,BrandonMWilson#inbound.plus
Craig White,516-795-8065,CraigJWhite#inbound.plus
David Whitney,713-214-3966,DavidCWhitney#inbound.plus
Here is sample excel file screen shot:
Save as and choose .csv.
And you will have as shown below .csv data screen shot if you open using notepad++ or any other notepad.
Make sure you remove header and have column alignment in .csv as in mysql Table.
Replace folder_name by your folder name
LOAD DATA LOCAL INFILE
'D:/folder_name/myfilename.csv'
INTO TABLE mail
FIELDS TERMINATED BY ','
(fname,lname ,email, phone);
If big data, you can take coffee and have it load!.
Thats all you need.
Change servername,username, password,dbname,path of your file, tablename and the field which is in your database you want to insert
<?php
$servername = "localhost";
$username = "root";
$password = "";
$dbname = "bd_dashboard";
//For create connection
$conn = new mysqli($servername, $username, $password, $dbname);
$query = "LOAD DATA LOCAL INFILE
'C:/Users/lenovo/Desktop/my_data.csv'
INTO TABLE test_tab
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(name,mob)";
if (!$result = mysqli_query($conn, $query)){
echo '<script>alert("Oops... Some Error occured.");</script>';
exit();
//exit(mysqli_error());
}else{
echo '<script>alert("Data Inserted Successfully.");</script>'
}
?>
I did it in simple way using phpmyadmin. I followed the steps by #Farhan but all data were eltered in single column.
How I did:
Created a CSV file and deleted the header row with column names. Kept only data.
I created a table with column names matching the csv columns.
Remember to assign appropriate types to each column.
I just selected the import and went to import tab.
In browse I selected the CSV file and kept all options as it is.
To my surprise all the data got imported successfully in their appropriate columns.
When executing MySQL Query to import CSV I was getting error
'Error Code: 1290. The MySQL server is running with the --secure-file-priv option so it cannot execute this statement'
So I moved file to secure file location
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/Orders.csv'
INTO TABLE orderdetails.orders
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
Where location of file is 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/Orders.csv' this is because, I moved my CSV file to 'secure_file_priv' location otherwise I was getting above error
You can get your secure_file_priv using query SHOW VARIABLES LIKE "secure_file_priv";
Source: Import CSV file to MySQL (Query or using Workbench)

String Converting Issue

Ive been stunk in working with java and mysql these days.
The problem is, ive got a mysql database. There is a column in one table which shows the chinese city names. One collegue changed the db to utf8 for every character(connection, db, results, server and system) The consequence is that the data before the change didn't show correctly any more only if i set the %character% back to latin1. In either character set i can only retrive half the data correctly. Could you please help me how to solve the problem?
Ive tried to use java to solve the problem but it doesn't work.
String sql = "SELECT * FROM customer_addresses";
ResultSet result = query.executeQuery(sql);
while (result.next()) {
byte b[] = result.getBytes("city");
c = new String(result.getBytes("city"), "UTF-8");
}
For example: there is one city in db like this 乌é²æœ¨é½å¸‚
the java print: 乌�?木�?市
it should be:乌鲁木齐市
Thanks in advance
Default charset of your MySQL server is probably not UTF8. Try to execute the following SQL queries before getting data from the database:
SET NAMES utf8
and
SET CHARACTER SET utf8
Add characterEncoding=UTF-8 to the connection string, where you connect to the database. For example:
"jdbc:mysql://servername:3306/databasename?characterEncoding=UTF-8"
Incidentally, the data in the database appears to be broken. If you want the database to store 乌鲁木齐市, that's what should be in the table, not 乌é²æœ¨é½å¸.
Update: The problem with the how the data is stored in the database is easier to solve using database's own tools, not Java. For each table that stores text do this:
ALTER TABLE tablename CONVERT TO CHARACTER SET binary;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8;

Categories

Resources