We are trying to save Dataframe to a Hive Table using the saveAsTable() method. But, We are getting the below exception. We are trying to store the data as TextInputFormat.
Exception in thread "main" org.apache.spark.sql.AnalysisException: Saving data in the Hive serde table `cdx_network`.`inv_devices_incr` is not supported yet. Please use the insertInto() API as an alternative..;
reducedFN.write().mode(SaveMode.Append).saveAsTable("cdx_network.alert_pas_incr");
I tried insertInto() and also enableHiveSupport() and it works. But, I want to use saveAsTable() .
I want to understand why the saveAsTable() does not work. I tried going through the documentation and also the code. Did not get much understanding. It supposed to be working. I have seen issues raised by people who are using Parquet format but for TextFileInputFormat i did not see any issues.
Table definition
CREATE TABLE `cdx_network.alert_pas_incr`(
`alertid` string,
`alerttype` string,
`alert_pas_documentid` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'maprfs:/apps/cdx-dev/alert_pas_incr'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1524121971')
Looks like this is bug. I made a little research and found this issue SPARK-19152. Fixed version is 2.2.0. Unfortunately I can’t verify it, cause my company’s cluster uses version 2.1.0
I have the following string value: "walmart obama 👽💔"
I am using MySQL and Java.
I am getting the following exception: `java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F...'
Here is the variable I am trying to insert into:
var1 varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL`
My Java code that is trying to insert "walmart obama 👽💔" is a preparedStatement. So I am using the setString() method.
It looks like the problem is the encoding of the values 👽💔. How can I fix this? Previously I was using Derby SQL and the values 👽💔 just ended up being two sqaures (I think this is the representation of the null character)
All help is greatly appreciated!
What you have is EXTRATERRESTRIAL ALIEN (U+1F47D) and BROKEN HEART (U+1F494) which
are not in the basic multilingual plane. They cannot be even represented in java as one char, "👽💔".length() == 4. They are definitely not null characters and one will see squares if you are not using fonts that support them.
MySQL's utf8 only supports basic multilingual plane, and you need to use utf8mb4 instead:
For a supplementary character, utf8 cannot store the character at all,
while utf8mb4 requires four bytes to store it. Since utf8 cannot store
the character at all, you do not have any supplementary characters in
utf8 columns and you need not worry about converting characters or
losing data when upgrading utf8 data from older versions of MySQL.
So to support these characters, your MySQL needs to be 5.5+ and you need to use utf8mb4 everywhere. Connection encoding needs to be utf8mb4, character set needs to be utf8mb4 and collaction needs to be utf8mb4. For java it's still just "utf-8", but MySQL needs a distinction.
I don't know what driver you are using but a driver agnostic way to set connection charset is to send the query:
SET NAMES 'utf8mb4'
Right after making the connection.
See also this for Connector/J:
14.14: How can I use 4-byte UTF8, utf8mb4 with Connector/J?
To use 4-byte UTF8 with Connector/J configure the MySQL server with
character_set_server=utf8mb4. Connector/J will then use that setting
as long as characterEncoding has not been set in the connection
string. This is equivalent to autodetection of the character set.
Adjust your columns and database as well:
var1 varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL
Again, your MySQL version needs to be relatively up-to-date for utf8mb4 support.
Weirdly, I found that REMOVING &characterEncoding=UTF-8 from the JDBC url did the trick for me with similar issues.
Based on my properties,
jdbc_url=jdbc:mysql://localhost:3306/dbName?useUnicode=true
I think this supports what #Esailija has said above, i.e. my MySQL, which is indeed 5.5, is figuring out its own favorite flavor of UTF-8 encoding.
(Note, I'm also specifying the InputStream I'm reading from as UTF-8 in the java code, which probably doesn't hurt)...
All in all, to save symbols that require 4 bytes you need to update characher-set and collation for utf8mb4:
database table/column:
alter table <some_table> convert to character set utf8mb4 collate utf8mb4_unicode_ci
database server connection (see)
On my development enviromnt for #2 I prefer to set parameters on command line when starting the server:
mysqld --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
btw, pay attention to Connector/J behavior with SET NAMES 'utf8mb4':
Do not issue the query set names with Connector/J, as the driver will not detect that the character set has changed, and will continue to use the character set detected during the initial connection setup.
And avoid setting characterEncoding parameter in connection url as it will override configured server encoding:
To override the automatically detected encoding on the client side, use the characterEncoding property in the URL used to connect to the server.
How I solved my problem.
I had
?useUnicode=true&characterEncoding=UTF-8
In my hibernate jdbc connection url and I changed the string datatype to longtext in database, which was varchar before.
Append the line useUnicode=true&characterEncoding=UTF-8 to your jdbc url.
In your case the data is not being send using UTF-8 encoding.
I faced the same issue and solved it by setting the Collation to utf8_general_ci for each column.
I guess MySQL doesn't believe this to be valid UTF8 text. I tried an insert on a test table with the same column definition (mysql client connection was also UTF8) and although it did the insert, the data I retrieved with the MySQL CLI client as well as JDBC didn't retrieve the values correctly. To be sure UTF8 did work correctly, I inserted an "ö" instead of an "o" for obama:
johan#maiden:~$ mysql -vvv test < insert.sql
--------------
insert into utf8_test values(_utf8 "walmart öbama 👽💔")
--------------
Query OK, 1 row affected, 1 warning (0.12 sec)
johan#maiden:~$ file insert.sql
insert.sql: UTF-8 Unicode text
Small java application to test with:
package test.sql;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
public class Test
{
public static void main(String[] args)
{
System.out.println("test string=" + "walmart öbama 👽💔");
String url = "jdbc:mysql://hostname/test?useUnicode=true&characterEncoding=UTF-8";
try
{
Class.forName("com.mysql.jdbc.Driver").newInstance();
Connection c = DriverManager.getConnection(url, "username", "password");
PreparedStatement p = c.prepareStatement("select * from utf8_test");
p.execute();
ResultSet rs = p.getResultSet();
while (!rs.isLast())
{
rs.next();
String retrieved = rs.getString(1);
System.out.println("retrieved=\"" + retrieved + "\"");
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
Output:
johan#appel:~/workspaces/java/javatest/bin$ java test.sql.Test
test string=walmart öbama 👽💔
retrieved="walmart öbama "
Also, I've tried the same insert with the JDBC connection and it threw the same exception you are getting.
I believe this to be a MySQL bug. Maybe there's a bug report about such a situation already..
I had kind of the same problem and after going carefully against all charsets and finding that they were all right, I realized that the bugged property I had in my class was annotated as #Column instead of #JoinColumn (javax.presistence; hibernate) and it was breaking everything up.
execute
show VARIABLES like "%char%”;
find character-set-server if is not utf8mb4.
set it in your my.cnf, like
vim /etc/my.cnf
add one line
character_set_server = utf8mb4
at last restart mysql
This setting useOldUTF8Behavior=true worked fine for me. It gave no incorrect string errors but it converted special characters like à into multiple characters and saved in the database.
To avoid such situations, I removed this property from the JDBC parameter and instead converted the datatype of my column to BLOB. This worked perfect.
Besides,data type can use blob install of varchar or text.
I am trying to use Jooq to do an INSERT into a PostgreSQL database. The query fails if the String includes a backslash character with SQL state code: 42601 which means SYNTAX ERROR.
Jooq: 3.4.4
postgresql driver: 8.4-702.jdbc4
PostgreSQL: "PostgreSQL
8.4.20 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit"
JDK 1.8.0_25
Spring Tool Suite 3.6.0.RELEASE
Database:
CREATE TABLE datahub.test (
body TEXT NOT NULL
);
Jooq code generated using maven:
jooq-codegen-maven version 3.4.4
generator.name: org.jooq.util.DefaultGenerator
generator.database.name: org.jooq.util.postgres.PostgresDatabase
Unit test
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration(locations = {"/spring-config.xml"})
public class BatchExceptionJooqTest {
private static Logger log = LogManager.getLogger(BatchExceptionJooqTest.class);
#Autowired
private DSLContext db;
#Test
public void runBasicJooqTest(){
try{
final List<InsertQuery<TestRecord>> batchUpdate = Lists.newLinkedList();
InsertQuery<TestRecord> insertQuery = db.insertQuery(TEST);
insertQuery.addValue(TEST.BODY, "It's a bit more complicated than just doing copy and paste... :\\");
batchUpdate.add(insertQuery);
db.batch(batchUpdate).execute();
}catch(Exception e){
log.error(e);
}
}
}
Problem
The test fails with an exception:
2014-12-26 17:11:16,490 [main] ERROR BatchExceptionJooqTest:36 :runBasicJooqTest - org.jooq.exception.DataAccessException: SQL [null]; Batch entry 0 insert into "datahub"."test" ("body") values ('It''s a bit more complicated than just doing copy and paste... :\') was aborted. Call getNextException to see the cause.
The test passes, if instead of String: "It's a bit more complicated than just doing copy and paste... :\\" I use String: "It's a bit more complicated than just doing copy and paste... :\\\\". This seems a bit inconsistent when compared to what is happening to the the single quote during the operation. It is correctly doubled so as to get through the SQL parser. Not so with the backslash.
I read somewhere that escaping a backslash with another backslash is not part of the SQL standard and Postgre has changed its default behavior lately. However I am not clear on the meaning of the manual p 4.1.2.2 - it seems to indicate that double backslashes should work and there is not really any reason for jooq not to do it.
So.. could someone please explain if the described situation in Jooq:
Is desired behavior and there is no workaround besides doubling all incoming backslashes my application is processing?
Is desired behavior but there is a configuration change I can do to make Jooq process the backslashes in a similar manner to the single quotes?
Is it a bug?
What am I doing incorrectly?
Thank you
You are using PostgreSQL 8.x. In that version, the system defaulted to accepting backslash escaped string literals even without the preceding E.
To avoid this, you should set the server configuration variable standard_conforming_strings to ON.
It is, of course, strongly recommended that you migrate to a version of PostgreSQL higher than 8.x, as the 8.x versions have reached end-of-life and are no longer supported.
jOOQ 3.5 has introduced org.jooq.conf.Settings.backslashEscaping (https://github.com/jOOQ/jOOQ/issues/3000). This was mainly introduced for MySQL, which still today defaults to non-standards compliant string literal escaping using backslashes.
Note that this setting affects only inlined bind values, so it will not escape backslashes when binding values to a PreparedStatement.
I agree with RealSkeptic's answer, which suggests you change the database behaviour or upgrade to a newer PostgreSQL version.
I use sybase database and am trying to update some values into the database.
While trying to run this it throws an exception as :
com.sybase.jdbc2.jdbc.SybSQLException: The identifier that starts with 'WeeklyStudentEventClassArchiv' is too long. Maximum length is 30.
This table is in another database and thus i have to use the database name along with the table name as dhown below:
StudActive..WeeklyStudentEventClassArchiv which apparently exceeds 30 characters.
I have to use the databasename..tablename in the stored procudure but its throwing an exception.
This happens even if i physically embed the sql in the java code.
How can this be solved.
The Stored Procedue is as shown:
create proc dbo.sp_getStudentList(
#stDate int,
#endDate int
)
as
begin
set nocount on
select distinct studCode
StudActive..WeeklyStudentEventClassArchive
where studCode > 0
and courseStartDate between #stDate and #endDate
end
StudActive..WeeklyStudentEventClassArchiv which apparently exceeds 30
characters.
Yes - I count 41.
Rename the table and/or the stored proc and you should be fine. It wounds like a limitation of either the JDBC driver or the database.
Your JDBC driver is out of date. Updating to a later version might help solve your problem.
First download a more recent jConnect driver from the Sybase website. Then update your code to use the new driver package. You will also need to change your code, as the package name of the driver changes for each new version of the specification. (The current package is com.sybase.jdbcx...)
Take a look at the programmers reference for more information.
I'm trying to use a SQL Array type with PostgreSQL 8.4 and the JDBC4 driver.
My column is defined as follows:
nicknames CHARACTER VARYING(255)[] NOT NULL
and I'm trying to update it thusly:
row.updateArray("nicknames",
connection.createArrayOf("CHARACTER VARYING", p.getNicknames().toArray()));
(p.getNicknames() returns a List<String>)
but I'm seeing:
org.postgresql.util.PSQLException:
Unable to find server array type for
provided name CHARACTER VARYING. at
org.postgresql.jdbc4.AbstractJdbc4Connection.createArrayOf(AbstractJdbc4Connection.java:67)
at
org.postgresql.jdbc4.Jdbc4Connection.createArrayOf(Jdbc4Connection.java:21)
Unfortunately, the Array types don't seem to be well documented - I've not found mention of exactly how to do this for PostgreSQL anywhere :(
Any ideas?
Change "CHARACTER VARYING" to "varchar". The command-line psql client accepts the type name "CHARACTER VARYING", but the JDBC driver does not.
The source for org.postgresql.jdbc2.TypeInfoCache contains a list of accepted type names.
Consider part of the ambiguously-worded contract for createArrayOf():
The typeName is a database-specific name which may be the name of a built-in type, a user-defined type or a standard SQL type supported by this database.
I always assumed driver implementors interpret the phrases "database-specific name" and "supported by this database" to mean "accept whatever you want". But maybe you could file this as a bug against the Postgres JDBC driver.
Good luck.