MySQL geometry type on Spark / Java - java

I have a MySQL table and I load it on spark. The table contains a column with geometry type.
When I load the table on spark, the column with geometry type becomes with binary type in data frame.
My questions are:
Why the geometry type in MySQL becomes binary type on spark ?
Is there any alternative to fix that ?
I need your help!
Thank you!

Geometry is a special data type.
Before use it, you should convert it to text or bynary.
Conversion info: https://dev.mysql.com/doc/refman/5.6/en/gis-format-conversion-functions.html
Or you can use GeoSpark:
var spatialDf = sparkSession.sql( """ |SELECT ST_GeomFromWKT(_c0) AS countyshape, _c1, _c2 |FROM rawdf """.stripMargin) spatialDf.createOrReplaceTempView("spatialdf") spatialDf.show()
Full tutorial below:
https://datasystemslab.github.io/GeoSpark/tutorial/sql/

Related

Codec not found for requested operation: [frozen<ynapanalyticsteam.ynapnestedmap> <-> java.util.Map<java.lang.String, java.lang.String>]

I'm working on data retrieval part of Cassandra using Java Driver.
I have a custom data type
CREATE TYPE ynapanalyticsteam.ynapnestedmap (
so_nestedmap map<text, text>
);
And column type mapped as below
order_line map<text, frozen<ynapnestedmap>>
I am trying to retrieve value of this column using TypeToken as below.
row.getMap("order_line", TypeToken.of(String.class), new TypeToken<Map<String,String>>() {});
But I am still getting codecNot found exception.
You need to define codec for your nested user-defined type, not for Map<String, String> - they are different types...
The documentation for java driver has good description of this process.
The code that you trying to use will work for definition of column like:
order_line map<text, frozen<map<text, text>>>

convert byte array to java.sql.Clob

Is there any way to convert a byte array into java.sql.Clob ?
I am having this type of issue...
getHibernateTemplate().save(object)
Where object is having a field private Clob docData; and the similar is mapped into oracle table as CLOB
This docData clob is getting formed from somewhere in my java code like Hibernate.createClob(someString)
I tried to save it with type="clob" but getting cann't cast com.sun.proxy$Proxy124 to oracle.sql.CLOB. I have tried many ways to remove this Proxy but finally failed.
So I have decided to go like byte[] data = IOUtils.toByteArray(docData.getCharacterStream()); / byte[] data = IOUtils.toByteArray(docData.getAsciiStream()) and saving it as type="binary" but I am getting Caused by: java.sql.BatchUpdateException: ORA-01461: can bind a LONG value only for insert into a LONG column.
So now I want to create as a Clob from byte[].
Any help welcome.
Note earlier I was using Hibernate 3.3 and it was working fine without any such byte array conversion and etc...now I have upgraded to Hibernate 3.6.10 and getting this issue.
I'm using this method to create Blobs:
org.hibernate.engine.jdbc.NonContextualLobCreator.NonContextualLobCreator.INSTANCE.createBlob( buffer )
where buffer is an array of bytes.
There are 2 similar methods for creating CLOBs:
NonContextualLobCreator.INSTANCE.createClob( reader, length )
NonContextualLobCreator.INSTANCE.createClob( string )
Pick the one that fits better with your data.
Your error message says
cann't cast com.sun.proxy$Proxy124 to oracle.sql.CLOB
In the rest of your text you are referring to java.sql.Clob Check your imports, you might be using the clob from the oracle.sql package instead of the java.sql package somewhere.
Well, issue is resolved. I kept the java data type as 'Clob' only and made the hibernate mapping like type="string". Issue got resolved since my digital sign data does not contain more than 2 MB (that java string max supports).

Add a null value column in Spark Data Frame using Java

I have a dataframe and want to add a column of type String with null values.
How can it be done using Spark Java API.
I used lit functions, but getting error when tried writing the DF and saveAsTable.
Was able to solve by using lit function on the column with null value and type cast the column to String type.
df.withColumn(
"col_name", functions.lit(null)
).withColumn("col_name",
df.col("channel_name").cast(DataTypes.StringType)
)
df.withColumn("col_name", lit(null).cast("string"))
or
import org.apache.spark.sql.types.StringType
df.withColumn("col_name", lit(null).cast(StringType))

Can Cassandra nest Sets of UDTs?

I have a Cassandra schema with a table that has a column that is a SET of a user defined type (UDT). That UDT itself has a column that is a SET of another UDT.
I can create the types and table in cqlsh but when I try to use this schema in my Java (actually Scala) code I get a "missing codec error".
Does anyone know if the Datastax java driver supports this?
CREATE TYPE testname(firstname text, lastname text);
CREATE TYPE testuser(testname <FROZEN<SET<FROZEN<testname>>>);
CREATE TABLE testobjects(
simplename text
testusers SET<FROZEN<testuser>>
) WITH CLUSTERING ORDER BY (simple name DESC);
I've registered codecs for the two UDT types but when I try to bind a prepared statement I get the error:
can't find code for:
cqlType: frozen<set<frozen<testname>>
javaType: TestNameUDT
Because while there is a codec mapping testname to TestNameUDT there really is no codec mapping a Set of testname's to a TestNameUDT.
So, I'm wondering if anyone knows if the Java driver supports this...has anyone created nested sets of UDTs? Thanks.
Datastax has acknowledged that this is a Cassandra defect and does not currently work.
with spring data cassandra yes, but the nested udt must be declered without #cassandraType
https://jira.spring.io/browse/DATACASS-506
hope it helps

Converting cassandra blob type to string

I have an old column family which has a column named "value" which was defined as a blob data type. This column usually holds two numbers separated with an underscore, like "421_2".
When im using the python datastax driver and execute the query, the results return with that field parsed as a string:
In [21]: session.execute(q)
Out[21]:
[Row(column1=4776015, value='145_0'),
Row(column1=4891778, value='114_0'),
Row(column1=4891780, value='195_0'),
Row(column1=4893662, value='105_0'),
Row(column1=4893664, value='115_0'),
Row(column1=4898493, value='168_0'),
Row(column1=4945162, value='148_0'),
Row(column1=4945163, value='131_0'),
Row(column1=4945168, value='125_0'),
Row(column1=4945169, value='211_0'),
Row(column1=4998426, value='463_0')]
When I use the java driver I get a com.datastax.driver.core.Row object back. When I try to read the value field by, for example, row.getString("value") I get the expected InvalidTypeException: Column value is of type blob. Seems like the only way to read the field is via row.getBytes("value") and then I get back an java.nio.HeapByteBuffer object.
Problem is, I cant seem to convert this object to string in an easy fashion. Googling yielded two answers from 2012 that suggest the following:
String string_value = new String(result.getBytes("value"), "UTF-8");
But such a String constructor doesn't seems to exist anymore.
So, my questions are:
How do I convert HeapByteBuffer into string?
How come the python driver converted the blob easily and the java one did not?
Side Note:
I could debug the python driver, but currently that seems too much work for something that should be trivial. (and the fact that no one asked about it suggests Im missing something simple here..)
Another easier way is to change the CQL statement.
select column1, blobastext(value) from YourTable where key = xxx
The second column would be type of String.
You can also get direct access to the Java driver's serializers. This way you don't have to deal with low-level details, and it also works for other types.
Driver 2.0.x:
String s = (String)DataType.text().deserialize(byteBuffer);
Driver 2.1.x:
ProtocolVersion protocolVersion = cluster.getConfiguration().getProtocolOptions().getProtocolVersion();
String s = (String)DataType.text().deserialize(byteBuffer, protocolVersion);
Driver 2.2.x:
ProtocolVersion protocolVersion = cluster.getConfiguration().getProtocolOptions().getProtocolVersion();
String s = TypeCodec.VarcharCodec.instance.deserialize(byteBuffer, protocolVersion);
For version 3.1.4 of the datastax java driver the following will convert a blob to a string:
ProtocolVersion proto = cluster.getConfiguration().getProtocolOptions().getProtocolVersion();
String deserialize = TypeCodec.varchar().deserialize(row.getBytes(i), proto);
1.) Converting from byte buffer in Java is discussed in this answer.
2.) Assuming you're using Python 2, it's coming back as a string in Python because str is the binary type.

Categories

Resources