Cassandra connection has been closed error when querying via Spark

Cassandra connection has been closed error when querying via Spark - java

I'm trying to access remote Cassandra using Spark in Java. However, when I'm trying to execute an aggregation function (count), the following error:
Exception in thread "main" com.datastax.driver.core.exceptions.TransportException: [/192.168.1.103:9042] Connection has been closed
at com.datastax.driver.core.exceptions.TransportException.copy(TransportException.java:38)
at com.datastax.driver.core.exceptions.TransportException.copy(TransportException.java:24)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
I already set the timeout in the Cassandra.yml to big value.
Here is my code:
SparkConf conf = new SparkConf();
conf.setAppName("Test");
conf.setMaster("local[*]");
conf.set("spark.cassandra.connection.host", "host");
Spark app = new Spark(conf);
app.run();
.
.
.
CassandraConnector connector = CassandraConnector.apply(sc.getConf());
// Prepare the schema
try (Session session = connector.openSession()) {
session.execute("USE keyspace0");
ResultSet results = session.execute("SELECT count(*) FROM table0");

Related

Issue with Spark Java API, Kerberos, and Hive

I'm trying to run a spark sql test against a hive table using the Spark Java API. The problem I am having is with kerberos. Whenever I attempt to run the program I get this error message:
Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS];
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at tester.SparkSample.lambda$0(SparkSample.java:62)
... 5 more
on this line of code:
ss.sql("select count(*) from entps_pma.baraccount").show();
Now when I run the code, I log into kerberos just fine and get this message:
18/05/01 11:21:03 INFO security.UserGroupInformation: Login successful for user <kerberos user> using keytab file /root/hdfs.keytab
I even connect to the Hive Metastore:
18/05/01 11:21:06 INFO hive.metastore: Trying to connect to metastore with URI thrift://<hiveserver>:9083
18/05/01 11:21:06 INFO hive.metastore: Connected to metastore.
But right after that I get the error. Appreciate any direction here. Here is my code:
public static void runSample(String fullPrincipal) throws IOException {
System.setProperty("hive.metastore.sasl.enabled", "true");
System.setProperty("hive.security.authorization.enabled", "true");
System.setProperty("hive.metastore.kerberos.principal", fullPrincipal);
System.setProperty("hive.metastore.execute.setugi", "true");
System.setProperty("hadoop.security.authentication", "kerberos");
Configuration conf = setSecurity(fullPrincipal);
loginUser = UserGroupInformation.getLoginUser();
loginUser.doAs((PrivilegedAction<Void>) () -> {
SparkConf sparkConf = new SparkConf().setMaster("local");
sparkConf.set("spark.sql.warehouse.dir", "hdfs:///user/hive/warehouse");
sparkConf.set("hive.metastore.uris", "thrift://<hive server>:9083");
sparkConf.set("hadoop.security.authentication", "kerberos");
sparkConf.set("hadoop.rpc.protection", "privacy");
sparkConf.set("spark.driver.extraClassPath",
"/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
sparkConf.set("spark.executor.extraClassPath",
"/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
sparkConf.set("spark.eventLog.enabled", "false");
SparkSession ss = SparkSession
.builder()
.enableHiveSupport()
.config(sparkConf)
.appName("Jim Test Spark App")
.getOrCreate();
ss.sparkContext()
.hadoopConfiguration()
.addResource(conf);
ss.sql("select count(*) from entps_pma.baraccount").show();
return null;
});
}

I guess you are running Spark on YARN. You need to specify spark.yarn.principal and spark.yarn.keytab parameters. Please check running Spark on YARN documentation

Persisting or writing spark dataset to OrientDB using java

How can I write a Spark dataset to OrientDB using java?
I already read data from OrientDB using Orient Java JDBC driver, but I am unable to persist the same dataset into OrientDB.
Code:
public void readAndWriteData(SparkSession spark, Map<String, String> dbProps, Properties destinationDb) {
Dataset<Row> tableDataset = spark.read().format("jdbc").options(dbProps).load();
tableDat‌aset.show();
tableDat‌aset.createOrReplace‌TempView("TEMP_V");
D‌ataset<Row> tableDataset1 = spark.sql("SELECT NAME FROM TEMP_V");
tableDataset1.show();
tableDataset1.write().format("‌org.apache.spark.ori‌entdb.documents")
.option("dburl", "jdbc:orient:REMOTE:localhost/testdb")
.option("user", "root")
.option("password", "root")
.option("class", "Test")
.mode(SaveMode.Append).save();
}
Here i am getting error as
Exception in thread "main" java.lang.RuntimeException: Connection Exception Occurred: Error on opening database 'jdbc:orient:REMOTE:localhost/testdb'
can you please help me on this error. I have used same db connection for reading and persisting

Java connect to Cassandra NoHostAvailableException

My code in Java:
public static void main(String[] args) {
Cluster cluster;
Session session;
cluster = Cluster.builder()
.addContactPoint("127.0.0.1")
.withPort(9042)
.build();
session = cluster.connect("Bundesliga");
session.execute("INSERT INTO test(c1,c2,c3,c4,c5) VALUES(0,0,0,0,0)");
}
Error Message:
Exception in thread "main"
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /127.0.0.1:9042 (null)) at
com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:196)
at
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:80)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1145)
at com.datastax.driver.core.Cluster.init(Cluster.java:149) at
com.datastax.driver.core.Cluster.connect(Cluster.java:225) at
com.datastax.driver.core.Cluster.connect(Cluster.java:258) at
cassandra.cassandra_main.main(cassandra_main.java:19)
I have already looked in cassandra.yaml:
start_native_transport: true
native_transport_port: 9042

I fixed it.
The problem were, that the version of the cassandra-driver-core was not compatible with the cassandra version.

Cassandra Read TimeOut during paging query

I have 2 cassandra node with replica_factor=2. I am trying to run select().all() from my code and i used setFetchSize(50000). When i start iterating result after some time it throw readTimeOutException Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded). Could any one please give me some suggestion?
I am creating cluster using below code
PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setCoreConnectionsPerHost(HostDistance.LOCAL, 52)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 80)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 500);
SocketOptions socketOption = new SocketOptions();
socketOption.setReadTimeoutMillis(600000)
.setReceiveBufferSize(1024*512)
.setSendBufferSize(1024*512)
.setKeepAlive(true).setConnectTimeoutMillis(1800000);
cluster = Cluster.builder()
.addContactPoints(cassandraHosts.get("HOST_1"), cassandraHosts.get("HOST_2"))
.withPoolingOptions(poolingOptions)
.withPort(cassandraPort)
.withSocketOptions(socketOption)
.withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy())).build();
Session session = cluster.connect(cassandraDB);
Cassandra version: 2.2.1
Java 7
Is there any other way to execute select all query without read time out exception

MongoClient timeout

I recenty updated to MongoDB 2.6.3 via Ubuntu debs and also switched to Mongo Client library 2.12.2; when I now execute
final MongoClient m = new MongoClient( "localhost" );
DB db = m.getDB( "test" );
System.out.println( db.getName( ) );
System.out.println( db.collectionExists( "Customer" ) );
then the "test" sysout is written, but during the collectionExists() method a timeout occurs:
Exception in thread "main" com.mongodb.MongoTimeoutException: Timed out while waiting to connect after 4996 ms
at com.mongodb.BaseCluster.getDescription(BaseCluster.java:114)
at com.mongodb.DBTCPConnector.getClusterDescription(DBTCPConnector.java:396)
at com.mongodb.DBTCPConnector.getMaxBsonObjectSize(DBTCPConnector.java:641)
at com.mongodb.Mongo.getMaxBsonObjectSize(Mongo.java:641)
at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:81)
at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66)
at com.mongodb.DB.getCollectionNames(DB.java:510)
at com.mongodb.DB.collectionExists(DB.java:553)
at com.apiomat.backend.persistence.MongoFacade.main(MongoFacade.java:342)
I can connect to MongoDB via the command line client tool and query what I want without problems.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cassandra connection has been closed error when querying via Spark - java

Related

Issue with Spark Java API, Kerberos, and Hive

Persisting or writing spark dataset to OrientDB using java

Java connect to Cassandra NoHostAvailableException

Cassandra Read TimeOut during paging query

MongoClient timeout

Categories

Resources