I'm trying to access remote Cassandra using Spark in Java. However, when I'm trying to execute an aggregation function (count), the following error:
Exception in thread "main" com.datastax.driver.core.exceptions.TransportException: [/192.168.1.103:9042] Connection has been closed
at com.datastax.driver.core.exceptions.TransportException.copy(TransportException.java:38)
at com.datastax.driver.core.exceptions.TransportException.copy(TransportException.java:24)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
I already set the timeout in the Cassandra.yml to big value.
Here is my code:
SparkConf conf = new SparkConf();
conf.setAppName("Test");
conf.setMaster("local[*]");
conf.set("spark.cassandra.connection.host", "host");
Spark app = new Spark(conf);
app.run();
.
.
.
CassandraConnector connector = CassandraConnector.apply(sc.getConf());
// Prepare the schema
try (Session session = connector.openSession()) {
session.execute("USE keyspace0");
ResultSet results = session.execute("SELECT count(*) FROM table0");
Related
I'm trying to run a spark sql test against a hive table using the Spark Java API. The problem I am having is with kerberos. Whenever I attempt to run the program I get this error message:
Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS];
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at tester.SparkSample.lambda$0(SparkSample.java:62)
... 5 more
on this line of code:
ss.sql("select count(*) from entps_pma.baraccount").show();
Now when I run the code, I log into kerberos just fine and get this message:
18/05/01 11:21:03 INFO security.UserGroupInformation: Login successful for user <kerberos user> using keytab file /root/hdfs.keytab
I even connect to the Hive Metastore:
18/05/01 11:21:06 INFO hive.metastore: Trying to connect to metastore with URI thrift://<hiveserver>:9083
18/05/01 11:21:06 INFO hive.metastore: Connected to metastore.
But right after that I get the error. Appreciate any direction here. Here is my code:
public static void runSample(String fullPrincipal) throws IOException {
System.setProperty("hive.metastore.sasl.enabled", "true");
System.setProperty("hive.security.authorization.enabled", "true");
System.setProperty("hive.metastore.kerberos.principal", fullPrincipal);
System.setProperty("hive.metastore.execute.setugi", "true");
System.setProperty("hadoop.security.authentication", "kerberos");
Configuration conf = setSecurity(fullPrincipal);
loginUser = UserGroupInformation.getLoginUser();
loginUser.doAs((PrivilegedAction<Void>) () -> {
SparkConf sparkConf = new SparkConf().setMaster("local");
sparkConf.set("spark.sql.warehouse.dir", "hdfs:///user/hive/warehouse");
sparkConf.set("hive.metastore.uris", "thrift://<hive server>:9083");
sparkConf.set("hadoop.security.authentication", "kerberos");
sparkConf.set("hadoop.rpc.protection", "privacy");
sparkConf.set("spark.driver.extraClassPath",
"/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
sparkConf.set("spark.executor.extraClassPath",
"/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
sparkConf.set("spark.eventLog.enabled", "false");
SparkSession ss = SparkSession
.builder()
.enableHiveSupport()
.config(sparkConf)
.appName("Jim Test Spark App")
.getOrCreate();
ss.sparkContext()
.hadoopConfiguration()
.addResource(conf);
ss.sql("select count(*) from entps_pma.baraccount").show();
return null;
});
}
I guess you are running Spark on YARN. You need to specify spark.yarn.principal and spark.yarn.keytab parameters. Please check running Spark on YARN documentation
How can I write a Spark dataset to OrientDB using java?
I already read data from OrientDB using Orient Java JDBC driver, but I am unable to persist the same dataset into OrientDB.
Code:
public void readAndWriteData(SparkSession spark, Map<String, String> dbProps, Properties destinationDb) {
Dataset<Row> tableDataset = spark.read().format("jdbc").options(dbProps).load();
tableDataset.show();
tableDataset.createOrReplaceTempView("TEMP_V");
Dataset<Row> tableDataset1 = spark.sql("SELECT NAME FROM TEMP_V");
tableDataset1.show();
tableDataset1.write().format("org.apache.spark.orientdb.documents")
.option("dburl", "jdbc:orient:REMOTE:localhost/testdb")
.option("user", "root")
.option("password", "root")
.option("class", "Test")
.mode(SaveMode.Append).save();
}
Here i am getting error as
Exception in thread "main" java.lang.RuntimeException: Connection Exception Occurred: Error on opening database 'jdbc:orient:REMOTE:localhost/testdb'
can you please help me on this error. I have used same db connection for reading and persisting
My code in Java:
public static void main(String[] args) {
Cluster cluster;
Session session;
cluster = Cluster.builder()
.addContactPoint("127.0.0.1")
.withPort(9042)
.build();
session = cluster.connect("Bundesliga");
session.execute("INSERT INTO test(c1,c2,c3,c4,c5) VALUES(0,0,0,0,0)");
}
Error Message:
Exception in thread "main"
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /127.0.0.1:9042 (null)) at
com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:196)
at
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:80)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1145)
at com.datastax.driver.core.Cluster.init(Cluster.java:149) at
com.datastax.driver.core.Cluster.connect(Cluster.java:225) at
com.datastax.driver.core.Cluster.connect(Cluster.java:258) at
cassandra.cassandra_main.main(cassandra_main.java:19)
I have already looked in cassandra.yaml:
start_native_transport: true
native_transport_port: 9042
I fixed it.
The problem were, that the version of the cassandra-driver-core was not compatible with the cassandra version.
I have 2 cassandra node with replica_factor=2. I am trying to run select().all() from my code and i used setFetchSize(50000). When i start iterating result after some time it throw readTimeOutException Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded). Could any one please give me some suggestion?
I am creating cluster using below code
PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setCoreConnectionsPerHost(HostDistance.LOCAL, 52)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 80)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 500);
SocketOptions socketOption = new SocketOptions();
socketOption.setReadTimeoutMillis(600000)
.setReceiveBufferSize(1024*512)
.setSendBufferSize(1024*512)
.setKeepAlive(true).setConnectTimeoutMillis(1800000);
cluster = Cluster.builder()
.addContactPoints(cassandraHosts.get("HOST_1"), cassandraHosts.get("HOST_2"))
.withPoolingOptions(poolingOptions)
.withPort(cassandraPort)
.withSocketOptions(socketOption)
.withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy())).build();
Session session = cluster.connect(cassandraDB);
Cassandra version: 2.2.1
Java 7
Is there any other way to execute select all query without read time out exception
I recenty updated to MongoDB 2.6.3 via Ubuntu debs and also switched to Mongo Client library 2.12.2; when I now execute
final MongoClient m = new MongoClient( "localhost" );
DB db = m.getDB( "test" );
System.out.println( db.getName( ) );
System.out.println( db.collectionExists( "Customer" ) );
then the "test" sysout is written, but during the collectionExists() method a timeout occurs:
Exception in thread "main" com.mongodb.MongoTimeoutException: Timed out while waiting to connect after 4996 ms
at com.mongodb.BaseCluster.getDescription(BaseCluster.java:114)
at com.mongodb.DBTCPConnector.getClusterDescription(DBTCPConnector.java:396)
at com.mongodb.DBTCPConnector.getMaxBsonObjectSize(DBTCPConnector.java:641)
at com.mongodb.Mongo.getMaxBsonObjectSize(Mongo.java:641)
at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:81)
at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66)
at com.mongodb.DB.getCollectionNames(DB.java:510)
at com.mongodb.DB.collectionExists(DB.java:553)
at com.apiomat.backend.persistence.MongoFacade.main(MongoFacade.java:342)
I can connect to MongoDB via the command line client tool and query what I want without problems.