I'm trying to run a spark sql test against a hive table using the Spark Java API. The problem I am having is with kerberos. Whenever I attempt to run the program I get this error message:
Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS];
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at tester.SparkSample.lambda$0(SparkSample.java:62)
... 5 more
on this line of code:
ss.sql("select count(*) from entps_pma.baraccount").show();
Now when I run the code, I log into kerberos just fine and get this message:
18/05/01 11:21:03 INFO security.UserGroupInformation: Login successful for user <kerberos user> using keytab file /root/hdfs.keytab
I even connect to the Hive Metastore:
18/05/01 11:21:06 INFO hive.metastore: Trying to connect to metastore with URI thrift://<hiveserver>:9083
18/05/01 11:21:06 INFO hive.metastore: Connected to metastore.
But right after that I get the error. Appreciate any direction here. Here is my code:
public static void runSample(String fullPrincipal) throws IOException {
System.setProperty("hive.metastore.sasl.enabled", "true");
System.setProperty("hive.security.authorization.enabled", "true");
System.setProperty("hive.metastore.kerberos.principal", fullPrincipal);
System.setProperty("hive.metastore.execute.setugi", "true");
System.setProperty("hadoop.security.authentication", "kerberos");
Configuration conf = setSecurity(fullPrincipal);
loginUser = UserGroupInformation.getLoginUser();
loginUser.doAs((PrivilegedAction<Void>) () -> {
SparkConf sparkConf = new SparkConf().setMaster("local");
sparkConf.set("spark.sql.warehouse.dir", "hdfs:///user/hive/warehouse");
sparkConf.set("hive.metastore.uris", "thrift://<hive server>:9083");
sparkConf.set("hadoop.security.authentication", "kerberos");
sparkConf.set("hadoop.rpc.protection", "privacy");
sparkConf.set("spark.driver.extraClassPath",
"/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
sparkConf.set("spark.executor.extraClassPath",
"/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
sparkConf.set("spark.eventLog.enabled", "false");
SparkSession ss = SparkSession
.builder()
.enableHiveSupport()
.config(sparkConf)
.appName("Jim Test Spark App")
.getOrCreate();
ss.sparkContext()
.hadoopConfiguration()
.addResource(conf);
ss.sql("select count(*) from entps_pma.baraccount").show();
return null;
});
}
I guess you are running Spark on YARN. You need to specify spark.yarn.principal and spark.yarn.keytab parameters. Please check running Spark on YARN documentation
Related
After following this instruction I am able to access the S3 bucket via access point + VPC endpoint perfectly fine from AWS CLI.
Basically I use
s3://arn:aws:s3:ap-southeast-1:<account number>:accesspoint/<bucket name>
the same way as I use
s3://<bucket name>
All aws s3 ... commands works great.
However that's not the case for my my Java based Flink project code. The code works great with s3://<bucket name>, but it seems that it does not recognize the new S3 URI.
Here is how the sink is defined in my code:
final FileSink<ConsumerRecordPOJO<CacheInfo>> sink = FileSink //
.<ConsumerRecordPOJO<CacheInfo>>forRowFormat(new Path(s3Url),
new Encoder<ConsumerRecordPOJO<CacheInfo>>() {
#Override
public void encode(ConsumerRecordPOJO<CacheInfo> record, OutputStream stream)
throws IOException {
GzipParameters params = new GzipParameters();
params.setCompressionLevel(Deflater.BEST_COMPRESSION);
GzipCompressorOutputStream out = new GzipCompressorOutputStream(stream, params);
OBJECT_MAPPER.writeValue(out, record);
out.finish();
}
}) //
// (some extra configuration omitted here)
.build();
After passing s3://arn:aws:s3:ap-southeast-1:<account number>:accesspoint/<bucket name> to the s3Url param, the job execution failed with
2021-11-26 22:14:34,085 WARN org.apache.flink.runtime.taskmanager.Task [] - Source: kafka -> Filter -> Map -> Sink file (1/1)#3 (c654160d3fab026c4544ca8a64644796) switched from INITIALIZING to FAILED with failure cause: org.apache.flink.util.FlinkRuntimeException: Could not create writer state serializer.
at org.apache.flink.connector.file.sink.FileSink.getWriterStateSerializer(FileSink.java:135)
at org.apache.flink.streaming.runtime.operators.sink.SinkOperatorFactory.createStreamOperator(SinkOperatorFactory.java:63)
at org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperator(OperatorChain.java:712)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:686)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:626)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:676)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:626)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:676)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:626)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:187)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.<init>(RegularOperatorChain.java:63)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: null uri host.
at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:162)
at org.apache.flink.core.fs.PluginFileSystemFactory.create(PluginFileSystemFactory.java:62)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:508)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409)
at org.apache.flink.connector.file.sink.FileSink$RowFormatBuilder.createBucketWriter(FileSink.java:326)
at org.apache.flink.connector.file.sink.FileSink$RowFormatBuilder.getWriterStateSerializer(FileSink.java:307)
at org.apache.flink.connector.file.sink.FileSink.getWriterStateSerializer(FileSink.java:130)
... 18 more
Caused by: java.lang.NullPointerException: null uri host.
at java.util.Objects.requireNonNull(Objects.java:228)
at org.apache.hadoop.fs.s3native.S3xLoginHelper.buildFSURI(S3xLoginHelper.java:71)
at org.apache.hadoop.fs.s3a.S3AFileSystem.setUri(S3AFileSystem.java:486)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:246)
at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:123)
... 24 more
It turns out I could use the S3 access point alias which works perfectly for Flink
See https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points-alias.html
I'm trying to access remote Cassandra using Spark in Java. However, when I'm trying to execute an aggregation function (count), the following error:
Exception in thread "main" com.datastax.driver.core.exceptions.TransportException: [/192.168.1.103:9042] Connection has been closed
at com.datastax.driver.core.exceptions.TransportException.copy(TransportException.java:38)
at com.datastax.driver.core.exceptions.TransportException.copy(TransportException.java:24)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
I already set the timeout in the Cassandra.yml to big value.
Here is my code:
SparkConf conf = new SparkConf();
conf.setAppName("Test");
conf.setMaster("local[*]");
conf.set("spark.cassandra.connection.host", "host");
Spark app = new Spark(conf);
app.run();
.
.
.
CassandraConnector connector = CassandraConnector.apply(sc.getConf());
// Prepare the schema
try (Session session = connector.openSession()) {
session.execute("USE keyspace0");
ResultSet results = session.execute("SELECT count(*) FROM table0");
I have a Hadoop Cluster using inner network(ip range is 192.168.0.0/24), and I want to connect hbase using java library(org.apache.hadoop.hbase.client)
from development computer on the Windows 7 that use different network(ip is outter network 203.252.x.x), But, I couldn't connect hbase.
I Have a question.
Is my code wrong??
Is it possible using Java Library (org.apache.hadoop.hbase.client), should i use thrift protocol? (I don't want to use Thrift)
Do you have any idea? or comment ?
Thank you
This is My Code for Connecting Hbase.
public class TestBase {
public static void main(String[] args) throws MasterNotRunningException, ZooKeeperConnectionException, ServiceException, IOException {
Configuration configuration = HBaseConfiguration.create();
configuration.set("hbase.master", "203.252.x.x"); // master info
configuration.set("hbase.master.port", "6000");
configuration.set("hbase.zookeeper.quorum", "203.252.x.x");
configuration.set("hbase.zookeeper.property.clientPort", "2181");
configuration.set("zookeeper.znode.parent", "/hbase-unsecure");
HBaseAdmin.checkHBaseAvailable(configuration);
HTable table = null;
table = new HTable(configuration, "weatherData");
Scan scan = new Scan();
scan.setTimeRange(1L, 1435633313526L);
ResultScanner scanner = null;
scanner = table.getScanner(scan);
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
System.out.println(Bytes.toString(rr.getRow())
+ " => "
+ Bytes.toString(rr.getValue(Bytes.toBytes("temp"),
Bytes.toBytes("max"))));
}
table.close();
scanner.close();
}
}
and That is Error Code in Eclipse
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: datanode2
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1661)
at enter code hereorg.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1687)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1904)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isMasterRunning(ConnectionManager.java:932)
at enter code hereorg.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2409)
at TestBase.main(TestBase.java:28)
Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: datanode2
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1739)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1777)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1698)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1607)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1633)
... 5 more
Caused by: java.net.UnknownHostException: unknown host: datanode2
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:501)
at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:325)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1614)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1494)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1724)
... 10 more
There might be a problem of the HBase Master DNS name mapping to the ip address of hbase.master. Be sure that you have either a DNS server set up or else you can try to find something similar to this that worked on my GNU/Linux machine. Such as configuring "/etc/hostname" :set up the name of the HBase Master node) and "/etc/hosts" on the machine that tries to connect to the master node.
Hopefully you can set up this on your Windows machine somehow.
Here is a helpful link for the GNU/Linux way:
http://sujee.net/2012/03/08/getting-dns-right-for-hadoop-hbase-clusters/#.XULnEZNKhTZ
You are unable to reach to nodes of cluster. Check the firewall and network settings. Make sure ports are also open to connect.
This is error in you stack trace:
Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: datanode2
Also, you dont need to specify HBase cluster properties in your code. Put hbase-site.xml in classpath of your java and just instantiate the connection.
I am trying to use the HBase Java APIs to write data into HBase. I installed Hadoop/HBase through Ambari.
Here is how the configuration is currently set up:
final Configuration CONFIGURATION = HBaseConfiguration.create();
final HBaseAdmin HBASE_ADMIN;
HBASE_ADMIN = new HBaseAdmin(CONFIGURATION)
When I try to write to HBase, I check to make sure that the table exists
!HBASE_ADMIN.tableExists(tableName)
If not, create a new one. However, it appears that when attempting to check if the table exists exceptions are being thrown.
I'm wondering if I'm not correctly connected to HBase...is there any good way to verify that the configuration is correct and that I am connecting to HBase? The exception I'm getting is below.
Thanks.
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:209)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:288)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135)
at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:597)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:802)
at org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:359)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:287)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:301)
at com.business.project.hbase.HBaseMessageWriter.getTable(HBaseMessageWriter.java:40)
at com.business.project.hbase.HBaseMessageWriter.write(HBaseMessageWriter.java:59)
at com.business.project.hbase.HBaseMessageWriter.write(HBaseMessageWriter.java:54)
at com.business.project.storm.bolt.package.exampleBolt.execute(exampleBolt.java:19)
at backtype.storm.daemon.executor$fn__5697$tuple_action_fn__5699.invoke(executor.clj:659)
at backtype.storm.daemon.executor$mk_task_receiver$fn__5620.invoke(executor.clj:415)
at backtype.storm.disruptor$clojure_handler$reify__1741.onEvent(disruptor.clj:58)
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
at backtype.storm.daemon.executor$fn__5697$fn__5710$fn__5761.invoke(executor.clj:794)
at backtype.storm.util$async_loop$fn__452.invoke(util.clj:465)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.getMetaReplicaNodes(ZooKeeperWatcher.java:269)
at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:241)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:62)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1203)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1164)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:294)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:130)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:55)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:201)
In addition to the configuration parameters suggested by Yosr, specifying
conf.set("zookeeper.znode.parent", "VALUE")
would help resolve the issue.
The property below resolved my issue
For Hortonworks:
hconfig.set("zookeeper.znode.parent", "/hbase-unsecure")
For cloudera:
hconfig.set("zookeeper.znode.parent", "/hbase")
You can use HBaseAdmin.checkHBaseAvailable(conf);
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.master", "ip_address:60000");
conf.set("hbase.zookeeper.quorum","ip_address");
conf.set("hbase.zookeeper.property.clientPort", "2181");
HBaseAdmin admin = new HBaseAdmin(conf);
boolean bool = admin.tableExists("table_name");
System.out.println( bool);
ip_address : this is the ip_adress of your hbase cluster, change your hbase zookeeper port (2181) if it is not the same on your configuration files.
I am trying to test a website on mobile using Perfectomobile in eclipse. I am getting the initialization error while run the code. It seems like an http request error. I am wondering if there is a proxy I can try with eclipse. I am able to go to this url directly on my browser.
Here is the code to initialize
public class MobileTest {
public static void main(String[] args) {
String deviceID = "1F297702";
\\Initializing
MobileDriver driver = new MobileDriver();
try {
//code area
driver.getDevice(deviceID);
sleep(13000);
MobileDeviceOpenOptions open = new MobileDeviceOpenOptions();
driver.getDevice(deviceID).open(open);
And error Console
Error:
Run started
Starting Mobile Driver
12:16:55.103 [main] INFO c.p.selenium.MobileDriver - Creating mobile driver
12:16:55.109 [main] INFO c.p.selenium.MobileDriver - Starting execution
12:16:55.142 [main] INFO c.p.httpclient.HttpClient - Processing request Request[_requestType=START_EXECUTION,_itemId=<null>,_parameters=[ParameterValue[_name=responseFormat,_value=xml]],_stringParameters=<null>,_encoding=<null>]
Exception in thread "main" java.lang.RuntimeException: Failed to start play
at com.perfectomobile.selenium.MobileDriver.initWithEclipseParams(MobileDriver.java:86)
at com.perfectomobile.selenium.MobileDriver.<init>(MobileDriver.java:39)
at MobileTest.main(MobileTest.java:41)
Caused by: com.perfectomobile.httpclient.HttpClientException: Error while processing HTTP request for URL in https & username & password
at com.perfectomobile.httpclient.HttpClient.sendTextRequest(HttpClient.java:195)
at com.perfectomobile.httpclient.HttpClient.sendTextRequest(HttpClient.java:143)
at com.perfectomobile.httpclient.HttpClient.sendValuesRequest(HttpClient.java:56)
at com.perfectomobile.httpclient.execution.ExecutionsHttpClient.startPlay(ExecutionsHttpClient.java:217)
at com.perfectomobile.selenium.MobileDriver.initWithEclipseParams(MobileDriver.java:76)
yes - perfecto mobile support proxy
see attached code:
// Setting up the proxy
MobileProxy proxy = new MobileProxy("name",
8080, "XXX "XXX");
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability(CapabilityType.PROXY, proxy);
MobileDriver connector = new MobileDriver(capabilities);
System.out.println("Script started");
for more code exaples you can go to :
https://github.com/perfectomobile/examples