Cannot use storm command line interface to kill topology - java

I have a storm cluster, my storm version is 0.10.1, and it run well. I can kill topology using ui in browser which is good. But sometime I want to just kill topology by shell. When I run this: bin/storm kill topology_name, I get errors.
Exception in thread "main" NotAliveException(msg:com_adsame_mydmp is not alive)
at backtype.storm.generated.Nimbus$killTopologyWithOpts_result$killTopologyWithOpts_resultStandardScheme.read(Nimbus.java:7482)
at backtype.storm.generated.Nimbus$killTopologyWithOpts_result$killTopologyWithOpts_resultStandardScheme.read(Nimbus.java:7468)
at backtype.storm.generated.Nimbus$killTopologyWithOpts_result.read(Nimbus.java:7410)
at org.apache.thrift7.TServiceClient.receiveBase(TServiceClient.java:78)
at backtype.storm.generated.Nimbus$Client.recv_killTopologyWithOpts(Nimbus.java:283)
at backtype.storm.generated.Nimbus$Client.killTopologyWithOpts(Nimbus.java:269)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
at backtype.storm.command.kill_topology$_main.doInvoke(kill_topology.clj:27)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at backtype.storm.command.kill_topology.main(Unknown Source)
But I can see this topology activate when I execute bin/storm list.
Topology_name Status Num_tasks Num_workers Uptime_secs
-------------------------------------------------------------------
topology_name ACTIVE 73 4 423894
Any idea ? Thanks.

Related

File not found when running Spark Job with input from Google Storage Bucket

I am running a job on a Google Cloud Dataproc cluster which takes one parameter -- path to the input file. This file is stored in a Google Cloud Storage bucket. I get a FileNotFoundException (trace below). Why would that be?
gcloud dataproc jobs submit spark --cluster cluster-1 --class MST.ComputeMST \
--jars gs://dataproc-211700eb-83ed-456d-a67e-98af9e6fa02d-us/ComputeMST.jar \
-- gs:///dataproc-211700eb-83ed-456d-a67e-98af9e6fa02d-us/input.txt
Job [8b193fcd-1350-462b-ae11-373333e868fe] submitted.
Waiting for job output...
17/05/16 05:06:02 INFO com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.1-hadoop2
number of runs = 0
Exception in thread "main" java.io.FileNotFoundException: gs:/dataproc-211700eb-83ed-456d-a67e-98af9e6fa02d-us/input.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at java.io.FileReader.<init>(FileReader.java:58)
at MST.ComputeMST.main(ComputeMST.java:670)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.spark) Job [8b193fcd-1350-462b-ae11-373333e868fe] entered state [ERROR] while waiting for [DONE].
Even though GCS connector is installed by default on Cloud Dataproc cluster, you can not use it from your job through java.io.FileReader interface.
To access GCS objects through GCS connector you need to use Hadoop's FileSystem interface.

Unable to kill YARN job

I have a simple Samza job, which I submit to our YARN cluster. The job allocates one single container and runs without any issues.
When trying to kill the job, however, both the AM and job containers are left running on the NM, even though the RM claims that the job was killed successfully:
$ yarn application -kill application_1461969364354_5761
Killing application application_1461969364354_5761
16/05/19 06:48:49 INFO impl.YarnClientImpl: Killed application application_1461969364354_5761
From the NM log, I'm able to see:
2016-05-19 06:48:50,051 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e34_1461969364354_5761_01_000002 transitioned from RUNNING to KILLING
2016-05-19 06:48:50,051 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_e34_1461969364354_5761_01_000002
2016-05-19 06:48:50,060 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1461969364354_5761 transitioned from RUNNING to FINISHING_CONTAI
NERS_WAIT
The status never transitions out of FINISHING_CONTAINERS_WAIT and I had to kill -9 the container process.
I'm using Samza version 0.10.0 and YARN version Hadoop 2.6.0-cdh5.4.9.
Any idea?
UPDATE:
After some digging, I'm able to see this:
2016-05-20 03:14:59,497 INFO org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking finishApplicationMaster of class ApplicationMasterProtocolPBClientImpl over rm157 after 2326 fail over attempts. Trying to fail over immediately.
org.apache.hadoop.security.token.SecretManager$InvalidToken: appattempt_1463512986427_0017_000001 not found in AMRMTokenSecretManager.
at sun.reflect.GeneratedConstructorAccessor13.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:94)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.finishApplicationMaster(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.unregisterApplicationMaster(AMRMClientImpl.java:378)
at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.unregisterApplicationMaster(AMRMClientAsyncImpl.java:157)
at org.apache.samza.job.yarn.SamzaAppMasterLifecycle.onShutdown(SamzaAppMasterLifecycle.scala:63)
at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$5.apply(SamzaAppMaster.scala:133)
at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$5.apply(SamzaAppMaster.scala:132)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.samza.job.yarn.SamzaAppMaster$.run(SamzaAppMaster.scala:132)
at org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:104)
at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): appattempt_1463512986427_0017_000001 not found in AMRMTokenSecretManager.
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy15.finishApplicationMaster(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91)
... 15 more

Can't run jstack -l against my java application

My application got stuck, then I wanted to check the thread status. But I couldn't take a thread dump via jstack -l 33822 from my application..
Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
Then I used -F to attempt to take a thread dump. I got an error as following:
Attaching to process ID 33822, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 23.21-b01
Deadlock Detection:
sun.jvm.hotspot.debugger.UnmappedAddressException: 3780320
at sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208)
at sun.jvm.hotspot.debugger.PageCache.getData(PageCache.java:63)
at sun.jvm.hotspot.debugger.DebuggerBase.readBytes(DebuggerBase.java:217)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCInteger(LinuxDebuggerLocal.java:482)
at sun.jvm.hotspot.debugger.DebuggerBase.readCompOopAddressValue(DebuggerBase.java:459)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCompOopHandle(LinuxDebuggerLocal.java:442)
at sun.jvm.hotspot.debugger.linux.LinuxAddress.getCompOopHandleAt(LinuxAddress.java:125)
at sun.jvm.hotspot.oops.Oop.getKlassForOopHandle(Oop.java:231)
at sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:380)
at sun.jvm.hotspot.runtime.JavaThread.getThreadObj(JavaThread.java:331)
at sun.jvm.hotspot.runtime.JavaThread.getCurrentParkBlocker(JavaThread.java:383)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:82)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
sun.jvm.hotspot.debugger.UnmappedAddressException: 10
at sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208)
at sun.jvm.hotspot.debugger.PageCache.getData(PageCache.java:63)
at sun.jvm.hotspot.debugger.DebuggerBase.readBytes(DebuggerBase.java:217)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCInteger(LinuxDebuggerLocal.java:482)
at sun.jvm.hotspot.debugger.DebuggerBase.readCompOopAddressValue(DebuggerBase.java:459)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCompOopHandle(LinuxDebuggerLocal.java:442)
at sun.jvm.hotspot.debugger.linux.LinuxAddress.getCompOopHandleAt(LinuxAddress.java:125)
at sun.jvm.hotspot.oops.Oop.getKlassForOopHandle(Oop.java:231)
at sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:356)
at sun.jvm.hotspot.oops.NarrowOopField.getValue(NarrowOopField.java:44)
at sun.jvm.hotspot.oops.OopUtilities.threadOopGetParkBlocker(OopUtilities.java:294)
at sun.jvm.hotspot.runtime.JavaThread.getCurrentParkBlocker(JavaThread.java:385)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:82)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
Can't print deadlocks:10
Is it a defect of JVM?
are you sure the threaddump isn't happening anyway? We get this error but we still get a thread dump after a small wait

hadoop running application- ERROR security.UserGroupInformation: PriviledgedActionException

I have written WordCount code of hadoop as an java application in eclipse to test hadoop for running applications, but when I try to run it as hdfs user, this error will appear:
./hadoop jar /home/masi/eclipse_workspace/WordCount_apacheSample/bin/test2.jar WordCountApacheSample /user/hdfs/wordCountInput /user/hdfs/wordCountOutput
13/10/02 17:14:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/10/02 17:14:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/10/02 17:14:50 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.net.ConnectException: Call From virtual-machine/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Exception in thread "main" java.net.ConnectException: Call From virtual-machine/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:780)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:727)
at org.apache.hadoop.ipc.Client.call(Client.java:1239)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:630)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1559)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:811)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1345)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:140)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:418)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:333)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1236)
at WordCountApacheSample.main(WordCountApacheSample.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:597)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:526)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:490)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:508)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:603)
at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:253)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1288)
at org.apache.hadoop.ipc.Client.call(Client.java:1206)
... 29 more
although I have tested input and output paths with hdfs://localhost:9000/ , there is no difference!
BTW, I have studied many posts related to my problem but they were not useful
any help is appreciated.
thanks.
finally I solved the problem by myself and decide to tell the reason here to help others :) the reason sounds somehow silly but the problem was this : the hadoop daemons were stop! my VM shut down suddenly and after restarting the VM, I had forgotten to start daemons(datanode, namenode,...) again! so the reason of this problem is this: datanode and namenode and other daemons are not running.
if you discover that your hdfs is corrupt then you can do following:
sudo -su hdfs
hadoop fsck /
hadoop dfsadmin -safemode leave
... - then delete the corrupted files if any -using following:
hadoop fs -rmr -skipTrash <folder with your files>
hadoop fsck -files delete /
check status :
hadoop fsck /
status should be HEALTHY after this - then manually restart everything in Ambari
I tried this on a small cluster and managed to get it up and running again after having similar error as mentioned above

Solaris - Why is java.lang.UNIXProcess.forkAndExec(Native Method) hanging

I have a java application running on Solaris. This application regularily launches external processes using Runtime.exec. It seems that after a while, having successfully launched such processes many time over, a launching of a process will hang. A thread dump taken at this point (and several minutes later) reveals that java.lang.UNIXProcess.forkAndExec is "stuck". Following is the top of the relevant stack trace taken from the thread dump:
"Thread-85305" prio=3 tid=0x0000000102aae800 nid=0x21499 runnable [0x7fffffff2a3fe000]
java.lang.Thread.State: RUNNABLE
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(Unknown Source)
at java.lang.ProcessImpl.start(Unknown Source)
at java.lang.ProcessBuilder.start(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
I have read through some forums where others have experienced forAndExec throwing an IOException due to not enough space or not enough memory, but I'm not getting this error here. I'm now waiting to get the results of pstack in the hope that it will reveal more information.
Does anyone have any idea on how to resolve this issue?
thanks,
Mike
Installing the Sun Alert Patch Cluster will do the trick.
As the thread live as long as your executable is running, maybe it's only your external executable which is hanging.
You should try to find the parameters passed to your executable and try to launch it manually.
Is stdout and stderr from the process being consumed? You might be looking at the results of a full buffer.
You can test this by adding output redirection to a tempfile for the command that is spawned.

Categories

Resources