storm topology unable to find stormconf.ser file - java

I am running storm topology on my cluster and I could find that the topology is submitted successfully. But it is not reading messages from Kafka. And when I checked the Topology logs in storm I could find that their is the issue with stormconf.ser file.
it is not able to find this file in the path like - /archive/hadoop/storm/supervisor/stormdist/FieldsGrouping_Topology-2-1441255014/stormconf.ser
Here is the full Stack Trace ---
2015-09-03 00:39:06 b.s.d.worker [ERROR] Error on initialization of server mk-worker
java.io.FileNotFoundException: File '/archive/hadoop/storm/supervisor/stormdist/FieldsGrouping_Topology-2-1441255014/stormconf.ser' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299) ~[commons-io-2.4.jar:2.4]
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763) ~[commons-io-2.4.jar:2.4]
at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:214) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.worker$fn__6019$exec_fn__1142__auto____6020.invoke(worker.clj:382) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
Please help me on this.....
Thanks.

Related

Flume twitter stream

I am trying to execute flume to get data from twitter stream but received this error while executing the flume.
[ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.TwitterStream.addListener(Ltwitter4j/StreamListener;)V
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:140)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I am a beginner to flume and working in Cloudera quickstart. While searching for solutions it was said to install maven and then build the flume-snapshot jar file from there but I don't know how I can install maven in Cloudera quickstart. Any help on how to correct this error please I have been stuck here for 1 week.
Found the solution:
The conflict is raised by twitter4j files and flume snapshot jars. So I renamed the twitter4j jars by changing their file extenstion with jarx. Another thing I did by reading from this article is to put flume snapshot in following hierarchy.
/usr/lib/flume-ng/lib/plugins.d/flumesnapshot and following same pattern in var directory.

How to run MapReduce Program from local IDE on remote cluster

I have a simple MapReduce program which I want to run it on a remote cluster. I can do this from command line by simply running
hadoop jar myjar.jar input output
but when I want to run a function in my junit TestCase class from my IDE which invokes the MR job, I get the following warnings:
WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
although I have this line set, before submitting the MR job:
job.setJarByClass(MyJob.class);
and hence the job fails as it cannot find the appropriate classes (like MyMapKey which is the mapper key class) to operate.
Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :java.lang.RuntimeException: java.lang.ClassNotFoundException: Class MyMapKey not found
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:414)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
any thoughts on this?
First you should add remote Hadoop cluster config files (i.e. core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, ssl-client.xml) as resources to your Configuration object. Then follow the steps in the above link to see how you should add manually the job jar to classpath on remote cluster.

Apache Spark job failed with FileNotFoundExceptoin

I have a spark cluster consists of 5 nodes and I have a spark job written in Java that read set of files from a directory and send the content to Kafka.
When I was testing the job locally, everything was working fine.
When I tried to submit the job to the cluster, the job failed with FileNoTFoundException
The files need to be processed exists in a directory mounted on all the 5 nodes so I am sure the file path appears in the exception exists.
Here is the exception appears while submitting the job
java.io.FileNotFoundException: File file:/home/me/shared/input_1.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
The directory /home/me/shared/ is mounted on all the 5 nodes.
EDIT:
Here is the command I am using to submit the job
bin$ ./spark-submit --total-executor-cores 20 --executor-memory 5G --class org.company.java.FileMigration.FileSparkMigrator --master spark://spark-master:7077 /home/me/FileMigrator-0.1.1-jar-with-dependencies.jar /home/me/shared kafka01,kafka02,kafka03,kafka04,kafka05 kafka_topic
I faced a weird behavior. I submitted the job while the directory contains only one file, the exception is thrown on the driver but the file is processed successfully. Then, I added another file, the same behavior occurred. But, once I added the third file, the exception is thrown and the job failed.
EDIT 2
After some tries, we discovered that there was a problem in the mounted directory that caused this weird behavior.
Spark defaults to HDFS by default. This looks like an NFS file, so try to access it with: file:///home/me/shared/input_1.txt
Yes, three /!
Here is what solve the problem for me. It is weird and I have no idea what the actual problem was.
Simply I asked the sysadmin to mount another directory instead of the one I was using. After that, everything worked fine.
Is seems there was an issue in the old mounted directory but I have no idea what was the actual problem.

Can anyone tell us what the ! means? "java.io.FileNotFoundException: JAR entry WEB-INF/lib/antlr-runtime-3.5.2.jar!/ not found"

I'm trying to run tapestry atmosphere to write a chat application for a school project. During this I came to an error even my professors don't understand.
java.io.FileNotFoundException: JAR entry WEB-INF/lib/antlr-runtime-3.5.2.jar!/ not found
Nobody know what the "!" stands for. If anyone would know Please let all of us know so tat this error can be thought to new students in the future and present.
this is being done on the demo framework from UkLance : tapestry-atmosphere
https://github.com/uklance/tapestry-atmosphere
Below is the error log from (intellij)
2016-10-27 10:26:26.980:WARN:oejw.WebAppContext:Scanner-1: Failed startup of context o.e.j.w.WebAppContext#71a39e83{/tapestry-atmosphere-demo,jar:file:///C:("locationToRoot")/tapestry-atmosphere-master/tapestry-atmosphere-demo/target/tapestry-atmosphere-demo.war!/,null}{C:\("locationToRoot")\tapestry-atmosphere-master\tapestry-atmosphere-demo\target\tapestry-atmosphere-demo.war}
java.io.FileNotFoundException: JAR entry WEB-INF/lib/antlr-runtime-3.5.2.jar!/ not found in C:/("locationToRoot")\tapestry-atmosphere-master\tapestry-atmosphere-demo\target\tapestry-atmosphere-demo.war
at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:142)
at sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
at org.eclipse.jetty.webapp.MetaInfConfiguration.getTlds(MetaInfConfiguration.java:409)
at org.eclipse.jetty.webapp.MetaInfConfiguration.scanForTlds(MetaInfConfiguration.java:326)
at org.eclipse.jetty.webapp.MetaInfConfiguration.scanJars(MetaInfConfiguration.java:143)
at org.eclipse.jetty.webapp.MetaInfConfiguration.preConfigure(MetaInfConfiguration.java:94)
at org.eclipse.jetty.webapp.WebAppContext.preConfigure(WebAppContext.java:483)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:519)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:499)
at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:147)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
at org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:458)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
at org.eclipse.jetty.util.Scanner$1.run(Scanner.java:329)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
[2016-10-27 10:26:27,252] Artifact tapestry-atmosphere-demo:war: Artifact is deployed successfully
[2016-10-27 10:26:27,252] Artifact tapestry-atmosphere-demo:war: Deploy took 2,079 milliseconds
Disconnected from the target VM, address: '127.0.0.1:57612', transport: 'socket'
Thank you for taking your time to read/think about this question.
xxx.jar!/yyy means you are trying to load a resource /yyy from xxx.jar. It's an absolute route to the resource.
Usually, something like MyClass.class.getResource("/yyy") is being called.
In this particular case you are loading resource "" (empty string) from your jar, and the resource obviously does not exist. Probably the resource name is not propagated properly.

Error when trying to run STORM_TEST on yahoo streaming-benchmark

When I try to test STORM using yahoo streaming-benchmark I get these errors. I tried changing the port to 2080 instead of default port 2181 in ZooKeeper "zoo.cfg" file and kafka "server.properties". Still I get the same error. Any help would be much appreciated. Thanks in advance. :-)
2792 [main] INFO o.a.s.s.o.a.z.s.NIOServerCnxnFactory - binding to port 0.0.0.0/0.0.0.0:2181
2796 [main] ERROR o.a.s.s.o.a.z.s.NIOServerCnxnFactory - Thread Thread[main,5,main] died
java.lang.RuntimeException: No port is available to launch an inprocess zookeeper.
at org.apache.storm.zookeeper$mk_inprocess_zookeeper$fn__2124$fn__2126.invoke(zookeeper.clj:223) ~[storm-core-1.0.1.jar:1.0.1]
at org.apache.storm.zookeeper$mk_inprocess_zookeeper$fn__2124.invoke(zookeeper.clj:219) ~[storm-core-1.0.1.jar:1.0.1]
at org.apache.storm.zookeeper$mk_inprocess_zookeeper.doInvoke(zookeeper.clj:217) ~[storm-core-1.0.1.jar:1.0.1]
at clojure.lang.RestFn.invoke(RestFn.java:439) ~[clojure-1.7.0.jar:?]
at org.apache.storm.command.dev_zookeeper$_main.doInvoke(dev_zookeeper.clj:25) ~[storm-core-1.0.1.jar:1.0.1]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.7.0.jar:?]
at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.7.0.jar:?]
at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.7.0.jar:?]
at org.apache.storm.command.dev_zookeeper.main(Unknown Source) ~[storm-core-1.0.1.jar:1.0.1]
Redis is already running...
WARNING: send already refers to: #'clojure.core/send in namespace: setup.core, being replaced by: #'clj-kafka.new.producer/send
{:redis-host localhost, :kakfa-brokers localhost:9092}
Writing campaigns data to Redis.
Error: Could not find or load main class .home.eranga.Software.kafka-0.10.0.1.config.server.properties
Unrecognized option: --create
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
The answer is described in first and second comments by myself.

Categories

Resources