Find topology jar version running in Apache Storm - java

I have a running storm topology started using a packaged jar. I'm trying to find the version of the jar the topology is running. As far as I can tell, Storm will only show the version of Storm that is running, not the version of the topology running.
Running the "storm version" command only gives the version of storm running and I don't see anything in the topology section of the Storm UI to indicate topology version.
Is there any way to have Storm report this or is my best bet setting a properties file? Ideally, this would be done automatically with either the pom.xml version or a git commit hash. Another solution I'd be happy with would be to have Storm report on the jar file name used to start the topology.

One way could be to list the
$STORM_HOME/storm-local/supervisor/stormdist/name-of-your-running-topology
while the topology is running, and look at the stormjar.jar file. This is the uber jar used by storm when submitting the topology. Comparing the size of this jar with the uber jar generated from your java project build command, they should be identical and give you hint about the jar version used.

Related

NoSuchMethodError: org/apache/hadoop/mapreduce/util/MRJobConfUtil.setTaskLogProgressDeltaThresholds

I am getting the following error while executing a mapreduce job in my hadoop cluster (distributed cluster).
I found the error below in the application logs in Yarn where the mapper fails.
java.lang.NoSuchMethodError: org/apache/hadoop/mapreduce/util/MRJobConfUtil.setTaskLogProgressDeltaThresholds(Lorg/apache/hadoop/conf/Configuration;)V (loaded from file:/data/hadoop/yarn/usercache/hdfs-user/appcache/application_1671477750397_2609/filecache/11/job.jar/job.jar by sun.misc.Launcher$AppClassLoader#8bf41861) called from class org.apache.hadoop.mapred.TaskAttemptListenerImpl
The hadoop version is Hadoop 3.3.0
Okay, that method exists in that version, but not 3.0.0
Therefore, you need to use hadoop-client dependency of that version, not 3.0.0-cdh6.... Also, use compileOnly with it not implementation. This way, it is not confliciting with what YARN already has on its classpath.
Similarly, spark-core would have the same problem, and if you have Spark in your app anyway, then use it, not MapReduce functions.
Run gradle dependencies, then search for hadoop libraries and ensure they are 3.3.0

Upgrade apache storm (1.0.0 to 1.2.3)

I needed update apache storm 1.0.0 to latest version 1.x.x (1.2.3). What is the steps need? What is the changes in storm configuration file? Is there any guide that help (migration)?
thank you so much
I don't believe (but could be wrong) that there are any breaking changes in the configuration file between 1.0.0 and 1.2.3. Migration should follow the usual steps for Storm upgrades:
Upgrade your Storm hosts one at a time by
Downloading the 1.2.3 binary release
Shutting down the Storm services running on the host (e.g. workers, supervisor, Nimbus)
Extracting the 1.2.3 release, and making any adjustments to the configuration files you had in the old install
Restarting your Storm services on that host, using the new install instead of the old install
You may also want to
Upgrade your topologies to build against the version of Storm you're upgrading to (optional, but can help catch any compile-time errors in non-core modules that may have changed within the major version, such as storm-kafka-client)
This way you can upgrade with no downtime. Of course if you don't care about downtime you can just shut down the whole cluster, do the steps outlined above, then restart the cluster using the new installs.
Edit: You're asking about the full diff of configuration options between 1.2.3 and 2.0.0. You can find every configuration option for 1.2.3 here, and the full list of configuration options for 2.0.0 here and here.

Apache Nutch 1.9 in local Eclipse to run on Amazon EMR remotely

I am on Windows 8 32 bit, running Eclipse Juno.
I have just started working on Amazon EMR. So far, I am being able to connect to EMR remotely from my local using SSH and inside Eclipse. I could run my custom JAR on EMR remotely by creating AWS project in Eclipse and using th Custom JAR execution on EMR commands.
I am now trying to run Apache Nutch 1.9 from inside my Eclipse. I did Ant build to create Nutch Eclipse project and I am being to export inside Eclipse workspace successfully. Now, when I am running the Injector I am getting the following error:
Injector: starting at 2015-04-20 00:56:08
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Kajari_G\mapred\staging\Kajari_G881485826\.staging to 0700
I found out this is sue to permission issues of Hadoop. After lots of search online I realized this is a common issue in Windows. I ran it via Cygwin as Admin and still couldn't fix it.
So, now I want to still run the Injector code, but I want to run it on my remote EMR cluster, instead of in my local.
Can you please guide me how to tell my Apache Nutch Eclipse project to run on Amazon EMR and not locally? I don't want to create a JAR and run it. I want to run it as an usual Run As --> in Eclipse.
Is this possible to do at all? I did search this online, but couldn't find any working solution.
Thanks!
As far as I know you can not run Nutch in distributed mode from Eclipse. In order to run Nutch on hadoop cluster you have to follow these steps:
Apply your required configuration in nutch-site.xml and other config files (according to the selected plugins)
Build Nutch using ant runtime
Follow the runtime/deploy directory to find nutch hadoop job.
Run following command:
hadoop jar nutch-${version}.job ${your_main_class} ${class_parameters}
For example suppose your main crawler class in org.apache.nutch.crawl.crawler in this case the running command would be:
hadoop jar nutch-${version}.job org.apache.nutch.crawl.crawler urls -dir crawl -depth 2 -topN 1000

Apache Samza does not run

I am trying to set up a Apache Samza and Kafka environment. I am experiencing some problems when trying to run the modules.
I have Kafka working correctly but I can not make Samza work. I have installed two Debian Jeesy AMD64 boxes and followed the instructions of the Samza documentation:
apt-get install openjdk-7-jdk openjdk-7-jre git maven
git clone http://git-wip-us.apache.org/repos/asf/samza.git
cd samza
./gradlew clean build
When I try to launch the script that should start the Yarn AppMaster with the script provided with Samza:
/opt/samza/samza-shell/src/main/bash/run-am.sh
I get this error:
Error: Main class org.apache.samza.job.yarn.SamzaAppMaster has not been found or loaded
If I try to run a test job with the run-job.sh script
./run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
and I get a similar error referencing the org.apache.samza.job.JobRunner class.
I am thinking that I have a java configuration issue, but I am not able to find much help or reference.
Does anyone know what I am doing wrong?
Still not working but I have gone one step ahead. When executing the Samza provided scripts from a path, they expect to be located in a /bin/ folder and they need to have a /lib/ one where all the samza .jar files should be located.
I am still having some dependencies issues, but different ones.

Hadoop library conflict at mapreduce time

I have a jar that uses the Hadoop API to launch various remote mapreduce jobs (ie, im not using the command-line to initiate the job). The service jar that executes the various jobs is built with maven's "jar-with-dependencies".
My jobs all run fine except one that uses commons-codec 1.7, I get:
FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeAsString([B)Ljava/lang/String;
I think this is because my jar is including commons-codec 1.7 whereas my Hadoop install's lib has commons-codec 1.4 ...
Is their any way to instruct Hadoop to use the distributed commons-codec 1.7 (I assume this is distributed as a job dependency) rather than the commons-codec 1.4 in the hadoop 1.0.3 core lib?
Many thanks!
Note: Removing commons-codec-1.4.jar from my Hadoop library folder does solve the problem, but doesn't seem too sane. Hopefully there is a better alternative.
Two approaches:
You should be able to exclude commons-codec from within the hadoop dependency and add another explicit dependency for commons-codec
Try setting the scope to provided so that none of the hadoop jars get included. This assumes that those jars would be in the runtime class path.

Categories

Resources