Error while running Storm MultiLang using Python

Error while running Storm MultiLang using Python - java

I am following a course work on Apache Storm from Udacity. The version of storm being used is 0.9.3
One of the exercises there is to run a topology which contains a bolt written in Python. Briefly here are the steps followed. For the purpose of this exercise my source directory is src and my package is udacity.storm
Create directory called resources/ under udacity/storm. Place two python scripts there - splitsentence.py and storm.py.
Create a bolt SplitSentence under the package udacity.storm. SplitSentence bolt derives from ShellBolt and implements the IRichBolt interface.
Build the topology using maven. During the process also package the resources/ directory within the JAR file.
Submit the topology to storm using the command storm jar target/mytopology.jar udacity.storm.MyTopology.
The topology loads up and dies immediately and I see the following error on the console
The storm client can only be run from within a release. You appear to
be trying to run the client from a checkout of Storm's source code.
I took a look at the storm.py code and figured out that this would happen if the lib/ directory is not present in the directory from where the python script is executing. After putting in some debug statements I identified that the python script runs from the following location :
/tmp/06380be9-d413-4ae5-b387-fafe3acf3e65/supervisor/stormdist/tweet-word-count-1-1449502750
I navigate to this directory and find that the lib/ folder is absent.
The Storm Multilang page does not give much information that would be helpful for beginners to debug the problem being faced.
Any help to solve this problem is greatly appreciated.

As the error says you try to run within the source code. Just download the binary release https://storm.apache.org/downloads.html and follow the setup instructions https://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html
Afterwards, you can prepare your jar file and submit to the cluster via bin/storm jar yourJarFile.jar (see https://storm.apache.org/documentation/Command-line-client.html)
There is no need (as long as you don't want to work on Storm itself) to download the source code manually. Just include the corresponding jar files from the binary release to your project. If you use maven (and only run in local mode), just include the corresponding maven dependency (see https://storm.apache.org/documentation/Maven.html); there is no need to download the binary release manually for this case.

I got the problem after some fair amount of looking around. Actually, the problem is not within the instructions themselves but because the storm.py file I had included in my resources directory was of an older or incorrect version - I had obtained the URL via a Google search and probably ended up with an incorrect one.
The storm.py to be downloaded is from this Github link. I am now able to run the exercises successfully.
Thank you all for your help. I will ensure that I post this up in Udacity forums so that people are aware of the confusion.

In case anyone else experiences this problem:
I had the same issue. However, I couldn't resolve it by copying storm.py from the binary release to my resources directory.
My initial error was "AttributeError: 'module' object has no attribute 'BasicBolt'"
You can add the correct Maven dependency to your pom.xml which will copy the correct dependencies into your JAR. Add the artifact "multilang-python", groupId "org.apache.storm" with version matching your Storm version, then run the clean and package goals to produce the updated JAR file.

Related

Apache spark project with single executable JAR with DataNucleus

I'm trying to run a Java project that uses Apache Spark and Java. The project is cloned from git: https://github.com/ONSdigital/address-index-data. I am new to both Spark and Java, which isn't helping me. I can't quite get to the solution using answers to similar questions , e.g. here
If I run the code, as is, from IntelliJ (with correct local Elasticsearch settings in application.conf), then everything works fine - IntelliJ seems to download the required jar files and link them at run time. However, I need to configure the project such that I can run it from the command line. This seems to be a known issue listed in the github project, with no solution offered.
If I run
sbt clean assembly
as in the instructions, it successfully makes a complete JAR file. However, then using
java -Dconfig.file=application.conf -jar batch/target/scala-2.11/ons-ai-batch-assembly-version.jar
this happens:
20/06/16 17:06:41 WARN Utils: Your hostname, MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.1.163 instead (on interface en0)
20/06/16 17:06:41 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/06/16 17:06:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/16 17:06:44 WARN Persistence: Error creating validator of type org.datanucleus.properties.CorePropertyValidator
ClassLoaderResolver for class "" gave error on creation : {1}
org.datanucleus.exceptions.NucleusUserException: ClassLoaderResolver for class "" gave error on creation : {1}
at org.datanucleus.NucleusContext.getClassLoaderResolver(NucleusContext.java:1087)
at org.datanucleus.PersistenceConfiguration.validatePropertyValue(PersistenceConfiguration.java:797)
at org.datanucleus.PersistenceConfiguration.setProperty(PersistenceConfiguration.java:714)
From previous posts, e.g. , I think this is because sbt is merging the jar files and information is lost. However, I cannot see how to either
Merge correctly, or
Collate all the JAR files necessary (including Scala libraries) with a build script that builds the classpath and executes the JAR file with a java command.
How can I proceed? Please keep instructions explicit, as I am really unsure about xml configs etc. And thanks!

So after a long time hitting my head against a wall, I finally managed to solve this one. The answer is mostly in two other stackoverflow solutions (here and here) (huge thanks to those authors!) but I'll add more detail as I still needed more pointers.
As
Oscar Korz said, the problem is that "the DataNucleus core tries to load modules as OSGi bundles, even when it is not running in an OSGi container. This works fine as long as the jars are not merged", which I need to do. So, when running "sbt clean assembly", the merged jar wrongly merged the datanucleus plugin files and didn't add the additional OSGi part in MANIFEST.MF.
I will give explicit details (and some tips) as to how I fixed the "fat jar".
To get the bulk of the "fat jar", I run
sbt clean assembly
but I made sure that I had also added the plugin.xml within assemblyMergeStrategy in build.sbt (using first or last, so we keep the plugin.xml):
assemblyMergeStrategy in assembly := {
...
case "plugin.xml" => MergeStrategy.first
...
}
This gives a "fat jar" (that still won't work) in the batch/target/scala-XXX folder, where XXX is the scala version used.
Copy the resulting jar tar file into a separate directory and then unpack it using:
jar xvf your-jar-assembly-0.1.jar
Within the unpacked folder, Edit the META-INF/MANIFEST.MF file by adding to the end:
Bundle-SymbolicName: org.datanucleus;singleton:=true
Premain-Class: org.datanucleus.enhancer.DataNucleusClassFileTransformer
Now we need to fix the plugin.xml by merging the 3 datanucleus files. Find and then unpack the original datanucleus jar files (as above) and separate out each plugin.xml (they are different). Anebril's solution in the stackoverflow solution gives a good start to merge these three files. But I will add a tip to help:
Pipe the contents from the 3 datanucleus files using this command and this will tell you where there are extensions which need merging:
cat plugin_core.xml plugin_rdbms.xml plugin_api.xml | grep -h "extension point" | tr -d "[:blank:]"| sort | uniq -d
You will still need to manually manage the merge of the elements highlighted as duplicates.
Within the unpacked your-jar-assembly-0.1.jar folder, replace the original plugin.xml with your newly merged plugin.xml.
Tar the jar file up again (but include the manifest!)
jar cmvf META-INF/MANIFEST.MF your-jar-assembly-0.1.jar *
Copy this jar file back into the batch/target/scala-XXX folder (replacing the original).
You can then use
java -Dconfig.file=application.conf -jar batch/target/scala-2.XXX/your-jar-assembly-0.1.jar
to run the fat jar. Voila!

Installation of metadata-extractor

I'm stuck on installation.
I downloaded Maven, but I'm not sure which file within
the metadata-extractor folder (that I downloaded from the repo) to use as the target.
I tried all the files at the top level.
All attempts have failed, e.g.
java -jar metadata-extractor-2.13.0.jar build.gradle
com.drew.imaging.ImageProcessingException: File format could not be determined
I am using v2.13.0 of the meta-data extractor

I've just started to code and took interest in this project but also had issues using it. Instead of using Maven i just downloaded the source code from github and threw the 'com' folder into my myproject/src folder. Then i downloaded the xmpcore-6.1.10.jar library and added it to the Build Path. Got it working that way. I hope someone will give you a better solution but if you just wanna do something right away, you can try this solution.

Try downloading IntelliJ and create a "new project from version controll". Use the URL given on the GitHub page (under download/code). That will save you a lot of problems.

ClassLoader.getSystemClassLoader().getResourceAsStream returns null after cloning Project

I've been applying the solutions from other similar questions.
I was getting a image from res folder using this line:
shell.setImage(new Image(display, ClassLoader.getSystemClassLoader().getResourceAsStream("icon_128.png")));
The file is inside "res" folder in the project.
It worked perfectly until I uploaded my project to a Git repo in Bitbucket. After cloning the project and importing it, now my project crash because getResourceAsStream("icon_128.png") returns null.
It's frustrating because it works perfectly in the other project which is not versioned into G|it, but crashes only in my cloned new directory project with Git.
In both versions of the project the file is inside the "res" folder.
What could be happening with this?

git has nothing to do with it. You didn't give enough detail to be sure about what's going on, but 2 obvious problems come to mind:
[1] getResourceAsStream looks for the named file in the same place that java looks for class files: The classpath. You're running this code either from an editor, or with java on the command line (in which case you're running a jar file and a build tool added a Class-Path entry to that jar, if you use the -jar switch, or you're not, in which case you're specifying the classpath on the command line), or with a build tool (in which case it will be supplying the classpath): icon_128.png needs to be in the root of one of the entries on the classpath, and now it isn't. The fix is to, well, fix that. Maven, for example, copies all resources find in /src/main/resources into any jars it makes. Your icon_128.png should be there.
[2] This isn't the right way to do it. The right way is ClassThisCodeIsIn.class.getResourceAsStream("/icon_128.png") (note: The starting slash; it is important). Your version has various somewhat exotic fail cases which this version skips. This version will look specifically in the classpath which produced your class file, and cannot NPE; your version will fail or throw NullPointerExceptions in various cases.
NB: When you cloned and re-built, the 'build' directories were effectively wiped because you don't check those into source control. That's why it worked before and doesn't now. git isn't to blame; you, or your IDE, copied icon_128.png to the build dir, and that step needs to be repeated every time you clone your git repo. A build tool automates this step and ensures that you can just do a fresh checkout from source control, and then invoke the build tool and all will be well after it finishes.

stardog sesame remote access - class org.openrdf.repository.base.AbstractRepository not found?

I am trying to integrate my application with stardog. The application accesses already other RDF repositories in Java via sesame remoteRepository interface.
Stardog writes in http://docs.stardog.com/#_using_sesame that this can be achieved via a use of StardogRepository() - but without saying what libs to include.
After a little search inside the stardog 4.1.3 installation I found and included in eclipse's WEB-INF/lib:
stardog-api-4.1.3.jar
stardog-sesame-core-4.1.3.jar
After that, the eclipse Java compiler shows no errors in the code but one error in the project path, saying:
The project was not built since its build path is incomplete. Cannot
find the class file for
org.openrdf.repository.base.AbstractRepository. Fix the build path
then try building this project
After project cleaning this problem remains.
This class org.openrdf.repository.base.AbstractRepository is defined in the lib eclipse-rdf4j-2.0.1.jar, which is also present in the WEB-INF/lib folder (with or without this latter eclipse-rdf4j-2.0.1.jar the problem is unchanged).
What lib shell I include in order to have these 2 lines of code in http://docs.stardog.com/# compiled?
Thanks a lot in advance for any hint.

The solution is: include openrdf-sesame-4.1.2-onejar.jar and the stardog code will compile.

Trying to remotely compile, using the command line, a Java program with multiple dependencies that I can currently only compile locally in Eclipse

Some weeks ago at work I took over a Java-based back-end web application written using Eclipse. The nature of the application is that it cannot be adequately tested locally, and instead changes need to be tested on our testing network (which involves pushing the changes to an AWS Micro server that we connect to via SSH).
Until now, I pushed changes in the same way as my predecessor: compile the program using Eclipse's Export to Runnable JAR File option, then scp the jar to the remote server and run it. However, this process has a huge problem. While compilation takes only seconds, the jar is well over 30MB, and pushing the entire thing from the office to the remote server over our fairly ordinary internet connection takes well over 10 minutes. If I'm having a particularly bad day and, say, introduce several minor bugs to the code and then discover them one by one, I can easily end up losing an hour or more in total twiddling my thumbs while pushing the whole jar over and over for a series of one-line changes.
Clearly, a saner solution than scping the entire jar for every change would be to simply remotely pull only the changed .java files from source control, and then compile the new version entirely remotely. However, I'm quite new to Java (and indeed programming generally) and all my Java work has been on existing Eclipse projects that I've taken over partway through development. As such, I know very little about compiling Java, and I have found the tutorials about this online are mostly either opaque or completely fail to address the question of how to compile code that uses external libraries.
I will relate here what information about the project's dependencies I can find from Eclipse, and my questions are these: what do I need to copy to the remote server, and where do I need to put it, to be able to compile remotely? What tools, if any, do I need to install on the remote server to be able to compile there? And once I've got everything set up, what do I actually type at the command line to get it to compile?
Anyway, here's what I know about the dependencies and directory structure (I've anonymised our application name by calling it “bunnies”):
The application source code is located in bunnies/src
We compile to bunnies/bin/main.jar
bunnies/dependencies contains three jars of external libraries that we use.
Right-clicking on the project in Eclipse, going to the Java Build Path section, and selecting the Libraries tab, I see
the three libraries above
(appearing in the form, e.g. “json-simple-1.1.1.jar - /home/mark/workspace/bunnies/dependencies”)
a fourth jar file in another location
(“M2_REPO/com/google/guava/guava/r09/guava-r09.jar - /home/mark/.m2/repository/com/google/guava/guava/r09/guava-r09.jar”)
JRE System Library [java-6-openjdk-i386]
But there's more! We also use two libraries, mahout-core and mahout-integration, that are included as separate projects in the same workspace rather than as jar files in the dependencies folder. They appear by name on the Projects tab of the Java Build Path section of the bunnies project, and are located at /home/mark/workspace/mahout-core and /home/mark/workspace/mahout-integration respectively.
Since I am not a Java whiz, perhaps there are also some other hidden dependencies I'm missing, that don't appear in any of the places I've looked so far?
If anyone can walk me through the steps of compiling this huge mess from the command line, without needing to use the Export option in Eclipse, so that I can ultimately compile it all remotely, I would be highly appreciative.

Look into Apache Ant. It's a build-suite for Java, sort of like an XML based Makefile system.
I have a Java system running on a remote server. I have a directory structure separated into /src and /build. I then just scp the .java files from my local machine to the /src folder and build using ant.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.