Apache spark project with single executable JAR with DataNucleus - java

I'm trying to run a Java project that uses Apache Spark and Java. The project is cloned from git: https://github.com/ONSdigital/address-index-data. I am new to both Spark and Java, which isn't helping me. I can't quite get to the solution using answers to similar questions , e.g. here
If I run the code, as is, from IntelliJ (with correct local Elasticsearch settings in application.conf), then everything works fine - IntelliJ seems to download the required jar files and link them at run time. However, I need to configure the project such that I can run it from the command line. This seems to be a known issue listed in the github project, with no solution offered.
If I run
sbt clean assembly
as in the instructions, it successfully makes a complete JAR file. However, then using
java -Dconfig.file=application.conf -jar batch/target/scala-2.11/ons-ai-batch-assembly-version.jar
this happens:
20/06/16 17:06:41 WARN Utils: Your hostname, MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.1.163 instead (on interface en0)
20/06/16 17:06:41 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/06/16 17:06:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/16 17:06:44 WARN Persistence: Error creating validator of type org.datanucleus.properties.CorePropertyValidator
ClassLoaderResolver for class "" gave error on creation : {1}
org.datanucleus.exceptions.NucleusUserException: ClassLoaderResolver for class "" gave error on creation : {1}
at org.datanucleus.NucleusContext.getClassLoaderResolver(NucleusContext.java:1087)
at org.datanucleus.PersistenceConfiguration.validatePropertyValue(PersistenceConfiguration.java:797)
at org.datanucleus.PersistenceConfiguration.setProperty(PersistenceConfiguration.java:714)
From previous posts, e.g. , I think this is because sbt is merging the jar files and information is lost. However, I cannot see how to either
Merge correctly, or
Collate all the JAR files necessary (including Scala libraries) with a build script that builds the classpath and executes the JAR file with a java command.
How can I proceed? Please keep instructions explicit, as I am really unsure about xml configs etc. And thanks!

So after a long time hitting my head against a wall, I finally managed to solve this one. The answer is mostly in two other stackoverflow solutions (here and here) (huge thanks to those authors!) but I'll add more detail as I still needed more pointers.
As
Oscar Korz said, the problem is that "the DataNucleus core tries to load modules as OSGi bundles, even when it is not running in an OSGi container. This works fine as long as the jars are not merged", which I need to do. So, when running "sbt clean assembly", the merged jar wrongly merged the datanucleus plugin files and didn't add the additional OSGi part in MANIFEST.MF.
I will give explicit details (and some tips) as to how I fixed the "fat jar".
To get the bulk of the "fat jar", I run
sbt clean assembly
but I made sure that I had also added the plugin.xml within assemblyMergeStrategy in build.sbt (using first or last, so we keep the plugin.xml):
assemblyMergeStrategy in assembly := {
...
case "plugin.xml" => MergeStrategy.first
...
}
This gives a "fat jar" (that still won't work) in the batch/target/scala-XXX folder, where XXX is the scala version used.
Copy the resulting jar tar file into a separate directory and then unpack it using:
jar xvf your-jar-assembly-0.1.jar
Within the unpacked folder, Edit the META-INF/MANIFEST.MF file by adding to the end:
Bundle-SymbolicName: org.datanucleus;singleton:=true
Premain-Class: org.datanucleus.enhancer.DataNucleusClassFileTransformer
Now we need to fix the plugin.xml by merging the 3 datanucleus files. Find and then unpack the original datanucleus jar files (as above) and separate out each plugin.xml (they are different). Anebril's solution in the stackoverflow solution gives a good start to merge these three files. But I will add a tip to help:
Pipe the contents from the 3 datanucleus files using this command and this will tell you where there are extensions which need merging:
cat plugin_core.xml plugin_rdbms.xml plugin_api.xml | grep -h "extension point" | tr -d "[:blank:]"| sort | uniq -d
You will still need to manually manage the merge of the elements highlighted as duplicates.
Within the unpacked your-jar-assembly-0.1.jar folder, replace the original plugin.xml with your newly merged plugin.xml.
Tar the jar file up again (but include the manifest!)
jar cmvf META-INF/MANIFEST.MF your-jar-assembly-0.1.jar *
Copy this jar file back into the batch/target/scala-XXX folder (replacing the original).
You can then use
java -Dconfig.file=application.conf -jar batch/target/scala-2.XXX/your-jar-assembly-0.1.jar
to run the fat jar. Voila!

Related

Include library for JAXB into JAR file without using Maven

I have a Java project in Eclipse that uses javax.xml.bind.JAXB classes.
Starting the application from inside Eclipse works perfectly.
However, when I export the project as (runnable) jar file and run it using java -jar myfile.jar it terminates with a java.lang.NoClassDefFoundError: javax/xml/bin/JAXBException.
Also playing around with the three options for Library handling in Eclipse Runnable JAR File Specification (extract, package, sub-folder) does not solve the problem - in fact, no libraries are exported in any case.
It seems that the library for JAXB (it seems to be rt.jar) is not considered as required library to be included into the jar file. However, when running the jar file, it is not found nevertheless.
I have read that the library must be added to the classpath but this seems strange to me as rt.jar is part of the standard libraries. Is there something special about this library?
Currently, I do not use Maven or something similar for dependency and build management and if possible I want to avoid it for the future. I think, there also must be a way without Maven.
I found several posts here on SO and in Google but was not able to work out a solution for me.
Thank you very much!
As remarked in the comments, Eclipse probably uses a different Java version than your system (by default). The JAXB API and implementation is not available in JRE 11.
To work on all versions of Java, your best option is:
Download the JAXB RI distribution. Nowadays I'll choose version 3.0 (which is binary incompatible with the one in Java 8, since it uses jakarta.xml instead of javax.xml for the packages name) as in Juliano's answer:
https://repo1.maven.org/maven2/com/sun/xml/bind/jaxb-ri/3.0.0/jaxb-ri-3.0.0.zip
Copy the 4 files jakarta.activation.jar, jakarta.xml.bind-api.jar, jaxb-core.jar and jaxb-impl.jar from the mod folder into the library folder of your project (let's say lib),
Add the 4 libraries to the project's "Build Path",
Make sure you use JAXB 3.0 throughout your code (the packages of the annotations and classes start with jakarta.xml)
Run the application once in Eclipse, so it updates the Run Configuration (or update the classpath of the Run Configuration yourself),
Export the project to a JAR file.
Among the three export options proposed by Eclipse: "extract required libraries" will create a so-called fat jar (everything in one JAR-file). It works, but it deletes the licence notices in the JAXB jars (so it can not be distributed). "copy required libraries" is your best option, but then you have to move the jar file together with the subfolder. _". "package required libraries" will not work, since jars in a jar are not read by the JVM (unlike JARs in a WAR package).
Edit by the author of the question:
The above worked for me well except that I experienced small differences how the two libraries (javax.xml in Java 8 and jakarta.xml in version 3.0) handle #XmlAttribute annotations. In javax.xml, I could place an annotation without further arguments on the public getter-method, e.g.
#XmlAttribute
public String getDescription() {
return "";
}
And this worked when the attribute name in the xml file is description. However, with jakarta.xml I had to add the name of the attribute:
#XmlAttribute(name="description")
public String getDescription() {
return "";
}
Just in the case, that others experience the same problem.
I thought about this myself too, since I am new to java.
There is a description of a Extension Mechanism in the java tutorials (SE), but it is no longer used since deprecated by Oracle. See, just to know of what I am talking about: https://docs.oracle.com/javase/tutorial/ext/index.html
What was this Extension thing in a nutshell: just drop your jar files inside the jdk lib and you could use the import keyword in all your classes to use the new jar file.
However, others had to do the same thing in their computers to run a class which imported your own update to the jdk.
Maven do something like the above. It searches on the pom file which other jar files it should include in your jar when you build an application. Hence, it may run anywhere.
Another way of looking into this is the answer which you should try to do.
A clunckier way of doing what Maven does without its pom structure is to create a new folder inside your src folder and copy the jakarta.xml.bind-api.jar. Just like when you create an object (aJavaBean) and need to use it in another class.
The file you need to include in your library is available at:
https://repo1.maven.org/maven2/com/sun/xml/bind/jaxb-ri/3.0.0/jaxb-ri-3.0.0.zip
Finally, extract the classes inside this newly created folder and use the import keyword in the classes that depend on it just like when you create your own classes.
Another thing you should try is to use the manifest file when making your jar.
https://docs.oracle.com/javase/tutorial/deployment/jar/manifestindex.html
This tutorial shows how to include a classpath to the files you need to run as a dependency. Make sure that everything you need is inside the newly created jar file.
Also, set the entry point in the manifest, so your application can run just using
java -jar MyJar.jar
in the command line.
The easiest way is to use JDK 8 (or older JDK) that has embedded the required jaxb library. The hard way requires that you set your CLASSPATH variable pointing to each required jaxb jar file.
From spec at https://javaee.github.io/jaxb-v2/doc/user-guide/release-documentation.html#a-2-3-0, the following jars are required using a java version 11 or above.
jaxb-api.jar
jaxb-core.jar
jaxb-impl.jar
A good article on this question is https://www.jesperdj.com/2018/09/30/jaxb-on-java-9-10-11-and-beyond/

ClassLoader.getSystemClassLoader().getResourceAsStream returns null after cloning Project

I've been applying the solutions from other similar questions.
I was getting a image from res folder using this line:
shell.setImage(new Image(display, ClassLoader.getSystemClassLoader().getResourceAsStream("icon_128.png")));
The file is inside "res" folder in the project.
It worked perfectly until I uploaded my project to a Git repo in Bitbucket. After cloning the project and importing it, now my project crash because getResourceAsStream("icon_128.png") returns null.
It's frustrating because it works perfectly in the other project which is not versioned into G|it, but crashes only in my cloned new directory project with Git.
In both versions of the project the file is inside the "res" folder.
What could be happening with this?
git has nothing to do with it. You didn't give enough detail to be sure about what's going on, but 2 obvious problems come to mind:
[1] getResourceAsStream looks for the named file in the same place that java looks for class files: The classpath. You're running this code either from an editor, or with java on the command line (in which case you're running a jar file and a build tool added a Class-Path entry to that jar, if you use the -jar switch, or you're not, in which case you're specifying the classpath on the command line), or with a build tool (in which case it will be supplying the classpath): icon_128.png needs to be in the root of one of the entries on the classpath, and now it isn't. The fix is to, well, fix that. Maven, for example, copies all resources find in /src/main/resources into any jars it makes. Your icon_128.png should be there.
[2] This isn't the right way to do it. The right way is ClassThisCodeIsIn.class.getResourceAsStream("/icon_128.png") (note: The starting slash; it is important). Your version has various somewhat exotic fail cases which this version skips. This version will look specifically in the classpath which produced your class file, and cannot NPE; your version will fail or throw NullPointerExceptions in various cases.
NB: When you cloned and re-built, the 'build' directories were effectively wiped because you don't check those into source control. That's why it worked before and doesn't now. git isn't to blame; you, or your IDE, copied icon_128.png to the build dir, and that step needs to be repeated every time you clone your git repo. A build tool automates this step and ensures that you can just do a fresh checkout from source control, and then invoke the build tool and all will be well after it finishes.

java.lang.ClassNotFoundException when running in IntelliJ IDEA

I creating a program to work with databases and I am getting the following error when compiling in IntelliJ IDEA. Does anyone why this is happening and how I could solve it?
The error that you get occurs not on complilation, but when you try to run your application. It happens because Java was not able to find Table.class file inside db subdirectory of the project output directory (classpath).
It can happen for multiple reasons:
wrong main class selected in the run/debug configuration
Table.java is excluded from compilation (by accident or intentionally because it contained errors and you wanted to skip it while working on other code)
class not compiled because Build step is excluded from from Before launch steps in the Run/Debug configuration
project is misconfigured and there is no Source root defined for the directory containing db subdirectory
Table.java has incorrect package statement or is located/moved to a different package
project path contains a colon : on Mac/Linux or semicolon ; on Windows, it's used to separate the classpath and will render the classpath invalid. See this thread for details. Note that Finder on Mac may display colons in the path as slashes.
the jar may not execute if one of the dependent jars is digitally signed since the new artifact will include the partial signature of the dependency. See this answer for more details.
In project structure make sure you have the right Java version for compile.
there is a known bug that sometimes a Java project created from the Command Line template doesn't work because .idea/modules.xml file references invalid module file named untitled104.iml. Fix the module name manually or create a project from scratch and don't use a template.
on Windows "Beta: Use Unicode UTF-8 for worldwide language support" Region Setting is enabled. See IDEA-247837 for more details and workarounds.
When IntelliJ IDEA is configured to store module dependencies in Eclipse format source root configuration is lost due to a known bug. Configure the module to use IntelliJ IDEA format dependencies as a workaround.
In a properly configured project and with the correct run/debug configuration everything works just fine:
the jar may not execute if one of the dependent jars is digitally signed since the new artifact will include the partial signature of the dependency. See this answer for more details.
I must again emphasis the point CrazyCoder has here.
The (Oracle) JVM used to throw a SecurityException when you tried to run a Jar-File containing broken signatures. This made sense from a "What's wrong"-Point of view.
That is no longer the case. They are indeed throwing ClassNotFoundExceptions now - even if the class is right there in the file (no matter if it is in the default package/toplevel or way down in a nested package structure).
Here's what worked for me:
I deleted .ide folder, .iml file. And all other auto generated files by intelliJ then restarted my ide and I was asked if I want to make my project run with maven that's it.
Obviously I said yes :)
This is a known bug in the IntelliJ idea.
To fix this I just deleted the .iml and the .idea and restart the IDE.
It works for most of the cases
Edit: The files will be in the project directories.
In my case the default console app template works only if the project folder path does not contain underscore (_) in it. Underscore brings the error
Error: Could not find or load main class com.company.Main
Caused by: java.lang.ClassNotFoundException: com.company.Main
IntelliJ IDEA 2021.3.1 (Ultimate Edition)
Build #IU-213.6461.79, built on December 28, 2021
If you've tried everything else that others have suggested (deleting .idea folder, rebuild, etc) there's another place to check, especially if you've built an artifact jar. When you first build an artifact jar, IntelliJ adds a folder: META-INF to src directory. in it is a single file: MANIFEST.MF which has info pointing to the Main-Class for Java to find. If you've refactored your project package, unfortunately IntelliJ does not update this file with the new changes. My MANIFEST.MF has the following correct content:
Manifest-Version: 1.0
Main-Class: org.umoja4life.fatashibackend.MainKt
Where "org.umoja4life.fatashibackend" is the package name, and "MainKt" is IntelliJ's constructed name for a (pseudo) "Main Class" because fun main() has been defined in file "main.kt" in the package directory.
Newbies: btw, This will be confusing for you because there should be no actual "class Main {}" definition despite the error message stating there should be.
Before I discovered this file and after trying everyone else's suggestions, I found it quickest to just have IntelliJ start a project (with correct package name!), initialize it with a trivial main.kt having:
fun main() { println("hello world!") }
run and test that; then, I added back in all my other files, rebuilt, ran, and tested it. Apparently IntelliJ has some secret state information stored somewhere which doesn't get correctly updated if your refactor your package name for an already running project and jar.

Error while running Storm MultiLang using Python

I am following a course work on Apache Storm from Udacity. The version of storm being used is 0.9.3
One of the exercises there is to run a topology which contains a bolt written in Python. Briefly here are the steps followed. For the purpose of this exercise my source directory is src and my package is udacity.storm
Create directory called resources/ under udacity/storm. Place two python scripts there - splitsentence.py and storm.py.
Create a bolt SplitSentence under the package udacity.storm. SplitSentence bolt derives from ShellBolt and implements the IRichBolt interface.
Build the topology using maven. During the process also package the resources/ directory within the JAR file.
Submit the topology to storm using the command storm jar target/mytopology.jar udacity.storm.MyTopology.
The topology loads up and dies immediately and I see the following error on the console
The storm client can only be run from within a release. You appear to
be trying to run the client from a checkout of Storm's source code.
I took a look at the storm.py code and figured out that this would happen if the lib/ directory is not present in the directory from where the python script is executing. After putting in some debug statements I identified that the python script runs from the following location :
/tmp/06380be9-d413-4ae5-b387-fafe3acf3e65/supervisor/stormdist/tweet-word-count-1-1449502750
I navigate to this directory and find that the lib/ folder is absent.
The Storm Multilang page does not give much information that would be helpful for beginners to debug the problem being faced.
Any help to solve this problem is greatly appreciated.
As the error says you try to run within the source code. Just download the binary release https://storm.apache.org/downloads.html and follow the setup instructions https://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html
Afterwards, you can prepare your jar file and submit to the cluster via bin/storm jar yourJarFile.jar (see https://storm.apache.org/documentation/Command-line-client.html)
There is no need (as long as you don't want to work on Storm itself) to download the source code manually. Just include the corresponding jar files from the binary release to your project. If you use maven (and only run in local mode), just include the corresponding maven dependency (see https://storm.apache.org/documentation/Maven.html); there is no need to download the binary release manually for this case.
I got the problem after some fair amount of looking around. Actually, the problem is not within the instructions themselves but because the storm.py file I had included in my resources directory was of an older or incorrect version - I had obtained the URL via a Google search and probably ended up with an incorrect one.
The storm.py to be downloaded is from this Github link. I am now able to run the exercises successfully.
Thank you all for your help. I will ensure that I post this up in Udacity forums so that people are aware of the confusion.
In case anyone else experiences this problem:
I had the same issue. However, I couldn't resolve it by copying storm.py from the binary release to my resources directory.
My initial error was "AttributeError: 'module' object has no attribute 'BasicBolt'"
You can add the correct Maven dependency to your pom.xml which will copy the correct dependencies into your JAR. Add the artifact "multilang-python", groupId "org.apache.storm" with version matching your Storm version, then run the clean and package goals to produce the updated JAR file.

Class-Path setting in executable jar doesn't seem to work for plugins

I've looked here and the wider web to find a solution to this. There's related material, but I've been unable to find anything useful about my specific question.
I'm working on some Java software that needs to accept plugins. I don't want to use a fancy framework like OSGi, and ServiceLoader seems to offer the right level of support. I've basically got it working but am having a problem with classpaths. My directory structure is as follows:
progfolder
|___________ plugintest.jar
|___________/plugins
|________ plugin1.jar
|________ plugin2.jar
If I run plugintest.jar with java -jar plugintest.jar then it doesn't find the plugins even if I add ./plugins/ (or variations of this) to the Class-Path: in the manifest. Reading suggests that this only works for classes, not jars, so I've tried putting the classes from the two plugins inside plugins both directly and within their full path of directories, but with no success.
I'm not allowed to add -cp plugins/* to add the plugins folder to the classpath if I'm using the -jar option. To get round this, I can run using java -cp plugintest.jar;plugins/* com.plugin.test.Main and this works as expected - the two plugins are detected and accessible via code, but the command line is a bit clunky, although I could live with it, if it's the best option.
I found another solution where I create a classloader for jars found in plugins, which works in this simple case, but reading suggests I might run into security issues in a more complex application.
Is there a way to fix things so I can simply run with java -jar plugintest.jar without having to do my own class loading or is this just the way it is?
Ok, so at least a partial answer, following more experimentation. Putting the class files in the plugins directory does work, after all, but you have to remember to include the META-INF directory and META-INF/services. The file in the services directory has to include references to all the plugins.
It would be nice if there was a solution that allowed the plugin jar files to be used directly, but creating a class loader seems to be the only way to do this (that I've found, at least), and this may cause security issues, as previously noted.
last time I faced with similar problem [1]. I found answer in java documentation [2]:
Note: The Class-Path header points to classes or JAR files on the local network, not JAR files within the JAR file or classes accessible over Internet protocols. To load classes in JAR files within a JAR file into the class path, you must write custom code to load those classes. For example, if MyJar.jar contains another JAR file called MyUtils.jar, you cannot use the Class-Path header in MyJar.jar's manifest to load classes in MyUtils.jar into the class path.
[1] https://github.com/narvi-blog/01-exec-jar#dependency-jar-files-within-an-executable-jar-are-not-so-easy
[2] https://docs.oracle.com/javase/tutorial/deployment/jar/downman.html

Categories

Resources