how to use external jars in Cloudera hadoop? - java

i have a cloudera hadoop version 4 installed on my cluster.
It comes packaged with google protobuffer jar version 2.4.
in my application code i use protobuffer classes compiled with protobuffer version 2.5.
This causes unresolved compilation problems at run time.
Is there a way to run the map reduce jobs with an external jar or am i stuck until cloudera upgrades their service?
Thanks.

Yes you can run MR jobs with external jars.
Be sure to add any dependencies to both the HADOOP_CLASSPATH and -libjars upon submitting a job like in the following examples:
You can use the following to add all the jar dependencies from current and lib directories:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`echo *.jar`:`echo lib/*.jar | sed 's/ /:/g'`
Bear in mind that when starting a job through hadoop jar you'll need to also pass it the jars of any dependencies through use of -libjars. I like to use:
hadoop jar <jar> <class> -libjars `echo ./lib/*.jar | sed 's/ /,/g'` [args...]
NOTE: The sed commands require a different delimiter character; the HADOOP_CLASSPATH is : separated and the -libjars need to be , separated.
EDIT: If you need your classpath to be interpreted first to ensure your jar (and not the pre-packaged jar) is the one that gets used, you can set the following:
export HADOOP_USER_CLASSPATH_FIRST=true

Related

Hadoop jar command - override hadoop jar with custom jar

I am trying to submit my jar with hadoop jar command and this is what I realize. The version of hadoop I have is using guava-11.0.jar and I require guava-20.0.jar or newer to run my code.
When I submit jar using hadoop jar command it's taking guava-11.0.jar and complains java.lang.NoSuchMethodError as certain methods doesn't exists in 11 version.
How to tell hadoop to use the guava-20.0.jar from my fat jar I am submitting.
I tried setting following environment variables as well. but same error.
export HADOOP_USER_CLASSPATH_FIRST=true

executing runnable JAR on PC's without JDK

I am having an executable JAR. Ofcouser I have JDK installed at my end I am giving following command to run my exe JAR from command prompt.
1 Using JRE :-
C:\Users\userName\Desktop\Utility\latest>"C:\Program Files\Java\jre1.8.0_161\bin\java.exe" -jar Utility.jar
2 Using JDK
C:\Users\userName\Desktop\Utility\latest>"C:\Program Files\Java\jdk1.8.0_161\bin\javaw.exe" -jar Utility.jar
Both are working on my desktop but if I tries #1 to run the executable JAR on different machine which has only JRE Version (1.8 onwards) it is not getting opened up.
I tried following links but some links are sayin to download few installers but all I do not want to get that. Is there any way. Or issue with my executable JAR ?
How can I make my executable JAR not need JDK to run
Run a JAR file using a specific JRE
Manifest-Version: 1.0
Rsrc-Class-Path: ./ commons-collections4-4.3.jar poi-3.17.jar poi-ooxm
l-3.17.jar xmlbeans-3.0.1.jar curvesapi-1.06.jar poi-ooxml-schemas-3.
17.jar poi-examples-3.17.jar poi-excelant-3.17.jar poi-scratchpad-3.1
7.jar commons-codec-1.10.jar commons-collections4-4.1.jar commons-log
ging-1.2.jar curvesapi-1.04.jar junit-4.12.jar log4j-1.2.17.jar xmlbe
ans-2.6.0.jar ooxml-schemas-1.3.jar
Class-Path: ./ commons-collections4-4.3.jar poi-3.17.jar poi-ooxml-3.17.jar
xmlbeans-3.0.1.jar curvesapi-1.06.jar poi-ooxml-schemas-3.17.jar poi-examples-3.17.
jar poi-excelant-3.17.jar poi-scratchpad-3.17.jar commons-codec-1.10.jar
commons-collections4-4.1.jar commons-logging-1.2.jar curvesapi-1.04.jar
junit-4.12.jar log4j-1.2.17.jar xmlbeans-2.6.0.jar ooxml-schemas-1.3.jar
Rsrc-Main-Class: DataProcessor.DataProcessor.App
Main-Class: org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoade
It sounds like the issue your are having is more than not having the JDK or knowing where the JRE on the target system is located, it's that you also didn't include the dependencies that your code has.
The jar file you have includes details in the manifest file that tells the JVM the classpath and the main class to load. If you look at the Rsrc-Class-Path, it is add the jars from the current directory. This is fine if you are sending the entire folder structure that includes all the jars in the expected location. But it doesn't work with just the jar.
In order to make a single jar that would run without any additional jars, you need to repackage the jars. There are two common ways to do this
UberJar - where the class of your project is combined with the classes extracted form all of your dependencies into a single jar
JarInJar - where your jar and all the dependecy jars are put into a jar and a custom classloader is used to load the classes from the jars inside the jar.
I'm not sure what build tool you're using, but for Maven the Shade Plugin will create an UberJar.
I personally recommend using the JarInJar option for this reason. The Spring Boot Maven Plugin is what I believe to the easiest

What are auto executable jars?

I was going through spring-boot-maven-plugin documentation and came across a term auto executable jar.
Could someone please explain me what is an auto executable jar and how is it different then normal jar files and how they are auto executed?
spring-boot-maven-plugin documentation mentions the term but does not go further to explain it
repackage: create a jar or war file that is auto-executable. It can replace the regular artifact or can be attached to the build lifecycle with a separate classifier.
Could someone please explain me what is an auto executable jar
A fully executable jar can be executed like any other executable
binary or it can be registered with init.d or systemd. This makes it
very easy to install and manage Spring Boot applications in common
production environments.
So In conclusion is like any other executable when you use a executable jar
how is it different then normal jar files and how they are auto executed?
Well a java file you need to run with java -jar
From Spring Docs
The Maven build of a Springboot application first build your own application and pack it into a JAR file.
In the second stage (repackage) it will wrap that jar with all the jar files from the dependency tree into a new wrapper jar archive. It will also generate a Manifest file where is defined what's the application Main class is (also in the wrapper jar).
After mvn package you can also see 2 jar files in your target directory. The original file and the wrapped jar file.
You can start a Springboot application with a simple command like:
java -jar my-springboot-app.jar
I may suggest that auto executable means that you supplied main method so that it can be launched with java -jar options, otherwise it may be just a java library.
Here is a quote from https://docs.spring.io/spring-boot/docs/current/maven-plugin/repackage-mojo.html
Repackages existing JAR and WAR archives so that they can be executed from the command line using java -jar. With layout=NONE can also be used simply to package a JAR with nested dependencies (and no main class, so not executable).
Executable jar - the one that has main class declared in manifest and can be run with java -jar yourJarFile.jar command
Other jars - jars jars without delcared main calss. Can be anything - application, library, etc. Still can run application by providing fully.qualified.class.name as entry point like java -cp yourJarFile.jar my.bootstrap.BootstrapClass
Autoexecutable jars - never heard about it :)

How to add external library to Hadoop map-reduce task

I have MyClass.java to define the map-reduce task. MyClass.java contains the definition of mapper, reducer and main. It works properly, but if I try to use/add an external jar, I have the message ClassNotFoundException.
To compile I use the command:
javac -classpath hadoop_library_path:my_library_path -sourcepath code_path/ -d class_path/ path/MyClass.java
I create the jar, and then I run the task:
hadoop jar maclass.jar MyClass input output -target target
The external jar need to be added also in in "jar hadoop" command?
I tried with the -libjars option with no result. Any idea?
As I commented, I see two options (there could be more):
Use Eclipse and generate a runnable jar (I am not sure about NetBeans or IntelliJ).
Use maven and its shade plugin to generate an uber jar. You should add all the external libraries that you use as dependencies.
I recommend the latter option.

maven dependency is not available in classpath while running java -jar command [duplicate]

I'm trying to get a maven managed project to run on the command line.
I have a set of dependencies in the pom.xml which are subsequently downloaded and installed in the ~/.m2/repository/. I've included the necessary config in my pom to add the classpath to the jar manifest.
Now the problem is i'm attempting to run the jar thus: java -jar project-SNAPSHOT.jar.
Java can't find the downloaded dependencies (i'm assuming because they are listed without paths in the manifest?) , but i'm not sure how best to get this running.
Options 1:
The jar created does not have the dependent jar files. So, you need to tell java the class-path where all the dependent jars are
java -cp /lcoation/of/dependency1.jar:/location/of/dependency2.jar:/location/of/dependency3.jar -jar project-SNAPSHOT.jar
Option 2:
The easier and much better solution is to use AppAssembler plugin. What it does it packages your jar in a directory structure that contains
dependent jars
the created jar
shell/windows scripts to execute it
have a look here http://www.mojohaus.org/appassembler/appassembler-maven-plugin/
Option 3:
If you do not want all the baggage and just wanted to have one jar-with-dependency
You may want to refer here How can I create an executable JAR with dependencies using Maven?
This will contain all the dependent jars within it.
Edit 1: For Option 1, Brad M mentioned that you can get a list of all your project's deps using the dependency plugin. dependency:build-classpath
mvn exec:java -Dexec.mainClass="com.vineetmanohar.module.Main" -Dexec.classpathScope=runtime
You can find more examples here: 3 ways to run Java main from Maven.

Categories

Resources