Hadoop library conflict at mapreduce time - java

I have a jar that uses the Hadoop API to launch various remote mapreduce jobs (ie, im not using the command-line to initiate the job). The service jar that executes the various jobs is built with maven's "jar-with-dependencies".
My jobs all run fine except one that uses commons-codec 1.7, I get:
FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeAsString([B)Ljava/lang/String;
I think this is because my jar is including commons-codec 1.7 whereas my Hadoop install's lib has commons-codec 1.4 ...
Is their any way to instruct Hadoop to use the distributed commons-codec 1.7 (I assume this is distributed as a job dependency) rather than the commons-codec 1.4 in the hadoop 1.0.3 core lib?
Many thanks!
Note: Removing commons-codec-1.4.jar from my Hadoop library folder does solve the problem, but doesn't seem too sane. Hopefully there is a better alternative.

Two approaches:
You should be able to exclude commons-codec from within the hadoop dependency and add another explicit dependency for commons-codec
Try setting the scope to provided so that none of the hadoop jars get included. This assumes that those jars would be in the runtime class path.

Related

NoSuchMethodError: org/apache/hadoop/mapreduce/util/MRJobConfUtil.setTaskLogProgressDeltaThresholds

I am getting the following error while executing a mapreduce job in my hadoop cluster (distributed cluster).
I found the error below in the application logs in Yarn where the mapper fails.
java.lang.NoSuchMethodError: org/apache/hadoop/mapreduce/util/MRJobConfUtil.setTaskLogProgressDeltaThresholds(Lorg/apache/hadoop/conf/Configuration;)V (loaded from file:/data/hadoop/yarn/usercache/hdfs-user/appcache/application_1671477750397_2609/filecache/11/job.jar/job.jar by sun.misc.Launcher$AppClassLoader#8bf41861) called from class org.apache.hadoop.mapred.TaskAttemptListenerImpl
The hadoop version is Hadoop 3.3.0
Okay, that method exists in that version, but not 3.0.0
Therefore, you need to use hadoop-client dependency of that version, not 3.0.0-cdh6.... Also, use compileOnly with it not implementation. This way, it is not confliciting with what YARN already has on its classpath.
Similarly, spark-core would have the same problem, and if you have Spark in your app anyway, then use it, not MapReduce functions.
Run gradle dependencies, then search for hadoop libraries and ensure they are 3.3.0

Is Spark's "Hadoop Free" distribution completely compatible with all Hadoop versions?

There is distribution of Spark that doesn't bundle hadoop libraries inside. It requires setting SPARK_DIST_CLASSPATH variable that points to provided hadoop libraries.
Apart from this, there is also "Building Spark" that states about incompatibility between different versions of hdfs:
Because HDFS is not protocol-compatible across versions, if you want
to read from HDFS, you’ll need to build Spark against the specific
HDFS version in your environment. You can do this through the
hadoop.version property. If unset, Spark will build against Hadoop
2.2.0 by default. Note that certain build profiles are required for particular Hadoop versions
Do I understand it right that this is only referred to Spark distributions bundling specific version of hadoop? And "Hadoop Free" can run on any hadoop version as soon as runtime-available jars have classes and methods that Spark uses in source code? So I can safely compile Spark with hadoop-client 2.6 and run in on Hadoop 2.7+?

ant-junit 1.5.3 jar is not available

I'm working on very old system, we have ant 1.5.3 running and we need to add unit tests to the environment. As far as i have researched, there is no available ant-junit 1.5.3 jar version. did it had a different name before ant-junit-1.7.0? My application says that JUnitTask is not available when compiling. (because ant and ant-junit.jar should be of same version)
for Ant 1.5.x the classes for optional tasks, such as <junit> were contained in optional.jar, in the following directory: org/apache/tools/ant/taskdefs/optional/junit to be precise.
from Ant 1.6.x onwards, optional.jar was split into multiple jars, one of them being ant-junit-1.6.*.jar.
so, i doubt ant-junit-1.5.3.jar ever existed.
read delegating-classloader-1.6, for more of this.

Difference between Apache CXF Maven distribution and CXF distribution

I need to use Apache CXF and Maven in my current project.
When I downloaded CXF from the Apache site I noticed a set of Jars in the distribution.
But when I added the cxf-rt-frontend-jaxws dependency to pom.xml and issued an mvn package command, the lib folder had some different files, for e.g.
cxf-api-2.2.7.jar
What is the difference between the 2 distributions (if there is one)?
In general, the downloaded distribution is not needed at all if using Maven. The downloaded distro does provide some useful examples, etc... that can be helpful, but to build applications, you don't really need it.
The lib dir in the download, however, uses the "cxf-bundle" jar (renamed to just cxf-VERSION.jar) instead of the individual jars. If you use maven, you likely will use the individual modules (like cxf-rt-frontend-jaxws). You can delete the lib/cxf-VERSION.jar and copy all the jars from modules/*.jar to lib and pretty much accomplish the same thing with the download. Thats just a BUNCH of jars though.
Note: you really should use CXF 2.6.2, not 2.2.7. 2.2.x is unsupported and has various security issues with it that are fixed in the newer versions.

Maven Project to build additional jar compiled by different java version

My main project is using java 1.6 and I need to provide an client jar to an system that can only run on java 1.5. The client jar is an separate module so I am able to specify the java version in the maven-compiler-plugin. However, the client jar is dependent on an core jar, which is on 1.6. One way to
I have used "test-jar" goal in maven-jar-plugin to generate an test jar for other module to use. I am hoping to do something similar and use it in my client module with the following dependency:
<dependency>
<groupId>org.mygroup</groupId>
<artifactId>module-core</artifactId>
<classifier>java1_5</classifier>
</dependency>
Why do your client projects depends on core?
If it uses the code from the core, you apparently need to compile core JAR for 1.5 as well. You have several options here:
Set the the target globally to 1.5 and make sure you are not using 1.6 JDK stuff in your code (at least, in the part of the code invoked by the client on JDK 1.5).
Use the profiles + classifiers to generate artifacts for different JDKs (see this question). You have to run the built multiple times, though. Actually, each build will compile everything using the same -target version, so this approach is only a little improvement of 1), allowing you to publish your artifact for multiple JDK versions.
If client code actually does not use core (for example, it uses only WSDLs from the core or some other non-Java stuff), you can remove this dependency by moving stuff to separate "shared" module.

Categories

Resources