How to run MapReduce Program from local IDE on remote cluster

How to run MapReduce Program from local IDE on remote cluster - java

I have a simple MapReduce program which I want to run it on a remote cluster. I can do this from command line by simply running
hadoop jar myjar.jar input output
but when I want to run a function in my junit TestCase class from my IDE which invokes the MR job, I get the following warnings:
WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
although I have this line set, before submitting the MR job:
job.setJarByClass(MyJob.class);
and hence the job fails as it cannot find the appropriate classes (like MyMapKey which is the mapper key class) to operate.
Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :java.lang.RuntimeException: java.lang.ClassNotFoundException: Class MyMapKey not found
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:414)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
any thoughts on this?

First you should add remote Hadoop cluster config files (i.e. core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, ssl-client.xml) as resources to your Configuration object. Then follow the steps in the above link to see how you should add manually the job jar to classpath on remote cluster.

Related

Apache Ignite - error 'JavaLoggerFileHandler' when running Spark shell

I am starting spark-shell (of spark 2.2) and added bunch of jars in spark-shell command (from Ignite 2.1 directory).
Still getting error:
Can't load log handler "org.apache.ignite.logger.java.JavaLoggerFileHandler"
Also followed recommendation from here:
https://apacheignite.readme.io/v1.2/docs/installation--deployment
# Optionally set IGNITE_HOME here.
# IGNITE_HOME=/path/to/ignite
IGNITE_LIBS="${IGNITE_HOME}/libs/*"
for file in ${IGNITE_HOME}/libs/*
do
if [ -d ${file} ] && [ "${file}" != "${IGNITE_HOME}"/libs/optional ]; then
IGNITE_LIBS=${IGNITE_LIBS}:${file}/*
fi
done
export SPARK_CLASSPATH=$IGNITE_LIBS
Also set logging to only ERROR as well but still getting error:
Can't load log handler "org.apache.ignite.logger.java.JavaLoggerFileHandler"
java.lang.ClassNotFoundException: org.apache.ignite.logger.java.JavaLoggerFileHandler
java.lang.ClassNotFoundException: org.apache.ignite.logger.java.JavaLoggerFileHandler
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.util.logging.LogManager$5.run(LogManager.java:965)
at java.security.AccessController.doPrivileged(Native Method)

Looks like you use documentation for an old Ignite version 1.2, while you use Ignite 2.1. Check the documentation for the latest version here: https://apacheignite-fs.readme.io/v2.2/docs/installation-deployment
Also, please make sure that have configured IGNITE_HOME in your environment. JavaLoggerFileHandler placed in the ignite-core module, looks like spark classpath doesn't see any Ignite lib at all.

The documentation describe the issue here:
https://apacheignite-fs.readme.io/v2.2/docs/troubleshooting
This issue appears when you do not have any loggers included in classpath and Ignite tries to use standard Java logging. By default Spark loads all user jar files using separate class loader. Java logging framework, on the other hand, uses application class loader to initialize log handlers. To resolve this, you can either add ignite-log4j module to the list of the used jars so that Ignite would use Log4j as a logging subsystem, or alter default Spark classpath as described

ClassNotFoundException is thrown when running ExportSnapshot

I'm getting a confusing ClassNotFoundException when I try to run ExportSnapshot from my HBase master node. hbase shell and other commands work just fine, and my cluster is fully operational.
This feels like a Classpath issue, but I don't know what I'm missing.
$ /usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot ambarismoketest-snapshot -copy-to hdfs://10.0.1.90/apps/hbase/data -mappers 16
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2015-10-13 20:05:02,339 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-10-13 20:05:04,351 INFO [main] util.FSVisitor: No logs under directory:hdfs://cinco-de-nameservice/apps/hbase/data/.hbase-snapshot/impression_event_production_hbase-transfer-to-staging-20151013/WALs
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/Job
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.runCopyJob(ExportSnapshot.java:529)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:646)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:697)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:701)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 5 more

Problem
It turns out this is because the mapreduce2 JARs are not available in the classpath. The classpath was properly set up, but I did not have the mapreduce2 client installed on that node. HBase's ExportSnapshot apparently depends on those client JARs when exporting snapshots to another cluster because it writes to HDFS.
Fix
If you use Ambari:
Load Ambari UI
Pull up node where you were running the ExportSnapshot from and getting the above error
Under "components", click "Add"
Click "Mapreduce 2 client"
Background
There's a ticket here https://issues.apache.org/jira/browse/HBASE-9687 where the title is ClassNotFoundException is thrown when ExportSnapshot runs against hadoop cluster where HBase is not installed on the same node as resourcemanager. The title implies that installing resource manager is the fix and this may work; however, the crux is you need the hadoop mapreduce2 jars in the classpath and you can do that by simply installing the mapreduce2 client.
For us, specifically, the reason our snapshot exports were working one day and broken the next is that our HBase master switched on us b/c of another issue we had. Our backup HBase master did not have the mapreduce2 client JARs, but the original primary master did.

Errors running flume agent with interceptor

I am trying to run custom flume agent from terminal using linux. I am working on cloudera VM. Command running flume looks like:
flume-ng agent --conf . -f spoolDirLocal2hdfs_memoryChannel.conflume.root.logger=DEBUG,console -n Agent5
Sources with interceptor looks like:
Agent5.sources.spooldir-source.interceptors = i1
Agent5.sources.spooldir-source.interceptors.i1.type = org.flumefiles.flume.HtmlInterceptor$Buider
I've placed my jar file both into /usr/lib/hadoop/lib/ and /usr/lib/flume-ng/lib/. Also I've created plugins.d at /usr/lib/flume-ng/plugins.d/ and placed jar there. But when running flume agent I've got an error:
15/02/18 06:10:46 ERROR channel.ChannelProcessor: Builder class not found. Exception follows.
java.lang.ClassNotFoundException: org.intropro.flume.HtmlInterceptor$Buider
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
.....
Where should I place my jar file to make it find builder?

Place it into FLUME_HOME/lib and then restart flume.
If that doesn't work, make sure your interceptor actually implements the Builder interface. That might be another reason.

Configuring memory for mappers and reducer during mapreduce job submission

I am trying to configure memory for mapper/reducer memory during a map reduce job submission as below:
hadoop jar Word-0.0.1-SNAPSHOT.jar -Dmapreduce.map.memory.mb=5120 com.test.Word.App /tmp/ilango/input /tmp/ilango/output/
Is there any wrong in the command above ? I am getting the following exception. It looks like do we need to put JAR file or need to configure what to use -D option in Hadoop. Thanks in advance.
Exception in thread "main" java.lang.ClassNotFoundException: -Dmapreduce.map.memory.mb=5120
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

Command to run a MR job is
hadoop jar jarname classname input output
As per your command
hadoop jar jarname -D mapreduce.map.memory.mb=5120 classname input output
hadoop checks Driver class with name "-Dmapreduce.map.memory.mb=5120".
Thats why it is showing java.lang.ClassNotFound Exception.
-D option should be supplied after your Driver class.
Try using below command.
hadoop jar Word-0.0.1-SNAPSHOT.jar com.test.Word.App -D mapreduce.map.memory.mb=5120 /tmp/ilango/input /tmp/ilango/output/
Hope this solve your issue.

It looks like you are missing a space after -D
try -D mapreduce.map.memory.mb=5120
There is a difference between -Dproperty=value and -D property=value. The first one sets JVM system property where as the second one sets the Hadoop configuration property.
Quoting from the book Hadoop the Definitive guide, :
-D property=value Sets the given Hadoop configuration property to the given value. Overrides any default or site properties in the
configuration, and any properties set via the -conf option.

If you're using MVN and added the Main class to the manifest, in this case com.test.Word.App, your command -D mapreduce.map.memory.mb=5120 will be taken as input.
So, just remove the com.test.Word.App line

Amazon EMR : java.io.IOException: File already exists: s3n://<bucketname>/output/part-r-00002

I am running a MapReduce Job. My code consists of only one class that does a simple calculation. It runs successfully on the single node setup of hadoop1.0.3
When I run it on EMR, I get the following error
java.io.IOException: File already exists: s3n://<bucketname>/output/part-r-00002
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:647)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:557)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:538)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:445)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

You need to configure your job to write your results to a different output directory each time it is run.
It is complaining now because a file already exists in this location, most likely because you have run this job more than once.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to run MapReduce Program from local IDE on remote cluster - java

Related

Apache Ignite - error 'JavaLoggerFileHandler' when running Spark shell

ClassNotFoundException is thrown when running ExportSnapshot

Errors running flume agent with interceptor

Configuring memory for mappers and reducer during mapreduce job submission

Amazon EMR : java.io.IOException: File already exists: s3n://<bucketname>/output/part-r-00002

Categories

Resources