test JNI on Hadoop using MapReduce

test JNI on Hadoop using MapReduce - java

[I try to run a JNI program on Hadoop using MapReduce.Here is the command:
bin/hadoop jar /Users/ming/Desktop/mctest/mctest.jar -files /Users/ming/Desktop/mctest/libGenerateRandom.jnilib mc hdfs://localhost:9000/Users/ming/seeds_shuffle.txt hdfs://localhost:9000/Users/ming/output
The jnilib(It's a file on Mac OS X just like .so file on Linux) should be sent to tasknode with the jar file.But I got an error below:]1
Anyone can help?
Thanks.

Instead use:
bin/hadoop jar /Users/ming/Desktop/mctest/mctest.jar \
<main-class> \
-files /Users/ming/Desktop/mctest/libGenerateRandom.jnilib \
mc \
hdfs://localhost:9000/Users/ming/seeds_shuffle.txt \
hdfs://localhost:9000/Users/ming/output
Where <main-class> should be of the form com.you.MainRunner.
That's because it's expecting the package to appear before any additional arguments such as -file.

Related

Why does spark-submit fail with “Error executing Jupyter command”?

When trying to run Spark locally on my Mac (which used to work) ...
/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home/bin/java \
-cp /usr/local/Cellar/apache-spark/2.4.0/libexec/conf/:/usr/local/Cellar/apache-spark/2.4.0/libexec/jars/* \
-Xmx1g org.apache.spark.deploy.SparkSubmit \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.0 \
/Users/crump/main.py
I'm now getting the following error:
Error executing Jupyter command '/Users/crump/main.py': [Errno 2] No such file or directory
The file is there. Since I know this used to work, I must have installed something recently that changed a library, sdk, etc.

Ok, I found the answer finally: PYSPARK_DRIVER_PYTHON=jupyter in my environment. I set this up to launch Jupyter/Spark notebooks with just the pyspark command, but it causes spark-submit to fail.
The solution is set the variable to use python, not jupyter: PYSPARK_DRIVER_PYTHON=python.

Can't execute the basic Hadoop Mapreduce Wordcount example

I am trying to run the WordCount example. But I am facing issues with compiling the program.
I get the error:
error: package org.apache.hadoop.mapred does not exist
after executing:
javac -classpath /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar -d wordcount_classes WordCount.java
I set up hadoop using this tutorial. I also looked this up on stackoverflow : question and executed the bin/hadoop classpath command in /usr/local/hadoop. This is the output I obtained:
/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/* :/usr/local/hadoop/share/hadoop/common/* :/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/* :/usr/local/hadoop/share/hadoop/hdfs/* :/usr/local/hadoop/share/hadoop/yarn/lib/* :/usr/local/hadoop/share/hadoop/yarn/* :/usr/local/hadoop/share/hadoop/mapreduce/lib/* :/usr/local/hadoop/share/hadoop/mapreduce/* :/contrib/capacity-scheduler/*.jar
But I don't know what to make of it or what my next step should be! Please help!

You're trying to compile the source code using one of the many hadoop dependency jars (hadoop-common-x.x.x.jar). The jar that contains the mapred package noted in the error message is the hadoop-mapreduce-client-core jar.
I suggest you use a build tool such as Maven or Gradle to build your source code as it will manage transitive dependencies for you.
Alternatively to proceed with your manual invocation of javac, try something like this (untested):
javac -cp '/usr/local/hadoop/share/hadoop/common/*' \
-cp '/usr/local/hadoop/share/hadoop/hdfs/lib/*' \
-cp '/usr/local/hadoop/share/hadoop/hdfs/*' \
-cp '/usr/local/hadoop/share/hadoop/yarn/lib/*' \
-cp '/usr/local/hadoop/share/hadoop/yarn/*' \
-cp '/usr/local/hadoop/share/hadoop/mapreduce/lib/*' \
-cp '/usr/local/hadoop/share/hadoop/mapreduce/*' \
-d wordcount_classes WordCount.java

How to distribute Java application 7.0 and above on Mac 10 and above?

I need to run my Java application on Mac. I could find this tutorial to use Xcode to bundle the application. The tutorial asks readers to have access to Jar Bundle application butI could not find it in the /Developer/Applications/Java Tools/ folder.
After that I came across this answer which seems is offering a good method to do it.
However, I am wondering if there is any better way to get the job done rather than the one mentioned there.

The Mac OS X utilities Jar Bundler, Icon Composer, and PacakgeMaker are all deprecated. Even the various AppBundler projects out there seem destine to fade away.
The way forward looks to be javapackager, which is included in the JDK.
The -deploy -native pkg options will convert a Java application (Executable JAR) into a native macOS installer.
Example commands:
$ jar cmf MainClass.txt ShowTime.jar *.class
$ javapackager -deploy -native pkg -srcfiles ShowTime.jar \
-appclass ShowTime -name ShowTime \
-outdir deploy -outfile ShowTime -v
Output: deploy/bundles/ShowTime-1.0.pkg
I posted a detailed tutorial at: centerkey.com/mac/java
For better or worse, javapackager bundles the JRE and the resulting .pkg file is over 60MB.

As Dem says, I user javapackager like this:
Open terminal and go to working folder like cd Desktop/
and type;
javapackager \
> -deploy \
> -title AppTitle \
> -name AppName \
> -appclass your.app.class \
> -native dmg \
> -outdir ~/YourOutputDir \
> -outfile out \
> -srcfiles your.jar

How to compile JavaMail Mbox Store on Linux?

I need to store locally emails downloaded via POP3 and so I'm tring to use JavaMail Mbox Store, which is part of the JavaMail source code but not compiled.
https://java.net/projects/javamail/pages/MboxStore
I've followed the instructions at the end of this page, but with no luck. Here comes what the instructions says:
export MACH=`uname -p`
export JAVA_HOME=/usr/java
cd mbox
mvn
cd native
mvn
I've changed the JAVA_HOME variable according to my environment. I get no error until the last command. The docs says that by default these are the options used by maven:
mvn -Dcompiler.name=c89 \
-Dcompiler.start.options='-Xa -xO2 -v -D_REENTRANT -I${env.JAVA_HOME}/include -I${env.JAVA_HOME}/include/solaris' \
-Dlinker.name=c89 \
-Dlinker.start.options='-G' \
-Dlinker.end.options='-L${env.JAVA_HOME}/jre/lib/${env.MACH} -lmail -ljava -lc'
I've changed the compiler name to gcc and removed some options unrecognized by gcc (-Xa and -x02). Unfortunately, it complains about a missing maillock.h.
Do you know where I can find a complete list of dependencies? Am I doing something wrong with options? I've tried to look for any pre-compiled version, but I had no luck.
I'm trying to compile on Slackware 14.1.

On Ubuntu/Debian/Mint you need the liblockfile-dev package.

To build on Debian Whezzy I had to manually set the archecture and then add the -shared option to stop the undefined reference to main (asumming the linux equivalent to -G in Solaris). Also add the additional library path for linjvm which is under the server directory
export MACH=amd64
mvn -Dcompiler.name=c89 \
-Dcompiler.start.options='-v -D_REENTRANT -I${env.JAVA_HOME}/include -I${env.JAVA_HOME}/include/linux' \
-Dlinker.name=c89 \
-Dlinker.start.options='-shared' \
-Dlinker.end.options='-L${env.JAVA_HOME}/jre/lib/${env.MACH} -L${env.JAVA_HOME}/jre/lib/${env.MACH}/server -llockfile -ljava -jverify -ljvm -lc'

How to create and configure Hadoop client script?

There is a running Hadoop cluster.
And I have downloaded Hadoop distribution (in this case 0.20.205.0)
I need to create some shell script (bash/zsh/perl) that will be capable of calling Hadoop on that cluster. Ideally it should be able to be called from Sqoop script this way:
exec ${HADOOP_HOME}/bin/hadoop com.cloudera.sqoop.Sqoop "$#"
How can I call Hadoop and provide namenode/jobtracker URIs?
How do I provide extra libs with Sqoop and DB drivers?

Should be simple enough using the hadoop generic options - Im assuming you've configured the contents of ${HADOOP_HOME}/conf for your cluster (namely core-site.xml and mapred-site.xml)
exec ${HADOOP_HOME}/bin/hadoop com.cloudera.sqoop.Sqoop \
-libjars myjar1.jar,myjar2,jar "$#"
Here you pass the jars to be placed on the classpath via the -libjars option.
If you have multiple clusters you want to target, then you'll just need to either create different conf folders for each cluster and set the HADOOP_CONF_DIR environment variable prior to calling the hadoop script, or you can use the -Dkey=value generic arguments to set the fs.default.name and mapred.job.tracker appropriately:
exec ${HADOOP_HOME}/bin/hadoop com.cloudera.sqoop.Sqoop \
-libjars myjar1.jar,myjar2,jar \
-Dfs.default.name=hdfs://namenode-servername:9000 \
-Dmapred.job.jobtracker=jobtracker-servername:9001 \
"$#"

My problem actually was to run Sqoop.
So I solved it by simply supplying -fs and -jt parameters as first arguments to Sqoop command (e.g. sqoop-import)
sqoop-import \
-fs $HADOOP_FILESYSTEM -jt $HADOOP_JOB_TRACKER \
--connect $DB_CONNECTION_STRING --username $DB_USER -P \
--outdir /home/user/sqoop/generated_code \
"$#" # <- other parameters

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

test JNI on Hadoop using MapReduce - java

Related

Why does spark-submit fail with “Error executing Jupyter command”?

Can't execute the basic Hadoop Mapreduce Wordcount example

How to distribute Java application 7.0 and above on Mac 10 and above?

How to compile JavaMail Mbox Store on Linux?

How to create and configure Hadoop client script?

Categories

Resources