Why does spark-submit fail with “Error executing Jupyter command”?

Why does spark-submit fail with “Error executing Jupyter command”? - java

When trying to run Spark locally on my Mac (which used to work) ...
/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home/bin/java \
-cp /usr/local/Cellar/apache-spark/2.4.0/libexec/conf/:/usr/local/Cellar/apache-spark/2.4.0/libexec/jars/* \
-Xmx1g org.apache.spark.deploy.SparkSubmit \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.0 \
/Users/crump/main.py
I'm now getting the following error:
Error executing Jupyter command '/Users/crump/main.py': [Errno 2] No such file or directory
The file is there. Since I know this used to work, I must have installed something recently that changed a library, sdk, etc.

Ok, I found the answer finally: PYSPARK_DRIVER_PYTHON=jupyter in my environment. I set this up to launch Jupyter/Spark notebooks with just the pyspark command, but it causes spark-submit to fail.
The solution is set the variable to use python, not jupyter: PYSPARK_DRIVER_PYTHON=python.

Related

Maven specify settings file location via MAVEN_OPTS

I need to use maven with a settings file in a specific location, normally you can give MAVEN_OPTS env variable but they are passed to JVM so the following will yield:
$ MAVEN_OPTS="-s /settings.xml"
$ mvn clean
Unrecognized option: -s
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
I searched a lot but found two keys, org.apache.maven.user-settings and org.apache.maven.global-settings which is explained here but it seemed it was working with Maven 2 only. Aliasing mvn to mvn -s /settings.xml would probably work but I do not like it.

From the mvn shell script:
# -----------------------------------------------------------------------------
# Apache Maven Startup Script
#
# Environment Variable Prerequisites
#
# JAVA_HOME Must point at your Java Development Kit installation.
# MAVEN_OPTS (Optional) Java runtime options used when Maven is executed.
# MAVEN_SKIP_RC (Optional) Flag to disable loading of mavenrc files.
# -----------------------------------------------------------------------------
so MAVEN_OPTS contains JVM arguments, not Maven arguments (which is consistent with the error message indicating the JVM doesn't like your arguments).
The actual invocation is
exec "$JAVACMD" \
$MAVEN_OPTS \
$MAVEN_DEBUG_OPTS \
-classpath "${CLASSWORLDS_JAR}" \
"-Dclassworlds.conf=${MAVEN_HOME}/bin/m2.conf" \
"-Dmaven.home=${MAVEN_HOME}" \
"-Dlibrary.jansi.path=${MAVEN_HOME}/lib/jansi-native" \
"-Dmaven.multiModuleProjectDirectory=${MAVEN_PROJECTBASEDIR}" \
${CLASSWORLDS_LAUNCHER} "$#"
so there is nowhere to put it. I would therefore suggest that you write your own ? mvn script which in turn calls the real maven command with the arguments you like (in my experience scripts are more robust than aliases). Additionally I have recently found myself that the Java versions later than 8 have ... interesting issues... so I really need to have mvn8, mvn11 (and perhaps more) commands anyway.
Another approach that I only started using recently is the Maven wrapper (https://github.com/takari/maven-wrapper) where a ./mvnw command is placed in your project which then downloads Maven when needed. This is very useful. To get started use
mvn -N io.takari:maven:wrapper
after which ./mvnw should be directly usable instead of mvn. The interesting part here is that the generated Maven command looks like
exec "$JAVACMD" \
$MAVEN_OPTS \
-classpath "$MAVEN_PROJECTBASEDIR/.mvn/wrapper/maven-wrapper.jar" \
"-Dmaven.home=${M2_HOME}" "-Dmaven.multiModuleProjectDirectory=${MAVEN_PROJECTBASEDIR}" \
${WRAPPER_LAUNCHER} $MAVEN_CONFIG "$#"
and MAVEN_CONFIG is not set earlier in the script. So for mvnw you can set MAVEN_CONFIG to your "-s /settings.xml" string.

Maven 4
The MAVEN_ARGS environment variable is supported and can be used.
Maven 3
There was a feature request MNG-5824: Support MAVEN_ARGS environment variable as a way of supplying default command line arguments. This was closed unimplemented with a suggestion to use the .mvn/maven.config in project directory

Error trying to run headless BehaviorSpace

I feel I must apologize for such a basic question, but I am getting an error simply trying to run BehaviorSpace experiments in headless mode. I tried running my own model experiments from the command line, but got an error. So I then tried following the exact instructions on the BehaviorSpace documentation. To do this, I created a BehaviorSpace experiment in the Fire.nlogo model called "experiment1" (see screen shot) and then tried to execute commands to run experiment1 from the command line. The screen shot of the terminal shows that I first set the directory where I have NetLogo 5.3 installed, and then tried to run the commands from the BehaviorSpace documentation. The screen shot of the terminal also shows the Java error I am getting. I have never used the terminal before and am not sure what I am doing wrong, but I am sure I am missing something simple.
I am using Mac OS X and NetLogo 5.3. Thank you for your time.

Seems you're not working in the correct directory.
You need to cd into the netlogo directory:
For me:
netlogo_directory = "/Applications/NetLogo 5.2"
so
cd /Applications/NetLogo\ 5.2
Then you can execute your command:
java -Xmx2048m -Dfile.encoding=UTF-8 -cp ./Netlogo.jar org.nlogo.headless.Main --model /path/to/your/file/name/filename.nlogo --experiment experimentname --table /path/to/log/with/filename.csv --spreadsheet /path/tp/spreadsheet/with/filename.csv

The problem is that the Java file that comes with NetLogo is where the .jar file and lib file are located. Hence, a simple addition of Java/ in the below code allows all files to be found.
java -Xmx1024m -Dfile.encoding=UTF-8 -cp Java/NetLogo.jar \
org.nlogo.headless.Main \
--model Fire.nlogo \
--experiment experiment1 \
--table mytable.csv

JAVA_HOME error with upgrade to Spark 1.3.0

I’m trying to upgrade a Spark project, written in Scala, from Spark 1.2.1 to 1.3.0, so I changed my build.sbt like so:
-libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1" % "provided"
+libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided"
then make an assembly jar, and submit it:
HADOOP_CONF_DIR=/etc/hadoop/conf \
spark-submit \
--driver-class-path=/etc/hbase/conf \
--conf spark.hadoop.validateOutputSpecs=false \
--conf spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.3.0-hadoop2.4.0.jar \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--deploy-mode=cluster \
--master=yarn \
--class=TestObject \
--num-executors=54 \
target/scala-2.11/myapp-assembly-1.2.jar
The job fails to submit, with the following exception in the terminal:
15/03/19 10:30:07 INFO yarn.Client:
15/03/19 10:20:03 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1420225286501_4698 failed 2 times due to AM
Container for appattempt_1420225286501_4698_000002 exited with exitCode: 127
due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Finally, I go and check the YARN app master’s web interface (since the job is there, I know it at least made it that far), and the only logs it shows are these:
Log Type: stderr
Log Length: 61
/bin/bash: {{JAVA_HOME}}/bin/java: No such file or directory
Log Type: stdout
Log Length: 0
I’m not sure how to interpret that – is {{JAVA_HOME}} a literal (including the brackets) that’s somehow making it into a script? Is this coming from the worker nodes or the driver? Anything I can do to experiment & troubleshoot?
I do have JAVA_HOME set in the hadoop config files on all the nodes of the cluster:
% grep JAVA_HOME /etc/hadoop/conf/*.sh
/etc/hadoop/conf/hadoop-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
/etc/hadoop/conf/yarn-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
Has this behavior changed in 1.3.0 since 1.2.1? Using 1.2.1 and making no other changes, the job completes fine.
[Note: I originally posted this on the Spark mailing list, I'll update both places if/when I find a solution.]

Have you tried setting JAVA_HOME in the etc/hadoop/yarn-env.sh file? It's possible that your JAVA_HOME environment variable not available to the YARN containers that are running your job.
It has happened to me before that certain env variables that were in the .bashrc on the nodes were not being read by the yarn workers spawned on the cluster.
There is a chance that the error is unrelated to the version upgrade but instead related to YARN environment configuration.

Okay, so I got some other people in the office to help work on this, and we figured out a solution. I'm not sure how much of this is specific to the file layouts of Hortonworks HDP 2.0.6 on CentOS, which is what we're running on our cluster.
We manually copy some directories from one of the cluster machines (or any machine that can successfully use the Hadoop client) to your local machine. Let's call that machine $GOOD.
Set up Hadoop config files:
cd /etc
sudo mkdir hbase hadoop
sudo scp -r $GOOD:/etc/hbase/conf hbase
sudo scp -r $GOOD:/etc/hadoop/conf hadoop
Set up Hadoop libraries & executables:
mkdir ~/my-hadoop
scp -r $GOOD:/usr/lib/hadoop\* ~/my-hadoop
cd /usr/lib
sudo ln –s ~/my-hadoop/* .
path+=(/usr/lib/hadoop*/bin) # Add to $PATH (this syntax is for zsh)
Set up the Spark libraries & executables:
cd ~/Downloads
wget http://apache.mirrors.lucidnetworks.net/spark/spark-1.4.1/spark-1.4.1-bin-without-hadoop.tgz
tar -zxvf spark-1.4.1-bin-without-hadoop.tgz
cd spark-1.4.1-bin-without-hadoop
path+=(`pwd`/bin)
hdfs dfs -copyFromLocal lib/spark-assembly-*.jar /apps/local/
Set some environment variables:
export JAVA_HOME=$(/usr/libexec/java_home -v 1.7)
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_DIST_CLASSPATH=$(hadoop --config $HADOOP_CONF_DIR classpath)
`grep 'export HADOOP_LIBEXEC_DIR' $HADOOP_CONF_DIR/yarn-env.sh`
export SPOPTS="--driver-java-options=-Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib"
export SPOPTS="$SPOPTS --conf spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.4.1-hadoop2.2.0.jar"
Now the various spark shells can be run like so:
sparkR --master yarn $SPOPTS
spark-shell --master yarn $SPOPTS
pyspark --master yarn $SPOPTS
Some remarks:
The JAVA_HOME setting is the same as I've had all along - just included it here for completion. All the focus on JAVA_HOME turned out to be a red herring.
The --driver-java-options=-Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib was necessary because I was getting errors about java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path. The jnilib file is the correct choice for OS X.
The --conf spark.yarn.jar piece is just to save time, avoiding re-copying the assembly file to the cluster every time you fire up the shell or submit a job.

Well, to start off I would recommend you to move to Java 7. However, that is not what you are looking for or need help with.
For setting JAVA_HOME, I would recommend you set it in your bashrc, rather than setting in multiple files. Moreover, I would recommend you installing java with alternatives to /usr/bin.

How to compile JavaMail Mbox Store on Linux?

I need to store locally emails downloaded via POP3 and so I'm tring to use JavaMail Mbox Store, which is part of the JavaMail source code but not compiled.
https://java.net/projects/javamail/pages/MboxStore
I've followed the instructions at the end of this page, but with no luck. Here comes what the instructions says:
export MACH=`uname -p`
export JAVA_HOME=/usr/java
cd mbox
mvn
cd native
mvn
I've changed the JAVA_HOME variable according to my environment. I get no error until the last command. The docs says that by default these are the options used by maven:
mvn -Dcompiler.name=c89 \
-Dcompiler.start.options='-Xa -xO2 -v -D_REENTRANT -I${env.JAVA_HOME}/include -I${env.JAVA_HOME}/include/solaris' \
-Dlinker.name=c89 \
-Dlinker.start.options='-G' \
-Dlinker.end.options='-L${env.JAVA_HOME}/jre/lib/${env.MACH} -lmail -ljava -lc'
I've changed the compiler name to gcc and removed some options unrecognized by gcc (-Xa and -x02). Unfortunately, it complains about a missing maillock.h.
Do you know where I can find a complete list of dependencies? Am I doing something wrong with options? I've tried to look for any pre-compiled version, but I had no luck.
I'm trying to compile on Slackware 14.1.

On Ubuntu/Debian/Mint you need the liblockfile-dev package.

To build on Debian Whezzy I had to manually set the archecture and then add the -shared option to stop the undefined reference to main (asumming the linux equivalent to -G in Solaris). Also add the additional library path for linjvm which is under the server directory
export MACH=amd64
mvn -Dcompiler.name=c89 \
-Dcompiler.start.options='-v -D_REENTRANT -I${env.JAVA_HOME}/include -I${env.JAVA_HOME}/include/linux' \
-Dlinker.name=c89 \
-Dlinker.start.options='-shared' \
-Dlinker.end.options='-L${env.JAVA_HOME}/jre/lib/${env.MACH} -L${env.JAVA_HOME}/jre/lib/${env.MACH}/server -llockfile -ljava -jverify -ljvm -lc'

Running Equinox stand-alone with --launcher.ini option

I'm trying to run Equinox and antRunner in "headless" mode with a custom eclipse.ini file. The "--launcher.ini" option should work according to:
http://wiki.eclipse.org/Equinox_Launcher .
However, when I run the following command line:
java.exe
-cp "C:\Program Files\eclipse\plugins\org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar" \
org.eclipse.core.launcher.Main \
--launcher.ini "C:\ini\my_eclipse.ini" \
-application org.eclipse.ant.core.antRunner \
-console \
-data "c:\my_workspace" \
-file "c:\my_buildfiles\build.xml" \
I get the following error message:
osgi> Unknown argument: --launcher.ini
Unknown target: C:\ini\my_eclipse.ini
Buildfile: .\build.xml
How can I load a custom eclipse.ini when starting Eclipse with Equinox from the command line?

The problem is that you are trying to launch using only the Java part of the launcher, while the wiki page describes the arguments for the native part of the launcher (eclipse.exe or any name you want).
The launcher.ini describes how to setup the Java process (memory size, vm location, arguments to vm, etc.). So, it makes sense that you pass refernece to the launcher.ini to the native launcher.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why does spark-submit fail with “Error executing Jupyter command”? - java

Ok, I found the answer finally: PYSPARK_DRIVER_PYTHON=jupyter in my environment. I set this up to launch Jupyter/Spark notebooks with just the pyspark command, but it causes spark-submit to fail. The solution is set the variable to use python, not jupyter: PYSPARK_DRIVER_PYTHON=python.

Related

Maven specify settings file location via MAVEN_OPTS

Error trying to run headless BehaviorSpace

JAVA_HOME error with upgrade to Spark 1.3.0

How to compile JavaMail Mbox Store on Linux?

Running Equinox stand-alone with --launcher.ini option

Categories

Resources