How does Apache Spark's CoarseGrainedScheduler get started?

How does Apache Spark's CoarseGrainedScheduler get started? - java

I'm trying to instrument a Spark (v 1.6.1) application with an APM (Application Performance Management System). To do so with the APM of choice, I must instrument the JVM startup string(s) with a -javaagent flag, pointing to my APM along with any pertinent APM options (e.g., unique-host-id).
On the Spark Master server, I have done this successfully, and on the various Spark Worker servers, I have successfully instrumented the spark worker process as well.
However, when I look at the Java processes that are running, I see one additional process that does not contain my startup string: the CoarseGrainedScheduler. The CoarseGrainedScheduler is the actual Spark executor which runs the worker application code submitted to the Worker by the Master.
I cannot determine from where the CoarseGrainedScheduler is invoked.
So, for more context, here's how I've instrumented the Spark Worker startup string: in {SPARK_HOME}/conf/spark_env.sh, I added the following environment variable:
SPARK_DAEMON_JAVA_OPTS="<java-agent-startup-string>"
This gets carried through to the eventual invocation of {SPARK_HOME}/bin/spark-class, which is the root of all Spark invocations; that is, all spark commands that emanate from {SPARK_HOME}/bin or {SPARK_HOME}/sbin eventually delegate to spark_class.
This is seemingly not, however, where CoarseGrainedScheduler is invoked from. Looking at this document, though, that is where it gets invoked from:
$ ./bin/spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend <opts>
Why, then, is my startup string not being picked up? Working from the assumption/instruction that CoarseGrainedExecutorBackend is invoked via spark-class, I actually edited that file to add my startup string when running any java command, and that also fails to add my startup string to the CoarseGrainedExecutorBackend, although it does add it to the Spark Worker process itself. So again, it seems as though CoarseGrainedExecutorBackend is not started via spark-class, even though the linked document says it is.
Can anyone help me find the root of the CoarseGrainedExecutorBackend process and how it's invoked? If I can provide any additional details, just let me know.

Related

Spark (Kafka) Streaming Memory Issue

I am testing my first Spark Streaming pipline which processes messages from Kafka. However, after several testing runs, I got the following error message
There is insufficient memory for the Java Runtime Environment to continue.
My testing data is really small thus this should not happen. After looking into the process, I realized maybe previously submitted spark jobs were not removed completely?
I usually submit jobs like below, and I am using Spark 2.2.1
/usr/local/spark/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 ~/script/to/spark_streaming.py
And stop it using `Ctrl+C'
Last few lines of the script looks like:
ssc.start()
ssc.awaitTermination()
Update
After I changing the way to submit a spark streaming job (command like below), I still ran into same issue which is after killing the job, memory will not be released.I only started Hadoop and Spark for those 4 EC2 nodes.
/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 --py-files ~/config.py --master spark://<master_IP>:7077 --deploy-mode client ~/spark_kafka.py

When you press Ctrl-C, only the submitter process is interrupted, the job itself continues to run. Eventually your system runs out of memory so no new JVM can be started.
Furthermore, even if you restart the cluster, all previously running jobs will be restarted again.
Read how to stop a running Spark application properly.

It might be the problem of bunch of driver (spark-app-driver process) processes running on the host you use to submit spark job. Try doing something like
ps aux --forest
or similar depending on your platform to understand what are the processes running at the moment. Or you can have a look at answers over the stackoverflow Spark Streaming with Actor Never Terminates , it might give you a glue on what is happening.

Why does net.exe start <servicename> report a failure when the service starts?

I have a Java application that uses the Apache Daemon service installer to register it as a Windows service. I am using Puppet to run an exec{} block to register the service, which works, and then chains a service{} block to start the service. Puppet uses "net.exe start" to run the service, but that command reports an error, even though the service starts correctly.
The output from running the command in a powershell shell is:
PS C:\ProgramData\PuppetLabs\puppet\etc\modules> net start myservice
The myservice_descriptive_name service is starting.....
The myservice_descriptive_name service could not be started.
More help is available by typing NET HELPMSG 3523.
As I refresh the Windows service panel while this command is running, I see the state change from:
blank field -> starting -> started
Is this a problem caused by the apache wrapper, which is starting a jvm in a separate shell or some other side effect? And, more importantly, can I get around this problem in Puppet while still using the service{} block? Is it possible to substitute sc.exe, which does not suffer the same problem, short of using an exec{} block?

To take the questions in order:
The net start command reports failure because the service appears to have hung.
Yes, the problem is caused by the Apache wrapper.
Specifically, the wrapper is telling Windows that it will reach the first checkpoint within two seconds. Since there does not appear to be any way for the Java code to implement a checkpoint, or to change the wait hint, this means that the service must start within two seconds to be compliant with the Windows service specification.
(In principle, Windows is entitled to terminate your service at this point. So far as I know, no current versions of Windows do so, though they may log error messages.)
Short of modifying Puppet or (preferably) the Apache wrapper, the only obvious workaround is to ensure that your service "starts" immediately, rather than waiting for initialization to complete.
This is less than ideal, since it means that the service can't provide feedback to Puppet if it really does fail to initialize, but no worse than your suggestion of using sc start instead of net start.

JPBlanc's answer explains why the net.exe times out waiting on the service to start, even though it does end up starting. You can definitely try swapping out net.exe calls for sc.exe (Service Control) instead.
I've created a ticket to address this - https://tickets.puppetlabs.com/browse/PUP-5475
If you find that it doesn't also timeout while waiting, please comment and/or file a pull request containing the change. At any rate, using something better than net.exe would be preferred.

The explanation is that the service takes too much time to start and does not communicate correctly with the starter.
When you write a service that initiate communications or DB connections you have to communicate with the Service Control Manager (SCM) to give the information that you are starting. Doing this kind of "I'am still starting message" the SCM can wait as mus time as you need to start. But much service writer or or tools to encapsulate exe files as services ignore that, so the SCM return "service could not be started". In Win32 this is handled by SetServiceStatus function, you will have much details there.

Start/stop java application from an external script

I have a Stand-alone Java application. At the moment I am running this Java application using a start-script i.e. startApplicatoin.bat in windows and startApplicatoin.sh in Linux which sets up the class-paths and then it executes: java -classpath .
Now I have to add a stopApplication.bat and stopApplication.sh script. This stop script has to shutdown/close this java application gracefully.
To achieve this I am planning to take the following steps:
1. When my java application runs it will store the process-id of the launched application in a file i.e. in a known file myapplication.pid.
Looks like ManagementFactory.getRuntimeMXBean().getName() call will work on both Linux and Windows to get the process ID. So I shall collect process ID in this way and will store it in the specified file myapplication.pid.
2. Then when running stop application script, this script will issue a “kill” request to the process-id as specified by that myapplication.pid file.
For Windows I shall run the "taskkill" command to stop this application. And for Linux environment "kill" command will serve that purpose.
And in my java code I shall add a addShutdownHook which will enable the graceful shutdown operations that I want to run i.e. there I shall handle whatever stuffs I want to persist before this program is going to stop.
http://docs.oracle.com/javase/7/docs/api/java/lang/Runtime.html#addShutdownHook%28java.lang.Thread%29
Now I would like to do a sanity check to ensure the way I am thinking is the proper way to do. Or there is a better way to do this. Any suggestion is appreciated. And thanks in advance.

If you're wanting a "graceful" shutdown, it may be more practical (and easier cross-platform) to open a socket in your long-running process and have your "stop" script connect to it and issue a shutdown command; this might even be practical through JMX, depending on how your application overall is structured. Approaches that are "inline" rather than requiring interaction with the OS are generally easier to reason about and test.

This looks like a Daemon.
The easiest way to run a daemon with start/stop functionality without resorting to a lot of scripting is with jsvc. This allows your code to implement an interface with four methods:
void init(String[] arguments): Here open configuration files, create a trace file, create ServerSockets, Threads
void start(): Start the Thread, accept incoming connections
void stop(): Inform the Thread to terminate the run(), close the ServerSockets
void destroy(): Destroy any object created in init()
You then have platform specific binaries that deal with keeping track of the process and stopping it when requested to do so.
The most useful thing is that jsvc can start a process as a superuser (root on unix) and then drop to a peon user the for auction running of the process.
This is how Tomcat (for example) works, it starts as root and performs privileged actions such as binding to port 80. It then drops down to a peon use called tomcat for security reasons.

Invoke Threads using JMX

How can I use JMX to invoke a thread using jConsole or jManage ?
I want to initially create 5 threads. Let them run. Then when one of them gets stuck, I want to create a new thread to continue operations.
I do not want to kill process until complete data is not processed / until really required.

You question seems a little bit vague; in general thread always runs some logic, so you should do some development here.
Basically JMX provides a way to install component (called MBean) and run it along with JVM process.
Java allows to start a JMX server along with the JVM process, in order to do that you should supply some properties to the process.
Then you can use this server for installing your own MBean that can do whatever you want, and of course run the thread.
Once you have a deployed mbean component and your jvm proces is up and running you can use jConsole and you should see your mbean among others.
Then just call the method.
There is a really good tutorial here
Hope this helps

How can I save a process resource from proc_open in order to check the status later on?

I'm running a sh script that runs a java process through php on ubuntu server. I'm using proc_open for running the process. usually the Workflow goes like :
request a page ->
script runs (until it's finished) ->
result page.
In my case the script runs in parallel so the server won't wait until the script is finished (it takes hours sometimes so it can't) , so I need to save that resource somehow to follow it later (status of the process or just stopping it).
The resource type is "process", I used this function get_resource_type for getting it.
Serialize won't work at this case - resource is an exceptional for it (you can look at http://il2.php.net/manual/en/function.serialize.php inside the Parameters box).
My target is a good process handling. does someone know how can i use the resource or other way you would do for process handling.

You can't store resource types for later use in PHP. What you need to do is implement some form of asynchronous communication - maybe a file, where one writes status information and the other one reads, a shared memory, a named pipe, ...
I would look into the pcntl extension. Hint: Forking is not possible from within a web-server environment for security reasons.

In my case the script runs in parallel so the server won't wait until the script is finished (it takes hours sometimes so it can't) ..
That shouldn't be a problem on its own. You can easily have a long running php-process, as long as it's not initiated from a web server. If you need to initiate the process from a web application, I would suggest that you insert an entry in a database table, and then have a cronjob run a script, which checks this queue and do the processing.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.