Spark: Send debug text to driver from worker - java

I would like to diagnose some error. I believe I should not tell the whole scenario to get a good solution for my question. So, I would like to create some debug information on the workers and display it on the driver, possibly real-time.
I read somewhere that issuing a System.out.println("DEBUG: ...") on a worker would produce a line in the executor log, but currently I'm having trouble retrieving those logs. Aside from that it would be still useful if I could see some debug noise on the driver as the calculation runs.
(I also figured out a workaround, but I don't know if I should apply it or not. At the end of each worker task I could append elements to a sequence file and I could monitor that, or check it at the end.)

One way I could think of doing this is (ab)using a custom acummulator to send messages from the workers to the driver. This will get whatever String message from the workers to the driver. On the driver you'd print the contents to collect the info. It's not real-time as wished-for as it depends of the program execution.
import org.apache.spark.AccumulatorParam
object LineCummulatorParam extends AccumulatorParam[String] {
def zero(value:String) : String = value
def addInPlace(s1:String, s2:String):String = s1 + "\n" + s2
}
val debugInfo = sparkContext.accumulator("","debug info")(DebugInfoCummulatorParam)
rdd.map{rdd => ...
...
...
//this happens on each worker
debugInfo += "something happened here"
}
//this happens on the driver
println(debugInfo)
Not sure why you cannot access the worker logs - that would be the most straightforward solution BTW.

Related

Is this the correct waiting strategy when sending commands with testcontainers?

I am using Testcontainers DockerComposeContainer and sending shell commands using the execInContainer method once my containers are up and running:
#ClassRule
public static DockerComposeContainer<?> environment =
new DockerComposeContainer<>(new File("docker-compose.yml"))
.withExposedService(DB_1, DB_PORT)
.waitingFor(SERVICE_1, Wait.defaultWaitStrategy())
.withLocalCompose(true);
One of the commands is to simply move a file which will then be processed and I want to wait until the process inside the container processes it before checking the results in my test.
service.execInContainer("cp", "my-file.zip", "/home/user/dir");
The way I'm checking to see if the process has consumed the my-file.zip, once it has been moved, is to inspect the logs:
String log = "";
while (!log.contains("Moving my-file.zip file to /home/user/dir")) {
ExecResult cat = soar.execInContainer("cat", "/my-service/logs/service.log");
log = cat.getStdout();
}
This works, but I don't like the constant polling very much inside the while loop and was wondering if there is a better way to achieve this.
I've been looking internally into testcontainers and it makes use of the java dockerapi so I wondered if there is a better way to do this via that API or if I could do the waiting using a library like Awaitility.
Thanks for any suggestions

Is it possible to execute a command on all workers within Apache Spark?

I have a situation where I want to execute a system process on each worker within Spark. I want this process to be run an each machine once. Specifically this process starts a daemon which is required to be running before the rest of my program executes. Ideally this should execute before I've read any data in.
I'm on Spark 2.0.2 and using dynamic allocation.
You may be able to achieve this with a combination of lazy val and Spark broadcast. It will be something like below. (Have not compiled below code, you may have to change few things)
object ProcessManager {
lazy val start = // start your process here.
}
You can broadcast this object at the start of your application before you do any transformations.
val pm = sc.broadcast(ProcessManager)
Now, you can access this object inside your transformation like you do with any other broadcast variables and invoke the lazy val.
rdd.mapPartition(itr => {
pm.value.start
// Other stuff here.
}
An object with static initialization which invokes your system process should do the trick.
object SparkStandIn extends App {
object invokeSystemProcess {
import sys.process._
val errorCode = "echo Whatever you put in this object should be executed once per jvm".!
def doIt(): Unit = {
// this object will construct once per jvm, but objects are lazy in
// another way to make sure instantiation happens is to check that the errorCode does not represent an error
}
}
invokeSystemProcess.doIt()
invokeSystemProcess.doIt() // even if doIt is invoked multiple times, the static initialization happens once
}
A specific answer for a specific use case, I have a cluster with 50 nodes and I wanted to know which ones have CET timezone set:
(1 until 100).toSeq.toDS.
mapPartitions(itr => {
sys.process.Process(
Seq("bash", "-c", "echo $(hostname && date)")
).
lines.
toIterator
}).
collect().
filter(_.contains(" CET ")).
distinct.
sorted.
foreach(println)
Notice I don't think it's guaranteed 100% you'll get a partition for every node so the command might not get run on every node, even using using a 100 elements Dataset in a cluster with 50 nodes like the previous example.

How to automatically collapse repetitive log output in log4j

Every once in a while, a server or database error causes thousands of the same stack trace in the server log files. It might be a different error/stacktrace today than a month ago. But it causes the log files to rotate completely, and I no longer have visibility into what happened before. (Alternately, I don't want to run out of disk space, which for reasons outside my control right now is limited--I'm addressing that issue separately). At any rate, I don't need thousands of copies of the same stack trace--just a dozen or so should be enough.
I would like it if I could have log4j/log4j2/another system automatically collapse repetitive errors, so that they don't fill up the log files. For example, a threshold of maybe 10 or 100 exceptions from the same place might trigger log4j to just start counting, and wait until they stop coming, then output a count of how many more times they appeared.
What pre-made solutions exist (a quick survey with links is best)? If this is something I should implement myself, what is a good pattern to start with and what should I watch out for?
Thanks!
Will the BurstFilter do what you want? If not, please create a Jira issue with the algorithm that would work for you and the Log4j team would be happy to consider it. Better yet, if you can provide a patch it would be much more likely to be incorporated.
Log4j's BurstFilter will certainly help prevent you filling your disks. Remember to configure it so that it applies in as limited a section of code as you can, or you'll filter out messages you might want to keep (that is, don't use it on your appender, but on a particular logger that you isolate in your code).
I wrote a simple utility class at one point that wrapped a logger and filtered based on n messages within a given Duration. I used instances of it around most of my warning and error logs to protect the off chance that I'd run into problems like you did. It worked pretty well for my situation, especially because it was easier to quickly adapt for different situations.
Something like:
...
public DurationThrottledLogger(Logger logger, Duration throttleDuration, int maxMessagesInPeriod) {
...
}
public void info(String msg) {
getMsgAddendumIfNotThrottled().ifPresent(addendum->logger.info(msg + addendum));
}
private synchronized Optional<String> getMsgAddendumIfNotThrottled() {
LocalDateTime now = LocalDateTime.now();
String msgAddendum;
if (throttleDuration.compareTo(Duration.between(lastInvocationTime, now)) <= 0) {
// last one was sent longer than throttleDuration ago - send it and reset everything
if (throttledInDurationCount == 0) {
msgAddendum = " [will throttle future msgs within throttle period]";
} else {
msgAddendum = String.format(" [previously throttled %d msgs received before %s]",
throttledInDurationCount, lastInvocationTime.plus(throttleDuration).format(formatter));
}
totalMessageCount++;
throttledInDurationCount = 0;
numMessagesSentInCurrentPeriod = 1;
lastInvocationTime = now;
return Optional.of(msgAddendum);
} else if (numMessagesSentInCurrentPeriod < maxMessagesInPeriod) {
msgAddendum = String.format(" [message %d of %d within throttle period]", numMessagesSentInCurrentPeriod + 1, maxMessagesInPeriod);
// within throttle period, but haven't sent max messages yet - send it
totalMessageCount++;
numMessagesSentInCurrentPeriod++;
return Optional.of(msgAddendum);
} else {
// throttle it
totalMessageCount++;
throttledInDurationCount++;
return emptyOptional;
}
}
I'm pulling this from an old version of the code, unfortunately, but the gist is there. I wrote a bunch of static factory methods that I mainly used because they let me write a single line of code to create one of these for that one log message:
} catch (IOException e) {
DurationThrottledLogger.error(logger, Duration.ofSeconds(1), "Received IO Exception. Exiting current reader loop iteration.", e);
}
This probably won't be as important in your case; for us, we were using a somewhat underpowered graylog instance that we could hose down fairly easily.

Jbpm6 . Asynchronous workitem and retry

Let me come directly to use case.
I am having a number of work-items in my process say A,B,C. It starts in A--->B--->C order.
In my case, B is a call to a 3rd party web service. C should process only if B is success. if the call to the web-service fails, system should retry after 5 min. The number of retries are limited to 3.
How can I achieve this using Jbpm6.?
Some options that I understand from doc are,
1) I can use a work item handler. Inside work item, I will start another thread which will do the retries and finally it calls the completeWrokItem() method. But in this case my process engine thread will wait unnecessarily for the completeWrokItem() call.
2)I can use command for retry . But if I call command it will execute in another thread and the process thread will execute C. Which is not a desirable way
How can I create a process so that, B will execute in back-end and will notify the engine that it can continue executing C?
Please advice.
Thanks in advance.
Please comment if my question is not clear enough to answer.
Your question is not completely clear; however, I provide an answer to hopefully provide some clarity:
For asynchronous execution, you should follow guidelines in documentation: JBMP 6.0 Async Documentation
Given your processes flow, if you use a Command and a process defined as: A->B->C; C will not start until the command completes.
To have commands run in parallel, you use parallel branches. In below pic, if Script1 and Script2 were commands they would execute in parallel, and Email would only execute once both Scripts complete:
A command signals complete by simply returning from execute method:
public ExecutionResults execute(CommandContext ctx) throws Exception {
// Set results if exist. Otherwise, return empty ExecutionResults.
ExecutionResults results = new ExecutionResults();
// This would match the name of an output parameter for the work item.
// results.setData("result", "result data");
logger.info("Command finished execution: " + this.getClass());
logger.debug("Results of executing command: ", results);
return results;
}
`
Add a XOR gateway after the node B, Add script to of the node B and set the status and retry_count of web-service(if success, status_b = true; if failed, status_b = false and retry_count ++),
XOR go to C if the retry_count>=3 or status_b == true
else go to B again

ProgramCallDocument connecting to AS400 from Groovy Hangs

This question is specifically related to the JT400 class ProgramCallDocument, with it's method callProgram(String ProgramName)
I've tried wapping the call in a try/catch - but it's not throwing an exception, the debugger goes into the callProgram method, and just sits there indefinitely.
A small amount of specific information about the API is available here:
http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.toolbox.doc/rzahhxpcmlusing.htm
Here's the code that I'm running:
AS400 as400System = AS400Factory.getAS400System()
ProgramCallDocument programCallDocument = new ProgramCallDocument(as400System, "com.sample.xpcml.Sample.xpcml")
programCallDocument.setStringValue("sampleProgramName.value", sampleValue)
Boolean didProgramCallDocumentRunSuccessfullyOnTheAS400 = programCallDocument.callProgram("sampleProgramName")
The last line of that snippet is the one that just sits there. I left out the try/catch for brevity.
The XPCML file that the ProgramCallDocument constructor uses is just a proprietary XML format that IBM uses for specifying the parameter lengths and types for a program call. I can come back and add it in if it would be helpful, but the ProgramCallDocument constructor runs validation on the XML, and it didn't come up with any validation errors. I'm not familiar with JT400, or how it does Program Calls, so any assistance would be wonderful.
As a further note, doing some more digging on a related issue today I also found this SO post:
Monitor and handle MSGW messages on a job on an IBM i-series (AS/400) from Java
I think it's relevant to this question, because it's about ways to trap MSGW status on the Java/Groovy side.
It's very likely the called program went into a MSGW status (error).
Check WRKACTJOB JOB(QZRCSRVS) to find the program call job and see the status as well as review the job log.
It may be easier to call a native program using the CommandCall class or as a JDBC stored procedure.
Here's an example of the CommandCall usage in Groovy:
sys = AS400Factory.AS400System
cmd = new CommandCall(sys)
if (!cmd.run "CALL MYLIB.MYPGM PARM('${sampleValue}')") {
println cmd.messageList
}

Categories

Resources