I want to run periodically (for instance every 10 minutes) a method to export .txt file inside a recursive resultless ForkJoinTask class.
So I initialized an ExecutorService for running ForkJoinTasks and create a pool of processors, an ExecutorService to run the command after a given delay, and a writer to output the file like this :
WriterTXT writer = new WriterTXT();
ScheduledExecutorService executor = Executors.newScheduledThreadPool(1);
ForkJoinPool pool = new ForkJoinPool(Runtime.getRuntime().availableProcessors());
Then inside the compute method recursive ForkJoinTask class, I create a runnable task like this :
protected void compute() {
if (condition)
run();
Runnable task = () -> {
try {
writer.writetxt();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
};
exec.scheduleWithFixedDelay(task, 0, 10, TimeUnit.MINUTES);
else {
dosomthingelse();
}
I understand that in scheduleWithFixedDelay, 0 is the intial delay and 10 minutes is the delay, However in my case since i use multiple processors, the file is exported continusously and the delays are not respected.
Below an example of running the application on 2 processors and and an initial delay of 0 minutes, and a delay of 5 minutes
exec.scheduleWithFixedDelay(task, 0, 5, TimeUnit.MINUTES);
2022-08-29 17:40:22.669 [main ] INFO - Number of CPU: 2
2022-08-29 17:40:23.661 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:23.951 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:23.992 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:24.075 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:24.191 [worker-2] INFO - Queued Tasks = 33725 / 33731
2022-08-29 17:40:24.191 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:24.456 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:24.498 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:24.597 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:24.881 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:25.182 [worker-2] INFO - Queued Tasks = 33720 / 33731
2022-08-29 17:40:25.182 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:25.481 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:25.634 [thread-1] INFO - Writing partial Solutions
2022-08-29 17:40:25.735 [thread-1] INFO - Writing partial Solutions
So is there a way to make this working?
Thank you in advance.
Related
Please consider the following diagram
MyProcess.bpmn
<?xml version="1.0" encoding="UTF-8"?>
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:activiti="http://activiti.org/bpmn" xmlns:bpmndi="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:omgdc="http://www.omg.org/spec/DD/20100524/DC" xmlns:omgdi="http://www.omg.org/spec/DD/20100524/DI" typeLanguage="http://www.w3.org/2001/XMLSchema" expressionLanguage="http://www.w3.org/1999/XPath" targetNamespace="http://www.activiti.org/test">
<process id="myProcess" name="My process" isExecutable="true">
<startEvent id="startevent1" name="Start"></startEvent>
<userTask id="evl" name="Evaluation"></userTask>
<boundaryEvent id="timer_event_autocomplete" name="Timer" attachedToRef="evl" cancelActivity="false">
<timerEventDefinition>
<timeDate>PT2S</timeDate>
</timerEventDefinition>
</boundaryEvent>
<serviceTask id="timer_service" name="Timed Autocomplete" activiti:async="true" activiti:class="com.example.service.TimerService"></serviceTask>
<serviceTask id="store_docs_service" name="Store Documents" activiti:async="true" activiti:class="com.example.service.StoreDocsService"></serviceTask>
<sequenceFlow id="flow1" sourceRef="startevent1" targetRef="evl"></sequenceFlow>
<sequenceFlow id="flow2" sourceRef="timer_event_autocomplete" targetRef="timer_service"></sequenceFlow>
<sequenceFlow id="flow3" sourceRef="evl" targetRef="store_docs_service"></sequenceFlow>
<sequenceFlow id="flow4" sourceRef="store_docs_service" targetRef="endevent1"></sequenceFlow>
<endEvent id="endevent1" name="End"></endEvent>
</process>
</definitions>
To describe it in words, there is one user task (Evaluation) and a timer attached to it (configured to trigger in 2 seconds). Upon triggering the timer, the Timed Autocomplete async service task in its Java Delegate, TimerService, tries to complete the user task (Evaluation). Completing the user task (Evaluation) the flow moves to the other async service task (Store Documents), it calls its Java Delegate, StoreDocsService, and the flow ends.
TimerService.java
public class TimerService implements JavaDelegate {
Logger LOGGER = LoggerFactory.getLogger(TimerService.class);
#Override
public void execute(DelegateExecution execution) throws Exception {
LOGGER.info("*** Executing Timer autocomplete ***");
Task task = execution.getEngineServices().getTaskService().createTaskQuery().active().singleResult();
execution.getEngineServices().getTaskService().complete(task.getId());
LOGGER.info("*** Task: {} autocompleted by timer ***", task.getId());
}
}
StoreDocsService.java
public class StoreDocsService implements JavaDelegate {
Logger LOGGER = LoggerFactory.getLogger(StoreDocsService.class);
#Override
public void execute(DelegateExecution execution) throws Exception {
LOGGER.info("*** Executing Store Documents ***");
}
}
App.java
public class App
{
public static void main( String[] args ) throws Exception
{
// DefaultAsyncJobExecutor demoAsyncJobExecutor = new DefaultAsyncJobExecutor();
// demoAsyncJobExecutor.setCorePoolSize(10);
// demoAsyncJobExecutor.setMaxPoolSize(50);
// demoAsyncJobExecutor.setKeepAliveTime(10000);
// demoAsyncJobExecutor.setMaxAsyncJobsDuePerAcquisition(50);
ProcessEngineConfiguration cfg = new StandaloneProcessEngineConfiguration()
.setJdbcUrl("jdbc:h2:mem:activiti;DB_CLOSE_DELAY=1000")
.setJdbcUsername("sa")
.setJdbcPassword("")
.setJdbcDriver("org.h2.Driver")
.setDatabaseSchemaUpdate(ProcessEngineConfiguration.DB_SCHEMA_UPDATE_TRUE)
// .setAsyncExecutorActivate(true)
// .setAsyncExecutorEnabled(true)
// .setAsyncExecutor(demoAsyncJobExecutor)
.setJobExecutorActivate(true)
;
ProcessEngine processEngine = cfg.buildProcessEngine();
String pName = processEngine.getName();
String ver = ProcessEngine.VERSION;
System.out.println("ProcessEngine [" + pName + "] Version: [" + ver + "]");
RepositoryService repositoryService = processEngine.getRepositoryService();
Deployment deployment = repositoryService.createDeployment()
.addClasspathResource("MyProcess.bpmn").deploy();
ProcessDefinition processDefinition = repositoryService.createProcessDefinitionQuery()
.deploymentId(deployment.getId()).singleResult();
System.out.println(
"Found process definition ["
+ processDefinition.getName() + "] with id ["
+ processDefinition.getId() + "]");
final Map<String, Object> variables = new HashMap<String, Object>();
final RuntimeService runtimeService = processEngine.getRuntimeService();
ProcessInstance id = runtimeService.startProcessInstanceByKey("myProcess", variables);
System.out.println("Started Process Id: "+id.getId());
try {
final TaskService taskService = processEngine.getTaskService();
// List<Task> tasks = taskService.createTaskQuery().active().list();
// while (!tasks.isEmpty()) {
// Task task = tasks.get(0);
// taskService.complete(task.getId());
// tasks = taskService.createTaskQuery().active().list();
// }
} catch (Exception e) {
System.out.println(e.getMessage());
} finally {
}
while(!runtimeService.createExecutionQuery().list().isEmpty()) {
}
processEngine.close();
}
}
Activiti 5.15
When the timer triggers, the above diagram executes as described. We use Activiti's DefaultJobExecutor
As we can see in the logs:
[main] INFO org.activiti.engine.impl.ProcessEngineImpl - ProcessEngine default created
[main] INFO org.activiti.engine.impl.jobexecutor.JobExecutor - Starting up the JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor].
[Thread-1] INFO org.activiti.engine.impl.jobexecutor.AcquireJobsRunnable - JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor] starting to acquire jobs
ProcessEngine [default] Version: [5.15]
[main] INFO org.activiti.engine.impl.bpmn.deployer.BpmnDeployer - Processing resource MyProcess.bpmn
Found process definition [My process] with id [myProcess:1:3]
Started Process Id: 4
[pool-1-thread-1] INFO com.example.service.TimerService - *** Executing Timer autocomplete ***
[pool-1-thread-1] INFO com.example.service.TimerService - *** Task: 9 autocompleted by timer ***
[pool-1-thread-1] INFO com.example.service.StoreDocsService - *** Executing Store Documents ***
[main] INFO org.activiti.engine.impl.jobexecutor.JobExecutor - Shutting down the JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor].
[Thread-1] INFO org.activiti.engine.impl.jobexecutor.AcquireJobsRunnable - JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor] stopped job acquisition
Activiti >= 5.17
Changing only the activiti's version in pom.xml to 5.17.0 and up (tested till 5.22.0) and executing the same code, the flow executes the timer's Java Delegate, TimerService, which completes the user task (Evaluation) but Store Documents Java Delegate, StoreDocsService is never called. To add more, it seems that the flow never ends and the execution remains stuck at Store Documents async service task.
Logs:
[main] INFO org.activiti.engine.impl.ProcessEngineImpl - ProcessEngine default created
[main] INFO org.activiti.engine.impl.jobexecutor.JobExecutor - Starting up the JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor].
[Thread-1] INFO org.activiti.engine.impl.jobexecutor.AcquireJobsRunnableImpl - JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor] starting to acquire jobs
ProcessEngine [default] Version: [5.17.0.2]
[main] INFO org.activiti.engine.impl.bpmn.deployer.BpmnDeployer - Processing resource MyProcess.bpmn
Found process definition [My process] with id [myProcess:1:3]
Started Process Id: 4
[pool-1-thread-2] INFO com.example.service.TimerService - *** Executing Timer autocomplete ***
[pool-1-thread-2] INFO com.example.service.TimerService - *** Task: 9 autocompleted by timer ***
Changing to Async Job Executor. One feature of 5.17 release was the new async job executor (however the default non-async executor remains configured as default). So trying to enable the async executor in App.java by the following lines:
DefaultAsyncJobExecutor demoAsyncJobExecutor = new DefaultAsyncJobExecutor();
demoAsyncJobExecutor.setCorePoolSize(10);
demoAsyncJobExecutor.setMaxPoolSize(50);
demoAsyncJobExecutor.setKeepAliveTime(10000);
demoAsyncJobExecutor.setMaxAsyncJobsDuePerAcquisition(50);
ProcessEngineConfiguration cfg = new StandaloneProcessEngineConfiguration()
.setJdbcUrl("jdbc:h2:mem:activiti;DB_CLOSE_DELAY=1000")
.setJdbcUsername("sa")
.setJdbcPassword("")
.setJdbcDriver("org.h2.Driver")
.setDatabaseSchemaUpdate(ProcessEngineConfiguration.DB_SCHEMA_UPDATE_TRUE)
.setAsyncExecutorActivate(true)
.setAsyncExecutorEnabled(true)
.setAsyncExecutor(demoAsyncJobExecutor)
;
The flow seems to execute correctly, StoreDocsService is called after TimerService, but it never ends (the while(!runtimeService.createExecutionQuery().list().isEmpty()) statement in App.java is always true)!
Logs:
[main] INFO org.activiti.engine.impl.ProcessEngineImpl - ProcessEngine default created
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Starting up the default async job executor [org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor].
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Creating thread pool queue of size 100
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Creating executor service with corePoolSize 10, maxPoolSize 50 and keepAliveTime 10000
[Thread-1] INFO org.activiti.engine.impl.asyncexecutor.AcquireTimerJobsRunnable - {} starting to acquire async jobs due
[Thread-2] INFO org.activiti.engine.impl.asyncexecutor.AcquireAsyncJobsDueRunnable - {} starting to acquire async jobs due
ProcessEngine [default] Version: [5.17.0.2]
[main] INFO org.activiti.engine.impl.bpmn.deployer.BpmnDeployer - Processing resource MyProcess.bpmn
Found process definition [My process] with id [myProcess:1:3]
Started Process Id: 4
[pool-1-thread-2] INFO com.example.service.TimerService - *** Executing Timer autocomplete ***
[pool-1-thread-2] INFO com.example.service.TimerService - *** Task: 9 autocompleted by timer ***
[pool-1-thread-3] INFO com.example.service.StoreDocsService - *** Executing Store Documents ***
!!!! UPDATE !!!
Activiti 6.0.0
Tried the same scenario but with Activiti version 6.0.0.
Changes needed in TimerService, cannot get the EngineServices from DelegateExecution:
public class TimerService implements JavaDelegate {
Logger LOGGER = LoggerFactory.getLogger(TimerService.class);
#Override
public void execute(DelegateExecution execution) {
LOGGER.info("*** Executing Timer autocomplete ***");
Task task = Context.getProcessEngineConfiguration().getTaskService().createTaskQuery().active().singleResult();
Context.getProcessEngineConfiguration().getTaskService().complete(task.getId());
// Task task = execution.getEngineServices().getTaskService().createTaskQuery().active().singleResult();
// execution.getEngineServices().getTaskService().complete(task.getId());
LOGGER.info("*** Task: {} autocompleted by timer ***", task.getId());
}
}
and this version has only the async executor so the ProcessEngineConfiguration in App.java changes to:
ProcessEngineConfiguration cfg = new StandaloneProcessEngineConfiguration()
.setJdbcUrl("jdbc:h2:mem:activiti;DB_CLOSE_DELAY=1000")
.setJdbcUsername("sa")
.setJdbcPassword("")
.setJdbcDriver("org.h2.Driver")
.setDatabaseSchemaUpdate(ProcessEngineConfiguration.DB_SCHEMA_UPDATE_TRUE)
.setAsyncExecutorActivate(true)
// .setAsyncExecutorEnabled(true)
// .setAsyncExecutor(demoAsyncJobExecutor)
// .setJobExecutorActivate(true)
;
With 6.0.0 version and async executor the process completes successfully as we can see in the logs:
[main] INFO org.activiti.engine.impl.ProcessEngineImpl - ProcessEngine default created
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Starting up the default async job executor [org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor].
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Creating thread pool queue of size 100
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Creating executor service with corePoolSize 2, maxPoolSize 10 and keepAliveTime 5000
[Thread-1] INFO org.activiti.engine.impl.asyncexecutor.AcquireAsyncJobsDueRunnable - {} starting to acquire async jobs due
[Thread-2] INFO org.activiti.engine.impl.asyncexecutor.AcquireTimerJobsRunnable - {} starting to acquire async jobs due
[Thread-3] INFO org.activiti.engine.impl.asyncexecutor.ResetExpiredJobsRunnable - {} starting to reset expired jobs
ProcessEngine [default] Version: [6.0.0.4]
Found process definition [My process] with id [myProcess:1:3]
Started Process Id: 4
[activiti-async-job-executor-thread-2] INFO com.example.service.TimerService - *** Executing Timer autocomplete ***
[activiti-async-job-executor-thread-2] INFO com.example.service.TimerService - *** Task: 10 autocompleted by timer ***
[activiti-async-job-executor-thread-2] INFO com.example.service.StoreDocsService - *** Executing Store Documents ***
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Shutting down the default async job executor [org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor].
[activiti-reset-expired-jobs] INFO org.activiti.engine.impl.asyncexecutor.ResetExpiredJobsRunnable - {} stopped resetting expired jobs
[activiti-acquire-timer-jobs] INFO org.activiti.engine.impl.asyncexecutor.AcquireTimerJobsRunnable - {} stopped async job due acquisition
[activiti-acquire-async-jobs] INFO org.activiti.engine.impl.asyncexecutor.AcquireAsyncJobsDueRunnable - {} stopped async job due acquisition
Process finished with exit code 0
2 Questions:
We have upgraded from Activiti 5.15 to 5.22.0 and we do not use the async job executor. Is there any way to keep the functionality of this piece of diagram to behave as it was behaving in 5.15?
If switching to the async job executor is inevitable, then what are we missing in order to make this process complete successfully?
A sample project of the above can be found at: https://github.com/pleft/DemoActiviti
Without answering your question explicitly which would require setting up your environment and debugging, I would recommend you at the very least move to Activiti 6.
The 5.x branch of Activiti hasn't been maintained for over 5 years and is effectively dead.
Even the 6.x line has pretty much been abandoned as the core developers have all moved to the "Flowable" project.
If you choose to stay with Activiti 5.x, your options are:
Maintain the codebase yourself (and hopefully contribute any changes/enhancements back to the project).
Contract Activiti support services. There are a couple of vendors offering such services.
I am trying tranquility with Druid 0.11 and Kafka. When tranquility receive new data it throw the following exception:
2018-01-12 18:27:34,010 [Curator-ServiceCache-0] INFO c.m.c.s.net.finagle.DiscoResolver - Updating instances for service[firehose:druid:overlord:flow-018-0000-0000] to Set(ServiceInstance{name='firehose:druid:overlord:flow-018-0000-0000', id='ea85b248-0c53-4ec1-94a6-517525f72e31', address='druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local', port=8100, sslPort=-1, payload=null, registrationTimeUTC=1515781653895, serviceType=DYNAMIC, uriSpec=null})
Jan 12, 2018 6:27:37 PM com.twitter.finagle.netty3.channel.ChannelStatsHandler exceptionCaught
WARNING: ChannelStatsHandler caught an exception
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
at org.jboss.netty.channel.SimpleChannelHandler.connectRequested(SimpleChannelHandler.java:306)
The worker was created by middle Manager:
2018-01-12T18:27:25,704 INFO [WorkerTaskMonitor] io.druid.indexing.worker.WorkerTaskMonitor - Submitting runnable for task[index_realtime_flow_2018-01-12T18:00:00.000Z_0_0]
2018-01-12T18:27:25,719 INFO [WorkerTaskMonitor] io.druid.indexing.worker.WorkerTaskMonitor - Affirmative. Running task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0]
And tranquility talk with overlord fine... I think by the following logs:
2018-01-12T18:27:25,268 INFO [qtp271944754-62] io.druid.indexing.overlord.TaskLockbox - Adding task[index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] to activeTasks
2018-01-12T18:27:25,272 INFO [TaskQueue-Manager] io.druid.indexing.overlord.TaskQueue - Asking taskRunner to run: index_realtime_flow_2018-01-12T18:00:00.000Z_0_0
2018-01-12T18:27:25,272 INFO [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Added pending task index_realtime_flow_2018-01-12T18:00:00.000Z_0_0
2018-01-12T18:27:25,279 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - No worker selection strategy set. Using default of [EqualDistributionWorkerSelectStrategy]
2018-01-12T18:27:25,294 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - Coordinator asking Worker[druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local:8091] to add task[index_realtime_flow_2018-01-12T18:00:00.000Z_0_0]
2018-01-12T18:27:25,334 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - Task index_realtime_flow_2018-01-12T18:00:00.000Z_0_0 switched from pending to running (on [druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local:8091])
2018-01-12T18:27:25,336 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] status changed to [RUNNING].
2018-01-12T18:27:25,747 INFO [Curator-PathChildrenCache-1] io.druid.indexing.overlord.RemoteTaskRunner - Worker[druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local:8091] wrote RUNNING status for task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] on [TaskLocation{host='null', port=-1, tlsPort=-1}]
2018-01-12T18:27:25,829 INFO [Curator-PathChildrenCache-1] io.druid.indexing.overlord.RemoteTaskRunner - Worker[druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local:8091] wrote RUNNING status for task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] on [TaskLocation{host='druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local', port=8100, tlsPort=-1}]
2018-01-12T18:27:25,829 INFO [Curator-PathChildrenCache-1] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] location changed to [TaskLocation{host='druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local', port=8100, tlsPort=-1}].
What's wrong? I tried a thousand things and nothing solves it ...
Thanks a lot
UnresolvedAddressException being hit by Druid broker
You have to have all the druid cluster information set in you servers running tranquility.
It's because you only get DNS of you druid cluster from zookeeper, not the IP.
For example, on linux server, save you cluster information in /etc/hosts.
As per Spark documentation only RDD actions can trigger a Spark job and the transformations are lazily evaluated when an action is called on it.
I see the sortBy transformation function is applied immediately and it is shown as a job trigger in the SparkUI. Why?
sortBy is implemented using sortByKey which depends on a RangePartitioner (JVM) or partitioning function (Python). When you call sortBy / sortByKey partitioner (partitioning function) is initialized eagerly and samples input RDD to compute partition boundaries. Job you see corresponds to this process.
Actual sorting is performed only if you execute an action on the newly created RDD or its descendants.
As per Spark documentation only the action triggers a job in Spark, the transformations are lazily evaluated when an action is called on it.
In general you're right, but as you've just experienced, there are few exceptions and sortBy is among them (with zipWithIndex).
As a matter of fact, it was reported in Spark's JIRA and closed with Won't Fix resolution. See SPARK-1021 sortByKey() launches a cluster job when it shouldn't.
You can see the job running with DAGScheduler logging enabled (and later in web UI):
scala> sc.parallelize(0 to 8).sortBy(identity)
INFO DAGScheduler: Got job 1 (sortBy at <console>:25) with 8 output partitions
INFO DAGScheduler: Final stage: ResultStage 1 (sortBy at <console>:25)
INFO DAGScheduler: Parents of final stage: List()
INFO DAGScheduler: Missing parents: List()
DEBUG DAGScheduler: submitStage(ResultStage 1)
DEBUG DAGScheduler: missing: List()
INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at sortBy at <console>:25), which has no missing parents
DEBUG DAGScheduler: submitMissingTasks(ResultStage 1)
INFO DAGScheduler: Submitting 8 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at sortBy at <console>:25)
DEBUG DAGScheduler: New pending partitions: Set(0, 1, 5, 2, 6, 3, 7, 4)
INFO DAGScheduler: ResultStage 1 (sortBy at <console>:25) finished in 0.013 s
DEBUG DAGScheduler: After removal of stage 1, remaining stages = 0
INFO DAGScheduler: Job 1 finished: sortBy at <console>:25, took 0.019755 s
res1: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[6] at sortBy at <console>:25
I'm starting to learn Apache Camel and faced with the problem.
I need to read XML file from file system, parse it and transfer some file specified in this XML to another location.
This is example of XML located in "C:/Users/JuISe/Desktop/jms".
<file>
<from>C:/Users/JuISe/Desktop/from</from>
<to>C:/Users/JuISe/Desktop/to</to>
</file>
It means transfer all files from
"C:/Users/JuISe/Desktop/from" directory to "C:/Users/JuISe/Desktop/to"
Here is my code:
public class FileShifter {
public static void main(String args[]) {
CamelContext context = new DefaultCamelContext();
try {
context.addRoutes(new MyRouteBuilder());
context.start();
Thread.sleep(10000);
context.stop();
}catch (Exception ex) {
ex.printStackTrace();
}
}
}
class MyRouteBuilder extends RouteBuilder {
private String from;
private String to;
public void configure(){
from("file:C:/Users/JuISe/Desktop/jms?noop=true")
.setHeader("from", xpath("file/from/text()").stringResult())
.setHeader("to", xpath("file/to/text()").stringResult())
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
from = exchange.getIn().getHeader("from").toString();
to = exchange.getIn().getHeader("to").toString();
}
})
.pollEnrich("file:" + from)
.to("file:" + to);
}
}
It doesn't works.
Here is logs:
[main] INFO org.apache.camel.impl.converter.DefaultTypeConverter - Loaded 216 type converters
[main] INFO org.apache.camel.impl.DefaultRuntimeEndpointRegistry - Runtime endpoint registry is in extended mode gathering usage statistics of all incoming and outgoing endpoints (cache limit: 1000)
[main] INFO org.apache.camel.impl.DefaultCamelContext - AllowUseOriginalMessage is enabled. If access to the original message is not needed, then its recommended to turn this option off as it may improve performance.
[main] INFO org.apache.camel.impl.DefaultCamelContext - StreamCaching is not in use. If using streams then its recommended to enable stream caching. See more details at http://camel.apache.org/stream-caching.html
[main] INFO org.apache.camel.component.file.FileEndpoint - Endpoint is configured with noop=true so forcing endpoint to be idempotent as well
[main] INFO org.apache.camel.component.file.FileEndpoint - Using default memory based idempotent repository with cache max size: 1000
[main] INFO org.apache.camel.impl.DefaultCamelContext - Route: route1 started and consuming from: Endpoint[file://C:/Users/JuISe/Desktop/jms?noop=true]
[main] INFO org.apache.camel.impl.DefaultCamelContext - Total 1 routes, of which 1 is started.
[main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.16.1 (CamelContext: camel-1) started in 1.033 seconds
[Camel (camel-1) thread #0 - file://C:/Users/JuISe/Desktop/jms] WARN org.apache.camel.component.file.strategy.MarkerFileExclusiveReadLockStrategy - Deleting orphaned lock file: C:\Users\JuISe\Desktop\jms\message.xml.camelLock
[Camel (camel-1) thread #0 - file://C:/Users/JuISe/Desktop/jms] INFO org.apache.camel.builder.xml.XPathBuilder - Created default XPathFactory com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl#2308d4c8
[main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.16.1 (CamelContext: camel-1) is shutting down
[main] INFO org.apache.camel.impl.DefaultShutdownStrategy - Starting to graceful shutdown 1 routes (timeout 300 seconds)
[Camel (camel-1) thread #2 - ShutdownTask] INFO org.apache.camel.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 300 seconds. Inflights per route: [route1 = 2]
[Camel (camel-1) thread #2 - ShutdownTask] INFO org.apache.camel.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 299 seconds. Inflights per route: [route1 = 2]
[Camel (camel-1) thread #2 - ShutdownTask] INFO org.apache.camel.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 298 seconds. Inflights per route: [route1 = 2]
[Camel (camel-1) thread #2 - ShutdownTask] INFO org.apache.camel.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 297 seconds. Inflights per route: [route1 = 2]
[Camel (camel-1) thread #2 - ShutdownTask] INFO org.apache.camel.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 296 seconds. Inflights per route: [route1 = 2]
[Camel (camel-1) thread #2 - ShutdownTask] INFO org.apache.camel.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 295 seconds. Inflights per route: [route1 = 2]
[Camel (camel-1) thread #2 - ShutdownTask] INFO org.apache.camel.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 294 seconds. Inflights per route: [route1 = 2]
Thanks for a help!
Try using a bean with producer and consumer template , file end points directory cannot be dynamic
from("file:/Users/smunirat/apps/destination/jms?noop=true")
.setHeader("from", xpath("file/from/text()").stringResult())
.setHeader("to", xpath("file/to/text()").stringResult())
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
from = exchange.getIn().getHeader("from").toString();
to = exchange.getIn().getHeader("to").toString();
exchange.getOut().setHeader("from", from);
exchange.getOut().setHeader("to", to);
}
})
.to("log:Sundar?showAll=true&multiline=true")
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
ConsumerTemplate createConsumerTemplate = exchange.getContext().createConsumerTemplate();
ProducerTemplate createProducerTemplate = exchange.getContext().createProducerTemplate();
Exchange receive = createConsumerTemplate.receive("file://"+exchange.getIn().getHeader("from"));
createProducerTemplate.sendBody("file://"+exchange.getIn().getHeader("to"),receive.getIn().getMandatoryBody());
}
})
.log("Message");
This might require a little tweaking to change the file name and delete the original file from the from location
I've a Camel process (that I run from command line) which route is similar to this one:
public class ProfilerRoute extends RouteBuilder {
#Override
public void configure() {
from("kestrel://my_queue?concurrentConsumers=10&waitTimeMs=500")
.unmarshal().json(JsonLibrary.Jackson, MyClass.class)
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
/* Do the real processing [...] */
exchange.getIn().setBody(null);
}
})
.filter(body().isNotNull())
.to("file://nowhere");
}
}
Note that I'm trashing whatever message after having processed it, being this a pure consumer
process.
The process is run by its own. No other process is writing on the queue, the queue is empty.
However when I try to kill the process the process is not going to die.
From the logs I see the following lines (indented for readability):
[ Thread-1] MainSupport$HangupInterceptor INFO
Received hang up - stopping the main instance.
[ Thread-1] MainSupport INFO
Apache Camel stopping
[ Thread-1] GuiceCamelContext INFO
Apache Camel 2.11.1 (CamelContext: camel-1)
is shutting down
[ Thread-1] DefaultShutdownStrategy INFO
Starting to graceful shutdown 1 routes
(timeout 300 seconds)
[l-1) thread #12 - ShutdownTask] DefaultShutdownStrategy INFO
Waiting as there are still 10 inflight and
pending exchanges to complete,
timeout in 300 seconds.
And so on with decreasing timeout. At the end of the timeout I get on the logs:
[l-1) thread #12 - ShutdownTask] DefaultShutdownStrategy INFO
Waiting as there are still 10 inflight and
pending exchanges to complete,
timeout in 1 seconds.
[ Thread-1] DefaultShutdownStrategy WARN
Timeout occurred.
Now forcing the routes to be shutdown now.
[l-1) thread #12 - ShutdownTask] DefaultShutdownStrategy WARN
Interrupted while waiting during graceful
shutdown, will force shutdown now.
[ Thread-1] KestrelConsumer INFO
Stopping consumer for
kestrel://localhost:22133/my_queue?concurrentConsumers=10&waitTimeMs=500
But the process will not die anyway (even if I try to kill it at this point).
I would have expected that after the waiting time all the threads would realise that a shutdown is going on and stop.
I've read the "Graceful Shutdown" document, however I could not find something that explains the behaviour I'm facing.
As you can see from logs I'm using the 2.11.1 version of Apache Camel.
UPDATE: According to Claus Ibsen it might be a problem of the camel-kestrel component. I filed a issue on ASF Jira for Camel: CAMEL-6632
This is a bug in camel-kestrel, and a JIRA ticket has been logged to fix this: https://issues.apache.org/jira/browse/CAMEL-6632