Using the DirectPipelineRunner, I'd like to run my pipeline locally for debugging purposes. I'm using SDK 1.9.0 with Java 8.
My pipeline reads a table from BigQuery, transforms some fields, and writes back to BigQuery.
Running on GCP, i.e. using the DataflowPipelineRunner runner works absolutely fine. However, when I use the DirectPipelineRunner is just keeps spitting out the following log info, and does nothing else:
19:45:05,470 21866 [main] INFO com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner - Executing pipeline using the DirectPipelineRunner.
19:45:18,594 34990 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_c88ee6741e434aabbf50e73d4e6733d1-extract found.
19:45:27,344 43740 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_012dca76d75e461480fe75897b5fa7ba-extract found.
19:45:38,150 54546 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_3548a0ee373a417e8e7570ae90aef78d-extract found.
19:45:47,912 64308 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_db0b957250ef41279a639bdc113c5493-extract found.
19:45:56,685 73081 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_3773e0643ec14475aaa140bcf46ea7af-extract found.
19:46:45,958 122354 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_27af9a1163944cb19e520242de98d899-extract found.
19:46:55,766 132162 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_5473e6702b3544118c7da8877c900f7a-extract found.
19:47:04,015 140411 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_40f47d35aa154708a6fc684c8ffb0ba4-extract found.
19:47:11,913 148309 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_6dce34301c97498884d7344b85a1b07e-extract found.
19:47:35,809 172205 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_4f7c26d372974095a24ac58b547c13d6-extract found.
19:47:45,136 181532 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_a7c33e75bfdb41a6990dd66810a0d44a-extract found.
19:47:55,802 192198 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_a1d7422ca42a4b1d96205bf8c6dada9d-extract found.
The log message is coming from here:
#VisibleForTesting
public Job getJob(JobReference jobRef, Sleeper sleeper, BackOff backoff)
throws IOException, InterruptedException {
String jobId = jobRef.getJobId();
Exception lastException;
do {
try {
return client.jobs().get(jobRef.getProjectId(), jobId).execute();
} catch (GoogleJsonResponseException e) {
if (errorExtractor.itemNotFound(e)) {
LOG.info("No BigQuery job with job id {} found.", jobId);
return null;
}....
Eventually, the JVM runs out of memory:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.String.toLowerCase(String.java:2647)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:847)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:472)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:781)
at com.google.api.client.json.JsonParser.parseArray(JsonParser.java:648)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:740)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:472)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:781)
at com.google.api.client.json.JsonParser.parseArray(JsonParser.java:648)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:740)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:472)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:781)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:382)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:355)
at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:87)
at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:81)
at com.google.api.client.http.HttpResponse.parseAs(HttpResponse.java:459)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.dataflow.sdk.util.BigQueryTableRowIterator.executeWithBackOff(BigQueryTableRowIterator.java:497)
at com.google.cloud.dataflow.sdk.util.BigQueryTableRowIterator.advance(BigQueryTableRowIterator.java:180)
at com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl$BigQueryJsonReaderImpl.advance(BigQueryServicesImpl.java:555)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$BigQuerySourceBase$BigQueryReader.advance(BigQueryIO.java:1331)
at com.google.cloud.dataflow.sdk.io.Read$Bounded$1.evaluateReadHelper(Read.java:180)
at com.google.cloud.dataflow.sdk.io.Read$Bounded$1.evaluate(Read.java:168)
at com.google.cloud.dataflow.sdk.io.Read$Bounded$1.evaluate(Read.java:164)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:858)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:221)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:103)
The table in BigQuery only has 100 rows (it's just for debugging).
What is the problem here?
I believe the BigQuery message is a red-herring; the stack trace of the OOM indicates that data is being read directly from the table and not via an export job.
The DirectPipelineRunner is not at all optimized for memory utilization; try using the newer InProcessPipelineRunner. Additionally, it may be worth using standard Java heap profiling tools to see where the memory is being used.
Related
I have a camel route in MyRouteBuilder.java file which is consuming messages from ActiveMQ:
from("activemq:queue:myQueue" )
.process(consumeDroppedMessage)
.log(">>> I am here");
I wrote a test case for the following like this :
#Override
public RouteBuilder createRouteBuilder() throws Exception {
return new MyRouteBuilder();
}
#Test
void testMyTest() throws Exception {
String queueInputMessage = "My Msg";
template.sendBody("activemq:queue:myQueue", queueInputMessage);
assertMockEndpointsSatisfied();
}
When I run the unit test case I get this strange error:
7:53:26.175 [main] DEBUG org.apache.camel.impl.engine.InternalRouteStartupManager - Route: route1 >>> Route[activemq://queue:null -> null]
17:53:26.175 [main] DEBUG org.apache.camel.impl.engine.InternalRouteStartupManager - Starting consumer (order: 1000) on route: route1
17:53:26.175 [main] DEBUG org.apache.camel.support.DefaultConsumer - Build consumer: Consumer[activemq://queue:null]
17:53:26.185 [main] DEBUG org.apache.camel.support.DefaultConsumer - Init consumer: Consumer[activemq://queue:null]
17:53:26.185 [main] DEBUG org.apache.camel.support.DefaultConsumer - Starting consumer: Consumer[activemq://queue:null]
17:53:26.213 [main] DEBUG org.apache.activemq.thread.TaskRunnerFactory - Initialized TaskRunnerFactory[ActiveMQ Task] using ExecutorService: java.util.concurrent.ThreadPoolExecutor#3fffff43[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
17:53:26.215 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Reconnect was triggered but transport is not started yet. Wait for start to connect the transport.
17:53:26.334 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Started unconnected
17:53:26.334 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Waking up reconnect task
17:53:26.335 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - urlList connectionList:[tcp://localhost:61616], from: [tcp://localhost:61616]
17:53:26.339 [main] DEBUG org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Established shared JMS Connection
17:53:26.340 [main] DEBUG org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Resumed paused task: org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker#58c34bb3
17:53:26.372 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Attempting 0th connect to: tcp://localhost:61616
17:53:28.393 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Connect fail to: tcp://localhost:61616, reason: {}
I am especially stumped to see these messages:
Route: route1 >>> Route[activemq://queue:null -> null]
and
urlList connectionList:[tcp://localhost:61616], from: [tcp://localhost:61616]
Why is the queue coming up as null though I have a proper queue name? Also why is the broker url tcp://localhost:61616?
I want to run this unit test case so that it runs properly in all environments like: local, DIT , SIT, PROD etc. So, for that I cannot afford the broker url to be: tcp://localhost:61616.
Any ideas as to what I am doing wrong here and what I should be doing?
EDIT 1:
One of the issues that I am seeing is even before the test class is called, the MyRouteBuilder() inside createRouteBuilder() is invoked, leading to the issues that I see in the log.
The "activemq:queue:.." is telling Camel to use the auto-configure magic behind the scenes (which uses default url) and your use case is beyond that.
You need to configure a connection factory (ActiveMQConnectionFactory) and configure a camel-jms component to use that connection factory.
The connection factory allows you to specify url, userName, password, default connection settings and setup SSL.
A best practice is to externalize the url, userName, password and queue to a properties file so you can change those across the environments-- local, DIT, SIT and prod, etc.
NOTE: Use org.apache.camel/camel-jms component, and not the org.apache.activemq/activemq-camel component. activemq-camel is deprecated and being removed in ActiveMQ 5.17.x.
Instead of setting up an explicit active mq broker , I started using a VM broker .
#Override
protected RoutesBuilder createRouteBuilder() throws Exception {
return new RouteBuilder() {
#Override
public void configure() {
ConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost?broker.persistent=false");
ActiveMQComponent activeMQComponent = new ActiveMQComponent();
activeMQComponent.setConnectionFactory(connectionFactory);
context.addComponent("activemq", activeMQComponent);
from("activemq:queue:myQueue").to("mock:collector");
}
};
}
Also , I mistook camel junit as a traditional junit . We don't need to call explicitly the actual route builder class . Instead after setting up my activeMq component up above , I was able to write my test methods, mock my end points for queue and send messages and assert them . Camel is truly versatile . Requires a lot of study though .
Please consider the following diagram
MyProcess.bpmn
<?xml version="1.0" encoding="UTF-8"?>
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:activiti="http://activiti.org/bpmn" xmlns:bpmndi="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:omgdc="http://www.omg.org/spec/DD/20100524/DC" xmlns:omgdi="http://www.omg.org/spec/DD/20100524/DI" typeLanguage="http://www.w3.org/2001/XMLSchema" expressionLanguage="http://www.w3.org/1999/XPath" targetNamespace="http://www.activiti.org/test">
<process id="myProcess" name="My process" isExecutable="true">
<startEvent id="startevent1" name="Start"></startEvent>
<userTask id="evl" name="Evaluation"></userTask>
<boundaryEvent id="timer_event_autocomplete" name="Timer" attachedToRef="evl" cancelActivity="false">
<timerEventDefinition>
<timeDate>PT2S</timeDate>
</timerEventDefinition>
</boundaryEvent>
<serviceTask id="timer_service" name="Timed Autocomplete" activiti:async="true" activiti:class="com.example.service.TimerService"></serviceTask>
<serviceTask id="store_docs_service" name="Store Documents" activiti:async="true" activiti:class="com.example.service.StoreDocsService"></serviceTask>
<sequenceFlow id="flow1" sourceRef="startevent1" targetRef="evl"></sequenceFlow>
<sequenceFlow id="flow2" sourceRef="timer_event_autocomplete" targetRef="timer_service"></sequenceFlow>
<sequenceFlow id="flow3" sourceRef="evl" targetRef="store_docs_service"></sequenceFlow>
<sequenceFlow id="flow4" sourceRef="store_docs_service" targetRef="endevent1"></sequenceFlow>
<endEvent id="endevent1" name="End"></endEvent>
</process>
</definitions>
To describe it in words, there is one user task (Evaluation) and a timer attached to it (configured to trigger in 2 seconds). Upon triggering the timer, the Timed Autocomplete async service task in its Java Delegate, TimerService, tries to complete the user task (Evaluation). Completing the user task (Evaluation) the flow moves to the other async service task (Store Documents), it calls its Java Delegate, StoreDocsService, and the flow ends.
TimerService.java
public class TimerService implements JavaDelegate {
Logger LOGGER = LoggerFactory.getLogger(TimerService.class);
#Override
public void execute(DelegateExecution execution) throws Exception {
LOGGER.info("*** Executing Timer autocomplete ***");
Task task = execution.getEngineServices().getTaskService().createTaskQuery().active().singleResult();
execution.getEngineServices().getTaskService().complete(task.getId());
LOGGER.info("*** Task: {} autocompleted by timer ***", task.getId());
}
}
StoreDocsService.java
public class StoreDocsService implements JavaDelegate {
Logger LOGGER = LoggerFactory.getLogger(StoreDocsService.class);
#Override
public void execute(DelegateExecution execution) throws Exception {
LOGGER.info("*** Executing Store Documents ***");
}
}
App.java
public class App
{
public static void main( String[] args ) throws Exception
{
// DefaultAsyncJobExecutor demoAsyncJobExecutor = new DefaultAsyncJobExecutor();
// demoAsyncJobExecutor.setCorePoolSize(10);
// demoAsyncJobExecutor.setMaxPoolSize(50);
// demoAsyncJobExecutor.setKeepAliveTime(10000);
// demoAsyncJobExecutor.setMaxAsyncJobsDuePerAcquisition(50);
ProcessEngineConfiguration cfg = new StandaloneProcessEngineConfiguration()
.setJdbcUrl("jdbc:h2:mem:activiti;DB_CLOSE_DELAY=1000")
.setJdbcUsername("sa")
.setJdbcPassword("")
.setJdbcDriver("org.h2.Driver")
.setDatabaseSchemaUpdate(ProcessEngineConfiguration.DB_SCHEMA_UPDATE_TRUE)
// .setAsyncExecutorActivate(true)
// .setAsyncExecutorEnabled(true)
// .setAsyncExecutor(demoAsyncJobExecutor)
.setJobExecutorActivate(true)
;
ProcessEngine processEngine = cfg.buildProcessEngine();
String pName = processEngine.getName();
String ver = ProcessEngine.VERSION;
System.out.println("ProcessEngine [" + pName + "] Version: [" + ver + "]");
RepositoryService repositoryService = processEngine.getRepositoryService();
Deployment deployment = repositoryService.createDeployment()
.addClasspathResource("MyProcess.bpmn").deploy();
ProcessDefinition processDefinition = repositoryService.createProcessDefinitionQuery()
.deploymentId(deployment.getId()).singleResult();
System.out.println(
"Found process definition ["
+ processDefinition.getName() + "] with id ["
+ processDefinition.getId() + "]");
final Map<String, Object> variables = new HashMap<String, Object>();
final RuntimeService runtimeService = processEngine.getRuntimeService();
ProcessInstance id = runtimeService.startProcessInstanceByKey("myProcess", variables);
System.out.println("Started Process Id: "+id.getId());
try {
final TaskService taskService = processEngine.getTaskService();
// List<Task> tasks = taskService.createTaskQuery().active().list();
// while (!tasks.isEmpty()) {
// Task task = tasks.get(0);
// taskService.complete(task.getId());
// tasks = taskService.createTaskQuery().active().list();
// }
} catch (Exception e) {
System.out.println(e.getMessage());
} finally {
}
while(!runtimeService.createExecutionQuery().list().isEmpty()) {
}
processEngine.close();
}
}
Activiti 5.15
When the timer triggers, the above diagram executes as described. We use Activiti's DefaultJobExecutor
As we can see in the logs:
[main] INFO org.activiti.engine.impl.ProcessEngineImpl - ProcessEngine default created
[main] INFO org.activiti.engine.impl.jobexecutor.JobExecutor - Starting up the JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor].
[Thread-1] INFO org.activiti.engine.impl.jobexecutor.AcquireJobsRunnable - JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor] starting to acquire jobs
ProcessEngine [default] Version: [5.15]
[main] INFO org.activiti.engine.impl.bpmn.deployer.BpmnDeployer - Processing resource MyProcess.bpmn
Found process definition [My process] with id [myProcess:1:3]
Started Process Id: 4
[pool-1-thread-1] INFO com.example.service.TimerService - *** Executing Timer autocomplete ***
[pool-1-thread-1] INFO com.example.service.TimerService - *** Task: 9 autocompleted by timer ***
[pool-1-thread-1] INFO com.example.service.StoreDocsService - *** Executing Store Documents ***
[main] INFO org.activiti.engine.impl.jobexecutor.JobExecutor - Shutting down the JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor].
[Thread-1] INFO org.activiti.engine.impl.jobexecutor.AcquireJobsRunnable - JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor] stopped job acquisition
Activiti >= 5.17
Changing only the activiti's version in pom.xml to 5.17.0 and up (tested till 5.22.0) and executing the same code, the flow executes the timer's Java Delegate, TimerService, which completes the user task (Evaluation) but Store Documents Java Delegate, StoreDocsService is never called. To add more, it seems that the flow never ends and the execution remains stuck at Store Documents async service task.
Logs:
[main] INFO org.activiti.engine.impl.ProcessEngineImpl - ProcessEngine default created
[main] INFO org.activiti.engine.impl.jobexecutor.JobExecutor - Starting up the JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor].
[Thread-1] INFO org.activiti.engine.impl.jobexecutor.AcquireJobsRunnableImpl - JobExecutor[org.activiti.engine.impl.jobexecutor.DefaultJobExecutor] starting to acquire jobs
ProcessEngine [default] Version: [5.17.0.2]
[main] INFO org.activiti.engine.impl.bpmn.deployer.BpmnDeployer - Processing resource MyProcess.bpmn
Found process definition [My process] with id [myProcess:1:3]
Started Process Id: 4
[pool-1-thread-2] INFO com.example.service.TimerService - *** Executing Timer autocomplete ***
[pool-1-thread-2] INFO com.example.service.TimerService - *** Task: 9 autocompleted by timer ***
Changing to Async Job Executor. One feature of 5.17 release was the new async job executor (however the default non-async executor remains configured as default). So trying to enable the async executor in App.java by the following lines:
DefaultAsyncJobExecutor demoAsyncJobExecutor = new DefaultAsyncJobExecutor();
demoAsyncJobExecutor.setCorePoolSize(10);
demoAsyncJobExecutor.setMaxPoolSize(50);
demoAsyncJobExecutor.setKeepAliveTime(10000);
demoAsyncJobExecutor.setMaxAsyncJobsDuePerAcquisition(50);
ProcessEngineConfiguration cfg = new StandaloneProcessEngineConfiguration()
.setJdbcUrl("jdbc:h2:mem:activiti;DB_CLOSE_DELAY=1000")
.setJdbcUsername("sa")
.setJdbcPassword("")
.setJdbcDriver("org.h2.Driver")
.setDatabaseSchemaUpdate(ProcessEngineConfiguration.DB_SCHEMA_UPDATE_TRUE)
.setAsyncExecutorActivate(true)
.setAsyncExecutorEnabled(true)
.setAsyncExecutor(demoAsyncJobExecutor)
;
The flow seems to execute correctly, StoreDocsService is called after TimerService, but it never ends (the while(!runtimeService.createExecutionQuery().list().isEmpty()) statement in App.java is always true)!
Logs:
[main] INFO org.activiti.engine.impl.ProcessEngineImpl - ProcessEngine default created
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Starting up the default async job executor [org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor].
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Creating thread pool queue of size 100
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Creating executor service with corePoolSize 10, maxPoolSize 50 and keepAliveTime 10000
[Thread-1] INFO org.activiti.engine.impl.asyncexecutor.AcquireTimerJobsRunnable - {} starting to acquire async jobs due
[Thread-2] INFO org.activiti.engine.impl.asyncexecutor.AcquireAsyncJobsDueRunnable - {} starting to acquire async jobs due
ProcessEngine [default] Version: [5.17.0.2]
[main] INFO org.activiti.engine.impl.bpmn.deployer.BpmnDeployer - Processing resource MyProcess.bpmn
Found process definition [My process] with id [myProcess:1:3]
Started Process Id: 4
[pool-1-thread-2] INFO com.example.service.TimerService - *** Executing Timer autocomplete ***
[pool-1-thread-2] INFO com.example.service.TimerService - *** Task: 9 autocompleted by timer ***
[pool-1-thread-3] INFO com.example.service.StoreDocsService - *** Executing Store Documents ***
!!!! UPDATE !!!
Activiti 6.0.0
Tried the same scenario but with Activiti version 6.0.0.
Changes needed in TimerService, cannot get the EngineServices from DelegateExecution:
public class TimerService implements JavaDelegate {
Logger LOGGER = LoggerFactory.getLogger(TimerService.class);
#Override
public void execute(DelegateExecution execution) {
LOGGER.info("*** Executing Timer autocomplete ***");
Task task = Context.getProcessEngineConfiguration().getTaskService().createTaskQuery().active().singleResult();
Context.getProcessEngineConfiguration().getTaskService().complete(task.getId());
// Task task = execution.getEngineServices().getTaskService().createTaskQuery().active().singleResult();
// execution.getEngineServices().getTaskService().complete(task.getId());
LOGGER.info("*** Task: {} autocompleted by timer ***", task.getId());
}
}
and this version has only the async executor so the ProcessEngineConfiguration in App.java changes to:
ProcessEngineConfiguration cfg = new StandaloneProcessEngineConfiguration()
.setJdbcUrl("jdbc:h2:mem:activiti;DB_CLOSE_DELAY=1000")
.setJdbcUsername("sa")
.setJdbcPassword("")
.setJdbcDriver("org.h2.Driver")
.setDatabaseSchemaUpdate(ProcessEngineConfiguration.DB_SCHEMA_UPDATE_TRUE)
.setAsyncExecutorActivate(true)
// .setAsyncExecutorEnabled(true)
// .setAsyncExecutor(demoAsyncJobExecutor)
// .setJobExecutorActivate(true)
;
With 6.0.0 version and async executor the process completes successfully as we can see in the logs:
[main] INFO org.activiti.engine.impl.ProcessEngineImpl - ProcessEngine default created
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Starting up the default async job executor [org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor].
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Creating thread pool queue of size 100
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Creating executor service with corePoolSize 2, maxPoolSize 10 and keepAliveTime 5000
[Thread-1] INFO org.activiti.engine.impl.asyncexecutor.AcquireAsyncJobsDueRunnable - {} starting to acquire async jobs due
[Thread-2] INFO org.activiti.engine.impl.asyncexecutor.AcquireTimerJobsRunnable - {} starting to acquire async jobs due
[Thread-3] INFO org.activiti.engine.impl.asyncexecutor.ResetExpiredJobsRunnable - {} starting to reset expired jobs
ProcessEngine [default] Version: [6.0.0.4]
Found process definition [My process] with id [myProcess:1:3]
Started Process Id: 4
[activiti-async-job-executor-thread-2] INFO com.example.service.TimerService - *** Executing Timer autocomplete ***
[activiti-async-job-executor-thread-2] INFO com.example.service.TimerService - *** Task: 10 autocompleted by timer ***
[activiti-async-job-executor-thread-2] INFO com.example.service.StoreDocsService - *** Executing Store Documents ***
[main] INFO org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor - Shutting down the default async job executor [org.activiti.engine.impl.asyncexecutor.DefaultAsyncJobExecutor].
[activiti-reset-expired-jobs] INFO org.activiti.engine.impl.asyncexecutor.ResetExpiredJobsRunnable - {} stopped resetting expired jobs
[activiti-acquire-timer-jobs] INFO org.activiti.engine.impl.asyncexecutor.AcquireTimerJobsRunnable - {} stopped async job due acquisition
[activiti-acquire-async-jobs] INFO org.activiti.engine.impl.asyncexecutor.AcquireAsyncJobsDueRunnable - {} stopped async job due acquisition
Process finished with exit code 0
2 Questions:
We have upgraded from Activiti 5.15 to 5.22.0 and we do not use the async job executor. Is there any way to keep the functionality of this piece of diagram to behave as it was behaving in 5.15?
If switching to the async job executor is inevitable, then what are we missing in order to make this process complete successfully?
A sample project of the above can be found at: https://github.com/pleft/DemoActiviti
Without answering your question explicitly which would require setting up your environment and debugging, I would recommend you at the very least move to Activiti 6.
The 5.x branch of Activiti hasn't been maintained for over 5 years and is effectively dead.
Even the 6.x line has pretty much been abandoned as the core developers have all moved to the "Flowable" project.
If you choose to stay with Activiti 5.x, your options are:
Maintain the codebase yourself (and hopefully contribute any changes/enhancements back to the project).
Contract Activiti support services. There are a couple of vendors offering such services.
I am trying tranquility with Druid 0.11 and Kafka. When tranquility receive new data it throw the following exception:
2018-01-12 18:27:34,010 [Curator-ServiceCache-0] INFO c.m.c.s.net.finagle.DiscoResolver - Updating instances for service[firehose:druid:overlord:flow-018-0000-0000] to Set(ServiceInstance{name='firehose:druid:overlord:flow-018-0000-0000', id='ea85b248-0c53-4ec1-94a6-517525f72e31', address='druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local', port=8100, sslPort=-1, payload=null, registrationTimeUTC=1515781653895, serviceType=DYNAMIC, uriSpec=null})
Jan 12, 2018 6:27:37 PM com.twitter.finagle.netty3.channel.ChannelStatsHandler exceptionCaught
WARNING: ChannelStatsHandler caught an exception
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
at org.jboss.netty.channel.SimpleChannelHandler.connectRequested(SimpleChannelHandler.java:306)
The worker was created by middle Manager:
2018-01-12T18:27:25,704 INFO [WorkerTaskMonitor] io.druid.indexing.worker.WorkerTaskMonitor - Submitting runnable for task[index_realtime_flow_2018-01-12T18:00:00.000Z_0_0]
2018-01-12T18:27:25,719 INFO [WorkerTaskMonitor] io.druid.indexing.worker.WorkerTaskMonitor - Affirmative. Running task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0]
And tranquility talk with overlord fine... I think by the following logs:
2018-01-12T18:27:25,268 INFO [qtp271944754-62] io.druid.indexing.overlord.TaskLockbox - Adding task[index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] to activeTasks
2018-01-12T18:27:25,272 INFO [TaskQueue-Manager] io.druid.indexing.overlord.TaskQueue - Asking taskRunner to run: index_realtime_flow_2018-01-12T18:00:00.000Z_0_0
2018-01-12T18:27:25,272 INFO [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Added pending task index_realtime_flow_2018-01-12T18:00:00.000Z_0_0
2018-01-12T18:27:25,279 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - No worker selection strategy set. Using default of [EqualDistributionWorkerSelectStrategy]
2018-01-12T18:27:25,294 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - Coordinator asking Worker[druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local:8091] to add task[index_realtime_flow_2018-01-12T18:00:00.000Z_0_0]
2018-01-12T18:27:25,334 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - Task index_realtime_flow_2018-01-12T18:00:00.000Z_0_0 switched from pending to running (on [druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local:8091])
2018-01-12T18:27:25,336 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] status changed to [RUNNING].
2018-01-12T18:27:25,747 INFO [Curator-PathChildrenCache-1] io.druid.indexing.overlord.RemoteTaskRunner - Worker[druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local:8091] wrote RUNNING status for task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] on [TaskLocation{host='null', port=-1, tlsPort=-1}]
2018-01-12T18:27:25,829 INFO [Curator-PathChildrenCache-1] io.druid.indexing.overlord.RemoteTaskRunner - Worker[druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local:8091] wrote RUNNING status for task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] on [TaskLocation{host='druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local', port=8100, tlsPort=-1}]
2018-01-12T18:27:25,829 INFO [Curator-PathChildrenCache-1] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_flow_2018-01-12T18:00:00.000Z_0_0] location changed to [TaskLocation{host='druid-md-deployment-7877777bf7-tmmvh.druid-md-hs.default.svc.cluster.local', port=8100, tlsPort=-1}].
What's wrong? I tried a thousand things and nothing solves it ...
Thanks a lot
UnresolvedAddressException being hit by Druid broker
You have to have all the druid cluster information set in you servers running tranquility.
It's because you only get DNS of you druid cluster from zookeeper, not the IP.
For example, on linux server, save you cluster information in /etc/hosts.
I have a few tables in Hive in HDFS, how do I read them into dataframe from spark? How does HiveContext know where my hive warehouse is?
My current code, it throws out of memory error for some reason, these tables are tiny, 30k rows max, 3-5 columns.
SparkConf sparkConf = new SparkConf().setAppName("Hive Test").setMaster("local[*]");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(ctx);
// Queries are expressed in HiveQL.
DataFrame results = sqlContext.sql("SELECT * FROM departments");
results.show();
$ hdfs dfs -ls /user/hive/warehouse
Found 6 items
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:09 /user/hive/warehouse/categories
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:13 /user/hive/warehouse/customers
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:09 /user/hive/warehouse/departments
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:11 /user/hive/warehouse/order_items
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:16 /user/hive/warehouse/orders
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:09 /user/hive/warehouse/products
16/10/30 14:38:29 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
16/10/30 14:38:29 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/10/30 14:38:30 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/10/30 14:38:30 INFO ObjectStore: ObjectStore, initialize called
16/10/30 14:38:30 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/10/30 14:38:30 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/10/30 14:38:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/10/30 14:38:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/10/30 14:38:31 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
Exception in thread "main"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
Using scala and spark, i try to build simple apps. To avoid too many logs, i have setLevel Log to Level.ERROR, but it seems all log still appear. Here're my codes :
import org.apache.log4j.{BasicConfigurator, Level, Logger}
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by hduser on 16/08/16.
*/
object ALS_Test {
def main(command : Array[String]): Unit =
{
Logger.getRootLogger
Logger.getLogger(this.getClass).setLevel(Level.ERROR)
Logger.getLogger("org.spark_project").setLevel(Level.ERROR)
BasicConfigurator.configure()
val sparkConf = new SparkConf().setAppName("AppName").setMaster("local[4]")
val sc = new SparkContext(sparkConf)
println("test 123")
}
}
The Output when running still have many confused logs :
0 [main] INFO org.apache.spark.SparkContext - Running Spark version 2.0.0
163 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation #org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time)
175 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation #org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time)
176 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation #org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, value=[GetGroups], valueName=Time)
177 [main] DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics
555 [main] DEBUG org.apache.hadoop.util.Shell - Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.