Apache Beam Dataflow job fails with "GetWork timed out, retrying"

Apache Beam Dataflow job fails with "GetWork timed out, retrying" - java

I am able to run an Acache Beam job successfully using the DirectRunner, with the following arguments:
java -jar my-jar.jar --commonConfigFile=comJobConfig.yml
--configFile=relJobConfig.yml
--jobName=my-job
--stagingLocation=gs://my-bucket/staging/
--gcpTempLocation=gs://my-bucket/tmp/
--tempLocation=gs://my-bucket/tmp/
--runner=DirectRunner
--bucket=my-bucket
--project=my-project
--region=us-west1
--subnetwork=my-subnetwork
--serviceAccount=my-svc-account#my-project.iam.gserviceaccount.com
--usePublicIps=false
--workerMachineType=e2-standard-2
--maxNumWorkers=20 --numWorkers=2
--autoscalingAlgorithm=THROUGHPUT_BASED
However, while trying to run on Google Dataflow (simply changing --runner=DataflowRunner) I receive the following message (GetWork timed out, retrying) in the workers.
I have checked the logs generated by the Dataflow process and found
[2023-01-28 20:49:41,600] [main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler:91 2023-01-28T20:49:39.386Z: Autoscaling: Raised the number of workers to 2 so that the pipeline can catch up with its backlog and keep up with its input rate.
[2023-01-28 20:50:26,911] [main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler:91 2023-01-28T20:50:26.779Z: Workers have started successfully.
and I see no indication that the workers have failed. Moreover I do not see any relevant logs which indicate that the process is working (in my case, reading from the appropriate Pub/Sub topic for notifications). Let me know if there is any further documentation on this log, as I have not been able to find any.

Turns out I forgot to include the --enableStreamingEngine flag. This solved my problem.

Related

Send transactions failed due to timeout when creating Channel on Hyperledger Fabric

I'm executing this code https://github.com/IBM/blockchain-application-using-fabric-java-sdk. When I execute CreateChannel I get this error :
Send transactions failed. Reason: timeout
I checked the log of the orderer.example.com docker container and it seems to be no communication. How could I solve this problem?

Channel create command times out when the orderer takes long enough (>5s), to respond to the transaction. You can add --timeout duration to increase the default value. I faced similar issue while creating a channel through command line - https://hyperledger-fabric.readthedocs.io/en/release-1.3/commands/peerchannel.html#peer-channel-create
You can check if java SDK provides an equivalent configuration in the channel apis for peers.

Apache Storm - LocalCluster stopped logging but java process still running

We are running a LocalCluster of Apache Storm as a java process i.e via nohup.
We are running a simple Topology with following configuration.
Config config = new Config();
config.setMessageTimeoutSecs(120);
config.setNumWorkers(1);
config.setDebug(false);
config.setMaxSpoutPending(1);
We are submitting the Topology to LocalCluster. Our shutdown hook is the default one found across sources.
Runtime.getRuntime().addShutdownHook(new Thread() {
#Override
public void run() {
cluster.killTopology(TOPOLOGY_NAME);
cluster.shutdown();
}
});
Lately we were facing Java Heap issues which might have been solved by increasing Xms, Xmx and using MarkSweepGC.
However, we are running into new problem. The spout logs are not being written to after sometime. There will be no trace of any storm relate Exception/Error.
The main problem is the java process i.e. via nohup is still showing up in ps -ef. What issue would be happening?

You can try enabling debug logging with config.setDebug(true);, which might let you tell what is happening.
Also next time your topology hangs, you should be able to tell what it's doing by either using jstack or sending the Java process a SIGQUIT (kill -3). This will cause the process to dump stack traces for each thread in the JVM, which should let you figure out why it's hanging.
As an aside in case you're doing it, please don't use LocalCluster in production. It's intended for testing.

Kafka Error reading field 'correlation_id': java.nio.BufferUnderflowException

Getting this kafka exception on consumer :
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'correlation_id': java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71)
at org.apache.kafka.common.requests.ResponseHeader.parse(ResponseHeader.java:53)
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:435)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorKnown(AbstractCoordinator.java:184)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:886)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
No client-server version mismatch.

Be sure your client connect to a real Kafka port !
this specific error happens while parsing (one of?) the first header field of the expected kafka message, as shown by the invocation of ResponseHeader.java in stack-trace.
So this can occurs if you target a listening port that has nothing to do with kafka server.
just a 1 minute check !
Otherwise, you should check for a client-server version mismatch.

For me, I was having trouble with unit test failure with above exception. When I inspected the port(9092) being used on local machine, it was bound to already running process, worth checking if there is process for Kafka running locally. If you are sure you are not expecting it to be running, kill it by finding its pid.
(Don't try on production though :P )
lsof -i:9092
kill -9 <PID_FROM_ABOVE_IF_ANY>

Kafka Connect implementation errors

I was running through the tutorial here: http://kafka.apache.org/documentation.html#introduction
When I get to "Step 7: Use Kafka Connect to import/export data" and attempt to start two connectors I am getting the following errors:
ERROR Failed to flush WorkerSourceTask{id=local-file-source-0}, timed out while waiting for producer to flush outstanding messages, 1 left
ERROR Failed to commit offsets for WorkerSourceTask
Here is the portion of the tutorial:
Next, we'll start two connectors running in standalone mode, which means they run in a single, local, dedicated process. We provide three configuration files as parameters. The first is always the configuration for the Kafka Connect process, containing common configuration such as the Kafka brokers to connect to and the serialization format for data. The remaining configuration files each specify a connector to create. These files include a unique connector name, the connector class to instantiate, and any other configuration required by the connector.
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties
I have spent some time looking for a solution, but was unable to find anything useful. Any help is appreciated.
Thanks!

The reason I was getting this error was because the first server I created using the config/server.properties was not running. I am assuming that because it is the lead of the topic, the messages could not be flushed and the offsets could not be committed.
Once I started the kafka server using the server propertes (config/server.properties) This issue was resolved.

You need to start Kafka server and Zookeeper before running Kafka Connect.
You need to exec the cmds in "Step 2: Start the server" below first:
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
from here:https://mail-archives.apache.org/mod_mbox/kafka-users/201601.mbox/%3CCAK0BMEpgWmL93wgm2jVCKbUT5rAZiawzOroTFc_A6Q=GaXQgfQ#mail.gmail.com%3E

You need to start zookeeper and kafka server first before running that line.
start zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
start multiple kafka servers
bin/kafka-server-start.sh config/server.properties
bin/kafka-server-start.sh config/server-1.properties
bin/kafka-server-start.sh config/server-2.properties
start connectors
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties
Then you will see some lines are written into test.sink.txt:
foo
bar
And you can start the consumer to check it:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}

If you configure your Kafka Broker with a hostname such as my.sandbox.com make sure that you modify the config/connect-standalone.properties accordingly:
bootstrap.servers=my.sandbox.com:9092
On Hortonworks HDP the default port is 6667, hence the setting is
bootstrap.servers=my.sandbox.com:6667
If Kerberos is enabled you will need the following settings as well (without SSL):
security.protocol=PLAINTEXTSASL
producer.security.protocol=PLAINTEXTSASL
producer.sasl.kerberos.service.name=kafka
consumer.security.protocol=PLAINTEXTSASL
consumer.sasl.kerberos.service.name=kafka

404 Instance Unavailable when running a Task Queue

I'm using Java SDK 1.7.5, HRD datastore with the following task queue setup:
<queue>
<name>surveyAssembly</name>
<rate>5/s</rate>
<bucket-size>20</bucket-size>
<max-concurrent-requests>10</max-concurrent-requests>
</queue>
I'm getting a HTTP 404 when triggering the task. No errors in the logs just failing silently.
It seems a similar issue to this one Tasks queue up, nothing happens on retry (no log) but no luck after purging the queue.
Any ideas on how to diagnose the cause?

I was also getting same error. After debug I found that I forget to deploy back-end version from eclipse. So You have to confirm that both backend and frontend has same updated code.
Try this code
//backends.xml
<backends>
<backend name="mailback">
</backend>
</backends>
// Queue code
Queue surveyAssemblyQueue = QueueFactory.getQueue("surveyAssembly");
surveyAssemblyQueue.add(withUrl("/taskloop")param("type", type).header("Host",BackendServiceFactory.getBackendService().getBackendAddress("mailback", 0)));
Note: Instant Id should be "0" because I have created only 1 backend instant.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache Beam Dataflow job fails with "GetWork timed out, retrying" - java

Turns out I forgot to include the --enableStreamingEngine flag. This solved my problem.

Related

Send transactions failed due to timeout when creating Channel on Hyperledger Fabric

Apache Storm - LocalCluster stopped logging but java process still running

Kafka Error reading field 'correlation_id': java.nio.BufferUnderflowException

Kafka Connect implementation errors

404 Instance Unavailable when running a Task Queue

Categories

Resources