At times, I get the following error while deploying the code through Netbeans.
Not sure why this is happening. Tried increasing time to 15000 in standalone.xml. Still facing this error.
Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'deploy' at address '[("deployment" => "server-1.0-SNAPSHOT.war")]'
Deploy of deployment "server-1.0-SNAPSHOT.war" was rolled back with the following failure message: "JBAS013487: Operation timed out awaiting service container stability"
ERROR [org.jboss.as.controller.management-operation] (DeploymentScanner-threads - 1) JBAS013413: Timeout after [5000] seconds waiting for service container stability while finalizing an operation.
Process must be restarted. Step that first updated the service container was 'deploy' at address '[("deployment" => "server-1.0-SNAPSHOT.war")]'
Related
I'm using the Apache Flink Kubernetes operator to deploy a standalone job on an Application cluster setup.
I have setup the following files using the Flink official documentation - Link
jobmanager-application-non-ha.yaml
taskmanager-job-deployment.yaml
flink-configuration-configmap.yaml
jobmanager-service.yaml
I have not changed any of the configurations in these files and am trying to run a simple WordCount example from the Flink examples using the Apache Flink Operator.
After running the kubectl commands to setting up the job manager and the task manager - the job manager goes into a NotReady state while the task manager goes into a CrashLoopBackOff loop.
NAME READY STATUS RESTARTS AGE
flink-jobmanager-28k4b 1/2 NotReady 2 (4m24s ago) 16m
flink-kubernetes-operator-6585dddd97-9hjp4 2/2 Running 0 10d
flink-taskmanager-6bb88468d7-ggx8t 1/2 CrashLoopBackOff 9 (2m21s ago) 15m
The job manager logs look like this
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout
at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86) ~[flink-dist-1.16.0.jar:1.16.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:453) ~[flink-rpc-akka_be40712e-8b2e-47cd-baaf-f0149cf2604d.jar:1.16.0]
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-rpc-akka_be40712e-8b2e-47cd-baaf-f0149cf2604d.jar:1.16.0]
The Task manager it seems cannot connect to the job manager
2023-01-28 19:21:47,647 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Connecting to ResourceManager akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000).
2023-01-28 19:21:57,766 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*.
2023-01-28 19:22:08,036 INFO akka.remote.transport.ProtocolStateActor [] - No response from remote for outbound association. Associate timed out after [20000 ms].
2023-01-28 19:22:08,057 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink#flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#flink-jobmanager:6123]] Caused by: [No response from remote for outbound association. Associate timed out after [20000 ms].]
2023-01-28 19:22:08,069 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*.
2023-01-28 19:22:08,308 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with org.jboss.netty.channel.ConnectTimeoutException: connection timed out: flink-jobmanager/100.127.18.9:6123
The flink-configuration-configmap.yaml looks like this
flink-conf.yaml: |+
jobmanager.rpc.address: flink-jobmanager
taskmanager.numberOfTaskSlots: 2
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
queryable-state.proxy.ports: 6125
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
parallelism.default: 2
This is what the pom.xml looks like - Link
You deployed the Kubernetes Operator in the namespace, but you did not create the CRDs the Operator requires. Instead you tried to create a standalone Flink Kubernetes cluster.
The Flink Operator makes it a lot easier to deploy your Flink jobs, you only need to deploy the operator itself and FlinkDeployment/FlinkSessionJob CRDs. The operator will manage your deployment after.
Please use this documentation for the Kubernetes Operator: Link
We have a 3 node Cassandra Cluster running the following version
[cqlsh 5.0.1 | Cassandra 3.11.6 | CQL spec 3.4.4 | Native protocol v4]
Node1 stopped communicating with the rest of the cluster this morning, the logs showed this:
ERROR [CompactionExecutor:242] 2020-09-15 19:24:48,753 CassandraDaemon.java:235 - Exception in thread Thread[CompactionExecutor:242,1,main]
ERROR [MutationStage-2] 2020-09-15 19:24:54,749 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[MutationStage-2,5,main]
ERROR [MutationStage-2] 2020-09-15 19:24:54,771 StorageService.java:466 - Stopping gossiper
ERROR [MutationStage-2] 2020-09-15 19:24:56,791 StorageService.java:476 - Stopping native transport
ERROR [CompactionExecutor:242] 2020-09-15 19:24:58,541 LogTransaction.java:277 - Transaction log [md_txn_compaction_c2dbca00-f780-11ea-95eb-cf88b1cae05a.log in /mnt/cass-a/data/system/local-7ad54392bcdd35a684174e047860b377] indicates txn was not completed, trying to abort it now
ERROR [CompactionExecutor:242] 2020-09-15 19:24:58,545 LogTransaction.java:280 - Failed to abort transaction log [md_txn_compaction_c2dbca00-f780-11ea-95eb-cf88b1cae05a.log in /mnt/cass-a/data/system/local-7ad54392bcdd35a684174e047860b377]
ERROR [CompactionExecutor:242] 2020-09-15 19:24:58,566 LogTransaction.java:225 - Unable to delete /mnt/cass-a/data/system/local-7ad54392bcdd35a684174e047860b377/md_txn_compaction_c2dbca00-f780-11ea-95eb-cf88b1cae05a.log as it does not exist, see debug log file for stack trace
Cassandra starts up fine on the "broken node", but refuses to rejoin the cluster.
When I do a nodetool status I get this:
**Error: The node does not have system_traces yet, probably still bootstrapping**
Gossip is not running, i've tried disabling and re-enabling, no joy.
I've also tried both a repair and a rebuild, both came back with no errors at all.
Any and all help would be appreciated.
Thanks.
The symptoms you described indicates to me that the node had some form of hardware failure and the data/ disk is possibly inaccessible.
In instances like this, the disk failure policy in cassandra.yaml kicked in:
disk_failure_policy: stop
This would explain why gossip is unavailable (on default port 7000) and the node would not be accepting any client connections either (on default CQL port 9042).
If there is an impending hardware failure, there's a good chance the disk/volume is mounted as read-only. There's also the possibility that the disk is full. Check the operating system logs for clues and you will likely need to escalate the issue to your sysadmin team. Cheers!
I am trying to setup mysql to my keycloak instance and couldn't get that working due to the following exception:
Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[
("core-service" => "management"),
("management-interface" => "http-interface")
]'
14:34:42,546 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0190: Step handler org.jboss.as.server.DeployerChainAddHandler$FinalRuntimeStepHandler#65d34517 for operation add-deployer-chains at address [] failed handling operation rollback -- java.util.concurrent.TimeoutException: java.util.concurrent.TimeoutException
at org.jboss.as.controller.OperationContextImpl.waitForRemovals(OperationContextImpl.java:522)
at org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1518)
at org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:1472)
at org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:1445)
at org.jboss.as.controller.AbstractOperationContext$Step.access$400(AbstractOperationContext.java:1319)
at org.jboss.as.controller.AbstractOperationContext.executeResultHandlerPhase(AbstractOperationContext.java:876)
at org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:726)
at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:467)
at org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1412)
at org.jboss.as.controller.ModelControllerImpl.boot(ModelControllerImpl.java:521)
at org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:472)
at org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:434)
at org.jboss.as.server.ServerService.boot(ServerService.java:435)
at org.jboss.as.server.ServerService.boot(ServerService.java:394)
at org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:374)
at java.lang.Thread.run(Thread.java:748)
14:34:42,549 ERROR [org.jboss.as.controller.client] (Controller Boot Thread) WFLYCTL0190: Step handler org.jboss.as.server.DeployerChainAddHandler$FinalRuntimeStepHandler#65d34517 for operation add-deployer-chains at address [] failed handling operation rollback -- java.util.concurrent.TimeoutException.
https://developer.jboss.org/thread/272010 in this question they also had a similar exception and got it solved after changing timeout from 300 to 600.
but where can i change this setting.
BTW i changed the transaction timeout as below but still facing the issue.
<coordinator-environment default-timeout="7200" />
As most of the people suggested, adding this parameter should do the fix for you.
-Djboss.as.management.blocking.timeout=3600
I use ubuntu 20.04 LTS on both my desktop and laptop. I noticed that mysql running on the docker container on the laptop while everything was fine on the desktop is running extremely slow, see MySQL bug#46959. If you're having a problem like mine, this solution might work.
I am deploying a WebApplication with several separate war and ear-files to a wildfly 10.1 server.
What I do is this: I clean the deployments, data/content and tmp folder, then copy all necessary war and ear files into the deployments-folder.
Then, I start the Server either in Debug-Mode via Spring Tool Suite 3.8.1 (basically Eclipse Neon) or directly via standalone.sh.
The server starts up normally, gets deploys all projects and publishes this message:
[org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 10.1.0.Final (WildFly Core 2.2.0.Final) started in 326001ms - Started 5819 of 6193 services (642 services are lazy, passive or on-demand)
The next message that follows, roughly half a second later, is the first Unregistered web context-message, followed by a general deployment stop.
There is no message in the log connected to this, neither an ERROR, nor a WARNING, SEVERE or FATAL in sight.
After stopping all Deployments, the server still runs, but no context is reachable.
When deploying the applications one-by-one, the server accepts, deploys and keeps running, which leads be to believe that there is something wrong with the server itself.
the memory-relevant VM-Arguments are these: -Xms1024m -Xmx4096m.
The server does not run on a timeout, as that has been thorougly tested and produces error messages.
Okay, long story short: comment was solution.
The parameters deployment-timeout in standalone.xml and jboss.as.management.blocking.timeout as command line parameter do work together on the same thing, considering blocking the command line parameter seemingly has precedence.
So if both values are set, and the management blocker is higher than the deployment-timeout, the following happens:
Deployment is started
Deployment timeout triggers
Management timeout still has time left and does not trigger to stop the server
Applications deploy within the management timeout period.
The After-Deployment Scanner checks the deployments directory for failures
Deloyment-timeout marked all deployments as failures
All Deployments are being undeployed as they are faulty
So: No error message, Deployment as usual, followed by instant undeployment. Should anyone else have that, check those values.
Tomcat version: 8.0.47
Tomcat is running as a container using: "FROM tomcat:8.0.47" in the Dockerfile.
Tomcat starts up and is able to serve my web application with no problems.
After approximately 7-10 minutes, I get a 502 server error:
Error: Server Error
The server encountered a temporary error and could not complete your
request.
Please try again in 30 seconds.
When I check the container ps -ax, I still see tomcat process running.
When I check catalina logs, the last log line: INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 72786 ms
When I check application-specific logs, there are no errors.
My expectation: How do I figure out what is wrong and why is it silently dying whilst the process is still active? There is 7 minutes of proof that the application & tomcat was deployed correctly since it works with no problems in the first 7 minutes of start up?