Docker container disappeared and job too slow

Docker container disappeared and job too slow - java

I have a multi-threaded data-processing job that completes in around 5 hours (same code) on an EC2 instance. But when it is run on a docker container (I configured it to have 7 GB of RAM before creating the container), the job runs slowly in docker container for about 12+ hours and then docker container disappeared. How can we fix this ? Why should the job be very slow in the docker container? CPU processing was very very slow in the docker container, not just the network I/O. Network I/O being slow is fine. But I 'm wondering what could be the cause for the CPU processing being very slow compared to EC2 instance. Also where can I find the detailed trace of what happened in the host operating system to cause the docker container to die.
**docker logs <container_id>**
19-Feb-2019 22:49:42.098 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
19-Feb-2019 22:49:42.105 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-8009"]
19-Feb-2019 22:49:42.106 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 27468 ms
19-Feb-2019 22:55:12.122 INFO [localhost-startStop-2] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/logging]
19-Feb-2019 22:55:12.154 INFO [localhost-startStop-2] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/logging] has finished in [32] ms
searchResourcePath=[null], isSearchResourceAvailable=[false]
knowledgeCommonResourcePath=[null], isKnowledgeCommonResourceAvailable=[false]
Load language resource fail...
blah blah blah some application log
bash: line 1: 10 Killed /usr/local/tomcat/bin/catalina.sh run
Error in Tomcat run: 137 ... failed!
Up on doing dmesg -T | grep docker, this is what I see. What is 500 dockerd? -500 docker-proxy? How to interpret what is here under?
[Tue Feb 19 14:49:04 2019] docker0: port 1(vethc30f313) entered blocking state
[Tue Feb 19 14:49:04 2019] docker0: port 1(vethc30f313) entered forwarding state
[Tue Feb 19 14:49:04 2019] docker0: port 1(vethc30f313) entered disabled state
[Tue Feb 19 14:49:07 2019] docker0: port 1(vethc30f313) entered blocking state
[Tue Feb 19 14:49:07 2019] docker0: port 1(vethc30f313) entered forwarding state
**[Wed Feb 20 04:09:23 2019] [10510] 0 10510 197835 12301 111 0 -500 dockerd
[Wed Feb 20 04:09:23 2019] [11241] 0 11241 84733 5434 53 0 0 docker
[Wed Feb 20 04:09:23 2019] [11297] 0 11297 29279 292 18 0 -500 docker-proxy**
[Wed Feb 20 04:09:30 2019] docker0: port 1(vethc30f313) entered disabled state
[Wed Feb 20 04:09:30 2019] docker0: port 1(vethc30f313) entered disabled state
[Wed Feb 20 04:09:30 2019] docker0: port 1(vethc30f313) entered disabled state
At 04:09:23, From above, it shows 500 dockerd etc and from below, at 04:09:24 it does Kill 11369 Java process score etc. What does it mean? Did it not kill docker process? It killed Java process running inside the docker container?
demsg -T | grep java
Wed Feb 20 04:09:23 2019] [ 3281] 503 3281 654479 38824 145 0 0 java
[Wed Feb 20 04:09:23 2019] [11369] 0 11369 3253416 1757772 4385 0 0 java
[Wed Feb 20 04:09:24 2019] Out of memory: Kill process 11369 (java) score 914 or sacrifice child
[Wed Feb 20 04:09:24 2019] Killed process 11369 (java) total-vm:13013664kB, anon-rss:7031088kB, file-rss:0kB, shmem-rss:0kB

TL;DR you need to increase the memory on your VM/host, or reduce the memory usage of your application.
The OS is killing Java which is running inside the container because the host ran out of memory. When the process inside the container dies, the container itself goes into an exited state. You can see these non-running containers with docker ps -a.
By default, docker does not limit the CPU or memory of a container. You can add these limits on containers, and if your container exceeds the container memory limits, docker will kill the container. That result will be visible with an OOM status when you inspect the stopped container.
The reason you see ether -500 lines setup on the docker processes is to prevent the OS from killing docker itself when the host runs out of memory. Instead, the process inside the container gets killed, and you can have a restart policy configured in docker to restart that container.
You can read more about memory limits, and configuring the OOM score for container processes at: https://docs.docker.com/engine/reference/run/

Related

Kubernetes pod (Java) restarts with 137 TERMINATED

We are running Kubernetes (1.18) with Docker 1.19 & systemd on an on-prem deployment with 3 masters and 3 workers. OS is RedHat 7.8.
Container is a Java 13 based spring boot app (using base image as openjdk:13-alpine) and below are the memory settings.
Pod:
memory - min 448M and max 2500M
cpu - min 0.1
Container:
Xms: 256M, Xmx: 512M
When traffic is send for a longer time, the container suddenly restarts; and in Prometheus I can see the Pod memory is below the max level (only around 1300MB).
In the pod events I can see warnings for liveness and readiness probes; and the pod getting restarted.
State: Running
Started: Sun, 23 Aug 2020 15:39:13 +0530
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Sun, 23 Aug 2020 15:23:03 +0530
Finished: Sun, 23 Aug 2020 15:39:12 +0530
Ready: True
Restart Count: 14
What are logs that I can refer to figure out why a restart was triggered? Application log is not helping at all; after the last log of the running app; I can see the starting line of the log as the next line.
What are the recommended approaches to troubleshoot this?
Thanks

137 means 128 + 9 (so it was killed with kill -9)
https://tldp.org/LDP/abs/html/exitcodes.html
Have a look at the pod and application logs.
Maybe the container needs more resources to start?

Initial EJB RMI works but with exception

I'm working through an EJB tutorial where my client program invokes a method via remote stateless EJB to add a book. Upon exit the client retrieves and prints all the books from the EJB (I know it's not a good idea to store data in a list within a stateless EJB). All of this works fine, except the initial RMI also returns the following exception (I've included the full output from the client test as well).
Client output:
Nov 29, 2016 11:34:29 PM org.jboss.ejb.client.EJBClient <clinit>
INFO: JBoss EJB Client version 2.1.4.Final
**********************
Welcome to Book Store
**********************
Options
1. Add Book
2. Exit
Enter Choice: 1
Enter book name: Some book
Nov 29, 2016 11:34:44 PM org.xnio.Xnio <clinit>
INFO: XNIO version 3.4.0.Final
Nov 29, 2016 11:34:44 PM org.xnio.nio.NioXnio <clinit>
INFO: XNIO NIO Implementation Version 3.4.0.Final
Nov 29, 2016 11:34:44 PM org.jboss.remoting3.EndpointImpl <clinit>
INFO: JBoss Remoting version 4.0.21.Final
Nov 29, 2016 11:34:45 PM org.jboss.ejb.client.remoting.VersionReceiver handleMessage
INFO: EJBCLIENT000017: Received server version 2 and marshalling strategies [river]
Nov 29, 2016 11:34:45 PM org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver associate
INFO: EJBCLIENT000013: Successful version handshake completed for receiver context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext#4f7d0008, receiver=Remoting connection EJB receiver [connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection#271053e1,channel=jboss.ejb,nodename=slave01:server01]} on channel Channel ID 87a6ebda (outbound) of Remoting connection 64bfbc86 to /127.0.0.1:8133 of endpoint "client-endpoint" <64bf3bbf>
Nov 29, 2016 11:34:45 PM org.jboss.ejb.client.remoting.RemotingConnectionClusterNodeManager getEJBReceiver
INFO: Could not create a connection for cluster node ClusterNode{clusterName='ejb', nodeName='slave01:server01', clientMappings=[ClientMapping{sourceNetworkAddress=/0:0:0:0:0:0:0:0, sourceNetworkMaskBits=0, destinationAddress='0.0.0.0', destinationPort=8080}], resolvedDestination=[Destination address=0.0.0.0, destination port=8080]} in cluster ejb
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.xnio.nio.WorkerThread$ConnectHandle.handleReady(WorkerThread.java:321)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:567)
at ...asynchronous invocation...(Unknown Source)
at org.jboss.remoting3.EndpointImpl.doConnect(EndpointImpl.java:294)
at org.jboss.remoting3.EndpointImpl.connect(EndpointImpl.java:430)
at org.jboss.ejb.client.remoting.NetworkUtil.connect(NetworkUtil.java:153)
at org.jboss.ejb.client.remoting.NetworkUtil.connect(NetworkUtil.java:133)
at org.jboss.ejb.client.remoting.ConnectionPool.getConnection(ConnectionPool.java:78)
at org.jboss.ejb.client.remoting.RemotingConnectionManager.getConnection(RemotingConnectionManager.java:51)
at org.jboss.ejb.client.remoting.RemotingConnectionClusterNodeManager.getEJBReceiver(RemotingConnectionClusterNodeManager.java:79)
at org.jboss.ejb.client.ClusterContext$EJBReceiverAssociationTask.call(ClusterContext.java:469)
at org.jboss.ejb.client.ClusterContext$EJBReceiverAssociationTask.call(ClusterContext.java:443)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
**********************
Welcome to Book Store
**********************
Options
1. Add Book
2. Exit
Enter Choice: 2
Book(s) entered so far: 2
1. test1
2. Some book
***Using second lookup to get library stateless object***
Book(s) entered so far: 2
1. test1
2. Some book
So everything with the client, other than the exception, appears to work correctly. I suspect this issue has something to do with the zero'd out node addresses, but I'm not certain. The client properties file is below (in case that configuration is incorrect).
jboss-ejb-clients.properties:
endpoint.name=client-endpoint
remote.connectionprovider.create.options.org.xnio.Options.SSL_ENABLED=false
invocation.timeout=3000
reconnect.tasks.timeout=2000
# User Credentials
username=user
password=pass
# Remote Connections
remote.connections=h1,h2
remote.connection.h1.host=127.0.0.1
remote.connection.h1.port=8133
remote.connection.h1.username=user
remote.connection.h1.password=pass
remote.connection.h2.host=127.0.0.1
remote.connection.h2.port=8134
remote.connection.h2.username=user
remote.connection.h2.password=pass
# Cluster
remote.clusters=ejb
remote.cluster.ejb.connect.timeout=2500
remote.cluster.ejb.max-allowed-connected-nodes=2
remote.cluster.ejb.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.cluster.ejb.connect.options.org.xnio.Options.SSL_ENABLED=false
remote.cluster.ejb.username=user
remote.cluster.ejb.password=pass

After extensive research (and a good amount of trial and error with test code), I found a book on Safari (Java EE 7 Development with WildFly) that lead me in the right direction. I had to drop the jboss-ejb-clients.properties file and add the ejb-client configuration found in the answer here to my main client class.

Connecting Java client to Hazelcast-Kubernetes fails

I'm running a kubernetes cluster in which I am deploying a "cloud native hazelcast" following the instructions on the kubernetes-hazelcast github page. Once I have a number of hazelcast instances running, I try to connect a java client to one of the instances but for some reason the connection fails.
Some background
Using a kubernetes external endpoint I can connect to hazelcast from outside the kubernetes cluster. When I do a REST call with curl kubernetes-master:32469/hazelcast/rest/cluster, I get a correct response from hazelcast with it's cluster information. So I know my endpoint works.
The hazelcast-kubernetes deployment uses the hazelcast-kubernetes-bootstrapper which allows some configuration by setting environment variables with the replication controller, but I'm using all defaults. So my group and password are "someGroup" and "someSecret".
The java client
My Java client code is really straightforward:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getNetworkConfig().setConnectionAttemptLimit(0);
clientConfig.getNetworkConfig().setConnectionTimeout(10000);
clientConfig.getNetworkConfig().setConnectionAttemptPeriod(2000);
clientConfig.getNetworkConfig().addAddress("kubernetes-master:32469");
clientConfig.getGroupConfig().setName("someGroup");
clientConfig.getGroupConfig().setPassword("someSecret")
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
When start my client this is the log output of the hazelcast container
2016-07-05 12:54:38.143 INFO 5 --- [thread-Acceptor] com.hazelcast.nio.tcp.SocketAcceptor : [172.16.15.4]:5701 [someGroup] [3.5.2] Accepting socket connection from /172.16.29.0:54333
2016-07-05 12:54:38.143 INFO 5 --- [ cached4] c.h.nio.tcp.TcpIpConnectionManager : [172.16.15.4]:5701 [someGroup] [3.5.2] Established socket connection between /172.16.15.4:5701
2016-07-05 12:54:38.157 INFO 5 --- [.IO.thread-in-1] c.h.nio.tcp.SocketClientMessageReader : [172.16.15.4]:5701 [someGroup] [3.5.2] Unknown client type: <
And the console output of the client
jul 05, 2016 2:54:37 PM com.hazelcast.core.LifecycleService
INFO: HazelcastClient[hz.client_0_someGroup][3.6.2] is STARTING
jul 05, 2016 2:54:38 PM com.hazelcast.core.LifecycleService
INFO: HazelcastClient[hz.client_0_someGroup][3.6.2] is STARTED
jul 05, 2016 2:54:48 PM com.hazelcast.client.spi.impl.ClusterListenerSupport
WARNING: Unable to get alive cluster connection, try in 0 ms later, attempt 1 of 2147483647.
jul 05, 2016 2:54:58 PM com.hazelcast.client.spi.impl.ClusterListenerSupport
WARNING: Unable to get alive cluster connection, try in 0 ms later, attempt 2 of 2147483647.
jul 05, 2016 2:55:08 PM com.hazelcast.client.spi.impl.ClusterListenerSupport
etc...
The client just keeps trying to connect but no connection is ever established.
What am I missing?
So why won't my client connect to the hazelcast instance? Is it some configuration part I'm missing?

Not sure about the official kubernetes support, however Hazelcast has a kubernetes discovery plugin (based on the new discovery spi) that works on both, client and nodes: https://github.com/noctarius/hazelcast-kubernetes-discovery

Looking at the console logs, you have different Hazelcast versions between Node and Client? Can you either update both to be 3.6.4 i.e., the latest or just change the cluster to be 3.6.2 to match with client. 3.6.x has many configuration changes and many bug fixes as well.

Disable Tomcat JAAS?

I have an active/passive nodes with Tomcat using heartbeat. When I shutdown the active node, Tomcat on passive node starts. This is a piece of the starting trace:
INFO: Initializing Coyote HTTP/1.1 on http-8080 May 22, 2014
7:37:43 PM org.apache.catalina.startup.Catalina load INFO:
Initialization processed in 366 ms May 22, 2014 7:37:43 PM
org.apache.catalina.realm.JAASRealm setContainer INFO: Set JAAS
app name VentusProxy May 22, 2014 7:37:59 PM
org.apache.catalina.mbeans.JmxRemoteLifecycleListener createServer
It takes 15 seconds to initialize JAAS, that means 15 seconds more without service.
I don't use JAAS at all in my application, so I'd like to disable it or, at least, try to reduce this 15 seconds.
Can anybody tell me if this is possible and how?

Tomcat port 80 - corrupted websocket messages

I am currently trying to write a small web application that makes use of websockets. The application is doing a broadacast to all connected clients. This works quite well as long as the Tomcat Container is running on a port different from port 80.
For this scenario think of the web application broadcasting messages all the time.
The working behaviour is the following (i.e. running on port different from 80):
Client (Browser) connects to the server successfully
Client immediately receives messages (i.e. websocket callback function is invoked)
As soon as I configure it to run on port 80 the following behaviour could be observed:
Client (Browser) connects to the server successfully
From the start no invocation of callback can be observed via console.log(...)
After some time the debug output from the onmessage callback is shown all at once having all the same timestamp, though the timespan between 5 broadcasts is definitely more than a second
Console Log:
Event data: {"aktuell":50,"total":788,"msg":"Indexiere Artikel"} at Mon Mar 03 2014 14:24:22 GMT+0100
Event data: {"aktuell":55,"total":788,"msg":"Indexiere Artikel"} at Mon Mar 03 2014 14:24:22 GMT+0100
Event data: {"aktuell":60,"total":788,"msg":"Indexiere Artikel"} at Mon Mar 03 2014 14:24:22 GMT+0100
Event data: {"aktuell":65,"total":788,"msg":"Indexiere Artikel"} at Mon Mar 03 2014 14:24:22 GMT+0100
Event data: {"aktuell":70,"total":788,"msg":"Indexiere Artikel"} at Mon Mar 03 2014 14:24:22 GMT+0100
Event data: {"aktuell":75,"total":788,"msg":"Indexiere Artikel"} at Mon Mar 03 2014 14:24:22 GMT+0100
Event data: {"aktuell":80,"total":788,"msg":"Indexiere Artikel"} at Mon Mar 03 2014 14:24:22 GMT+0100
Event data: {"aktuell":85,"total":788,"msg":"Indexiere Artikel"} at Mon Mar 03 2014 14:24:22 GMT+0100
Event data: {"aktuell":90,"total":788,"msg":"Indexiere Artikel"} at Mon Mar 03 2014 14:24:22 GMT+0100
Event data: {"aktuell":95,"total":784{"aktuell":50,"total":788, at Mon Mar 03 2014 14:24:22 GMT+0100
This behaviour is shown using Tomcat 7.0.42, 7.0.52 and Tomcat 8.0.3. For the client side IE 10, Firefox 21 and Chrome 33 have been used.
It seems to me that the websockets' content is somehow buffered to a size of about 510 bytes (observed when stripping debug messages down to the message content only). Even if I change the JSON message structure it will be a total of 510 bytes.
Is there anything that I missed that is different when working with port 80?
Just as an additional information, on the server side I use
session.getBasicRemote().sendText(message)
to send the message and on the client side I use
ws = new WebSocket(url); // Open connection
ws.onmessage = function(event) {
console.log(event.data); // Stripped down version
}
to handle any incoming event.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Docker container disappeared and job too slow - java

Related

Kubernetes pod (Java) restarts with 137 TERMINATED

Initial EJB RMI works but with exception

Connecting Java client to Hazelcast-Kubernetes fails

Disable Tomcat JAAS?

Tomcat port 80 - corrupted websocket messages

Categories

Resources