I tried to do a deployment for some applications in spring dataflow,
Routinely each diploi takes a few minutes and passes successfully or fails.
But this time the diplomacy took longer than usual. At one point I pressed "undeploy"
Since the system does not respond.
Under Stream all flickers in UNKNOWN mode.
It is not possible to redeploy.
When I try to perform a dipole I get the error Failed to upload the package. Package [test-orders:1.0.0] in Repository [local] already exists. from the ui
When I request the status of the pods I get 2 pods with CrashLoopBackOff status
I rebooted all pods kubectl -n **** rollout restart deploy
I try to run dataflow:>stream undeploy --name test-orders
I deleted the new docker image from EKS
Changed skipper_status from FAILED to DELETED
The problem still exists.
I'm really at a loss.
OK,
I seem to have been able to solve the problem.
Due to the CrashLoopBackOff status I realized that the system is unable to pull the image or the image is corrupt.
I have overwritten all the images in EKS that are associated with the project.
I changed the problematic skipper_status.status_code to DELETED(update skipper_status set status_code = 'DELETED' where id =***).
In the skipper_release table I added
backoffLimit: 6
completions: 1
parallelism: 1
So a crash of the system after several attempts will result in the end of a run.
I did a reset for all the pods.
And then in the UI interface I pressed the undeploy button.
Edit 1
I noticed that there were pods left that did not close.
I closed them like this:
kubectl -n foobar delete deployment foo-bar-v1
Related
I'm running on AEM Perspective with my remote server running. I then run Java Remote Application configuration on port 5402. First time running the debug config, it says it failed. Second time running it it says "Failed to connect to VM. Connection refused."
None of the breakpoints I've set up get hit, or are the icon they're meant to be when debug is running. None of the buttons on the debug view are lit up indicatin debug is actually working. However, eclipse is active as a task for the debug port when I run netstat -ano | findstr :5402 on cmd line.
I have restarted eclipse and AEM about 10 times today trying to get around this as no one on my team is experiencing this issue. I'm new to the tea, though, and all of them have had their environments running for years, so they don't know what step I could be missing that I can't get my env to work like theirs.
This is preventing me from getting work done and is very frustrating. Does anyone know why Eclipse would behave this way?
So, I have had a working Tomcat JDBC session storage solution for some time. I deployed what I consider to be an unrelated change (and I rolled back to previous war and the new issue still exists).
The problem
I can reproduce the problem like this
systemctl restart tomcat
wait for wars to load, manager to respond etc
access web app page that fetches for session, syserr:
java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.catalina.session.JDBCStore.open(JDBCStore.java:955)
(this part of apache's JDBCStore imple tries to load db driver using class name)
I keep accessing the web page -> same error to syserr
I keep doing this for a minute or so
Error suddenly no longer appears and session storage works as it should (persists new sessions all the way to db correctly, fetches sessions since prev restart correctly)
What I have tried
I have checked context.xml (that has the JDBCStore configuration) has not been changed by updates (and surely it is ok since the app does work eventually)
I have double checked jdbc driver is in tomcat/lib folder, has permissions etc (and surely it is since the app does work eventually)
I have tried to think of any change on the server that might cause this but I'm drawing blank
Killing Tomcat with kill -9 <pid> instead of systemctl restart does not make a difference
Stopping Tomcat, waiting for ~3 minutes and starting up: no difference, still have to wait ~1 minute before session storage works
What I suspect it that somehow JDBCStore (which is "internal Tomcat stuff", not part of my app) can not load the driver with Class.forName, but some other app/connection poll that I have running on the same Tomcat succeeds, and after that JDBCStore also works. What I don't know is why this has come up now but not lets say a month ago. In any case, any hints would be nice.
Tomcat8
openjdk version "1.8.0_265"
RHEL 7.9
If tomcat could not open a connection to the database it will throw NPE, check this commit that fixes it. Update your tomcat that will be released which includes this latest patch and you won't get the NPE.
But check why you are not able to connect to the database.
i have a java spring app that runs just fine locally with gradle, but when i run the image version of it, i get a strange error that i am not sure how to resolve or debug
docker run -it --rm myregistry.azurecr.io/my-app:latest
is the command that i get the following error on:
I/O exception (java.net.SocketException) caught when processing request to {s}->https://rt.services.visualstudio.com:443: Invalid argument or cannot assign requested address
ive found out thru googling that this url https://rt.services.visualstudio.com:443 is associated with Application Insights (azure logging) which is one of my dependencies. does this mean something is wrong with logback?
the thing is, when i run with gradle, i see trace logs in the app insights instance i am trying to log to, so i know that connection is working outside of docker.
one thing to note is that in the deployed dockerized instance, i see the same logs on startup as the successful local gradle output, but then it dies after it sets the profile. maybe meaning that it is dying during the tomcat initialization.
what to do/try here?
this ended up being due to a web proxy issue, switching networks or creating a proxy rule are solutions.
I am not getting while EARs are undeployed automatically in jboss-as-7.1.1.Final.
I can see these logs:
ERROR org.apache.tomcat.util.net.JIoEndpoint$Acceptor [run] Socket accept failed: java.net.SocketException: Too many open files
WARN com.kpn.tie.ejbs.dao.webservice.tt.WebServiceProcessor [invoke] WebService unavailable. The request could not be completed due to technical problems. ; nested exception is: java.net.SocketException: Too many open files
Can somebody tell me root cause of this behavior and also suggest solution for this.
For workaround, restarting jboss in particular time interval will resolve this issue?
The reason could be that the application is overloaded or the file descriptor settings is too low. Due to this, the JVM can not open any new file handle, so you are getting Socket accept failed for incoming requests.
After a while the Deployment-Scanner comes into play (5 sec is default) and tries to check the deployments folder, which is not possible as it can not open any file-handle. So it gets confused and stops the deployed apps.
First solution could be:
Deactivate the scanner so that it only checks once during boot or remove the deployment scanner subsystem and use only CLI to deploy.
Second solution could be:
Increase the file-handler limit (open files size)
java.net.SocketException: Too many open files
On Linux you can increase the number of concurrently open files with
ulimit -n 2048
This would allow 2048 open at the same time in the current session. The command should be either inserted in the session configuration (e.g. .bashrc or similar, depends on your used shell) or in the JBoss start script.
To show the current limit you can use
ulimit -n
I have setup hue to use it to create oozie work flow on HDP2.0
I start Hue using /etc/init.d/hue start and I login to Hue webserver and work for a few minutes and then the server dies suddenly.
I have everything configured correctly, but I dont know why Hue service keeps crashing. Is Hue an unstable product ?
Where can I find the logs for the service while its running ?
sudo /etc/init.d/hue status
supervisor is stopped
Hue is used by hundred of organizations and is available in the main Hadoop distributions.
The root cause of the crash could be caused by an underlying Hadoop service. Could you check the Hue logs in order to know more? (default is /var/log/hue normally).
Which Hue version are you using? The latest one as today is 3.5. You should also post your logs / questions on the Hue user list which is pretty active.
The first thing to be checked is all the permission of the keytab hue.service.keytab indside /etc/security/keytabs.
They should be as follows:
-r--r----- 1 root hadoop 340 Jun 17 03:29 hue.service.keytab.
Note :- The above keytab should not have the write permission.
Also kerberos section in hue.ini file which is located at /etc/hue/conf should be properly configured according to your need.