We are seeing this intermittent issue in production. The CPU gets pegged at 50% (2 core CPU) randomly and it never comes back. Only option is to restart the server.
This is how CPU appears from Dynatrace
This is how the thread dump looks when we analyzed through dynatrace.
Through my research, it appears there was a jdk defect
Calling 'java.util.zip.Deflater.finish()' prematurely hangs the application.
The application is spinning consuming one cpu
https://bugs.openjdk.java.net/browse/JDK-8060193
Only happens randomly when for some multiple filters are involved.
I was able to reproduce this using test class in above jira on CentOs vm which has JDK "1.8.0_201"
That was surprising because as per the docs and ticket, this has been fixed.
On further research, find similar defect opened again in jdk.
https://bugs.openjdk.java.net/browse/JDK-8193682
Now the team is not willing to work on it unless someone could reproduce it.
Since it is happening randomly in production, I am not sure how to reproduce it. The test class from https://bugs.openjdk.java.net/browse/JDK-8060193 still has issues. IS this even a valid test case?
If this is valid then there will be problems every time we send compressed data.
Our run time JRE is Jdk 1.8
Compression is at tomcat, not at load balancer.
Any pointers as to why is this happening and how we can solve this?
Update:
In one of the libraries we are using, it was throwing an exception
Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately after start byte 0xfd)
LastName, First’Name
As we can see, this is not a regular apostrophe.We can have this by copy pasting from word which auto corrects a regular apostrophe to this funky character.
Our reproducer did threw an error but CPU was not getting stuck. I think it happens under high volume and traffic.
EDIT 4 Oct 2022
It seems that the problem has been fixed and applied to OpenJDK 11 and 17: https://bugs.openjdk.org/browse/JDK-8193682
Original answer
As I said in a comment before, we are facing this problem when we try to generate Zip files which are being written in the OutputStream of the HttpServletResponse through a ZipOutputStream.
The reason for the cores running at 100% is because of three (under certain conditions)infinite loops in ZipOutputStream(closeEntry()) and DeflaterOutputStream(write() and finish()).
These infinite loops look like this:
while (!def.finished()) {
deflate();
}
Where def is a java.util.zip.Deflater.
If I understand right, this is the problem in JDK-8193682. There is a workaround class there which overwrites the deflate method of ZipOutputStream.
I am going to try to use a class based on that workaround, which accepts a timeout to be checked in the deflate method. I hope not to produce resource leaks with this approach.
Related question: Thread locking when flushing jsp file
I want to post an update to this problem that has bugged us for years. We had an inititiave to migrate static content to CDN underway. After CDN was implemented and all static resources was served from a different server, the ZipStream problem was resolved. Although the research showed that the problem was more for dynamic content and not static, I am not sure how the problem got solved. Maybe someone who is reading this answer can explain me how this has got fixed.
Related
When a Java VM crashes with an EXCEPTION_ACCESS_VIOLATION and produces an hs_err_pidXXX.log file, what does that indicate? The error itself is basically a null pointer exception. Is it always caused by a bug in the JVM, or are there other causes like malfunctioning hardware or software conflicts?
Edit: there is a native component, this is an SWT application on win32.
Most of the times this is a bug in the VM.
But it can be caused by any native code (e.g. JNI calls).
The hs_err_pidXXX.log file should contain some information about where the problem happened.
You can also check the "Heap" section inside the file. Many of the VM bugs are caused by the garbage collection (expecially in older VMs). This section should show you if the garbage was running at the time of the crash. Also this section shows, if some sections of the heap are filled (the percentage numbers).
The VM is also much more likely to crash in a low memory situation than otherwise.
Answer found!
I had the same error and noticed that others who provided the contents of the pid log file were running 64 bit Windows. Just like me. At the end log file, it included the PATH statement. There I could see C:\Windows\SysWOW64 was incorrectly listed ahead of: %SystemRoot%\system32. Once I corrected it, the exception disappeared.
First thing you should do is upgrade your JVM to the latest you can.
Can you repeat the issue? Or does it seem to happen randomly? We recently had a problem where our JVM was crashing all over the place, at random times. Turns out it was a hardware problem. We put the drives in a new server and it completely went away.
Bottom line, the JVM should never crash, as the poster above mentioned if your not doing any JNI then my gut is that you have a hardware problem.
The cause of the problem will be documented in the hs_err* file, if you know what to look for. Take a look, and if it still isn't clear, consider posting the first 5 or 10 lines of the stack trace and other pertinent info (don't post the whole thing, there's tons of info in there that won't help - but you have to figure out which 1% is important :-) )
Are you using a Browser widget and executing javascript in the Browser widget? If so, then there are bugs in some versions of SWT that causes the JVM to crash in native code, in various Windows libraries.
Two examples (that I opened) are bug 217306 and bug 127960. These two bug reports are not the only bug reports of the JVM crashing in SWT, however.
If you aren't using the Browser widget then these suggestions won't help you. In that case, you can search for a list of SWT bugs causing a JVM crash. If none of those are your issue, then I highly recommend that you open a bug report with SWT.
I have the same problem with a JNLP application that I have been using for a long time and is pretty reliable. The problem started immediately after I upgraded from Windows 7 to Windows 10. According to my investigation, it is most likely a bug in Win 10.
The following is not a solution, but an ugly workaround. In jre/bin directory, there is javaws.exe. If I right-clicked /Properties/Compatibility and ticked Run this program as an administrator, the JNLP app started to work.
Please, be aware that this approach could cause security issues and use it only if you have no other option and 100% know what you are doing.
Today in the company where I work we received a report about one of our webapps not working.
The first thing we did was look at the hardware utilization:
Processor: 5%;
Memory: 68%;
Disk IO capacity: 4%;
Network: 1Mbps/1Gbps;
After some tests we saw that, indeed, the webapp pages weren't loading, some time after the webapp timeouts.
Other webapps in the exactly same tomcat instance were working fine and fast as ever, no problem with them. We tried to restart/reload the webapp but it still didn't get it working. Finally we restarted tomcat which corrected the issue for now.
There were no restarts or redeploys since yesterday when it was working fine. We believe this may be a periodic bug of some sort so we want to correct it soon.
Does anyone know of any steps we may take to investigate what it might have been?
It seems related to lock on IO.
If you are using linux, the first thing is do is to check open files by using the command lsof . If you see a lot of opened files by your app, you have to check in the code that every InputStream/OutputStream are closed (even in exception handling code).
An other common source of issues is related to thread starvation, see http://tomcat.apache.org/tomcat-9.0-doc/config/valve.html#Stuck_Thread_Detection_Valve
When a Java VM crashes with an EXCEPTION_ACCESS_VIOLATION and produces an hs_err_pidXXX.log file, what does that indicate? The error itself is basically a null pointer exception. Is it always caused by a bug in the JVM, or are there other causes like malfunctioning hardware or software conflicts?
Edit: there is a native component, this is an SWT application on win32.
Most of the times this is a bug in the VM.
But it can be caused by any native code (e.g. JNI calls).
The hs_err_pidXXX.log file should contain some information about where the problem happened.
You can also check the "Heap" section inside the file. Many of the VM bugs are caused by the garbage collection (expecially in older VMs). This section should show you if the garbage was running at the time of the crash. Also this section shows, if some sections of the heap are filled (the percentage numbers).
The VM is also much more likely to crash in a low memory situation than otherwise.
Answer found!
I had the same error and noticed that others who provided the contents of the pid log file were running 64 bit Windows. Just like me. At the end log file, it included the PATH statement. There I could see C:\Windows\SysWOW64 was incorrectly listed ahead of: %SystemRoot%\system32. Once I corrected it, the exception disappeared.
First thing you should do is upgrade your JVM to the latest you can.
Can you repeat the issue? Or does it seem to happen randomly? We recently had a problem where our JVM was crashing all over the place, at random times. Turns out it was a hardware problem. We put the drives in a new server and it completely went away.
Bottom line, the JVM should never crash, as the poster above mentioned if your not doing any JNI then my gut is that you have a hardware problem.
The cause of the problem will be documented in the hs_err* file, if you know what to look for. Take a look, and if it still isn't clear, consider posting the first 5 or 10 lines of the stack trace and other pertinent info (don't post the whole thing, there's tons of info in there that won't help - but you have to figure out which 1% is important :-) )
Are you using a Browser widget and executing javascript in the Browser widget? If so, then there are bugs in some versions of SWT that causes the JVM to crash in native code, in various Windows libraries.
Two examples (that I opened) are bug 217306 and bug 127960. These two bug reports are not the only bug reports of the JVM crashing in SWT, however.
If you aren't using the Browser widget then these suggestions won't help you. In that case, you can search for a list of SWT bugs causing a JVM crash. If none of those are your issue, then I highly recommend that you open a bug report with SWT.
I have the same problem with a JNLP application that I have been using for a long time and is pretty reliable. The problem started immediately after I upgraded from Windows 7 to Windows 10. According to my investigation, it is most likely a bug in Win 10.
The following is not a solution, but an ugly workaround. In jre/bin directory, there is javaws.exe. If I right-clicked /Properties/Compatibility and ticked Run this program as an administrator, the JNLP app started to work.
Please, be aware that this approach could cause security issues and use it only if you have no other option and 100% know what you are doing.
I've been working on a Java project for year. My code had been working fine for months. A few days ago I upgraded the Java SDK to the newest version 1.6.0_26 on my Mac (Snow Leopard 10.6.8). After the upgrade, something very weird happens. When I run some of the classes, I get this error:
Invalid memory access of location 0x202 rip=0x202
But, if I run them with -Xint (interpreted) they work, slow but work fine. I get that problem in classes where I use bitwise operators (bitboards for the game Othello). I can't put any code here because I don't get an error, exception or something similar. I just get that annoying message.
Is it normal that the code doesn't run without -Xint but it works with it? What should I do?
Thanks in advance
When a JVM starts crashing like that, it is a sign that something has broken the JVM's execution model.
Does your application include any native code? Does it use any 3rd-party libraries with native code components? If neither is true, then the chances are that this is a bug in the Apple port of the JVM. It could be a JIT compiler bug, or a bug in some JVM native code library.
What can you do about a bug like that?
Not a lot.
Reduce your application by progressively chopping out bits until you have a small testcase that exhibits the problem.
Based on the testcase, see if there's some empirical way to avoid the problem.
Submit a bug report to Apple with the testcase.
I just came across this situation and it turned out to be related to a piece of code that was serializing a JSON object with a cyclic reference to itself. I removed the cycle and the error went away. I suspect this is related to a memory overflow error that is now handled differently by newer JVMs on Mac OSX. In this case, I was running Mac OSX 10.7.
For completeness the errors I was receiving were:
Invalid access of stack red zone 0x10e586d30 rip=0x10daabba6
Bus error: 10
And:
Invalid memory access of location 0x10b655890 rip=0x10a8baba6
Segmentation fault: 11
Also verify that you are building the GUI on the event dispatch thread and never updating a GUI component from any other thread.
Related errors are notoriously hard to reproduce, but the change associated with altered timing is suggestive.
Please check if /etc/hosts is empty and verify that it include these configurations :
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost
I'm developing a web application (using JBoss and Icefaces) which uses a Servlet to create a Excel or PDF file and sends it to the browser.
But I'm experiencing performance problems after the servlet was called, resulting in a constantly high (~50%) CPU consumption of the browser. Testet in Firefox and Internet Explorer on different machines. The high cpu usage arises with a little delay (half a minute) after the servlet was called and stays high (until I close the browser or reload the page).
Whether I download the created file or open it directly makes no difference.
I'm clueless how this can happen...
Is there a way to analyze/debug the browser to find out, what is causing the cpu consumption?
UPDATE:
I've found out that is definitly related to the Icefaces webapp. When I replace my direct html-link to my servlet with a javascript call which opens the servlet in a new window (with window.open), then I can download the created file without problems.
Also when I logout inside my application, the CPU usage goes back to normal again!
UPDATE:
Ok, now Firebug helped me on: After the servlet was called I can see in the Firebug Console that there are XMLHtppRequests every milliseconds. Now I can understand the CPU usage!
POST http://localhost/webapp/block/receive-updated-views
is coming on and on. Have to check this...
UPDATE:
Ok I found an iceface thread (with the corresponding iceface jira bug) but the bugtracker states this should be fixed already... somehow not for my case!
Have you tried FindBugs or other tools for static code analysis?
http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis#Java
If the CPU consumption is really coming from the browser, then you can't do anything about it - it is either the PDF or excel viewer .
If the server is on the same machine, and the cpu load comes from the servlet, then you have to optimize it. Give us code for that.
Browsers doesn't run servlets. Browsers runs/displays the output which is produced by the servlet. It's the output which is causing a high load. Based on the as far given little information, it's hard to tell what's the problem with the output. Firebug can give lot of insights of what's going on in the webbrowser.
Usually, an extraordinary large HTML table or an inefficent piece of JavaScript code can consume a lot of CPU/memory resources. But with a binary file download like XLS/PDF, this should in fact not happen. The cause is then probably deeper. Do you see lot a resource usage when you do a file-to-file copy on the harddisk? If so, then it may probably be a bad harddisk or bad harddisk driver or harddisk DMA is being turned off.
Update: as per your investigation with help of Firebug, it look like that you're using IceFaces' ice:commandButton or ice:commandLink to download the file. Replace them by the normal JSF h:commandButton or h:commandLink so that it doesn't unnecessarily generate JS code which is responsible for that.
Problem solved (actually it's more a workaround).
It's a Iceface Prob, which should be fixed according to the bugtracker.
But as it is actually still present I could only fix it by calling the servlet in a new windows, created by javascript. (As already mentioned in my edited question).
It's really not a nice solution and has the drawback, that browser must not block popups.
But it's also a solution I've seen several times in the net (like here).
public void exportToExcel(ActionEvent e) {
JavascriptContext.addJavascriptCall(FacesContext.getCurrentInstance(), "window.open (\"downloadServlet.dl?contentType=excel\",\"report\")");
}