I've been working on a Java project for year. My code had been working fine for months. A few days ago I upgraded the Java SDK to the newest version 1.6.0_26 on my Mac (Snow Leopard 10.6.8). After the upgrade, something very weird happens. When I run some of the classes, I get this error:
Invalid memory access of location 0x202 rip=0x202
But, if I run them with -Xint (interpreted) they work, slow but work fine. I get that problem in classes where I use bitwise operators (bitboards for the game Othello). I can't put any code here because I don't get an error, exception or something similar. I just get that annoying message.
Is it normal that the code doesn't run without -Xint but it works with it? What should I do?
Thanks in advance
When a JVM starts crashing like that, it is a sign that something has broken the JVM's execution model.
Does your application include any native code? Does it use any 3rd-party libraries with native code components? If neither is true, then the chances are that this is a bug in the Apple port of the JVM. It could be a JIT compiler bug, or a bug in some JVM native code library.
What can you do about a bug like that?
Not a lot.
Reduce your application by progressively chopping out bits until you have a small testcase that exhibits the problem.
Based on the testcase, see if there's some empirical way to avoid the problem.
Submit a bug report to Apple with the testcase.
I just came across this situation and it turned out to be related to a piece of code that was serializing a JSON object with a cyclic reference to itself. I removed the cycle and the error went away. I suspect this is related to a memory overflow error that is now handled differently by newer JVMs on Mac OSX. In this case, I was running Mac OSX 10.7.
For completeness the errors I was receiving were:
Invalid access of stack red zone 0x10e586d30 rip=0x10daabba6
Bus error: 10
And:
Invalid memory access of location 0x10b655890 rip=0x10a8baba6
Segmentation fault: 11
Also verify that you are building the GUI on the event dispatch thread and never updating a GUI component from any other thread.
Related errors are notoriously hard to reproduce, but the change associated with altered timing is suggestive.
Please check if /etc/hosts is empty and verify that it include these configurations :
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost
Related
We are seeing this intermittent issue in production. The CPU gets pegged at 50% (2 core CPU) randomly and it never comes back. Only option is to restart the server.
This is how CPU appears from Dynatrace
This is how the thread dump looks when we analyzed through dynatrace.
Through my research, it appears there was a jdk defect
Calling 'java.util.zip.Deflater.finish()' prematurely hangs the application.
The application is spinning consuming one cpu
https://bugs.openjdk.java.net/browse/JDK-8060193
Only happens randomly when for some multiple filters are involved.
I was able to reproduce this using test class in above jira on CentOs vm which has JDK "1.8.0_201"
That was surprising because as per the docs and ticket, this has been fixed.
On further research, find similar defect opened again in jdk.
https://bugs.openjdk.java.net/browse/JDK-8193682
Now the team is not willing to work on it unless someone could reproduce it.
Since it is happening randomly in production, I am not sure how to reproduce it. The test class from https://bugs.openjdk.java.net/browse/JDK-8060193 still has issues. IS this even a valid test case?
If this is valid then there will be problems every time we send compressed data.
Our run time JRE is Jdk 1.8
Compression is at tomcat, not at load balancer.
Any pointers as to why is this happening and how we can solve this?
Update:
In one of the libraries we are using, it was throwing an exception
Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately after start byte 0xfd)
LastName, First’Name
As we can see, this is not a regular apostrophe.We can have this by copy pasting from word which auto corrects a regular apostrophe to this funky character.
Our reproducer did threw an error but CPU was not getting stuck. I think it happens under high volume and traffic.
EDIT 4 Oct 2022
It seems that the problem has been fixed and applied to OpenJDK 11 and 17: https://bugs.openjdk.org/browse/JDK-8193682
Original answer
As I said in a comment before, we are facing this problem when we try to generate Zip files which are being written in the OutputStream of the HttpServletResponse through a ZipOutputStream.
The reason for the cores running at 100% is because of three (under certain conditions)infinite loops in ZipOutputStream(closeEntry()) and DeflaterOutputStream(write() and finish()).
These infinite loops look like this:
while (!def.finished()) {
deflate();
}
Where def is a java.util.zip.Deflater.
If I understand right, this is the problem in JDK-8193682. There is a workaround class there which overwrites the deflate method of ZipOutputStream.
I am going to try to use a class based on that workaround, which accepts a timeout to be checked in the deflate method. I hope not to produce resource leaks with this approach.
Related question: Thread locking when flushing jsp file
I want to post an update to this problem that has bugged us for years. We had an inititiave to migrate static content to CDN underway. After CDN was implemented and all static resources was served from a different server, the ZipStream problem was resolved. Although the research showed that the problem was more for dynamic content and not static, I am not sure how the problem got solved. Maybe someone who is reading this answer can explain me how this has got fixed.
When a Java VM crashes with an EXCEPTION_ACCESS_VIOLATION and produces an hs_err_pidXXX.log file, what does that indicate? The error itself is basically a null pointer exception. Is it always caused by a bug in the JVM, or are there other causes like malfunctioning hardware or software conflicts?
Edit: there is a native component, this is an SWT application on win32.
Most of the times this is a bug in the VM.
But it can be caused by any native code (e.g. JNI calls).
The hs_err_pidXXX.log file should contain some information about where the problem happened.
You can also check the "Heap" section inside the file. Many of the VM bugs are caused by the garbage collection (expecially in older VMs). This section should show you if the garbage was running at the time of the crash. Also this section shows, if some sections of the heap are filled (the percentage numbers).
The VM is also much more likely to crash in a low memory situation than otherwise.
Answer found!
I had the same error and noticed that others who provided the contents of the pid log file were running 64 bit Windows. Just like me. At the end log file, it included the PATH statement. There I could see C:\Windows\SysWOW64 was incorrectly listed ahead of: %SystemRoot%\system32. Once I corrected it, the exception disappeared.
First thing you should do is upgrade your JVM to the latest you can.
Can you repeat the issue? Or does it seem to happen randomly? We recently had a problem where our JVM was crashing all over the place, at random times. Turns out it was a hardware problem. We put the drives in a new server and it completely went away.
Bottom line, the JVM should never crash, as the poster above mentioned if your not doing any JNI then my gut is that you have a hardware problem.
The cause of the problem will be documented in the hs_err* file, if you know what to look for. Take a look, and if it still isn't clear, consider posting the first 5 or 10 lines of the stack trace and other pertinent info (don't post the whole thing, there's tons of info in there that won't help - but you have to figure out which 1% is important :-) )
Are you using a Browser widget and executing javascript in the Browser widget? If so, then there are bugs in some versions of SWT that causes the JVM to crash in native code, in various Windows libraries.
Two examples (that I opened) are bug 217306 and bug 127960. These two bug reports are not the only bug reports of the JVM crashing in SWT, however.
If you aren't using the Browser widget then these suggestions won't help you. In that case, you can search for a list of SWT bugs causing a JVM crash. If none of those are your issue, then I highly recommend that you open a bug report with SWT.
I have the same problem with a JNLP application that I have been using for a long time and is pretty reliable. The problem started immediately after I upgraded from Windows 7 to Windows 10. According to my investigation, it is most likely a bug in Win 10.
The following is not a solution, but an ugly workaround. In jre/bin directory, there is javaws.exe. If I right-clicked /Properties/Compatibility and ticked Run this program as an administrator, the JNLP app started to work.
Please, be aware that this approach could cause security issues and use it only if you have no other option and 100% know what you are doing.
When a Java VM crashes with an EXCEPTION_ACCESS_VIOLATION and produces an hs_err_pidXXX.log file, what does that indicate? The error itself is basically a null pointer exception. Is it always caused by a bug in the JVM, or are there other causes like malfunctioning hardware or software conflicts?
Edit: there is a native component, this is an SWT application on win32.
Most of the times this is a bug in the VM.
But it can be caused by any native code (e.g. JNI calls).
The hs_err_pidXXX.log file should contain some information about where the problem happened.
You can also check the "Heap" section inside the file. Many of the VM bugs are caused by the garbage collection (expecially in older VMs). This section should show you if the garbage was running at the time of the crash. Also this section shows, if some sections of the heap are filled (the percentage numbers).
The VM is also much more likely to crash in a low memory situation than otherwise.
Answer found!
I had the same error and noticed that others who provided the contents of the pid log file were running 64 bit Windows. Just like me. At the end log file, it included the PATH statement. There I could see C:\Windows\SysWOW64 was incorrectly listed ahead of: %SystemRoot%\system32. Once I corrected it, the exception disappeared.
First thing you should do is upgrade your JVM to the latest you can.
Can you repeat the issue? Or does it seem to happen randomly? We recently had a problem where our JVM was crashing all over the place, at random times. Turns out it was a hardware problem. We put the drives in a new server and it completely went away.
Bottom line, the JVM should never crash, as the poster above mentioned if your not doing any JNI then my gut is that you have a hardware problem.
The cause of the problem will be documented in the hs_err* file, if you know what to look for. Take a look, and if it still isn't clear, consider posting the first 5 or 10 lines of the stack trace and other pertinent info (don't post the whole thing, there's tons of info in there that won't help - but you have to figure out which 1% is important :-) )
Are you using a Browser widget and executing javascript in the Browser widget? If so, then there are bugs in some versions of SWT that causes the JVM to crash in native code, in various Windows libraries.
Two examples (that I opened) are bug 217306 and bug 127960. These two bug reports are not the only bug reports of the JVM crashing in SWT, however.
If you aren't using the Browser widget then these suggestions won't help you. In that case, you can search for a list of SWT bugs causing a JVM crash. If none of those are your issue, then I highly recommend that you open a bug report with SWT.
I have the same problem with a JNLP application that I have been using for a long time and is pretty reliable. The problem started immediately after I upgraded from Windows 7 to Windows 10. According to my investigation, it is most likely a bug in Win 10.
The following is not a solution, but an ugly workaround. In jre/bin directory, there is javaws.exe. If I right-clicked /Properties/Compatibility and ticked Run this program as an administrator, the JNLP app started to work.
Please, be aware that this approach could cause security issues and use it only if you have no other option and 100% know what you are doing.
I am running a simple script in Groovy on an Ubuntu 11.10 machine, which takes key/value pairs and adds them to a JDBM map in a loop. Every ~3 minutes the script hangs for a couple of minutes and then resumes. When I look at the resource monitor I see that there is no CPU or Memory activity and the process is in futex_wait_queue_me().
Please suggest means to overcome this, on a Windows machine by the way the application runs without the hangs.
Could this be an OS issue? (found many similar threads about similar futex_wait_queue_me() problems in Ubuntu0
Thanks
Please check the version of the kernel. I ran into a similar problem (java and other multithreaded applications) on Centos6 and upgrading the kernel to version 2.6.32-504.16.2.el6.x86_64 solved the issue.
See the centos bug report: https://bugs.centos.org/view.php?id=8703 which contains this pointer to an explanation of the problem:
https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0 [^]
My stacktrace was:
cat /proc/23199/stack
[<ffffffff810b226a>] futex_wait_queue_me+0xba/0xf0
[<ffffffff810b33a0>] futex_wait+0x1c0/0x310
[<ffffffff810b4c91>] do_futex+0x121/0xae0
[<ffffffff810b56cb>] sys_futex+0x7b/0x170
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
For anyone interested I used those parameters when running java:
-Xms16384M -Xmx16384M
You can find additional GC optimization tips at http://randomlyrr.blogspot.it/2012/03/java-tuning-in-nutshell-part-1.html
Yesterday evening I left the office with a running Java program written by me. It should insert a lot of records into our company database (Oracle) using a JDBC connection. This morning when I came back to work I saw this error (caught by a try-catch):
java.sql.SQLRecoverableException: I/O Exception: Connection reset
The program wrote almost all records before getting this problem, but what if it happens early (just minutes after I leave the office at evening)? I cannot understand what happened, I contacted my database admin and he said there was no particular issue on the database.
Any idea on what happened and what can I do do to avoid it?
The error occurs on some RedHat distributions. The only thing you need to do is to run your application with parameter java.security.egd=file:///dev/urandom:
java -Djava.security.egd=file:///dev/urandom [your command]
I want to produce a complementary answer of nacho-soriano's solution ...
I recently search to solve a problem where a Java written application (a Talend© ELT job in fact) want to connect to an Oracle database (11g and over) then randomly fail. OS is both RedHat Enterprise and CentOS. Job run very quily in time (no more than half a minute) and occur very often (approximately one run each 5 minutes).
Some times, during night-time as work-time, during database intensive-work usage as lazy work usage, in just a word randomly, connection fail with this message:
Exception in component tOracleConnection_1
java.sql.SQLRecoverableException: Io exception: Connection reset
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:101)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:229)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:458)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:411)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:490)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:202)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:33)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:465)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
and StackTrace follow ...
##Problem explanation:##
As detailed here
Oracle connection needs some random numbers to assume a good level of security. Linux random number generator produce some numbers bases keyboard and mouse activity (among others) and place them in a stack. You will grant me, on a server, there is not a big amount of such activity. So it can occur that softwares use more random number than generator can produce.
When the pool is empty, reads from /dev/random will block until additional environmental noise is gathered. And Oracle connection fall in timeout (60 seconds by default).
##Solution 1 - Specific for one app solution##
The solution is to give add two parameters given to the JVM while starting:
-Djava.security.egd=file:/dev/./urandom
-Dsecurerandom.source=file:/dev/./urandom
Note: the '/./' is important, do not drop it !
So the launch command line could be:
java -Djava.security.egd=file:/dev/./urandom -Dsecurerandom.source=file:/dev/./urandom -cp <classpath directives> appMainClass <app options and parameters>
One drawback of this solution is that numbers generated are a little less secure as randomness is impacted. If you don't work in a military or secret related industry this solution can be your.
##Solution 2 - General Java JVM solution##
As explained here
Both directives given in solution 1 can be put in Java security setting file.
Take a look at $JAVA_HOME/jre/lib/security/java.security
Change the line
securerandom.source=file:/dev/random
to
securerandom.source=file:/dev/urandom
Change is effective immediately for new running applications.
As for solution #1, one drawback of this solution is that numbers generated are a little less secure as randomness is impacted. This time, it's a global JVM impact. As for solution #1, if you don't work in a military or secret related industry this solution can be your.
We ideally should use "file:/dev/./urandom" after Java 5 as previous path will again point to /dev/random.
Reported Bug : https://bugs.openjdk.java.net/browse/JDK-6202721
##Solution 3 - Hardware solution##
Disclamer: I'm not linked to any of hardware vendor or product ...
If your need is to reach a high quality randomness level, you can replace your Linux random number generator software by a piece of hardware.
Some information are available here.
Regards
Thomas
This simply means that something in the backend ( DBMS ) decided to stop working due to unavailability of resources etc.
It has nothing to do with your code or the number of inserts.
You can read more about similar problems here:
http://kr.forums.oracle.com/forums/thread.jspa?threadID=941911
http://forums.oracle.com/forums/thread.jspa?messageID=3800354
This may not answer your question, but you will get an idea of why it might be happening. You could further discuss with your DBA and see if there is something specific in your case.
Solution
Change the setup for your application, so you this parameter[-Djava.security.egd=file:/dev/../dev/urandom] next to the java command:
java -Djava.security.egd=file:/dev/../dev/urandom [your command]
Ref :- https://community.oracle.com/thread/943911
We experienced these errors intermittently after upgraded from 11g to 12c and our java was on 1.6.
The fix for us was to upgrade java and jdbc from 6 to 7
export JAVA_HOME='/usr/java1.7'
export CLASSPATH=/u01/app/oracle/product/12.1.0/dbhome_1/jdbc/libojdbc7.jar:$CLASSPATH
Several days later, still intermittent connection resets.
We ended up removing all the java 7 above. Java 6 was fine. The problem was fixed by adding this to our user bash_profile.
Our groovy scripts that were experiencing the error were using /dev/random on our batch VM server. Below forced java and groovy to use /dev/urandom.
export JAVA_OPTS=" $JAVA_OPTS -Djava.security.egd=file:///dev/urandom "
Your exception says it all "Connection reset".
The connection between your java process and the db server was lost, which could have happened for almost any reason(like network issues). The SQLRecoverableException just means that its recoverable, but the root cause is connection reset.
I had a similar situation when reading from Oracle in a Spark job. This connection reset error was caused by an incompatibility between the Oracle server and the JDBC driver used. Worth checking it.
add java security in your run command
java -jar -Djava.security.egd="file:///dev/urandom" yourjarfilename.jar