I have a java application running on Solaris. This application regularily launches external processes using Runtime.exec. It seems that after a while, having successfully launched such processes many time over, a launching of a process will hang. A thread dump taken at this point (and several minutes later) reveals that java.lang.UNIXProcess.forkAndExec is "stuck". Following is the top of the relevant stack trace taken from the thread dump:
"Thread-85305" prio=3 tid=0x0000000102aae800 nid=0x21499 runnable [0x7fffffff2a3fe000]
java.lang.Thread.State: RUNNABLE
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(Unknown Source)
at java.lang.ProcessImpl.start(Unknown Source)
at java.lang.ProcessBuilder.start(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
I have read through some forums where others have experienced forAndExec throwing an IOException due to not enough space or not enough memory, but I'm not getting this error here. I'm now waiting to get the results of pstack in the hope that it will reveal more information.
Does anyone have any idea on how to resolve this issue?
thanks,
Mike
Installing the Sun Alert Patch Cluster will do the trick.
As the thread live as long as your executable is running, maybe it's only your external executable which is hanging.
You should try to find the parameters passed to your executable and try to launch it manually.
Is stdout and stderr from the process being consumed? You might be looking at the results of a full buffer.
You can test this by adding output redirection to a tempfile for the command that is spawned.
Related
My application uses javafx 11.0.1 and is shipped bundled with a jlinked version of openjdk 11. It runs fine for the vast majority of the users but few of them are getting this stack:
Exception in thread "WindowsNativeRunloopThread" java.lang.NoSuchMethodError: <init>
at com.sun.glass.ui.win.WinApplication.staticScreen_getScreens(Native Method)
at com.sun.glass.ui.Screen.initScreens(Screen.java:412)
at com.sun.glass.ui.Application.lambda$run$1(Application.java:152)
at com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
at com.sun.glass.ui.win.WinApplication.lambda$runLoop$3(WinApplication.java:174)
at java.base/java.lang.Thread.run(Unknown Source)
Exception in thread "JavaFX Application Thread" java.lang.NullPointerException
at com.sun.javafx.tk.quantum.QuantumToolkit.assignScreensAdapters(QuantumToolkit.java:695)
at com.sun.javafx.tk.quantum.QuantumToolkit.runToolkit(QuantumToolkit.java:313)
at com.sun.javafx.tk.quantum.QuantumToolkit.lambda$startup$10(QuantumToolkit.java:258)
at com.sun.glass.ui.Application.lambda$run$1(Application.java:153)
at com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
at com.sun.glass.ui.win.WinApplication.lambda$runLoop$3(WinApplication.java:174)
at java.base/java.lang.Thread.run(Unknown Source)
I found some discussion related to the same exception but regarding Maven+Eclipse, here and here. The issue is very similar, user reporting it, do have other java installations and uninstalling them solves the issue, so basically leaving my bundled openjdk as the only option the application starts, but if there is another Java installed on the system, the wrong .dll is picked up and the application crashes with the above stacktrace.
I tried the suggested java.library.path workaround but users are saying it doesn't solve. Unfortunately I cannot reproduce it myself, any idea on how to solve it or what to ask the users reporting it?
EDIT: we fixed the exe generated by lauch4j here and the bash script here. The idea is basically to restrict/change the PATH env variable to avoid the wrong dll being picked up.
We fixed the exe generated by lauch4j here and the bash script here. The idea is basically to restrict/change the PATH env variable to avoid the wrong dll being picked up.
This is a followup of JPA/Hibernate hangs on production during EntityManagerFactory creation
I have managed to get a thread dump during "hanging" state and I have discovered that the problem is related to file system listing operation.
In general, this process can hang in this state during application bootstrap for couple (~30) minutes. Any walkawrounds or fixes for this??
"main" #1 prio=5 os_prio=0 tid=0x00000000010c9000 nid=0x2c73 runnable [0x00007f4c928f5000]
java.lang.Thread.State: RUNNABLE
at java.io.UnixFileSystem.list(Native Method)
at java.io.File.list(File.java:1122)
at java.io.File.listFiles(File.java:1207)
at org.hibernate.boot.archive.internal.ExplodedArchiveDescriptor.processDirectory(ExplodedArchiveDescriptor.java:105)
at org.hibernate.boot.archive.internal.ExplodedArchiveDescriptor.processDirectory(ExplodedArchiveDescriptor.java:118)
at org.hibernate.boot.archive.internal.ExplodedArchiveDescriptor.visitArchive(ExplodedArchiveDescriptor.java:54)
at org.hibernate.boot.archive.scan.spi.AbstractScannerImpl.scan(AbstractScannerImpl.java:47)
at org.hibernate.boot.model.process.internal.ScanningCoordinator.coordinateScan(ScanningCoordinator.java:75)
at org.hibernate.boot.model.process.spi.MetadataBuildingProcess.prepare(MetadataBuildingProcess.java:98)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl.<init>(EntityManagerFactoryBuilderImpl.java:227)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl.<init>(EntityManagerFactoryBuilderImpl.java:169)
at org.hibernate.jpa.boot.spi.Bootstrap.getEntityManagerFactoryBuilder(Bootstrap.java:36)
at org.hibernate.jpa.HibernatePersistenceProvider.getEntityManagerFactoryBuilder(HibernatePersistenceProvider.java:181)
at org.hibernate.jpa.HibernatePersistenceProvider.getEntityManagerFactoryBuilderOrNull(HibernatePersistenceProvider.java:129)
at org.hibernate.jpa.HibernatePersistenceProvider.getEntityManagerFactoryBuilderOrNull(HibernatePersistenceProvider.java:71)
at org.hibernate.jpa.HibernatePersistenceProvider.createEntityManagerFactory(HibernatePersistenceProvider.java:52)
at javax.persistence.Persistence.createEntityManagerFactory(Persistence.java:55)
at javax.persistence.Persistence.createEntityManagerFactory(Persistence.java:39)
The problem for me as well as for marc82ch was having enomerous count of files (external) on the application classpath. For instance, in my case I had log files directory included in classpath with tons of files.
I've got a program that's packaged as a .jar I need to run for school. In essence the program acts as an interface between a user and a DC motor to control speed, angle, etc.
This program (which required MS C++ to install) runs well on everyone's machines running win7 or 8, but not on my XP_x64 machine. When opened from the start menu, it spawns multiple javaw.exe processes, but no application is created. Run from command line, I find this:
C:\Program Files\Quanser\QICii_USB\bin>java -jar usbQICii.jar
Exception in thread "main" java.lang.UnsatisfiedLinkError: Can't load library: C
:\Program Files\Quanser\QICii_USB\bin\lib\usbQICii_jni.dll
at java.lang.ClassLoader.loadLibrary(Unknown Source)
at java.lang.Runtime.load0(Unknown Source)
at java.lang.System.load(Unknown Source)
at com.quanser.raskin.QIC_USB.<clinit>(Unknown Source)
at com.quanser.conduit.pic.PICSource.<init>(Unknown Source)
at com.quanser.raskin.RaskinFrame.<init>(Unknown Source)
at com.quanser.raskin.Raskin.<init>(Unknown Source)
at com.quanser.raskin.Raskin.main(Unknown Source)
I've so far been unable to locate the requested .dll on my system. Two primary questions: is there something obvious I've missed? If I could find the .dll on someone else' machine (so far a no-go) could I grab it and use it on mine (x64 compatibility pending, of course).
Check where usbQICii_jni.dll is located.
My guess is that it is in
"C:\Program Files\Quanser\QICii_USB\lib"
If I'm right, go one directory up, and form
"C:\Program Files\Quanser\QICii_USB"
execute
"java -jar bin\usbQICii.jar"
I am running hadoop cluster with Ubuntu host as master-slave and virtual machine running on it as another slave(2 node cluster).
It seems the solution to the problem which is supposed to be resolved at No data nodes are started is not working for me. I tried both the solutions explained there.
It seems that when i manually equate the namespace ids of the affected datanodes to name node
and start the cluster(solution 2 in the linked post) i still get the same error( DataStreamer Exception).
Next the logs of one of the datanode shows the same Incompatible namespaceIDs error, but the namespace id of data node which is shown in the log is different from that present my tmp/dfs/data/current/version file(which is not changed and is same as that of tmp/dfs/name/current/version)
After many hours of debugging i am still clueless :(.
PS:
There is no connection problem from my host to slave
When i start the cluster using start-dfs.sh then datanodes on both the nodes are started which is normall just to clarify.
This error i am facing when i copy file from local to hdfs.
I performed a simple test after all this
Deleted the tmp/dfs/data and tmp/dfs/name folder on master
Deleted tmp/dfs/data on slave
format the namenode using hadoop namenode -format
started the cluster using start-dfs. all nodes
it started normally on master and datanode is also up on slave
now ran the copyfromLocal command and it gave me the
same error as below
But this time there is no namespace mismatch error in any of datanode logs master or slave
14/05/04 04:12:54 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/dsingh/mysample could only be replicated to 0 nodes, instead of
1 at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113) at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at
com.sun.proxy.$Proxy1.addBlock(Unknown Source) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source) at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)
14/05/04 04:12:54 WARN hdfs.DFSClient: Error Recovery for null bad
datanode[0] nodes == null 14/05/04 04:12:54 WARN hdfs.DFSClient: Could
not get block locations. Source file "/user/dsingh/mysample" -
Aborting... put: java.io.IOException: File /user/dsingh/mysample could
only be replicated to 0 nodes, instead of 1 14/05/04 04:12:54 ERROR
hdfs.DFSClient: Failed to close file /user/dsingh/mysample
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/dsingh/mysample could only be replicated to 0 nodes, instead of
1 at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113) at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at
com.sun.proxy.$Proxy1.addBlock(Unknown Source) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source) at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)
Any clue will help me .
After working for few hours on this issue. I finally gave up and it is still unresolved in my universe of knowledge.]
But the good thing is that instead of using a virtual box as slave on the same machine I connected another ubuntu machine with my master and every thing worked like charm :)
The problem i guess could be related to limited virtual memory allocation for storage in Virtual machine(It was less than 500Mb) in my case and i have read somewhere that each node in the cluster should have atleast 10 GB of free space to keep HDFS happy.
My take away if possible try the hadoop cluster on 2 separate machines rather than using Virtual machine on same host
After you did -copyFromLocal, seems like the Datanode was up to get request to write the file. However, it wasn't able to allocate the blocks needed for the file. Please check the Datanode log to see exactly what happened. Also, run "hdfs dfsadmin -report" to make sure you have enough space on the Datanode.
I have faced the same problem. This is all about the lack of space dedicated to hdfs.
I have 10 virtual machine (vmware) nodes which has 3.5 GB average storage for hdfs.I am using hadoop 2.6.
You can decrease the number of replication by going your "_hadoop_location/etc/hadoop/hdfs-site.xml" (for hadoop 2.6) configuration file's value of "dfs.replication" property.You can decrease to small number (like 1 or 2) and then try to keep file smaller than your total space.
If it shows the same problem try file size smaller than you used last time or recreate machine with larger disk size.
May be late but it may help others who faced the same problem :)
Thank you.
I have a basic java server app that has 100 worker threads that do simple HEAD requests on urls. I'm using HttpClient 4.x for this.
A few minutes into the run my program just freezes for a couple minutes and I cannot figure out why. Check out the screen shot of what visual vm monitor reports. You can see it flatline. During this time I'm unable to get a good thread dump and visual vm just freezes until it's unblocked. Does anyone have any ideas on what I can do to try and start debugging this guy?
Visual VM: http://tinypic.com/view.php?pic=2i915bs&s=7
Here is the output when I tried to take a jstack dump while it was frozen:
jstack -F 4325
Attaching to process ID 4325, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 16.3-b01
Deadlock Detection:
No deadlocks found.
Thread 4557: (state = BLOCKED)
Error occurred during stack walking:
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:152)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.getThreadIntegerRegisterSet(LinuxDebuggerLocal.java:466)
at sun.jvm.hotspot.debugger.linux.LinuxThread.getContext(LinuxThread.java:65)
at sun.jvm.hotspot.runtime.linux_amd64.LinuxAMD64JavaThreadPDAccess.getCurrentFrameGuess(LinuxAMD64JavaThreadPDAccess.java:92)
at sun.jvm.hotspot.runtime.JavaThread.getCurrentFrameGuess(JavaThread.java:256)
at sun.jvm.hotspot.runtime.JavaThread.getLastJavaVFrameDbg(JavaThread.java:218)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:76)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
at sun.tools.jstack.JStack.main(JStack.java:84)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.getThreadIntegerRegisterSet0(Native Method)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.access$800(LinuxDebuggerLocal.java:51)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$1GetThreadIntegerRegisterSetTask.doit(LinuxDebuggerLocal.java:460)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.run(LinuxDebuggerLocal.java:127)
I've seen several bug reports about jstack on Linux with a similar trace:
JVM Bug Id: 6494722 (is supposed to be fixed)
Ubuntu Bug #597098 (this one is not)
Do you get the same result with a kill -3 <pid>?
Very likley due to too much memory usage causing GC. Add the params to java:
-verbosegc -XX:+PrintGCDetails
And see if you notice anything obvious in the output/logs
What worked for me was running jstack as the process owner without -F.