I'm writing a program using java, where I use some error printing statements for debugging.
My program generates about 2000 threads. The program runs fine till the moment when a large number of threads access this statement:
System.err.println("Some error message");
When this happens one of my threads successfully manages to get access to println function, while the other threads have status:
State in JVM: Waiting for synchronized block
Digging deeper in the debugging statement I noticed that the thread which managed to access println function is stopped at this function:
private native void writeBytes(byte b[], int off, int len , boolean append) throws IOException;
and it has the following stack trace:
java.io.FileOutputStream.write(FileOutputStream.java:327)
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
java.io.PrintStream.write(PrintStream.java:482)
sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
java.io.PrintStream.write(PrintStream.java:527)
java.io.PrintStream.print(PrintStream.java:669)
java.io.PrintStream.println(PrintStream.java:806)
fetcher.responseHandler.ExtendedResponseHandler500.handleResponse(ExtendedResponseHandler500.java:20)
fetcher.FetchWorker.run(FetchWorker.java:79)
java.lang.Thread.run(Thread.java:745)
While the other threads are stopped at the first line of the println function (inside the java core code):
synchronized(this)
Is this problem caused by me? or is this error related to JVM? Can I do anything about this issue?
The most likely cause is that the output stream of the process isn't being consumed by the parent process, so the stdout buffer fills up and then the next call to System.err.println just hangs forever.
This is common when one process is used to launch another, but doesn't set up "flushing" threads to drain the child's stdout and stderr streams.
Note that this doesn't have anything in particular to do with "threading" - but launching many threads can certainly increase the rate at which errors are generated (and perhaps cause more total errors if something else fails due to contention downstream) which means your output buffer fills up faster and hangs earlier.
It is perfectly normal for 2000 threads to be waiting to acquire lock on the println call.
Your stack trace shows that you are getting some HTTP 500 errors. Probably the majority your threads got this error and now are all in line to report it on the standard error. What you are seeing is the consequence of your problem, not the cause.
2000 threads is an insane number, it will not improve performances for about any reasonable scenario and will most likely degrade performances. Start with something like 4 and see if incrementally doubling that value gives you any improvement. The JVM can handle this number of threads (so this is NOT the source of your problem) but it is just useless. Using more JVM will not fix the problem (that is probably a simple HTTP 500 and/or network timeouts).
Also check the server side logs.
If you need to maximize performances (in a real high concurrency scenario) consider asynchio, but normal IO is extremely good for all common cases and seems fine here.
Update:
My hypotesis is this:
a thread does a remote call
it gets a 500 error
it joins the line to report the error behind other 2000 threads
it succedes and goes back to point 1
The step 3 may take a lot of time so even taking multiple thread dumps you will see the very same thread that appears always locaked (I mean that you need to be very lucky to see the Thread-1352 during the steps 1 or 2). I'm assuming that you checked the thread name and that the locked thread we are discussing is always the same.
Do you see any log while the program "freeze" (does it freezes, right?) or everything is still? How many thread dumps did you took, how much time in between?
Related
Summary
From my studies, I don't remember that a concept such "uninterruptible block" exists, and I did not find it either with a quick Google search.
Expected answer
yes, it does exist, and the proper term for that is ... (in this case, it would be nice, if someone could explain me, why it does not exist in Java)
no, it does not exist, because ...
Definition
By "uninterruptible block", I mean a section of code, in a multi-threading context, which, once starts execution, cannot be interrupted by other threads. I.e., the CPU (or the JVM), won't run any other thread at all, until the "atomic block" is left.
Note, that this is not the same as a section marked by lock/mutex/... etc., because such section can not be interrupted only by other threads, which acquire the same lock or mutex. But other threads can still interrupt it.
EDIT, in response to comments It would be fine also, if it affected only the threads of the current process.
RE. multiple cores: I would say, yes, also the other cores should stop, and we accept the performance hit (or, if it is exclusive only for the current process, then the other cores could still run threads of other processes).
Background
First of all, it is clear, that, at least in Java, this concept does not exist:
Atomic as in uninterruptible: once the block starts, it can't be interrupted, even by task switching.
...
[this] cannot be guaranteed in Java - it doesn't provide access to the
"critical sections" primitives required for uninterruptibility.
However, it would have come in handy in the following case: a system sends a request and receives response A. After receiving the response, it has max. 3 seconds to send request B. Now, if multiple threads are running, doing this, then it can happen, that after receiving response A, the thread is interrupted, and one or more threads run, before the original thread has the chance to send out request B, and thus misses the 3 seconds deadline. The more threads are running, the bigger the risk that this happens. By marking the "receive A to send B" section "uninterruptible", this could be avoided.
Note, that locking this section would not solve the issue. (It would not prevent the JVM, from e.g. processing 10 new threads at the "send request A" phase, right after our thread received response A.)
EDIT: Re. global mutex. That would also not solve the issue. Basically, I want the threads to make Request A's (and some other stuff) simultaneously, but I want them to stop, when another thread received Response A, and is going to make Request B.
Now, I know, that this would not be a 100% solution either, because those threads that don't get scheduled right after receiving response A still could miss the deadline. But, at least, those who do, would for sure send out the second request in time.
Some further speculation
The classic concurrency problem a++ could be simply solved by uninterruptible { a++; }, without the need for locks (which can cause dead-lock, and, in any case, would probably be more expensive in terms of performance, than simply executing the three instructions required by a++, with a simple flag, that they must not be interrupted).
EDIT RE. CAS: of course, that's another solution too. However, it involves retrying, until the write succeeds, and it is also slightly more complex to use (at least in Java, we have to use AtomicXXX, instead of the primitive types for that).
I know, of course, that this could be easily abused, by marking large blocks of code as uninterruptible, but that is true for many concurrency primitives as well. (What's more, I also know, that my original use case would be also kind of an "abuse", since I'd be doing I/O in an uninterruptible block, still it would have been worth at least a try, if such concept did exist in Java.)
I am experimenting with a game mechanic in which players can run scripts on in-game computers. Script execution will be resource limited at a gameplay level to some amount of instructions per tick.
The following proof-of-concept demonstrates a basic level of sandboxing and throttling of arbitrary user code. It successfully runs ~250 instructions of poorly crafted 'user input' and then discards the coroutine. Unfortunately, the Java process never terminates. A little investigation in shows that the LuaThread created by LuaJ for the coroutine is hanging around forever.
SandboxTest.java:
public static void main(String[] args) {
Globals globals = JsePlatform.debugGlobals();
LuaValue chunk = globals.loadfile("res/test.lua");
chunk.call();
}
res/test.lua:
function sandbox(fn)
-- read script and set the environment
f = loadfile(fn, "t")
debug.setupvalue(f, 1, {print = print})
-- create a coroutine and have it yield every 50 instructions
local co = coroutine.create(f)
debug.sethook(co, coroutine.yield, "", 50)
-- demonstrate stepped execution, 5 'ticks'
for i = 1, 5 do
print("tick")
coroutine.resume(co)
end
end
sandbox("res/badfile.lua")
res/badfile.lua:
while 1 do
print("", "badfile")
end
The docs suggest that a coroutine that is considered unresumable will be garbage collected and an OrphanedThread exception will be thrown, signalling the LuaThread to end - but this is never happening. My question is in two parts:
Am I doing something fundamentally wrong to cause this behaviour?
If not, how should I handle this situation? From the source it appears that if I can get a reference to the LuaThread in Java I may be able to forcibly abandon it by issuing an interrupt(). Is this a good idea?
Reference: Lua / Java / LuaJ - Handling or Interrupting Infinite Loops and Threads
EDIT: I have posted a bug report over at the LuaJ SourceForge. It discusses the underlying issue (threads not being garbage collected as in the Lua spec) and suggests some ways to work around it.
It seems to be a limitation of LuaJ. I submitted a ticket earlier this year on Sourceforge as I see you've also done. The LuaThread class doesn't store references to the Java threads it creates, so you can't interrupt() those threads without modifying the LuaJ core to expose them:
new Thread(this, "Coroutine-"+(++coroutine_count)).start();
It may be dangerous to interrupt those threads without adding appropriate cleanup code to LuaJ.
Documentation that you provided for OrphanedThread also tells us that scope is the defining condition:
"Error sublcass that indicates a lua thread that is no longer referenced has been detected. The java thread in which this is thrown should correspond to a LuaThread being used as a coroutine that could not possibly be resumed again because there are no more references to the LuaThread with which it is associated. Rather than locking up resources forever, this error is thrown, and should fall through all the way to the thread's Thread.run() method."
Your code example doesn't cause all LuaThread references to disappear, so you shouldn't expect an exception to be thrown. CoroutineLib documentation indicates: Coroutines that are yielded but never resumed to complete their execution may not be collected by the garbage collector, so an OutOfMemoryError should actually be expected from the code you listed on SourceForge, if I'm not mistaken. LuaThread:52 also specifies: Applications should not catch OrphanedThread, because it can break the thread safety of luaj., which is yet another obstacle.
There also seem to be differences between empty and non-empty while loops in Lua/J. IIRC, empty loops (while true do end) don't obey all coroutine hook/tick rules. *Because no actions occur in an empty loop, there's no opportunity for certain hooks to occur (I need to test this again so please correct me otherwise!).
A forked version of LuaJ with the functionality we're looking for is used in the ComputerCraft mod for Minecraft, though it's designed only for the mod and isn't open source.
I encounter a problem with java threads and I am not sure if its related to my approach or if thread pooling is going to resolve what I am trying to achieve.
for (int i = 0; i < 100; i++) {
verifier[i]=new Thread();
verifier[i].start();
}
I initialize 100 threads and start them. In the threads the code that gets executed is just
con = (HttpURLConnection) website.openConnection(url);
// gets only the header
con.setRequestMethod("HEAD");
con.setConnectTimeout(2000); // set timeout to 2 seconds
These threads repeat the process above over a long list of url/data.
The first 50 threads execute almost instantly then they just stop for 60 seconds or so and then there is another spike of execution that 20 of them or so finish at the same time and so on. The same deadlock occurs even if there is 4 of them.
My first guess was a deadlock. I am not sure how to resolve the issue and maintain a constant execution pace, without deadlocks and stops.
I am looking for an explanation of why this occurs and how it can be resolved.
By DeadLock I reefer to the Java Virtual Machine and how it handles thread. Not deadlock caused by my threads.
SCREENSHOT OF THREAD EXECUTION:
It looks like the threads are dying for no reason and I don't know why?!
It could be that the operating system configurable limit of tcp/ip connections gets hit, which causes the JVM to block waiting for a new TCP/IP connection to be created, which will only happen if a connection already used get's closed.
This could help to find what is going on:
profile the run with visualvm which comes with the JVM itself (run it on the command line with jvisualvm). There should be indication of how many threads are created and why are they blocked, deadlocks, etc.
Wait for it to block and take thread dumps of the JVM process to check for deadlocks in the thread stack traces using jstack or visualvm, search for the deadlock keyword.
Check with netstat -nao the state of your TCP connections, to see if the operating system limit is getting hit, if there are many connections in CLOSE_WAIT at the times the blocking occurs
Are you behind a corporate proxy/firewall, you could be hitting some other sort of security limit that prevents you from opening more TCP connections, not necessarily the limit of the operating system
If none of this helps you can always edit the question with further findinds, but based on the description of the code other limits are getting hit that on a first look don't seem related to JVM thread deadlocks, hope this helps.
I'm really new to programming and having performance problems with my software. Basically I get some data and run a 100 loop on it(i=0;i<100;i++) and during that loop my program makes 1 of 3 decisions, keep the data its working on, discard it, or send a version of it back to the queue to process. The individual work each thread does is very small but there's a lot of it(which is why I'm using a queue server to scale horizontally).
My problem is it never takes close to my entire cpu, my program runs at around 40% per core. After profiling, it seems the majority of the time is spend sending/receiving data from the queue(64% approx. in a part called com.rabbitmq.client.impl.Frame.readFrom(DataInputStream) and com.rabbitmq.client.impl.SocketFrameHandler.readFrame(), 17% approx. is getting it in the format for the queue(I brought down from 40% before) and the rest is spend on my programs logic). Obviously, I want my work to be done faster and want to not have it spend so much time in the queue and I'm wondering if there's a better design I can use.
My code is actually quite large but here's a overview of what it does:
I create a connection to the queue server(rabbitmq and java)
I fork as many threads as I have cpu cores(using the same connection)
Data from thread is
each thread creates its own channel to the queue server using the shared connection.
There'a while loop that pools the server and gets X number of messages without acknowledgments
Once I get a message, I use thread executor to send an acknowledge while my job is running
I parse the message and run my loop
If data is sent back to the queue, I send it to a thread executor that sends it back so my program can proceed with the next data set.
One weird thing I did, was although I use thread executor for acknowledgments and sending to the queue, my main worker thread is just a forked thread(using public void run()) because my program is dedicated to this single process I did that to make sure there was always X number of threads ready to work(and there was no shutting down/respawning of them). The rest is in threads because I figured the rest could wait/be queued while my main program runs.
I'm not sure how to design it better so it spends less time gathering/sending data. Is there any designs, rabbitmq, Java things I can use to help?
If it's not IO wait, then I suspect that it's down to some locking going on inside those methods.
It looks to me like your threads are spending a significant amount of time waiting for them to return. Somewhat counter-intuitively, you might well be able to increase your performance by cutting down on the number of threads, since they'll spend less time tripping over each other and more time actively doing something.
Give it a try and see what affect it has on the profile.
I have a server-side application opens a socket thread for each connected client. I have a DataInputStream in each thread, which calls read(byte[]array) to read data. I also set the socket timeout to a few minutes. The main code is something like this:
while (dataInputStream.read(array) != -1) { do something... }
However, after several hours running, in jconsole with topthreads plugin, I can see several client threads use 20%ish CPU each. If I click on it, the call stack shows the thread is blocked on the above line, on the read() function.
I know read() function will normally block to wait for data. When blocked, it consumes little CPU cycles. Now it's using 20%ish each and my server is running more and more slower when more threads have the same problem. My server has about 5 connection requests per second, and this happens really rarely as in several hours only 5 threads have the problem.
I am really confused. Can someone help me?
when jvm is waiting to read data from a socket there is a lot more activities the system needs to constantly do..
I don't have the exact technique used but this link should give some idea..
why don't you try using a BufferedInputStream or any of the StreamReaders .. these classes would help in the performance.
you could try using classes from java.util.concurrent package to improve thread handling (creating a thread pool would help in reducing the total memory consumed, thereby helping in the overall system performance).. not sure if you are doing this already
while (dataInputStream.read(array) != -1) { do something... }
This code is wrong anyway. You need to store the return value of read() in a variable so you know how many bytes were returned. The rest of your application can't possibly be working reliably without this anyway, so worrying about timing at this stage is most premature.
However unless the array is exceptionally small, I doubt you are really using 20% CPU in here. More likely 20% of elapsed time is spent here. Blocking on a network read doesn't use any CPU.