Java DataInputStream.read() causing 20% constant CPU usage while being blocked. - java

I have a server-side application opens a socket thread for each connected client. I have a DataInputStream in each thread, which calls read(byte[]array) to read data. I also set the socket timeout to a few minutes. The main code is something like this:
while (dataInputStream.read(array) != -1) { do something... }
However, after several hours running, in jconsole with topthreads plugin, I can see several client threads use 20%ish CPU each. If I click on it, the call stack shows the thread is blocked on the above line, on the read() function.
I know read() function will normally block to wait for data. When blocked, it consumes little CPU cycles. Now it's using 20%ish each and my server is running more and more slower when more threads have the same problem. My server has about 5 connection requests per second, and this happens really rarely as in several hours only 5 threads have the problem.
I am really confused. Can someone help me?

when jvm is waiting to read data from a socket there is a lot more activities the system needs to constantly do..
I don't have the exact technique used but this link should give some idea..
why don't you try using a BufferedInputStream or any of the StreamReaders .. these classes would help in the performance.
you could try using classes from java.util.concurrent package to improve thread handling (creating a thread pool would help in reducing the total memory consumed, thereby helping in the overall system performance).. not sure if you are doing this already

while (dataInputStream.read(array) != -1) { do something... }
This code is wrong anyway. You need to store the return value of read() in a variable so you know how many bytes were returned. The rest of your application can't possibly be working reliably without this anyway, so worrying about timing at this stage is most premature.
However unless the array is exceptionally small, I doubt you are really using 20% CPU in here. More likely 20% of elapsed time is spent here. Blocking on a network read doesn't use any CPU.

Related

How to deal with a slow consumer in traditional Java NIO?

So, I've been brushing up my understanding of traditional Java non-blocking API. I'm a bit confused with a few aspects of the API that seem to force me to handle backpressure manually.
For example, the documentation on WritableByteChannel.write(ByteBuffer) says the following:
Unless otherwise specified, a write operation will return only after
writing all of the requested bytes. Some types of channels,
depending upon their state, may write only some of the bytes or
possibly none at all. A socket channel in non-blocking mode, for
example, cannot write any more bytes than are free in the socket's
output buffer.
Now, consider this example taken from Ron Hitchens book: Java NIO.
In the piece of code below, Ron is trying to demonstrate how we could implement an echo response in a non-blocking socket application (for context here's a gist with the full example).
//Use the same byte buffer for all channels. A single thread is
//servicing all the channels, so no danger of concurrent access.
private ByteBuffer buffer = ByteBuffer.allocateDirect(1024);
protected void readDataFromSocket(SelectionKey key) throws Exception {
var channel = (SocketChannel) key.channel();
buffer.clear(); //empty buffer
int count;
while((count = channel.read(buffer)) > 0) {
buffer.flip(); //make buffer readable
//Send data; don't assume it goes all at once
while(buffer.hasRemaining()) {
channel.write(buffer);
}
//WARNING: the above loop is evil. Because
//it's writing back to the same nonblocking
//channel it read the data from, this code
//can potentially spin in a busy loop. In real life
//you'd do something more useful than this.
buffer.clear(); //Empty buffer
}
if(count < 0) {
//Close channel on EOF, invalidates the key
channel.close();
}
}
My confusion is on the while loop writing into output channel stream:
//Send data; don't assume it goes all at once
while(buffer.hasRemaining()) {
channel.write(buffer);
}
It really confuses me how NIO is helping me here. Certainly the code may not be blocking as per the description of the WriteableByteChannel.write(ByteBuffer), because if the output channel cannot accept any more bytes because its buffer is full, this write operation does not block, it just writes nothing, returns, and the buffer remains unchanged. But --at least in this example-- there is no easy way to use the current thread in something more useful while we wait for the client to process those bytes. For all that matter, if I only had one thread, the other requests would be piling up in the selector while this while loop wastes precious cpu cycles “waiting” for the client buffer to open some space. There is no obvious way to register for readiness in the output channel. Or is there?
So, assuming that instead of an echo server I was trying to implement a response that needed to send a big number of bytes back to the client (e.g. a file download), and assuming that the client has a very low bandwidth or the output buffer is really small compared to the server buffer, the sending of this file could take a long time. It seems as if we need to use our precious cpu cycles attending other clients while our slow client is chewing our file download bytes.
If we have readiness in the input channel, but not on the output channel, it seems this thread could be using precious CPU cycles for nothing. It is not blocked, but it is as if it were since the thread is useless for undetermined periods of time doing insignificant CPU-bound work.
To deal with this, Hitchens' solution is to move this code to a new thread --which just moves the problem to another place--. Then I wonder, if we had to open a thread every time we need to process a long running request, how is Java NIO better than regular IO when it comes to processing this sort of requests?
It is not yet clear to me how I could use traditional Java NIO to deal with these scenarios. It is as if the promise of doing more with less resources would be broken in a case like this. What if I were implementing an HTTP server and I cannot know how long it would take to service a response to the client?
It appears as if this example is deeply flawed and a good design of the solution should consider listening for readiness on the output channel as well, e.g.:
registerChannel(selector, channel, SelectionKey.OP_WRITE);
But how would that solution look like? I’ve been trying to come up with that solution, but I don’t know how to achieve it appropriately.
I'm not looking for other frameworks like Netty, my intention is to understand the core Java APIs. I appreciate any insights anyone could share, any ideas on what is the proper way to deal with this back pressure scenario just using traditional Java NIO.
NIO's non-blocking mode enables a thread to request reading data from a channel, and only get what is currently available, or nothing at all, if no data is currently available. Rather than remain blocked until data becomes available for reading, the thread can go on with something else.
The same is true for non-blocking writing. A thread can request that some data be written to a channel, but not wait for it to be fully written. The thread can then go on and do something else in the meantime.
What threads spend their idle time on when not blocked in IO calls, is usually performing IO on other channels in the meantime. That is, a single thread can now manage multiple channels of input and output.
So I think you need to rely on the design of the solution by using a design pattern for handling this issue, maybe **Task or Strategy design pattern ** are good candidates and according to the framework or the application you are using you can decide the solution.
But in most cases you don't need to implement it your self as it's already implemented in Tomcat, Jetty etc.
Reference : Non blocking IO

Check if ObjectInputStream has anything to read without blocking?

I am building a server in java that communicates with several clients at the same time, the initial approach we had is the the server listens to connections from the clients, once a connection is received and a socket is created, a new thread is spawned to handle the communication with each client, that is read the request with an ObjectInputStream, do the desired operation (fetch data from the DB, update it, etc.), and send back a response to the client (if needed). While the server itself goes back to listen to more connections.
This works fine for the time being, however this approach is not really scalable, it works great for a small amount of clients connected at the same time, however since every client spawns another thread, what will happen when there are a too many clients connected at once?
So my next idea was to maintain a list of sorts that will hold all connected clients (the socket object and some extra info), use a ThreadPool for to iterate through them and read anything they sent, if a message was received then put it in a queue for execution by another ThreadPool of worker threads, and once the worker has finished with its task if a response is required then send it.
The 2 latter steps are pretty trivial to implement, the problem is that with the original thread per client implementation, I use ObjectInputStream.readObject() to read the message, and this method blocks until there is something to read, which is fine for this approach, but I can't use the same thing for the new approach, since if I block on every socket, I will never get to the ones that are further down the list.
So I need a way to check if I have anything to read before I call readObject(), so far I tried the following solutions:
Solution 1:
use ObjectInputStream.available() to check if there is anything available to read, this approach failed since this method seems to always return 0, regardless of whether there is an object in the stream or not. So this does not help at all.
Solution 2:
Use PushbackInputStream to check for the existence of the first unread byte in the stream, if it exists then push it back and read the object using the ObjectInputStream, and if it doesn't move on:
boolean available;
int b = pushbackinput.read();
if (b==-1)
available = false;
else
{
pushbackinput.unread(b);
available = true;
}
if (available)
{
Object message= objectinput.readObject();
// continue with what you need to do with that object
}
This turned out to be useless too, since read() blocks also if there is no input to read. It seems to only return the -1 option if the stream was closed. If the stream is still open but empty it just blocks, so this is no different than simply using ObjectInputStream.readObject();
Can anyone suggest an approach that will actually work?
This is a good question, and you've done some homework.... but it involves going through some history to get things right. Note, your issue is actually more to do with the socket-level communication rather than the ObjectInputStream:
The easiest way to do things in the past was to have a separate thread per socket. This was scalable to a point but threads were expensive and slow to create.
In response, for large systems, people created thread pools and would service the sockets on threads when there was work to do. This was complicated.
The Java language was then changed with the java.nio package which introduced the Selector together with non-blocking IO. This created a reliable (although sometimes confusing) way to service multiple sockets with fewer threads. In your case through, it would not help fully/much because you want to know when a full Object is ready to be read, not when there's just 'some' object.
In the interim the 'landscape' changed, and Java is now able to more efficiently create and manage threads. 'Current' thinking is that it is better/faster and easier to allocate a single thread per socket again.... see Java thread per connection model vs NIO
In your case, I would suggest that you stick with the thread-per-socket model, and you'll be fine. Java can scale and handle more threads than sockets, so you'll be fine.

Performance issue designing threaded consumer of queue

I'm really new to programming and having performance problems with my software. Basically I get some data and run a 100 loop on it(i=0;i<100;i++) and during that loop my program makes 1 of 3 decisions, keep the data its working on, discard it, or send a version of it back to the queue to process. The individual work each thread does is very small but there's a lot of it(which is why I'm using a queue server to scale horizontally).
My problem is it never takes close to my entire cpu, my program runs at around 40% per core. After profiling, it seems the majority of the time is spend sending/receiving data from the queue(64% approx. in a part called com.rabbitmq.client.impl.Frame.readFrom(DataInputStream) and com.rabbitmq.client.impl.SocketFrameHandler.readFrame(), 17% approx. is getting it in the format for the queue(I brought down from 40% before) and the rest is spend on my programs logic). Obviously, I want my work to be done faster and want to not have it spend so much time in the queue and I'm wondering if there's a better design I can use.
My code is actually quite large but here's a overview of what it does:
I create a connection to the queue server(rabbitmq and java)
I fork as many threads as I have cpu cores(using the same connection)
Data from thread is
each thread creates its own channel to the queue server using the shared connection.
There'a while loop that pools the server and gets X number of messages without acknowledgments
Once I get a message, I use thread executor to send an acknowledge while my job is running
I parse the message and run my loop
If data is sent back to the queue, I send it to a thread executor that sends it back so my program can proceed with the next data set.
One weird thing I did, was although I use thread executor for acknowledgments and sending to the queue, my main worker thread is just a forked thread(using public void run()) because my program is dedicated to this single process I did that to make sure there was always X number of threads ready to work(and there was no shutting down/respawning of them). The rest is in threads because I figured the rest could wait/be queued while my main program runs.
I'm not sure how to design it better so it spends less time gathering/sending data. Is there any designs, rabbitmq, Java things I can use to help?
If it's not IO wait, then I suspect that it's down to some locking going on inside those methods.
It looks to me like your threads are spending a significant amount of time waiting for them to return. Somewhat counter-intuitively, you might well be able to increase your performance by cutting down on the number of threads, since they'll spend less time tripping over each other and more time actively doing something.
Give it a try and see what affect it has on the profile.

Chat system in Java

Is there a way to immediately print the message received from the client without using an infinite loop to check whether the input stream is empty or not?
Because I found that using infinite loop consumes a lot of system resources, which makes the program running so slow. And we also have to do the same (infinite loop) on the client side to print the message on the screen in real time.
I'm using Java.
You should be dealing with the input stream in a separate Thread - and let it block waiting for input. It will not use any resources while it blocks. If you're seeing excessive resource usage while doing this sort of thing, you're doing it wrong.
I think you can just put your loop in a different thread and have it sleep a bit (maybe for half a second?) between iterations. It would still be an infinite loop, but it would not consume nearly as many resources.
You don't you change your architecture a little bit to accommodate WebSockets. check out Socket.IO . It is a cross browser WebSockets enabler.
You will have to write controllers (servlets for example in java) that push data to the client. This does not follow the request-response architecture.
You can also architect it so that a "push servlet" triggers a "request" from the client to obtain the "response".
Since your question talks about Java, and if you are interested in WebSockets, check this link out.
If you're using Sockets, which you should be for any networking.
Then you can use the socket's DataInputStream which you can get using socket.getInputStream() (i think that's the right method) and do the following:
public DataInputStream streamIn;
public Socket soc;
// initialize socket, etc...
streamIn = soc.getInputStream();
public String getInput() {
return (String) streamIn.readUTF(); // Do some other casting if this doesn't work
}
streamIn.readUTF() blocks until data is available, meaning you don't have to loop, and threading will let you do other processing while you wait for data.
Look here for more information on DataInputStream and what you can do with it: http://docs.oracle.com/javase/6/docs/api/java/io/DataInputStream.html
A method that does not require threads would involve subclassing the input stream and adding a notify type method. When called this method would alert any interested objects (i.e. objects that would have to change state due to the additions to the stream) that changes have been made. These interested objects could then respond in anyway that is desired.
Objects writing to the buffer would do their normal writing, and afterward would call the notify() method on the input stream, informing all interested objects of the change.
Edit: This might require subclassing more than a couple of classes and so could involve a lot of code changes. Without knowing more about your design you would have to decide if the implementation is worth the effort.
There are two approaches that avoid busy loops / sleeps.
Use a thread for each client connection, and simply have each thread call read. This blocks the thread until the client sends some data, but that's no problem because it doesn't block the threads handling other clients.
Use Java NIO channel selectors. These allow a thread to wait until one of set of channels (in this case sockets) has data to be read. There is a section of the Oracle Java Tutorials on this.
Of these two approaches, the second one is most efficient in terms of overall resource usage. (The thread-per-client approach uses a lot of memory on thread stacks, and CPU on thread switching overheads.)
Busy loops that repeatedly call (say) InputStream.available() to see if there is any input are horribly inefficient. You can make them less inefficient by slowing down the polling with Thread.sleep(...) calls, but this has the side effect of making the service less responsive. For instance, if you add a 1 second sleep between each set of polls, the effect that each client will see is that the server typically delays 1 second before processing each request. Assuming that those requests are keystrokes and the responses echo them, the net result is a horribly laggy service.

Best way to configure a Threadpool for a Java RIA client app

I've a Java client which accesses our server side over HTTP making several small requests to load each new page of data. We maintain a thread pool to handle all non UI processing, so any background client side tasks and any tasks which want to make a connection to the server. I've been looking into some performance issues and I'm not certain we've got our threadpool set up as well as possible. Currently we use a ThreadPoolExecutor with a core pool size of 8, we use a LinkedBlockingQueue for the work queue so the max pool size is ignored. No doubt there's no simple do this certain thing in all situations answer, but are there any best practices. My thinking at the moment is
1) I'll switch to using a SynchronousQueue instead of a LinkedBlockingQueue so the pool can grow to the max pool size figure.
2) I'll set the max pool size to be unlimited.
Basically my current fear is that occasional performance issues on the server side are causing unrelated client side processing to halt due to the upper limit on the thread pool size. My fear with unbounding it is the additional hit on managing those threads on the client, possibly just the better of 2 evils.
Any suggestions, best practices or useful references?
Cheers,
Robin
It sounds like you'd probably be better of limiting the queue size: does your application still behave properly when there are many requests queued (is it acceptable for all task to be queued for a long time, are some more important to others)? What happens if there are still queued tasks left and the user quits the application? If the queue growing very large, is there a chance that the server will catch-up (soon enough) to hide the problem completely from the user?
I'd say create one queue for requests whose response is needed to update the user interface, and keep its queue very small. If this queue gets too big, notify the user.
For real background tasks keep a separate pool, with a longer queue, but not infinite. Define graceful behavior for this pool when it grows or when the user wants to quit but there are tasks left, what should happen?
In general, network latencies are easily orders of magnitude higher than anything that can be happening in regards to memory allocation or thread management on the client side. So, as a general rule, if you are running into a performance bottle neck, look first and foremost to the networking link.
If the issue is that your server simply can not keep up with the requests from the clients, bumping up the threads on the client side is not going to help matters: you'll simply progress from having 8 threads waiting to get a response to more threads waiting (and you may even aggravate the server side issues by increasing its load due to higher number of connections it is managing).
Both of the concurrent queues in JDK are high performers; the choice really boils down to usage semantics. If you have non-blocking plumbing, then it is more natural to use the non-blocking queue. IF you don't, then using the blocking queues makes more sense. (You can always specify Integer.MAX_VALUE as the limit). If FIFO processing is not a requirement, make sure you do not specify fair ordering as that will entail a substantial performance hit.
As alphazero said, if you've got a bottleneck, your number of client side waiting jobs will continue to grow regardless of what approach you use.
The real question is how you want to deal with the bottleneck. Or more correctly, how you want your users to deal with the bottleneck.
If you use an unbounded queue, then you don't get feedback that the bottleneck has occurred. And in some applications, this is fine: if the user is kicking off asynchronous tasks, then there's no need to report a backlog (assuming it eventually clears). However, if the user needs to wait for a response before doing the next client-side task, this is very bad.
If you use LinkedBlockingQueue.offer() on a bounded queue, then you'll immediately get a response that says the queue is full, and can take action such as disabling certain application features, popping a dialog, whatever. This will, however, require more work on your part, particularly if requests can be submitted from multiple places. I'd suggest, if you don't have it already, you create a GUI-aware layer over the server queue to provide common behavior.
And, of course, never ever call LinkedBlockingQueue.put() from the event thread (unless you don't mind a hung client, that is).
Why not create an unbounded queue, but reject tasks (and maybe even inform the user that the server is busy (app dependent!)) when the queue reaches a certain size? You can then log this event and find out what happened on the server side for the backup to occur, Additionally, unless you are connecting to a multiple remote servers there is probably not much point having more than a couple of threads in the pool, although this does depend on your app and what it does and who it talks to.
Having an unbounded pool is usually dangerous as it generally doesn't degrade gracefully. Better to log the problem, raise an alert, prevent further actions being queued and figure out how to scale the server side, if the problem is there, to prevent this happening again.

Categories

Resources