I'm building a high performance network application that needs to pump out hundreds of megabits per second. It is UDP based and I'm using a DatagramChannel to send the data.
My application is separated into two parts, with one thread doing each part. One part is reading from something, doing some processing and putting the result in a queue to be sent. The second part is reading from the queue and writing to the DatagramChannel.
The thread doing the writing is falling behind greatly (sometimes causing out of memory errors due to how much data is in the queue) and I'm wondering if there are ways to speed up the DatagramChannel write operation.
Related
Trying to get how Java sockets operate. A question is: what can you do simultaneously if you are using socket Java API, and what happens if we send and read data with some delay?
READ & WRITE at once. If one socket-client connected to one spcket-server, can they BOTH read and write at the same time? As far as I understand, TCP protocol is full-duplex, so theoretically socket should be able to read and write at one, but we have to create two threads for bot client and server. Am I right?
WRITE to N clients at once. If several socket-clients connected to one socket-server, can server read several clients at one moment, can server write to several clients at one moment?
If maximum possible physical speed rate of NetworkCard is 1kbyte/sec and 5 clients are connected, which speed is it possible to write with to one client?
How can I implement sequential sending of data in both directions? I mean I want to send N bytes from server to client, then M bytes from client to server, then N from server to client etc. The problem is if any of the two sides has written something to the channel, the other side will stop reading that data (read() == -1) only if channel is closed, which means that we cannot reuse it and have to open another connection. Or, may be, we should place readers and writers to different threads which do their job with read() and write() until connection is closed?
Imagine we have a delay between calling write(); flush() on one side, and calling read() on the other side. During the delay - where the written data would be stored? Would it be transmitted? What is the max size of that "delayed" data to be stored somewhere "between"?
Correct. If you're using blocking I/O, you'll need a reader thread and a writer thread for each Socket connection.
You could use a single thread to write to N clients at once, but you run the risk of blocking on a write. I won't address the writing speeds here, as it would depend on several things, but obviously the cumulative writing speed to all clients would be under 1kbps.
Yes, you'll need 2 threads, you can't do this with a single thread (or you could, but as you said yourself, you'd need to constantly open and close connections).
It would be stored in a buffer somewhere. Depending on your code it could be in a Buffered stream, or the socket's own buffer. I believe the default buffer size of BufferedOutputStream is 8K, and the socket's own buffer would depend on the environment. It shouldn't really be of importance though, the streaming quality of TCP/IP removes the need to think about buffers unless you really need to do fine-tuning.
I am studying the structure for my server-client communications in my multiplayer game.
I came to the conclusion that UDP is the best choice due to the "shoot and forget" way of using it that will not block the application if a packet is lost.
I will also use TCP to send reliability needing packets, like during login procedures and exchange of informations like change of server, change of map, updates etc. It will also run an IRC based chat. (all the commands actually are IRC-style custom messages).
I was wondering what is the best way to send the interaction messages (moves, spells, objects, actions etc) between server and clients.
Reading some documentation I came to the MulticastSocket.
My question is:
Is better to send a continuous flow of information to all the clients starting a thread for each player (as I do in TCP communications) where each DatagramSockets will listen to a queue sending each new message to its client. This will mean that all the maps and all the movements (supposing there can be 50 players all-over the maps) will be sent to all the players, and each packet has to be larger to contain all those informations.
Or is better to use a thread for each map, active only if some player are inside that specific map, using a multicast communication, sending a message to only the players that are inside that map, listening with a MulticastSocket.
I read about problems with firewall or routers using multicast, but I can't figure out what those problems can be (different from normal UDP).
The application should be used by anyone with few configuration problems.
Looking at your scenario above you need to decided if your application absolutely needs TCP connection as TCP connection requires one thread per TCP connection, no exceptions (unless using nio).
Now to target the UDP section of the program, you have two basic choices:
a) You spawn one thread for receiving datagram packets for all players.
In this case, all players send their datagram packets to a single receiver which then decides what to do with the data. This data may be sent to various queues for other threads for processing. Data can be sent back to all players using a single thread or multiple threads (per player).
PROS:
Low resource usage
Low program (synchronization) overhead.
CONS:
Possible network slowness (due to masses of packets going towards the same socket)
Higher chance of packet drop (again due to masses of packets going to the same socket)
Serial processing
Disconnect events are messy and hard to deal with
b) You spawn one thread per player and listen on a different port per player.
In this case, all players get their own handler threads which may process the data directly or send it to a central processing queue. By doing this, data can be processed in parallel, allowing for faster processing speeds with a higher resource usage. Synchronization will also require special attention, uses of atomics and re-entrant read/write locks may be needed. Writing back out to the socket should generally occur on another "per player thread".
PROS:
Parallel Processing
Modular (have all the handling code per player in one thread, start thread on player join)
Disconnects are easier to handle and don't cause problems with other players.
Fast network response, concurrent packet receiving.
CONS:
High resource usage (a lot more objects)
High synchronization overhead
High thread count (may be as many as 2 ~ 4x threads to players ratio)
In either case, by using TCP you will need at least one thread per player. The question is are you willing to use a lot more resources for a smoother, swifter response from the server.
I'm the main developer of an online game.
Players use a specific client software that connects to the game server with TCP/IP (TCP, not UDP)
At the moment, the architecture of the server is a classic multithreaded server with one thread per connection.
But in peak hours, when there are often 300 or 400 connected people, the server is getting more and more laggy.
I was wondering, if by switching to a java.nio.* asynchronous I/O model with few threads managing many connections, if the performances would be better.
Finding example codes on the web that cover the basics of such a server architecture is very easy. However, after hours of googling, I didn't find the answers to some more advanced questions:
1 - The protocol is text-based, not binary-based. The clients and the server exchanges lines of text encoded in UTF-8. A single line of text represents a single command, each lines are properly terminated by \n or \r\n.
For the classic multithreaded server, I have that kind of code :
public Connection (Socket sock) {
this.in = new BufferedReader( new InputStreamReader( sock.getInputStream(), "UTF-8" ));
this.out = new BufferedWriter( new OutputStreamWriter(sock.getOutputStream(), "UTF-8"));
new Thread(this) .start();
}
And then in run, data are read line by line with readLine.
In the doc, I found an utilitiy class Channels that can create a Reader out of a SocketChannel. But it is said that the produced Reader wont work if the Channel is in non-blocking mode, what contradicts the fact that non-blocking mode is mandatory to use the highly performant channel selection API I'm willing to use. So, I suspect that it isn't the right solution for what I would like to do.
The first question is therefore the following: if I can't use that, how to efficiently and properly take care of breaking lines and converting native java strings from/to UTF-8 encoded data in the nio API, with buffers and channels?
Do I have to play with get/put or inside the wrapped byte array by hand? How to go from ByteBuffer to strings encoded in UTF-8 ? I admit to don't understand very well how to use classes in the charset package and how it works to do that.
2 - In the asynchronous/non-blocking I/O world, what about the handling of consecutive read/write that have by nature to be executed sequencially one after the other?
For example, the login procedure, which is typicly challenge-response-based: the server sends a question (a particular computation), the client sends the response, and then the server checks the response given by the client.
The answer is, I think, certainly not to make a single task to send to worker threads for the whole login process, as it is quite long, with the risk to freeze worker threads for too much time (Imagine that scenario: 10 pool threads, 10 players try to connect at the same time; tasks related to players already online are delayed until one thread is again ready).
3 - What happens if two different threads simultaneously call Channel.write(ByteBuffer) on the same Channel?
Do the client might receive mixed up lines ? For example if a thread sends "aaaaa" and another sends "bbbbb", could the client receive "aaabbbbbaa", or am I ensured that everyting is sent in a consist order? Am I allowed to modify the buffer used right after the call returned?
Or asked differently, do I need additional synchronization to avoid this sort of situation?
If I need additionnal synchronization, how to know when release locks and so on, upon write finishes?
I'm afraid that the answer isn't as simple as registering for OP_WRITE in the selector. By trying that, I noticed that I get the write-ready event all the time and always for all clients, exiting Selector.select early mostly for nothing, since there are only 3 or 4 messages to send pers second per client, while the selection loop is performed hundreds of times per second. So, potentially, active wait in perspective, what is very bad.
4 - Can multiple threads call Selector.select on the same selector simultaneously without any concurrency problems such as missing an event, scheduling it twice, etc?
5 - In fact, is nio as good as it is said to be ? Would it be interesting to stay to classic multithreaded model, but unstead of creating a thread per connection, use fewer threads and loop over the connections to look for data availability using InputStream.isAvailable ? Is that idea stupid and/or inefficient?
1) Yes. I think that you need to write your own nonblocking readLine method. Note also that a nonblocking read may be signaled when there are several lines in the buffer, or when there is an incomplete line:
Example: (first read)
USER foo
PASS
(second read)
bar
You will need to store (see 2) the data that was not consumed, until enough information is ready to process it.
//channel was select for OP_READ
read data from channel
prepend data from previous read
split complete lines
save incomplete line
execute commands
2) You will need to keep the state of each client.
Map<SocketChannel,State> clients = new HashMap<SocketChannel,State>();
when a channel is connected, put a fresh state into the map
clients.put(channel,new State());
Or store the current state as the attached object of the SelectionKey.
Then, when executing each command, update the state. You may write it as a monolithic method, or do something more fancy such as polymorphic implementations of State, where each state knows how to deal with some commands (e.g. LoginState expects USER and PASS, then you change the state into a new AuthorizedState).
3) I don't recall using NIO with many asynchronous writers per channel, but the documentation says it is thread safe (I won't elaborate, since I have no proof of this). About OP_WRITE, note that it signals when the write buffer is not full. In other words, as said here: OP_WRITE is almost always ready, i.e. except when the socket send buffer is full, so you will just cause your Selector.select() method to spin mindlessly.
4) Yes. Selector.select() performs a blocking selection operation.
5) I think that the most difficult part is switching from a thread-per-client architecture, to a different design where reads and writes are decoupled from processing. Once you have done that, it is easier to work with channels than working your own way with blocking streams.
I'm really new to programming and having performance problems with my software. Basically I get some data and run a 100 loop on it(i=0;i<100;i++) and during that loop my program makes 1 of 3 decisions, keep the data its working on, discard it, or send a version of it back to the queue to process. The individual work each thread does is very small but there's a lot of it(which is why I'm using a queue server to scale horizontally).
My problem is it never takes close to my entire cpu, my program runs at around 40% per core. After profiling, it seems the majority of the time is spend sending/receiving data from the queue(64% approx. in a part called com.rabbitmq.client.impl.Frame.readFrom(DataInputStream) and com.rabbitmq.client.impl.SocketFrameHandler.readFrame(), 17% approx. is getting it in the format for the queue(I brought down from 40% before) and the rest is spend on my programs logic). Obviously, I want my work to be done faster and want to not have it spend so much time in the queue and I'm wondering if there's a better design I can use.
My code is actually quite large but here's a overview of what it does:
I create a connection to the queue server(rabbitmq and java)
I fork as many threads as I have cpu cores(using the same connection)
Data from thread is
each thread creates its own channel to the queue server using the shared connection.
There'a while loop that pools the server and gets X number of messages without acknowledgments
Once I get a message, I use thread executor to send an acknowledge while my job is running
I parse the message and run my loop
If data is sent back to the queue, I send it to a thread executor that sends it back so my program can proceed with the next data set.
One weird thing I did, was although I use thread executor for acknowledgments and sending to the queue, my main worker thread is just a forked thread(using public void run()) because my program is dedicated to this single process I did that to make sure there was always X number of threads ready to work(and there was no shutting down/respawning of them). The rest is in threads because I figured the rest could wait/be queued while my main program runs.
I'm not sure how to design it better so it spends less time gathering/sending data. Is there any designs, rabbitmq, Java things I can use to help?
If it's not IO wait, then I suspect that it's down to some locking going on inside those methods.
It looks to me like your threads are spending a significant amount of time waiting for them to return. Somewhat counter-intuitively, you might well be able to increase your performance by cutting down on the number of threads, since they'll spend less time tripping over each other and more time actively doing something.
Give it a try and see what affect it has on the profile.
At the moment we have an architecture where a server streams data to the client. We're finding instances where the client cannot process data quickly enough, the buffer overflows and the client is disconnected. Node.js has a pump pattern whereby a stream can be paused if data is not fully flushed and then resumed once the stream is drained. How would I do the equivalent pause/resume cycle in Java?
It's not exactly the same thing, but it sounds like a variation on the producer/consumer theme to me. Put a blocking queue between the two. If the consumer can't keep up, the blocking queue continues to accept messages from the producer and accumulate them until the consumer is ready.
Or maybe you mean this.