Java client peer-to-multipeer using Netty

Java client peer-to-multipeer using Netty - java

I'm writing a process which must connect (and keep alive) to several (hundreds) remote peers and manage messaging / control over them.
I made two versions of this software: first with classic "thread-per-connection" model, the second using standard java NIO and selectors (to reduce thread allocation, but has problems). Then, looking around I found Netty can boost a lot in most cases and I started a third one using it. My goal is to keep resource usage quite low keeping it fast.
Once written the pipeline factory with custom events and dynamic handler switching, I stopped on the most superficial part: its allocation.
All the examples I read use a single client with single connection, so I got the doubt: I set up a ChannelFactory and a PipelineFactory, so every (new ClientBootstrap(factory)).connect(address) makes a new channel with a new pipeline. Is it possible to make a shared pipeline and defer business logic to a thread-pool?
If so, how?
Using standard java NIO I managed to use two small small thread pools (threads < remote peers) taking advantage of selectors; I had, however, troubles on recycling listened channels for writing.
Communication should happen through a single channel which can receive timed messages from the remote peer or make a 3-way control (command-ack-ok).
On second hand: once the event as reached the last handler, what happens? Is it there I extract it or can I extract a message from any point?

You should only have one bootstrap (i.e one ChannelFactory and one PipeLineFactory). Pipelines, or even individual channel handlers, may be shared, but they are usually created unique per channel.
You can have an ExecutionHandler in your pipeline to transfer execution from the IO worker threads to a thread pool.
But why don't you read the exhaustive documentation at http://netty.io/wiki/ ? You'll find answers to every question of your's there.

Related

Slow Loris in Netty as a client or server

I'm still pretty new to netty so please bare with me. There seems to be plenty of questions asking why a specefic netty implementation is slow and how to make it faster. But my use case is a bit different. I want to avoid low level socket implementations (hence netty) but I also know that blocking the event group is bad. I know I can dynamically manage the pipeline. I'm not sure I know enough about netty to know if this is possible, and I've not tried much that I don't already know is bad (thread.sleep for example). The protocol is HTTP but I also need it to be useful for other protocols.
But what I don't know is, for a single connection on a shared port, how to slow down the response of the server to the client, and vice versa? Or put more aptly: where, and what, would I implement the slowness required? My guess is the encoder for the where; but because of netty's approach, i haven't the foggiest for the what.

You say that you know that Thread.sleep is "bad" but it really depends on what you're trying to achieve and where you put the sleep. I believe that the best way to build this would be to use a DefaultEventExecutorGroup to offload the processing of your slow-down ChannelHandler onto non-event-loop threads and then call Thread.sleep in your handler.
From the ChannelPipeline javadoc, under the "Building a pipeline" section:
https://netty.io/4.1/api/io/netty/channel/ChannelPipeline.html
A user is supposed to have one or more ChannelHandlers in a pipeline to receive I/O events (e.g. read) and to request I/O operations (e.g. write and close). For example, a typical server will have the following handlers in each channel's pipeline, but your mileage may vary depending on the complexity and characteristics of the protocol and business logic:
Protocol Decoder - translates binary data (e.g. ByteBuf) into a Java object.
Protocol Encoder - translates a Java object into binary data.
Business Logic Handler - performs the actual business logic (e.g. database access).
and it could be represented as shown in the following example:
static final EventExecutorGroup group = new DefaultEventExecutorGroup(16);
...
ChannelPipeline pipeline = ch.pipeline();
pipeline.addLast("decoder", new MyProtocolDecoder());
pipeline.addLast("encoder", new MyProtocolEncoder());
// Tell the pipeline to run MyBusinessLogicHandler's event handler methods
// in a different thread than an I/O thread so that the I/O thread is not blocked by
// a time-consuming task.
// If your business logic is fully asynchronous or finished very quickly, you don't
// need to specify a group.
pipeline.addLast(group, "handler", new MyBusinessLogicHandler());
Be aware that while using DefaultEventLoopGroup will offload the operation from the EventLoop it will still process tasks in a serial fashion per ChannelHandlerContext and so guarantee ordering. Due the ordering it may still become a bottle-neck. If ordering is not a requirement for your use-case you may want to consider using UnorderedThreadPoolEventExecutor to maximize the parallelism of the task execution.

I hope someone can post a better (more explanative) answer than this but basically all that's needed is to use a ChannelTrafficShapingHandler with some small enough values.
For instance, a 2kb response with read and write limit of 512b, maxTime of 6000ms, and a checkInterval of 1000ms forces the response to take 4000ms with the ChannelTrafficShapingHandler, and 50ms without it when running both client and server locally. I expect those times to increase dramatically when on the network wire.
final ChannelTrafficShapingHandler channelTrafficShapingHandler = new ChannelTrafficShapingHandler(
getRateInBytesPerSecond(), getRateInBytesPerSecond(), getCheckInterval(), getMaxTime());
ch.addLast(channelTrafficShapingHandler);

From classic multithreaded to java.nio asynchronous/non-blocking server

I'm the main developer of an online game.
Players use a specific client software that connects to the game server with TCP/IP (TCP, not UDP)
At the moment, the architecture of the server is a classic multithreaded server with one thread per connection.
But in peak hours, when there are often 300 or 400 connected people, the server is getting more and more laggy.
I was wondering, if by switching to a java.nio.* asynchronous I/O model with few threads managing many connections, if the performances would be better.
Finding example codes on the web that cover the basics of such a server architecture is very easy. However, after hours of googling, I didn't find the answers to some more advanced questions:
1 - The protocol is text-based, not binary-based. The clients and the server exchanges lines of text encoded in UTF-8. A single line of text represents a single command, each lines are properly terminated by \n or \r\n.
For the classic multithreaded server, I have that kind of code :
public Connection (Socket sock) {
this.in = new BufferedReader( new InputStreamReader( sock.getInputStream(), "UTF-8" ));
this.out = new BufferedWriter( new OutputStreamWriter(sock.getOutputStream(), "UTF-8"));
new Thread(this) .start();
}
And then in run, data are read line by line with readLine.
In the doc, I found an utilitiy class Channels that can create a Reader out of a SocketChannel. But it is said that the produced Reader wont work if the Channel is in non-blocking mode, what contradicts the fact that non-blocking mode is mandatory to use the highly performant channel selection API I'm willing to use. So, I suspect that it isn't the right solution for what I would like to do.
The first question is therefore the following: if I can't use that, how to efficiently and properly take care of breaking lines and converting native java strings from/to UTF-8 encoded data in the nio API, with buffers and channels?
Do I have to play with get/put or inside the wrapped byte array by hand? How to go from ByteBuffer to strings encoded in UTF-8 ? I admit to don't understand very well how to use classes in the charset package and how it works to do that.
2 - In the asynchronous/non-blocking I/O world, what about the handling of consecutive read/write that have by nature to be executed sequencially one after the other?
For example, the login procedure, which is typicly challenge-response-based: the server sends a question (a particular computation), the client sends the response, and then the server checks the response given by the client.
The answer is, I think, certainly not to make a single task to send to worker threads for the whole login process, as it is quite long, with the risk to freeze worker threads for too much time (Imagine that scenario: 10 pool threads, 10 players try to connect at the same time; tasks related to players already online are delayed until one thread is again ready).
3 - What happens if two different threads simultaneously call Channel.write(ByteBuffer) on the same Channel?
Do the client might receive mixed up lines ? For example if a thread sends "aaaaa" and another sends "bbbbb", could the client receive "aaabbbbbaa", or am I ensured that everyting is sent in a consist order? Am I allowed to modify the buffer used right after the call returned?
Or asked differently, do I need additional synchronization to avoid this sort of situation?
If I need additionnal synchronization, how to know when release locks and so on, upon write finishes?
I'm afraid that the answer isn't as simple as registering for OP_WRITE in the selector. By trying that, I noticed that I get the write-ready event all the time and always for all clients, exiting Selector.select early mostly for nothing, since there are only 3 or 4 messages to send pers second per client, while the selection loop is performed hundreds of times per second. So, potentially, active wait in perspective, what is very bad.
4 - Can multiple threads call Selector.select on the same selector simultaneously without any concurrency problems such as missing an event, scheduling it twice, etc?
5 - In fact, is nio as good as it is said to be ? Would it be interesting to stay to classic multithreaded model, but unstead of creating a thread per connection, use fewer threads and loop over the connections to look for data availability using InputStream.isAvailable ? Is that idea stupid and/or inefficient?

1) Yes. I think that you need to write your own nonblocking readLine method. Note also that a nonblocking read may be signaled when there are several lines in the buffer, or when there is an incomplete line:
Example: (first read)
USER foo
PASS
(second read)
bar
You will need to store (see 2) the data that was not consumed, until enough information is ready to process it.
//channel was select for OP_READ
read data from channel
prepend data from previous read
split complete lines
save incomplete line
execute commands
2) You will need to keep the state of each client.
Map<SocketChannel,State> clients = new HashMap<SocketChannel,State>();
when a channel is connected, put a fresh state into the map
clients.put(channel,new State());
Or store the current state as the attached object of the SelectionKey.
Then, when executing each command, update the state. You may write it as a monolithic method, or do something more fancy such as polymorphic implementations of State, where each state knows how to deal with some commands (e.g. LoginState expects USER and PASS, then you change the state into a new AuthorizedState).
3) I don't recall using NIO with many asynchronous writers per channel, but the documentation says it is thread safe (I won't elaborate, since I have no proof of this). About OP_WRITE, note that it signals when the write buffer is not full. In other words, as said here: OP_WRITE is almost always ready, i.e. except when the socket send buffer is full, so you will just cause your Selector.select() method to spin mindlessly.
4) Yes. Selector.select() performs a blocking selection operation.
5) I think that the most difficult part is switching from a thread-per-client architecture, to a different design where reads and writes are decoupled from processing. Once you have done that, it is easier to work with channels than working your own way with blocking streams.

JMS Client Session Usage

I'm attempting to utilize the .NET Kaazing client in order to interact with a JMS back-end via web sockets. I'm struggling to understand the correct usage of sessions. Initially, I had a single session shared across all threads, but I noticed that this was not supported:
A Session object is a single-threaded context for producing and consuming messages. Although it may allocate provider resources outside the Java virtual machine (JVM), it is considered a lightweight JMS object.
The reason I had a single session was just because I thought that would yield better performance. Since the documentation claimed sessions were lightweight, I had no hesitation switching my code over to use a session per "operation". By "operation" I mean either sending a single message, or subscribing to a queue/topic. In the former case, the session is short-lived and closed immediately after the message is sent. In the latter case, the session needs to live as long as the subscription is active.
When I tried creating multiple sessions I got an error:
System.NotSupportedException: Only one non-transacted session can be active at a time
Googling this error was fruitless, so I tried switching over to transacted sessions. But when attempting to create a consumer I get a different error:
System.NotSupportedException: This operation is not supported in transacted sessions
So it seems I'm stuck between a rock and a hard place. The only possible options I see are to share my session across threads or to have a single, non-transacted session used to create consumers, and multiple transacted sessions for everything else. Both these approaches seem a little against the grain to me.
Can anyone shed some light on the correct way for me to handle sessions in my client?

There are several ways to add concurrency to your application. You could use multiple Connections, but that is probably not desirable due to an increase in network overhead. Better would be to implement a simple mechanism for handling the concurrency in the Message Listener by dispatching Tasks or by delivering messages via ConcurrentQueues. Here are some choices for implementation strategy:
The Task based approach would use a TaskScheduler. In the MessageListener, a task would be scheduled to handle the work and return immediately. You might schedule a new Task per message, for instance. At this point, the MessageListener would return and the next message would be immediately available. This approach would be fine for low throughput applications - e.g. a few messages per second - but where you need concurrency perhaps because some messages may take a long time to process.
Another approach would be to use a data structure of messages for work pending (ConcurrentQueue). When the MessageListener is invoked, each Message would be added to the ConcurrentQueue and return immediately. Then a separate set of threads/tasks can pull the messages from that ConcurrectQueue using an appropriate strategy for your application. This would work for a higher performance application.
A variation of this approach would be to have a ConcurrentQueue for each Thread processing inbound messages. Here the MessageListener would not manage its own ConcurrentQueue, but instead it would deliver the messages to the ConcurrentQueue associated with each thread. For instance, if you have inbound messages representing stock feeds and also news feeds, one thread (or set of threads) could process the stock feed messages, and another could process inbound news items separately.
Note that if you are using JMS Queues, each message will be acknowledged implicitly when your MessageListener returns. This may or may not be the behavior you want for your application.
For higher performance applications, you should consider approaches 2 and 3.

Is MQ publish/subscribe domain-specific interface generally faster than point-to-point?

I'm working on the existing application that uses transport layer with point-to-point MQ communication.
For each of the given list of accounts we need to retrieve some information.
Currently we have something like this to communicate with MQ:
responseObject getInfo(requestObject){
code to send message to MQ
code to retrieve message from MQ
}
As you can see we wait until it finishes completely before proceeding to the next account.
Due to performance issues we need to rework it.
There are 2 possible scenarios that I can think off at the moment.
1) Within an application to create a bunch of threads that would execute transport adapter for each account. Then get data from each task. I prefer this method, but some of the team members argue that transport layer is a better place for such change and we should place extra load on MQ instead of our application.
2) Rework transport layer to use publish/subscribe model.
Ideally I want something like this:
void send (requestObject){
code to send message to MQ
}
responseObject receive()
{
code to retrieve message from MQ
}
Then I will just send requests in the loop, and later retrieve data in the loop. The idea is that while first request is being processed by the back end system we don't have to wait for the response, but instead send next request.
My question, is it going to be a lot faster than current sequential retrieval?

The question title frames this as a choice between P2P and pub/sub but the question body frames it as a choice between threaded and pipelined processing. These are two completely different things.
Either code snippet provided could just as easily use P2P or pub/sub to put and get messages. The decision should not be based on speed but rather whether the interface in question requires a single message to be delivered to multiple receivers. If the answer is no then you probably want to stick with point-to-point, regardless of your application's threading model.
And, incidentally, the answer to the question posed in the title is "no." When you use the point-to-point model your messages resolve immediately to a destination or transmit queue and WebSphere MQ routes them from there. With pub/sub your message is handed off to an internal broker process that resolves zero to many possible destinations. Only after this step does the published message get put on a queue where, for the remainder of it's journey, it then is handled like any other point-to-point message. Although pub/sub is not normally noticeably slower than point-to-point the code path is longer and therefore, all other things being equal, it will add a bit more latency.
The other part of the question is about parallelism. You proposed either spinning up many threads or breaking the app up so that requests and replies are handled separately. A third option is to have multiple application instances running. You can combine any or all of these in your design. For example, you can spin up multiple request threads and multiple reply threads and then have application instances processing against multiple queue managers.
The key to this question is whether the messages have affinity to each other, to order dependencies or to the application instance or thread which created them. For example, if I am responding to an HTTP request with a request/reply then the thread attached to the HTTP session probably needs to be the one to receive the reply. But if the reply is truly asynchronous and all I need to do is update a database with the response data then having separate request and reply threads is helpful.
In either case, the ability to dynamically spin up or down the number of instances is helpful in managing peak workloads. If this is accomplished with threading alone then your performance scalability is bound to the upper limit of a single server. If this is accomplished by spinning up new application instances on the same or different server/QMgr then you get both scalability and workload balancing.
Please see the following article for more thoughts on these subjects: Mission:Messaging: Migration, failover, and scaling in a WebSphere MQ cluster
Also, go to the WebSphere MQ SupportPacs page and look for the Performance SupportPac for your platform and WMQ version. These are the ones with names beginning with MP**. These will show you the performance characteristics as the number of connected application instances varies.

It doesn't sound like you're thinking about this the right way. Regardless of the model you use (point-to-point or publish/subscribe), if your performance is bounded by a slow back-end system, neither will help speed up the process. If, however, you could theoretically issue more than one request at a time against the back-end system and expect to see a speed up, then you still don't really care if you do point-to-point or publish/subscribe. What you really care about is synchronous vs. asynchronous.
Your current approach for retrieving the data is clearly synchronous: you send the request message, and wait for the corresponding response message. You could do your communication asynchronously if you simply sent all the request messages in a row (perhaps in a loop) in one method, and then had a separate method (preferably on a different thread) monitoring the incoming topic for responses. This would ensure that your code would no longer block on individual requests. (This roughly corresponds to option 2, though without pub/sub.)
I think option 1 could get pretty unwieldly, depending on how many requests you actually have to make, though it, too, could be implemented without switching to a pub/sub channel.

The reworked approach will use fewer threads. Whether that makes the application faster depends on whether the overhead of managing a lot of threads is currently slowing you down. If you have fewer than 1000 threads (this is a very, very rough order-of-magnitude estimate!), i would guess it probably isn't. If you have more than that, it might well be.

Stateless Blocking Server Design

A little help please.
I am designing a stateless server that will have the following functionality:
Client submits a job to the server.
Client is blocked while the server tries to perform the job.
The server will spawn one or multiple threads to perform the job.
The job either finishes, times out or fails.
The appropriate response (based on the outcome) is created, the client is unblocked and the response is handed off to the client.
Here is what I have thought of so far.
Client submits a job to the server.
The server assigns an ID to the job, places the job on a Queue and then places the Client on an another queue (where it will be blocked).
Have a thread pool that will execute the job, fetch the result and appropriately create the response.
Based on ID, pick the client out of the queue (thereby unblocking it), give it the response and send it off.
Steps 1,3,4 seems quite straight forward however any ideas about how to put the client in a queue and then block it. Also, any pointers that would help me design this puppy would be appreciated.
Cheers

Why do you need to block the client? Seems like it would be easier to return (almost) immediately (after performing initial validation, if any) and give client a unique ID for a given job. Client would then be able to either poll using said ID or, perhaps, provide a callback.
Blocking means you're holding on to a socket which obviously limits the upper number of clients you can serve simultaneously. If that's not a concern for your scenario and you absolutely need to block (perhaps you have no control over client code and can't make them poll?), there's little sense in spawning threads to perform the job unless you can actually separate it into parallel tasks. The only "queue" in that case would be the one held by common thread pool. The workflow would basically be:
Create a thread pool (such as ThreadPoolExecutor)
For each client request:
If you have any parts of the job that you can execute in parallel, delegate them to the pool.
And / or do them in the current thread.
Wait until pooled job parts complete (if applicable).
Return results to client.
Shutdown the thread pool.
No IDs are needed per se; though you may need to use some sort of latch for 2.1 / 2.3 above.
Timeouts may be a tad tricky. If you need them to be more or less precise you'll have to keep your main thread (the one that received client request) free from work and have it signal submitted job parts (by flipping a flag) when timeout is reached and return immediately. You'll have to check said flag periodically and terminate your execution once it's flipped; pool will then reclaim the thread.

How are you communicating to the client?
I recommend you create an object to represent each job which holds job parameters and the socket (or other communication mechanism) to reach the client. The thread pool will then send the response to unblock the client at the end of job processing.

The timeouts will be somewhat tricky, and will have hidden gotcha's but the basic design would seem to be to straightforward, write a class that takes a Socket in the constructor. on socket.accept we just do a new socket processing instantiation, with great foresight and planning on scalability or if this is a bench-test-experiment, then the socket processing class just goes to the data processing stuff and when it returns you have some sort of boolean or numeric for the state or something, handy place for null btw, and ether writes the success to the Output Stream from the socket or informs client of a timeout or whatever your business needs are
If you have to have a scalable, effective design for long-running heavy-haulers, go directly to nio ... hand coded one-off solutions like I describe probably won't scale well but would provide fundamental conceptualizing basis for an nio design of code-correct work.
( sorry folks, I think directly in code - design patterns are then applied to the code after it is working. What does not hold up gets reworked then, not before )

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.