I'm trying to figure out the best way to go about writing data transfer code for a client/server system that handles multiple clients at once.
I'm already keeping a List of clients who connect (I'm using NIO non-blocking framework btw).
Isn't it costly on performance to iterate through every client with each read/write pass and write the buffer data to each channel? Is there a better/more efficient way of doing it?
I've been thinking about dividing up the buffer size based on the number of clients. Is that a viable solution?
Using selectors (as you seem to be doing) really pays when you're handling a really large number of clients (and why optimize for the case in which you don't have a large number of clients ;)
The bottleneck in such system is rarely the CPU which does the iteration, but the I/O anyway, so I wouldn't worry if I where you.
Related
I have to make a file synchronizer: an application that essentially synchronizes H24 a large amount of data files from many systems outside to my local system using essentially FTP, SFTP and NFS.
The streams are more than twenty, for each of them the logic is slightly different and it must be configurable.
One of the requirements is that if one of the streams for some reason falls down it must be possible to retrieve it on without restarting the entire system.
Another requirement is that the transfer rate is balanced. In other words, there must not be a stream or a part of them synchronized and another stream 10 hours late
I have some perplexity about architecture to be realized: if I realize a single multithread system I would have a very high thread count (more than 100 I would say) and make it complicated by fulfilling the two requirements outlined above.
I was thinking of realizing several processes or different instances of the same process even if It seems a little "ugly" .. so in this way some load balancing would be done by the operating system and it would be simpler to kill or to start a flow ..Perhaps even performance might be better as several processes could use much more ram Someone has any tips/advice? Thanks a lot and sorry for my poor english. Gian
As #kayaman said, 100 threads is not a lot. If that means 100 threads per unit of work and you will have many units of work which would imply many magnitudes increase in threads, I would suggest having a look at Fibers
As long as you don't block the fibers, you can have 100000+ fibers running over a couple (typically number of CPU cores) of threads. Each fiber would then just wait for a callback from the process before continuing.
To access your endpoints and handle them in similar ways, have a look at Apache Camel - it will allow you to stream the FTP, SFTP, etc and handle each as just another endpoint (in theory you should be able to plug email in as well and stream packets that are emailed to the endpoint)
Regarding balancing the streams, this is business logic you need to implement. If one stream is receiving packets faster than another stream, you should be able to limit the rate by not requesting more packets under certain conditions. Need some more information on how you retrieve the packages and which libraries you are using in order to be of better assistance here.
I have 2 Java applications. First I may edit as much as I wish, but I will compile it to machine code later on. Second one I am not able to edit, but I may write an addon for it. I need to make that addon be able to talk with first application. Generally simply send strings to each other. Input and Output streams of a process is not an option for me. I am thinking of using a tcp socket client/server or a file which will act as a buffer. But both ways seem a liitle bit ugly to me, could anyone propose me a better idea?
It depends on what kind of data you wish to transfer.
If it is only Strings, then:
if number of process = 2 and if you are sure of it, then stdin &8 stdout is the best way forward. You can create a Process using ProcessBuilder and then get the streams to communicate. The other process can just to System.out to transfer message. This is preferred to Socket, because you dont have to handle graceful closing of socket etc. (In case it fails and the port is not un-binded successfully, it can be a big trouble)
if number of process > 2 and less than say 10, you can probably use Sockets and communicate through Socket. This should work well, though extra effort goes in gracefully managing sockets.
if number of process is Large, then JMS should be used. It does a lot of things which you dont need to handle. Too big a task if the number of processes are less.
So in your case, process is the best way forward.
If the data you wish to transfer, can even be Objects. RMI can be used given the number of processes are less. If more, use JMS again.
Edit: Now for all the above, there is a lot of dirty work involved. For a change, if you are looking at something new & exciting, I would advice akka. It is a actor based model which communicate with each other using Messages.
The beauty is, the actors can be on same JVM or another (very little config) and akka takes care the rest for you. I haven't seen a more cleaner way than doing this :)
What about to use JMS ?
You can use according to your needs, either the Publish/Sunbscribe or Point-to-Point Models.
Another approach is having DB table to store your data, one process can insert and other process can read it when ever required. When you are using JMS, there is likeliness of loosing data, But storing in db would be failsafe and future proof.
I have a Java application that require communication between different process. Process could run in same JVM or different JVM, but runs on the same machine.
My application need to submit "messages" to another process (same or different JVM) and forgot about it. similar to messaging queue like IBM "MQ", but simple, and only use memory, no IO to hard disk for performance gains.
I'm not sure what is the best approach from Performance prescriptive.
I wonder if RMI is efficient in terms of Performance, I think it require some overhead.
What about TCP/IP socket using local host?
any other thought?
I wonder if RMI is efficient in terms of Performance, I think it require some overhead.
RMI is efficient for what it does. It does much more than most people need, but is usually more than faster enough. You should be able to get of the order of 1-3 K messages per second with a latency around 1 milli-second.
What about TCP/IP socket using local host?
That is always an option but with plain Java Serialization this will not be a lot faster than using RMI. How you do the serialization and deserialization is critical for high performance.
An important note is that much of the time is spent serializing and deserilizing the message, something most transports don't help you with, so if you want maximum performance you have to consider an efficient marshalling strategy. Most transport protocols only benchmark raw bytes.
Ironically if you are willing to use disk, it can be faster than TCP or UDP (like ZeroMQ) plus you get persistence for "free".
This library (I am the author) can perform millions of messages per second between processes with latency as low as 100 nano-second (350x lower than ZeroMQ) https://github.com/peter-lawrey/Java-Chronicle Advantages are
ultra fast serialization and deserialization, something most transport benchmarks avoid including this as it often takes much longer than the transport costs.
is that you can monitor what is happening between queues any time after the message was sent.
replay all messages.
the producer can be any amount of data ahead of your consumer to handle micro-burst gracefully up to the size of your disk space. e.g. the consumer can be TBs behind.
supports replication over TCP.
restart of either the consumer or producer is largely transparent.
If you are developing server application try to consider ZeroMQ. It has great performance, allow to build interprocess communication easier, allow to build asynchronous API.
ZeroMQ declare fantastic performance with InterProcess communication. Even better than TCP sounds great. We are consider this solution for our clusterisation schema.
Pieter Hintjens give the great answer for performance comparison between different Message Broker.
This maybe not possible but I thought I might just give it a try. I have some work that process some data, it makes 3 decisions with each data it proceses: keep, discard or modify/reprocess(because its unsure to keep/discard). This generates a very large amount of data because the reprocess may break the data into many different parts.
My initial method was to send it to my executionservice that was processing the data but because the number of items to process was large I would run out of memory very quickly. Then I decided to maybe offload the queue off to a messaging server(rabbitmq) which works fine but now I'm bound by network IO. What I like about rabbitmq is it keeps messages in memory up to a certain level and then dumps old messages to the local drive so if I have 8 gigs of memory on my server I can still have a 100 gig message queue.
So my question is, is there any library that has a similar feature in Java? Something that I can use as a nonblocking queue that keeps only X items in queue(either by number of items or size) and writes the rest to the local drive.
note: Right now I'm only asking for this to be used on one server. In the future I might add more servers but because each server is self-generating data I would try to take messages from one queue and push them to another if one server's queue is empty. The library would not need to have network access but I would need to access the queue from another Java process. I know this is a long shot but thought if anyone knew it would be SO.
Not sure if it id the approach you are looking for, but why not using a lightweight database like hsqldb and a persistence layer like hibernate? You can have your messages in memory, then commit to db to save on disk, and later query them, with a convenient SQL query.
Actually, as Cuevas wrote, HSQLDB could be a solution. If you use the "cached table" provided, you can specify the maximum amount of memory used, exceeding data will be sent to the hard drive.
Use the filesystem. It's old-school, yet so many engineers get bitten with libraries because they are lazy. True that HSQLDB provides lots of value-add features, but in the context of being light weight....
I'm creating a client/server pair in Java that, for now, only supports interlaced text communication via PrintWriters and BufferedReaders wrapped around both server and client's IO streams.
I would like to implement a function that uses Image[Input/Output]Stream to send a BufferedImage from the server to the client at a set interval.
The problem is that I want the BufferedImages to be sent/received in separate threads so that the client/server can still send/receive text commands.
Can I create multiple streams or sockets? If so, is that the best way?
One way to accomplish this with a single socket is multiplexing the individual streams over a single bytestream connected to the socket, a good implementation of this is BEEP.
Yes, sure you can create as many threads and sockets as you need. Just be careful: do not forget to close sockets and keep process of threads creation under control: to many threads do not improve your performance and may even cause your system to halt.
Probably you should use thread pool. But it depends on your application. Take a look on java.util.concurrency package.
If you have more specific questions do not hesitate to ask them.
A multiplexing stream should maintain multiple buffers.
A reader should be given it's own buffer by the multiplexing stream. The multiplexing stream should grow every buffer during a write operation, and shrink the desired buffer during a read operation.
A single rewind buffer is harder to manage, since the readers need to be stateful, but is generally more scalable, if not performant.
The specific connection protocol used is an implementation detail. Network sockets are just buffers, and can be used to implement a multiplexing stream. The network becomes the bottleneck in this case.