I'm working on an online game and I've hit a little snag while working on the server side of things.
When using nonblocking sockets in Java, what is the best course of action to handle complete packet data sets that cannot be processed until all the data is available? For example, sending a large 2D tiled map over a socket.
I can think of two ways to handle it:
Allocate the ByteBuffer large enough to handle the complete data set needed to process a large 2D tiled map from my example. Continue add read data to the buffer until it's all been received and process from there.
If the ByteBuffer is a smaller size (perhaps 1500), subsequent reads can be done and put out to a file until it can be processed completely from the file. This would prevent having to have large ByteBuffers, but degrades performance because of disk I/O.
I'm using a dedicated ByteBuffer for every SocketChannel so that I can keep reading in data until it's complete for processing. The problem is if my 2D Tiled Map amounts to 2MB in size, is it really wise to use 1000 2MB ByteBuffers (assuming 1000 is a client connection limit and they are all in use)? There must be a better way that I'm not thinking of.
I'd prefer to keep things simple, but I'm open to any suggestions and appreciate the help. Thanks!
Probably, the best solution for now is to use the full 2MB ByteBuffer and let the OS take care of paging to disk (virtual memory) if that's necessary. You probably won't have 1000 concurrent users right away, and when you do, you can optimize. You may be surprised what your real performance issues are.
I decided the best course of action was to simply reduce the size of my massive dataset and send tile updates instead of an entire map update. That way I can simply send a list of tiles that have changed on a map instead of the entire map over again. This reduces the need for such a large buffer and I'm back on track. Thanks.
Related
This will be a bit abstract question since i don't even know if there are any developments like this.
Given we have an application which tries to deliver text data from point A to B.
A and B are quite far away so size of the data has significant effect on all important metrics we want to optimize for (speed, latency and throughput). First thing that comes to mind is compression, but compression is not that effective when we have to compress many many small messages but its very effective when the size of the compressed data is significant.
I have no experience with compression algorithms but my understanding is that bigger the input the better can be the compression rate since there is a bigger likelihood of repeating chunks and things that can be optimized.
One other way we could go is batching, by waiting for some N period of time and collecting all tiny messages and creating one compressed big one we could have good compression rate but we would sacrifice latency, the message that arrives first will take unnecessary delay of N.
Solution that I'm looking for is something like this, when a compression algorithm traverses the data set it is probably having some dictionary of things that it knows can be optimized. This dictionary is thrown away every time we finish with the compression and it is always sent with the message to B.
rawMsg -> [dictionary|compressedPayload] -> send to B
however if we could have this dictionary to be maintained in memory, and be sent only when there is a change in it that would mean that we can efficiently compress even small messages and avoid sending the dictionary to the other end every time...
rawMsg -> compress(existingDictrionaryOfSomeVersion, rawMsg) -> [dictionaryVersion|compressedPayload] -> send to B
now obviously the assumption here is that B will also keep the instance of dictionary and keep updating it when the newer version arrives.
Note that exactly this is happening already with protocols like protobuf or fix (in financial applications).
With any message you have schema (dictionary) and it is available on both ends and then you just send raw binary data, efficient and fast but your schema is fixed and unchanged.
I'm looking for something that can be used for free form text.
Is there any technology that allows to do this (without having some fixed schema)?
You can simply send the many small messages in a single compressed stream. Then they will be able to take advantage of the previous history of small messages. With zlib you can flush out each message, which will avoid having to wait for a whole block to be built up before transmitting. This will degrade compression, but not nearly as much as trying to compress each string individually (which will likely just end up expanding them). In the case of zlib, your dictionary is always the last 32K of messages that you have sent.
I want to stream data over network continuously. The source gives me a byte array that I'd want to store in a data structure which serves as buffer to compensate for any network lags.
What is the most efficient data structure to store the bytes in a queue fashion. Think of it as a pipe where one thread pumps in the data and other one reads and sends it over the network, while the pipe itself is long enough to contain multiple frames of the input data.
Is Queue efficient enough?
A Queue would not be efficient if you put bytes in one at a time. It would eat lots of memory, create GC pressure, and slow things down.
You could make the overhead of Queues reasonable if you put reasonably-sized (say 64kB) byte[]s or ByteBuffers in them. That buffer size could be tunable and changed based on performance experiments or perhaps even be adaptive at runtime.
TCP already compensates for network lags. If you are using UDP then you will need to handle congestion properly or things will go badly. In practice using TCP or UDP directly creates a lot of extra work and reinvention of wheels.
ZeroMQ (or the pure Java JeroMQ) is a good library option with an efficient wire protocol (good enough for realtime stock trading platforms). It handles the queueing transparently and gives a lot of options for different client models including things like PUB SUB that would help if you have lots of clients on a broadcast. Within a process ZeroMQ can manage the queueing of data being producuers and consumers. You could even use it to efficiently broadcast the same bytes to workers that do independent things with the same stream (ex: one doing usage metering and another doing transcoding).
There are other libraries that may also work. I think Netty handles things like this efficiently for example.
You should look into the OKIO libraray
I want to make this general question. If we have a program which reads data from outside of the program, should we first put the data in a container and then manipulate the data or should we work directly on the stream, if the stream api of a language is powerful enough?
For example. I am writing a program which reads from text file. Should I first put the data in a string and then manipulate instead of working directly on the stream. I am using java and let's say it has powerful enough (for my needs) stream classes.
Stream processing is generally preferable to accumulating data in memory.
Why? One obvious reason is that the file you are reading might not even fit into memory. You might not even know the size of the data before you've read it completely (imagine, that you are reading from a socket or a pipe rather than a file).
It is also more efficient, especially, when the size isn't known ahead of time - allocating large chunks of memory and moving data around between them can be taxing. Things like processing and concatenating large strings aren't free either.
If the io is slow (ever tried reading from a tape?) or if the data is being produced in real time by a peer process (socket/pipe), your processing of the data read can, at least in part, happen in parallel with reading, which will speed things up.
Stream processing is inherently easier to scale and parallelize if necessary, because your logic is forced to only depend on the current element, being processed, you are free from state. If the amount of data becomes too large to process sequentially, you can trivially scale your app, by adding more readers, and splitting the stream between them.
You might argue, that in case none of this matters, because the file you are reading is only 300 bytes. Indeed, for small amounts of data, this is not crucial (you may also bubble sort it while you are at it), but adopting good patterns and practices makes you a better programmer, and will help when it does matters. There is no disadvantage to it. No, it does not make your code more complicated. It might seem so to you at first, but that's simply because you are not used to stream processing. Once you get into the right mindset, and it becomes natural to you, you'll see that, if anything, the code, dealing with one small piece of data at a time, and not caring about indexes, pointers and positions, is simpler than the alternative.
All of the above applies to sequential processing though. You read the stream once, processing the data immediately, as it comes in, and discarding it (or, perhaps, writing out to the next stream in pipeline).
You mentioned RandomAccessFile ... that's a completely different beast. If you need random access, and the data fits in memory, put it in memory. Seeking the file back and forth is the same thing conceptually, only much slower. There is no benefit to it other than saving memory.
You should certainly process it as you receive it. The other way adds latency and doesn't scale.
I have an image upload servlet which receives uploaded images via HTTP POST and are high resolution images with size varying from 5 MB - 75 MB. The image data is read from the request input stream and and saved onto a local disk. I am looking for a efficient mechanism to generate thumbnails in parallel (or part sequential if not fully parallel) of varied sizes (4-5 different sizes of which the largest is the webimage - 1024x768) from the request inputstream along with saving the stream into disk as original uploaded file.
What I could think of till now is -
Save the original stream as image file to disk.
Generate webimage (1024x768) which is the largest among the lot of thumbnails.
Then use this to generate subsequent smaller images as it would be faster.
Could someone please suggest a better effcient way ? The most desired approach would be to do this synchronously but async is also fine if its very efficient.
Any help in this regard will be much appreciated preferably in Java.
this is quite an interesting question as it has a number of points of optimisation.
Your idea about generating a smaller image then generating the thumbnails off that is probably a good one but the first thing I would say is if you have a 75MB image then it is clearly far bigger than 1024x768 - most likely some multiple of it, in which case you want to make certain you scale the image using SCALE_FAST (Image). What you want to achieve is that the scaling chops the image down to a smaller size by discarding pixels rather than trying to do anything nicer-looking (and much more expensive) like area averaging. You may even be able to get it to go faster by grabbing the int[] for the image and sampling every Nth element to create a new int[] for a new image, scaled down by some factor.
At that point you will have a smaller image, say roughly 2000 by 2000. You can then take that image and scale it using something nicer looking for the actual thumbnails like SCALE_SMOOTH.
I would say that you should not write it to disk if at all possible (during processing anyway). If you can do the operation in memory it will be far faster and that is doubly important where there is parallelism. Unless your server is running an SSD then having two disk-heavy operations running at the same time (like two of these images being rescaled at the same time or one image being rescaled to two different sizes) will force your disk to thrash (since the spindle can only read one stream at a time). You will then be at the mercy of your seek time and you'll quickly find that serialising all the operations will be far faster than doing multiple at once.
I would say rescale them in memory then write them (synchronized) to an ArrayList then have another thread reading those images sequentially and storing them. If you're not sure what I mean then have a look at my answer to another question here:
Producer Consumer solution in Java
This way you parallelise where its useful (CPU operations) and you do the file writes sequentially (avoiding thrashing).
Having said that you need to ask yourself if parallelising is going to benefit you at all. Does your server have multiple CPUs/Cores? If not then this is all moot and you should just not bother threading anything because it will only lose you time.
Further to this, if you are expecting a lot of these images to get uploaded at once then you may not need to parallellise the processing of each image since you will end up with multiple web server threads all processing one image each most of the time and that will give you good CPU utilisation across more than one core anyway. For example if you expect that at any one time there will be 4 images being uploaded constantly then this would utilise 4 cores just fine without further parallelisation.
One last note would be that when you are rescaling the images, once you have the intermediate image you can set the previous image to null to facilitate garbage collection, meaning when you generate the thumbnails you will only have the intermediate image in memory, not the original large size one.
Let me see if I got this right,
You have one large image and want to perform different operations on it at the same time. Some operations involve disk IO.
Option 1,
Start 1 thread to save the original hi res image to disk. This will take a long time compared to other operations because disk writing is slow.
Start other threads to create thumbnails of desired size. You need to resize the original image. I believe this can be done by cloning the bytes of the original image (in java I assume BufferedImage). You can then resize the clones according to the sizes you wish. The resizing operation is faster compared to writing to disk.
If you have 1 thread per thumbnail, you can use these threads to save their thumbnails to disk. The problem is that you will be done making the thumbnails fast, and all these threads will be writing to disk almost at once. The problem here is that they might be sent to different disk locations, instead of being grouped in the same physical area on disk (locality issue). The result is that, the disk writes will be even slower than not doing this in parallel because the disk has to seek the new location and write a bit of data, then the CPU does context switch and takes another thread which will write to another part of the disk (so another seek) and so on. So this idea is slow.
Note: use ExecutorService which has a thread pool, instead of individual threads. In my example I used 1 thread per thumbnail because it makes it easier to explain.
Option 2,
Another way you can do it is designate one thread to do disk writing, and a few other worker threads to do resizing. Cache all thmubnails into a list from which the thread which writes to disk will take them one by one and write them out.
Option 3,
Lastly, if you have multiple disks, you can give each thread a disk to write to, then all writes will be in parallel (more or less).
If you have RAID, the writes will be faster, but not as fast as I just mentioned above because the files are written in series not in parallel. RAID parallelizes writing parts of the same file (to different disks at once).
I need a byte buffer class in Java for single-threaded use. The buffer should resize when it's full, rather than throw an exception or something. Very important issue for me is performance.
What would you recommend?
ADDED:
At the momement I use ByteBuffer but it cannot resize. I need one that can resize.
Any reason not to use the boring normal ByteArrayOutputStream?
As mentioned by miku above, Evan Jones gives a review of different types and shows that it is very application dependent. So without knowing further details it is hard to speculate.
I would start with ByteArrayOutputStream, and only if profiling shows it is your performance bottleneck move to something else. Often when you believe the buffer code is the bottleneck, it will actually be network or other IO - wait until profiling shows you need an optimisation before wasting time finding a replacement.
If you are moving to something else, then other factors you will need to think about:
You have said you are using single threaded use, so BAOS's synchronization is not needed
what is the buffer being filled by and fed into? If either end is already wired to use Java NIO, then using a direct ByteBuffer is very efficient.
Are you using a circular buffer or a plain linear buffer? If you are then the Ostermiller Utils are pretty efficient, and GPL'd
You can use a direct ByteBuffer. Direct memory uses virtual memory to start with is only allocated to the application when it is used. i.e. the amount of main memory it uses re-sizes automagically.
Create a direct ByteBuffer larger than you need and it will only consume what you use.
you can also write manual code for checking the buffer content continously and if its full then make a new buffer of greater size and shift all the data in that new buffer.