Assume a multithreaded application scenario, in which every thread acquires some data (one or more files) from the network, performs some processing and then saves the results on the hard disk of the hosting machine.
In such a scenario, there is always the possibility that the disk space is exhausted, leading to unexpected service behavior (e.g., a system crash).
To avoid a case like that, it would be helpful if Java provided a means of reserving hard disk space, but, as verified in an earlier question, such an option is not available and even if it were, it could lead to inefficient allocation (e.g., in the case of a decompressing application, which does not know beforehand the total size of the decompressed data).
So, an alternative could be to make "virtual disk space reservations", e.g. by keeping in memory a static registry of the free space and having each thread request capacity from the registry before proceeding.
Are there any better alternatives, or improvements to this approach?
Is there any (preferably open source) Java library that implements such functionality?
An abstract way to implement this might be to use a constant or user inputted value of how much disk space the multi-threaded application is allowed to use, save this as variable and have synchronized get and set methods that will get the value of allowed space, allocate it space for the thread (as much is needed but not more then available), and then minus that from the total so other threads may see a decreased 'disk space' and once a thread has finished and data is deleted re-add the value so the used 'disk space' becomes usable to other threads?
EDIT:
it could lead to inefficient allocation (e.g., in the case of a decompressing application, which does not know beforehand the total size of the decompressed data).
If this occurs and the thread 'sees' (through a constant check while extracting file) that it has reached its limit of 'disk space', it could then request for more space allocation if available, or be put back into a queue until the needed space has been free'd up by other threads
Couldn't you also simply have an alert that triggers off after >XX% of the given partition is used up? That way your admin has time to go in and remove/copy data off or add additional storage to that mount point?
Related
In other words -- with 1MB stacks (-Xss1m), do you actually bump your RSS by 1M every time you create a thread, or do you just consume 1MB of VSZ, plus a few actual pages top and/or bottom?
In other, other words, on 64b systems, and assuming it does the right thing (just map), is there any real downside to large (say, 10MB) "just-in-case" stacks?
As this answer already says: The Java VM will allocate the memory for the whole stack every time you create a new thread.
That means it depends on your OS and its virtual memory subsystem what happens next.
I have an application that produces large results objects and puts them in a queue. Multiple worker threads create the results objects and queue them, and a single writer thread de-queues the objects, converts them to CSV, and writes them to disk. Due to both I/O and the size of the results objects, writing the results takes far longer than generating them. This application is not a server, it is simply a command-line app that runs through a large batch of requests and finishes.
I would like to decrease the overall memory footprint of the application. Using a heap analysis tool (IBM HeapAnalyzer), I am finding that just before the program terminates, most of the large results objects are still on the heap, even though they were de-queued and have no other references to them. That is, they are all root objects. They take up the majority of the heap space.
To me, this means that they made it into tenured heap space while they were still in the queue. As no full GC is ever triggered during the run, that is where they remain. I realize that they should be tenured, otherwise I'd be copying them back and forth within the Eden spaces while they are still in the queue, but at the same time I wish there was something I could do to facilitate getting rid of them after de-queueing, short of calling System.gc().
I realize one way of getting rid of them would be to simply shrink the maximum heap size and trigger a full GC. However the inputs to this program vary considerably in size and I would prefer to have one -Xmx setting for all runs.
Added for Clarification: this is all an issue because there is also a large memory overhead in Eden for actually writing the object out (mostly String instances, which also appear as roots in the heap analysis). There are frequent minor GC's in Eden as a result. These would be less frequent if the result objects were not hanging around in the tenured space. The argument could be made that my real problem is the output overhead in Eden, and I am working on that, but wanted to pursue this tenured issue at the same time.
As I research this, are there any particular garbage collector settings or programmatic approaches I should be focusing on? Note I am using JDK 1.8.
Answer Update: #maaartinus made some great suggestions that helped me avoid queueing (and thus tenuring) the large objects in the first place. He also suggested bounding the queue, which would surely cut down on the tenuring of what I am now queueing instead (the CSV byte[] representations of the results objects). The right mix of thread count and queue bounds will definitely help, though I have not tried this as the problem basically disappeared by finding a way to not tenure the big objects in the first place.
I'm sceptical concerning a GC-related solution, but it looks like you're creating a problem you needn't to have:
Multiple worker threads create the results objects and queue them, and a single writer...
... writing the results takes far longer than generating them ...
So it looks like it should actually be the other way round: single producer and many consumers to keep the game even.
Multiple writers mightn't give you much speed up, but I'd try it, if possible. The number of producers doesn't matter much as long as you use a bounded queue for their results (I'm assuming they have no substantially sized input as you haven't mentioned it). This bounded queue could also ensure that the objects get never too old.
In any case, you can use multiple to CSV converters, so effectively replacing a big object by a big String or byte[], or ByteBuffer, or whatever (assuming you want to do the conversion in memory). The nice thing about the buffer is that you can recycle it (so the fact that it gets tenured is no problem anymore).
You could also use some unmanaged memory, but I really don't believe it's necessary. Simply bounding the queue should be enough, unless I'm missing something.
And by the way, quite often the cheapest solution is to buy more RAM. Really, one hour of work is worth a couple of gigabytes.
Update
how much should I be worried about contention between multiple writer threads, since they would all be sharing one thread-safe Writer?
I can imagine two kinds of problems:
Atomicity: While synchronizations ensures that each executed operations happens atomically, it doesn't mean that the output makes any sense. Imagine multiple writers, each of them generating a single CSV and the resulting file should contain all the CSVs (in any order). Using a PrintWriter would keep each line intact, but it'd intermix them.
Concurrency: For example, a FileWriter performs the conversion from chars to bytes, which may in this context end up in a synchronized block. This could reduce parallelism a bit, but as the IO seems to be the bottleneck, I guess, it doesn't matter.
As title, in my module I had a blockingqueue to deliver my data. The data which server can produce is a a large number of logging information. In order to avoid affecting the performance of server , I wrote multi-thread clients to consume these data and persist them in data caches. Because the data can be produced hugely per mins,I became confused that how many sizes should I initialize my queue. And I knew that I can set my queue policy that if more data is produced , I can omit the overflow part. But how many size I created in the queue in order to hold these data as much as I can.
Could you give me some suggestion?As far as I know , it was related with my server JVM stack size & the single logging data in my JVM???
Make it "as large as is reasonable". For example, if you are OK with it consuming up to 1Gb of memory, then allocate its size to be 1Gb divided by the average number of bytes of the objects in the queue.
If I had to pick a "reasonable" number, I would start with 10000. The reason is, if it grows to larger than that, then making it larger isn't a good idea and isn't going to help much, because clearly the logging requirement is outpacing your ability to log, so it's time to back off the clients.
"Tuning" through experimentation is usually the best approach, as it depends on the profile of your application:
If there are highs and lows in your application's activity, then a larger queue will help "smooth out" the load on your server
If your application has a relatively steady load, then a smaller queue is appropriate as a larger queue only delays the inevitable point when clients are blocked - you would be better to make it smaller and dedicate more resources (a couple more logging threads) to consuming the work.
Note also that a very large queue may impact garbage collection responsiveness to freeing up memory, as it has to traverse a much larger heap (all the objects in the queue) each time it runs, increasing the load on both CPU and memory.
You want to make the size as small as you can without impacting throughput and responsiveness too much. To asses this you'll need to set up a test server and hit it with a typical load to see what happens. Note that you'll probably need to hit it from multiple machines to put a realistic load on the server, as hitting it from one machine can limit the load due to the number of CPU cores and other resources on the test client machine.
To be frank, I'd just make the size 10000 and tune the number of worker threads rather than the queue size.
Contiguous writes to disk are reasonably fast (easily 20MB per second). Instead of storing data in RAM, you might be better off writing it to disk without worrying about memory requirements. Your clients then can read data from files instead of RAM.
To know size of java object, you could use any java profiler. YourKit is my favorite.
I think the real problem is not size of queue but what you want to do when things exceed your planned capacity. ArrayBlockingQueue will simply block your threads, which may or may not be the right thing to do. Your options typically are:
1) Block the threads (use ArrayBlockingQueue) based on memory committed for this purpose
2) Return error to the "layer above" and let that layer decide what to do...may be send error to the client
3) Can you throw away some data...say which was en queued long ago.
4) Start writing to disk, once you overflow RAM capacity.
We have a swing based application that does complex processing on data. One of the prerequisites for our software is that any given column cannot have too many unique values. If the number is numeric, the user would need to discretize the data before they could from our tool.
Unfortunately, the algorithms we are using are combinatorially expensive in memory depending on the number of unique values per column. Right now with the wrong dataset, the app would run out of memory very quickly. Before doing one of these operations that would run out of memory, we should be able to calculate roughly how much memory the operation will need. It would be nice if we could check how much memory the app currently is using, estimate if the app is going to run out of memory, and show an error message accordingly rather than running out of memory. Using java.lang.Runtime, we can find the free memory, total memory, and max memory, but is this really helpful? Even if it appears we won't have enough heap space, it could be that if we wait 30 milliseconds the garbage collector will run, and suddenly we have more than enough heap space to run our operation. Is there anyway to really predict if we are going to run out of memory?
I have done something similar for a database application where the number of rows that were loaded could not be estimated. So in the loop that processes the result set I'm calling a "MemorWatcher" method that would check the memory that was free.
If the available memory goes under a certain threshold the watcher would force a garbage collection and re-check. If there still wasn't enough memory the watcher method signals this to the caller with an exception. The caller can gracefully recover from that exception - as opposed to the OutOfMemoryException which sometimes leaves Swing totally unstable.
I don't have expertise on this, but I feel you can take an extra step of bytecode analysis using ASM to preempt bugs like null pointer exception, out of memory exception etc.
Unless you run your application with the maximum amount of memory you need from the outset (using -Xms) I don't think you can achieve anything useful, since other applications will be able to consume memory before your app needs it.
Have you considered using Soft/WeakReferences, and letting garbage collection reap objects that you could possible recalculate/regenerate on the fly ?
I have a loop that spawns a lot of threads. These threads contains, among other things, 2 (massive) StringBuilder objects. These threads then run and do their thing.
However, I noticed that after a certain amount of threads, I get strange crashes. I know this is because of these StringBuilder, because when I reduce their initial capacity, I can start a lot more threads. Now for these StringBuilders, they are create like this in the constructor of the thread object:
StringBuilder a = new StringBuilder(30000);
StringBuilder b = new StringBuilder(30000);
The point where it generally crashes is around 550 threads, which results in a little bit more than 62MB. Combined with the rest of the program the memory in use is most likely 64MB, which I read online somewhere was the defaulf size of the JVM memory allocation pool. I don't know whether this is true or not.
Now, is there something I am doing wrong, that somehow because of the design, I am allocating memory the wrong way? Or is this the only way and should I tell the JVM to increase its memory pool? Or something else entirely?
Also, please do not tell me to set a lower capacity, I know these StringBuilders automatically increase their capacity when needed, but I would like to have a solution to this problem.
Use the -Xmx JVM option to increase the Java maximum heap size.
Before J2SE 5.0, the default maximum heap size was 64MB. You can override this default using the -Xmx command-line option.
If you are storing a lot of information in a StringBuilder, are you going to reference back to it at some point? If not then just write it to another medium (DB, File etc). The program has a finite amount of resources, it can't hold all of the state of an entire system at once. -Xmx will give you more space for storage in the memory, however it won't make your storage ability infinite.
Consider using a ThreadPoolExecutor, and set the pool size to the number of CPUs on your machine. Creating more threads than CPUs is just adding overhead.
ExecutorService service = Executors.newFixedThreadPool(cpuCount))
Also, you can reduce memory usage by writing your strings to files instead of keeping them in-memory with StringBuilders.
Assume 1MB per thread. That's the RAM cost of creating each one, over and above the memory allocated by its process.
As Gregory said, give the jvm, some options like -Xmx.
Also consider using a ThreadPool or Executor to ensure that only a given amount of threads are running simultaneously. That way the amount of memory can be kept limited without slowdown (as your processor is not capable of running 550 threads at the same time anyway).
And when you're using an Executor don't create the StringBuilders in the constructor, but in the run method.
You can use a FileWriter to output text to a file, then pull it back in with a FileReader. That way you'll only need to store the filename in memory, rather then the entire contents of the string.
To cut down on threads you can use an ExecutorService, or simply use a few threads that read out of a queue.
My guess is that with a little tinkering you can probably get your program down to not needing much memory at all.