Creating multiple threads for writing files to Disk, Java

Creating multiple threads for writing files to Disk, Java - java

I have an piece of code which writes an object to disk as an when the object is put into the LinkedBlockingQueue.
As of now, this is Single threaded. I need to make it multi threaded as the contents are being written to the different files on disk.and therefore, there is no harm in writing them independently.
I am not sure if i can use ThreadPool here as i dont know when the object will be placed on the queue!! now if i decide to have a fixedThreadPool of 5 threads, how do i distribute it among multiple objects?
Any suggestions are highly appreciated.
here is my existing code. I want to Spawn a new thread as and when i get a new object in the queue.

how do i distribute it among multiple objects?
Well, you don't have to worry about task distribution. All you need to do is submit a runnable or callable(which describes your task) and it will be handed to an idle thread in the pool or if all the threads are busy processing, this new task will wait in queue.
Below is what you can try...
1) Create a thread pool that suits your need best.
ExecutorService es = Executors.newFixedThreadPool(desiredNoOfThreads);
2) As you already have the queue in place -
while (true) {
//peek or poll the queue and check for non null value
//if not null then create a runnable or callable and submit
//it to the ExecutorService
}

Generally speaking, if your files are on the same physical device you will get no performance benefit since storage devices work synchronously on read/write operations. So your will get your threads blocked on I/O, which can lead to poorer speed, and definitely will waste threads that could be doing useful work.

Related

Java Thread storing

So, I have a loop where I create thousands of threads which process my data.
I checked and storing a Thread slows down my app.
It's from my loop:
Record r = new Record(id, data, outPath, debug);
//r.start();
threads.add(r);
//id is 4 digits
//data is something like 500 chars long
It stop my for loop for a while (it takes a second or more for one run, too much!).
Only init > duration: 0:00:06.369
With adding thread to ArrayList > duration: 0:00:07.348
Questions:
what is the best way of storing Threads?
how to make Threads faster?
should I create Threads and run them with special executor, means for example 10 at once, then next 10 etc.? (if yes, then how?)

Consider that having a number of threads that is very high is not very useful.
At least you can execute at the same time a number of threads equals to the number of core of your cpu.
The best is to reuse existing threads. To do that you can use the Executor framework.
For example to create an Executor that handle internally at most 10 threads you can do the followig:
List<Record> records = ...;
ExecutorService executor = Executors.newFixedThreadPool(10);
for (Record r : records) {
executor.submit(r);
}
// At the end stop the executor
executor.shutdown();
With a code similar to this one you can submit also many thousands of commands (Runnable implementations) but no more than 10 threads will be created.

I'm guessing that it is not the .add method that is really slowing you down. My guess is that the hundreds of Threads running in parallel is what really is the problem. Of course a simple command like "add" will be queued in the pipeline and can take long to be executed, even if the execution itself is fast. Also it is possible that your data-structure has an add method that is in O(n).
Possible solutions for this:
* Find a real wait-free solution for this. E.g. prioritising threads.
* Add them all to your data-structure before executing them
While it is possible to work like this it is strongly discouraged to create more than some Threads for stuff like this. You should use the Thread Executor as David Lorenzo already pointed out.

I have a loop where I create thousands of threads...
That's a bad sign right there. Creating threads is expensive.
Presumeably your program creates thousands of threads because it has thousands of tasks to perform. The trick is, to de-couple the threads from the tasks. Create just a few threads, and re-use them.
That's what a thread pool does for you.
Learn about the java.util.concurrent.ThreadPoolExecutor class and related classes (e.g., Future). It implements a thread pool, and chances are very likely that it provides all of the features that you need.
If your needs are simple enough, you can use one of the static methdods in java.util.concurrent.Executors to create and configure a thread pool. (e.g., Executors.newFixedThreadPool(N) will create a new thread pool with exactly N threads.)
If your tasks are all compute bound, then there's no reason to have any more threads than the number of CPUs in the machine. If your tasks spend time waiting for something (e.g., waiting for commands from a network client), then the decision of how many threads to create becomes more complicated: It depends on how much of what resources those threads use. You may need to experiment to find the right number.

Difference between these two snippets using Thread and Executor for quick Java threading? [duplicate]

This question already has answers here:
When should we use Java's Thread over Executor?
(7 answers)
Closed 7 years ago.
In Java, both of the following code snippets can be used to quickly spawn a new thread for running some task-
This one using Thread-
new Thread(new Runnable() {
#Override
public void run() {
// TODO: Code goes here
}
}).start();
And this one using Executor-
Executors.newSingleThreadExecutor().execute(new Runnable(){
#Override
public void run() {
// TODO: Code goes here
}
});
Internally, what is the difference between this two codes and which one is a better approach?
Just in case, I'm developing for Android.
Now I think, I was actually looking for use-cases of newSingleThreadExecutor(). Exactly this was asked in this question and answered-
Examples of when it is convenient to use Executors.newSingleThreadExecutor()

Your second example is strange, creating an executor just to run one task is not a good usage. The point of having the executor is so that you can keep it around for the duration of your application and submit tasks to it. It will work but you're not getting the benefits of having the executor.
The executor can keep a pool of threads handy that it can reuse for incoming tasks, so that each task doesn't have to spin up a new thread, or if you pick the singleThread one it can enforce that the tasks are done in sequence and not overlap. With the executor you can better separate the individual tasks being performed from the technical implementation of how the work is done.
With the first approach where you create a thread, if something goes wrong with your task in some cases the thread can get leaked; it gets hung up on something, never finishes its task, and the thread is lost to the application and anything else using that JVM. Using an executor can put an upper bound on the number of threads you lose to this kind of error, so at least your application degrades gracefully and doesn't impair other applications using the same JVM.
Also with the thread approach each thread you create has to be kept track of separately (so that for instance you can interrupt them once it's time to shutdown the application), with the executor you can shut the executor down once and let it handle its threads itself.

The second using an ExecutorService is definitely the best approach.
ExecutorService determines how you want your tasks to run concurrently. It decouples the Runnables (or Callables) from their execution.
When using Thread, you couple the tasks with how you want them to be executed, giving you much less flexibility.
Also, ExecutorService gives you a better way of tracking your tasks and getting a return value with Future while the start method from Thread just run without giving any information. Thread therefore encourages you to code side-effects in the Runnable which may make the overall execution harder to understand and debug.
Also Thread is a costly resource and ExecutorService can handle their lifecycle, reusing Thread to run a new tasks or creating new ones depending on the strategy you defined. For instance: Executors.newSingleThreadExecutor(); creates a ThreadPoolExecutor with only one thread that can sequentially execute the tasks passed to it while Executors.newFixedThreadPool(8)creates a ThreadPoolExecutor with 8 thread allowing to run a maximum of 8 tasks in parallel.

You already have three answers, but I think this question deserves one more because none of the others talk about thread pools and the problem that they are meant to solve.
A thread pool (e.g., java.util.concurrent.ThreadPoolExecutor) is meant to reduce the number of threads that are created and destroyed by a program.
Some programs need to continually create and destroy new tasks that will run in separate threads. One example is a server that accepts connections from many clients, and spawns a new task to serve each one.
Creating a new thread for each new task is expensive; In many programs, the cost of creating the thread can be significantly higher than the cost of performing the task. Instead of letting a thread die after it has finished one task, wouldn't it be better to use the same thread over again to perform the next one?
That's what a thread pool does: It manages and re-uses a controlled number of worker threads, to perform your program's tasks.
Your two examples show two different ways of creating a single thread that will perform a single task, but there's no context. How much work will that task perform? How long will it take?
The first example is a perfectly acceptable way to create a thread that will run for a long time---a thread that must exist for the entire lifetime of the program, or a thread that performs a task so big that the cost of creating and destroying the thread is not significant.
Your second example makes no sense though because it creates a thread pool just to execute one Runnable. Creating a thread pool for one Runnable (or worse, for each new task) completely defeats the purpose of the thread-pool which is to re-use threads.
P.S.: If you are writing code that will become part of some larger system, and you are worried about the "right way" to create threads, then you probably should also learn what problem the java.util.concurrent.ThreadFactory interface was meant to solve.
Google is your friend.

According to documentation of ThreadPoolExecutor
Thread pools address two different problems: they usually provide
improved performance when executing large numbers of asynchronous
tasks, due to reduced per-task invocation overhead, and they provide a
means of bounding and managing the resources, including threads,
consumed when executing a collection of tasks. Each ThreadPoolExecutor
also maintains some basic statistics, such as the number of completed
tasks.
First approach is suitable for me if I want to spawn single background processing and for small applications.
I will prefer second approach for controlled thread execution environment. If I use ThreadPoolExecutor, I am sure that 1 thread will be running at time , even If I submit more threads to executor. Such cases are tend to happen if you consider large enterprise application, where threading logic is not exposed to other modules. In large enterprise application , you want to control the number of concurrent running threads. So second approach is more pereferable if you are designing enterprise or large scale applications.

How do I reuse Threads with different ExecutorService objects?

Is it possible to have one thread pool for my whole program so that the threads are reused, or do I need to make the ExecutorService global/ pass it to all objects using it.
To be more precise I have multiple tasks that run in my program but they do not run extremely often.
ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1);
I believe that it would be unnecessary to have a full thread running all the time for every single task but it might also be costly to restart the thread every single time when a task is executed.
Is there a better alternative to making the Thread pool global?

How do I reuse Threads with different ExecutorService objects?
It is not possible to re-use threads across different ExecutorService thread-pools. You can certainly submit vastly different types of Runnable classes to a common thread-pool however.
Is there a better alternative to making the Thread pool global?
I don't see a problem with a "global" thread-pool in your application. Someone needs to know when to call shutdown() on it of course but that's the only problem I see with it. If you have a lot of disparate classes which are submitting tasks, they all could access this set (or 1) of common background threads.
You may find however that different tasks may want to use a cached thread pool while others need a fixed sized pool so that multiple pools are still necessary.
I believe that it would be unnecessary to have a full thread running all the time for every single task but it might also be costly to restart the thread every single time when a task is executed.
In general, unless you are forking tons and tons of threads, the relative cost of starting one up every so often is relatively small. Unless you have evidence from a profiler or some other source, this may be premature optimization.

With Java 8 there is a new solution.
The fork join global thread pool:
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html#commonPool--

Examples of when it is convenient to use Executors.newSingleThreadExecutor()

Could please somebody tell me a real life example where it's convenient to use this factory method rather than others?
newSingleThreadExecutor
public static ExecutorService newSingleThreadExecutor()
Creates an Executor that uses a single worker thread operating off an
unbounded queue. (Note however that if this single thread terminates
due to a failure during execution prior to shutdown, a new one will
take its place if needed to execute subsequent tasks.) Tasks are
guaranteed to execute sequentially, and no more than one task will be
active at any given time. Unlike the otherwise equivalent
newFixedThreadPool(1) the returned executor is guaranteed not to be
reconfigurable to use additional threads.
Thanks in advance.

Could please somebody tell me a real life example where it's convenient to use [the newSingleThreadExecutor() factory method] rather than others?
I assume you are asking about when you use a single-threaded thread-pool as opposed to a fixed or cached thread pool.
I use a single threaded executor when I have many tasks to run but I only want one thread to do it. This is the same as using a fixed thread pool of 1 of course. Often this is because we don't need them to run in parallel, they are background tasks, and we don't want to take too many system resources (CPU, memory, IO). I want to deal with the various tasks as Callable or Runnable objects so an ExecutorService is optimal but all I need is a single thread to run them.
For example, I have a number of timer tasks that I spring inject. I have two kinds of tasks and my "short-run" tasks run in a single thread pool. There is only one thread that executes them all even though there are a couple of hundred in my system. They do routine tasks such as checking for disk space, cleaning up logs, dumping statistics, etc.. For the tasks that are time critical, I run in a cached thread pool.
Another example is that we have a series of partner integration tasks. They don't take very long and they run rather infrequently and we don't want them to compete with other system threads so they run in a single threaded executor.
A third example is that we have a finite state machine where each of the state mutators takes the job from one state to another and is registered as a Runnable in a single thread-pool. Even though we have hundreds of mutators, only one task is valid at any one point in time so it makes no sense to allocate more than one thread for the task.

Apart from the reasons already mentioned, you would want to use a single threaded executor when you want ordering guarantees, i.e you need to make sure that whatever tasks are being submitted will always happen in the order they were submitted.

The difference between Executors.newSingleThreadExecutor() and Executors.newFixedThreadPool(1) is small but can be helpful when designing a library API. If you expose the returned ExecutorService to users of your library and the library works correctly only when the executor uses a single thread (tasks are not thread safe), it is preferable to use Executors.newSingleThreadExecutor(). Otherwise the user of your library could break it by doing this:
ExecutorService e = myLibrary.getBackgroundTaskExecutor();
((ThreadPoolExecutor)e).setCorePoolSize(10);
, which is not possible for Executors.newSingleThreadExecutor().

It is helpful when you need a lightweight service which only makes it convenient to defer task execution, and you want to ensure only one thread is used for the job.

ThreadPoolExecutor - Specifying Which Thread Handles a Given Task

Is there a good way to implement an execution policy that determines which Thread will handle a given task based on some identification scheme? or is this even a good approach?
I have a requirement to process 1-many files, which I will receive in interleaved chunks. as the chunks arrive I want to make a task out of processing that chunk. The catch is that I do no have the luxury of making the processing code thread-safe, so once a thread in the pool has processed a chunk from a file, i need that same thread to process the rest of that file. I don't care if a thread is processing several files at once, but I cannot have more than one thread from a pool processing the same file at once.
the book "Java Concurrency in Practice" states that you can use execution policies to determine "in what thread will a task be executed?", but I do not grasp how.
Thanks

Well, you could write your own ThreadPoolExecutor - but in general there's no way of doing this. The whole point of a thread pool is that you just throw work at it, without caring which thread gets which task. It sounds like you'll need to manage the threads yourself in this case, keeping a map of which thread is handling which file.
Do you know when a file has been finished? If not, you're going to potentially have problems with an ever-growing map...

A good idea might be a Thread per file:
HashMap<String, MyThreadImplementer> fileToThreadMap...
class MyThreadImplementer implements Runnable {
int maxNumParts;
private List<FileChunk> chunkList...
private List<FileChunk> doneChunks...
public MyThreadImplementer(int maxNumberOfParts) {
maxNumParts=maxNumberOfParts;
}
public void run() {
while( doneChunks.size() < maxNumParts ) {
Thread.sleep(...)
if ( !chunkList.isEmpty() ) {
process each chunk in list and mvoe to done chunks
}
}
}
}
But you'd need to be careful you don't process 1000 files, and thereby create 1000 threads.

You say that you "do no have the luxury of making the processing code thread-safe", but this does not imply that you need to map files to specific threads. It just means that you can't start processing the next chunk from a file until the last chunk from that file has finished processing.
Taking advantage of java.util.concurrent, you could maintain a Map<String, LinkedBlockingQueue<FileChunk>> (assuming filename as key) in the main thread and assign each chunk to the queue for the respective file as chunks come in. Then have one Runnable blocking on each queue.
That way, only one thread at a time would be processing any given file. And you wouldn't need to directly mess with threads or maintain multiple thread pools.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.