I have some a java code that reads an email inbox.
Once this service is turned on, it spawns a parent thread that keeps checking if a new mail has arrived every 5 seconds. It checks and if no mail has arrived, it sleeps for 5 seconds. If a new mail has arrived, depending on mail burst, it spawns up to max 10 worker threads to parse those emails. and once all the mails are parsed, and no new mail has arrived, the worker threads are killed after 5 seconds of inactivity. The parent thread keeps on pinging ALWAYS.
There are 7-8 such services, reading different inboxes, that keep on running on my aws machine, which has a 4 core CPU. These services are eating upto 350% of my cpu usage which I can see via "top" command.
I want to know if there is a way I can limit these threads from eating CPU resource all the time. This is slowing down all other processes, because they do not get CPU usage because of contention.
This is the code in parent thread
#Override
public void run() {
try {
while (!this.isThreadKillRequested()) {
if (this.getMessageCount() > 0) {
WorkerThread worker = getWorkerThread();
if (worker != null && !worker.isAlive()) {
worker.start();
}
} else {
if(this.isAllThreadIdle()){
//ModelUtil.printOutput("all idle. nothing to do");
}
}
Thread.sleep(MESSAGE_PROCESSOR_SLEEP_TIME);
}
} catch (InterruptedException e) {
ErrorLogger.logError("InterruptedException exception in monitoring thread",
GlobalConfig.msgException + e.getMessage());
e.printStackTrace();
}
}
private WorkerThread getWorkerThread() {
WorkerThread worker = null;
for (Map.Entry<String, ThreadPerformance> entry : this.threadPool.entrySet()) {
ThreadPerformance p = entry.getValue();
if (p.isThreadIdle()) {
worker = p.getThisThread();
break;
}
}
if (worker == null && this.threadPool.size() < MAX_POOL_SIZE) {
double overallThroughput = 0.00;
//some logic to calculate throughput
if (overallThroughput < MIN_THROUGHPUT) {
worker = new WorkerThread(this, this.getUniqueThreadId());
//add in pool
}
System.out.println("Overall Throughput - " + overallThroughput);
System.out.println("Pool Size - " + this.threadPool.size());
}
return worker;
}
Limiting the threads CPU consumption is easiest by slowing down the refresh cycle and condensing threads. Try every 20 seconds and limiting the number of worker threads to 4. Unless there's a need for all of the threads and a fast refresh rate, it's unnecessary pressure on the CPU. Especially when 8 mailboxes are being checked.
The probability of all mailboxes receiving a large amount of new mail, at the same time, is low. Thus, a total thread count can be kept and the number of threads per mailbox can be distributed based on the percentage of new items per mailbox. This will increase the throughput for the mailbox that needs it the most while limiting the number of threads to a manageable amount.
Related
I am reading the source code of ThreadPoolExecutor.java, for the execute method below:
public void execute(Runnable command) {
if (command == null)
throw new NullPointerException();
/*
* Proceed in 3 steps:
*
* 1. If fewer than corePoolSize threads are running, try to
* start a new thread with the given command as its first
* task. The call to addWorker atomically checks runState and
* workerCount, and so prevents false alarms that would add
* threads when it shouldn't, by returning false.
*
* 2. If a task can be successfully queued, then we still need
* to double-check whether we should have added a thread
* (because existing ones died since last checking) or that
* the pool shut down since entry into this method. So we
* recheck state and if necessary roll back the enqueuing if
* stopped, or start a new thread if there are none.
*
* 3. If we cannot queue task, then we try to add a new
* thread. If it fails, we know we are shut down or saturated
* and so reject the task.
*/
int c = ctl.get();
if (workerCountOf(c) < corePoolSize) {
if (addWorker(command, true))
return;
c = ctl.get();
}
if (isRunning(c) && workQueue.offer(command)) {
int recheck = ctl.get();
if (! isRunning(recheck) && remove(command))
reject(command);
else if (workerCountOf(recheck) == 0)
addWorker(null, false);
}
else if (!addWorker(command, false))
reject(command);
}
Assume the thread pool has 2 core threads and set the max pool size is 4.
I can understand the code if (workerCountOf(c) < corePoolSize) { addWorkder(..) }, it means if currently the core threads count is less than core poll size, just create a new thread to handle the runnable command.
What I cannot understand is, say if we had already called execute(runnable) two times, and each of them needs long time to complete, so they are still busy now, and now we are calling the 3rd time.
What the code will do? I think the code goes to if (isRunning(c) && workQueue.offer(command)) { so the command gets added to work queue. However, I don't understand this 3rd command will be executed by which thread. As per the code else if (workerCountOf(recheck) == 0), I think worker count should be 2 because we have already added two workers.
So my question is when will the 3rd worker be added?
--Edit--
My testing code:
public class ThreadPoolExecutorTest {
public static void main(String[] args) {
ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(
2,
4,
60,
TimeUnit.SECONDS,
new ArrayBlockingQueue<>(4)
);
threadPoolExecutor.execute(new Command("A"));
threadPoolExecutor.execute(new Command("B"));
threadPoolExecutor.execute(new Command("C"));
}
static class Command implements Runnable {
private String task;
Command(String task) {
this.task = task;
}
#Override
public void run() {
try {
Thread.sleep(1000 * 10);
System.out.println(new Date() + " - " + Thread.currentThread().getName() + " : " + task);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
It prints:
Thu Jun 13 17:44:30 CST 2019 - pool-1-thread-1 : A
Thu Jun 13 17:44:30 CST 2019 - pool-1-thread-2 : B
Thu Jun 13 17:44:40 CST 2019 - pool-1-thread-1 : C
With the testing code I expect the core workers are keep being busy for 10 seconds so when execute("C") I want to hit the case 'core workers are busy and the 3rd worker will be added', but it seems that there is no the 3rd worker? Sorry but what's wrong?
Thanks.
I want to hit the case 'core workers are busy and the 3rd worker will be added'
Then you also have to fill up the queue.
Javadoc says:
When a new task is submitted in method execute(java.lang.Runnable), and fewer than corePoolSize threads are running, a new thread is created to handle the request, even if other worker threads are idle. If there are more than corePoolSize but less than maximumPoolSize threads running, a new thread will be created only if the queue is full.
Suppose
N = no of thread in currently in the pool.
C = core size of pool
M = maximum size of pool.
BQ = Bounded Blocking Queue.(having a predefined capacity).
UQ = Unbounded Blocking Queue.(without a predefined capacity).
DHQ = Direct hand-offs Queue.
Then
1. If BQ
A. If N <= C , then thread always created when task is submitted, idle
thread is present in pool or not doesn't matter.
B. Once the core pool size is reached, executor start puting
the new task in queue if there is no idle thread.
If there is any idle thread then the task is assigned to idle thread.
C. When BQ is full, then executor start creating again new thread till
its value reached to M if there is no idle thread.
So the new thread creation after reaching N=C value is start when queue
is full.
D. Once N=M reached and BQ is also full , then executor not accept any
task.It throw exception.
2. If UQ
A. Same as above
B. Same as above
C. Not applicable. Why ? because it is unbounded queue.
(UQ capacity is Integer.MAX_VALUE)
D. No effect of M. Why ?
Since creation of new thread again is start after the queue is full,but
in the case UQ queue is never full.
So new thread never created once reach N=C for the new task submitted.
Means thread in thread pool always be equal to C (N=C always) in case
UQ , whatever the value of M
3. If DHQ
A. The direct hand-offs queue never put the task in queue, its immediately
assigned task to thread if any thread is idle ,if not then it create new
one.(task in queue is always 0)
B. The concept of C is not applicable in this queue.Thread created till
its value reach M.
C. Once the N value reach M (N=M), and try to submit the task ,it reject
the task.
I am using Java's concurrency library ExecutorService to run my tasks. The threshold for writing to the database is 200 QPS, however, this program can only reach 20 QPS with 15 threads. I tried 5, 10, 20, 30 threads, and they were even slower than 15 threads. Here is the code:
ExecutorService executor = Executors.newFixedThreadPool(15);
List<Callable<Object>> todos = new ArrayList<>();
for (final int id : ids) {
todos.add(Executors.callable(() -> {
try {
TestObject test = testServiceClient.callRemoteService();
SaveToDatabase();
} catch (Exception ex) {}
}));
}
try {
executor.invokeAll(todos);
} catch (InterruptedException ex) {}
executor.shutdown();
1) I checked the CPU usage of the linux server on which this program is running, and the usage was 90% and 60% (it has 4 CPUs). The memory usage was only 20%. So the CPU & memory were still fine. The database server's CPU usage was low (around 20%). What could prevent the speed from reaching 200 QPS? Maybe this service call: testServiceClient.callRemoteService()? I checked the server configuration for that call and it allows high number of calls per seconds.
2) If the count of id in ids is more than 50000, is it a good idea to use invokeAll? Should we split it to smaller batches, such as 5000 each batch?
There is nothing in this code which prevents this query rate, except creating and destroying a thread pool repeately is very expensive. I suggest using the Streams API which is not only simpler but reuses a built in thread pool
int[] ids = ....
IntStream.of(ids).parallel()
.forEach(id -> testServiceClient.callRemoteService(id));
Here is a benchmark using a trivial service. The main overhead is the latency in creating the connection.
public static void main(String[] args) throws IOException {
ServerSocket ss = new ServerSocket(0);
Thread service = new Thread(() -> {
try {
for (; ; ) {
try (Socket s = ss.accept()) {
s.getOutputStream().write(s.getInputStream().read());
}
}
} catch (Throwable t) {
t.printStackTrace();
}
});
service.setDaemon(true);
service.start();
for (int t = 0; t < 5; t++) {
long start = System.nanoTime();
int[] ids = new int[5000];
IntStream.of(ids).parallel().forEach(id -> {
try {
Socket s = new Socket("localhost", ss.getLocalPort());
s.getOutputStream().write(id);
s.getInputStream().read();
} catch (IOException e) {
e.printStackTrace();
}
});
long time = System.nanoTime() - start;
System.out.println("Throughput " + (int) (ids.length * 1e9 / time) + " connects/sec");
}
}
prints
Throughput 12491 connects/sec
Throughput 13138 connects/sec
Throughput 15148 connects/sec
Throughput 14602 connects/sec
Throughput 15807 connects/sec
Using an ExecutorService would be better as #grzegorz-piwowarek mentions.
ExecutorService es = Executors.newFixedThreadPool(8);
for (int t = 0; t < 5; t++) {
long start = System.nanoTime();
int[] ids = new int[5000];
List<Future> futures = new ArrayList<>(ids.length);
for (int id : ids) {
futures.add(es.submit(() -> {
try {
Socket s = new Socket("localhost", ss.getLocalPort());
s.getOutputStream().write(id);
s.getInputStream().read();
} catch (IOException e) {
e.printStackTrace();
}
}));
}
for (Future future : futures) {
future.get();
}
long time = System.nanoTime() - start;
System.out.println("Throughput " + (int) (ids.length * 1e9 / time) + " connects/sec");
}
es.shutdown();
In this case produces much the same results.
Why do you restrict yourself to such a low number of threads?
You're missing performance opportunities this way. It seems that your tasks are really not CPU-bound. The network operations (remote service + database query) may take up the majority of time for each task to finish. During these times, where a single task/thread needs to wait for some event (network,...), another thread can use the CPU. The more threads you make available to the system, the more threads may be waiting for their network I/O to complete while still having some threads use the CPU at the same time.
I suggest you drastically ramp up the number of threads for the executor. As you say that both remote servers are rather under-utilized, I assume the host your program runs at is the bottleneck at the moment. Try to increase (double?) the number of threads until either your CPU utilization approaches 100% or memory or the remote side become the bottleneck.
By the way, you shutdown the executor, but do you actually wait for the tasks to terminate? How do you measure the "QPS"?
One more thing comes to my mind: How are DB connections handled? I.e. how are SaveToDatabase()s synchronized? Do all threads share (and compete for) a single connection? Or, worse, will each thread create a new connection to the DB, do its thing, and then close the connection again? This may be a serious bottleneck because establishing a TCP connection and doing the authentication handshake may take up as much time as running a simple SQL statement.
If the count of id in ids is more than 50000, is it a good idea to use
invokeAll? Should we split it to smaller batches, such as 5000 each
batch?
As #Vaclav Stengl already wrote, the Executors have internal queues in which they enqueue and from which they process the tasks. So no need to worry about that one. You can also just call submit for each single task as soon as you have created it. This allows the first tasks to already start executing while you're still creating/preparing later tasks, which makes sense especially when each task creation takes comparatively long, but won't hurt in all other cases. Think about invokeAll as a convenience method for cases where you already have a collection of tasks. If you create the tasks successively yourself and you already have access to the ExecutorService to run them on, just submit() them a.s.a.p.
About batch spliting:
ExecutorService has inner queue for storing tasks. In your case ExecutorService executor = Executors.newFixedThreadPool(15); has 15 thread so max 15 tasks will run concurrently and others will be stored in queue. Size of queue can be parametrized. By default size will scale up to max int. InvokeAll call inside of method execute and this method will place tasks in to queue when all threads are working.
Imho there are 2 possible scenarios why CPU is not at 100%:
try to enlarge thread pool
thread is waiting for testServiceClient.callRemoteService() to
complete and meanwhile CPU is starwing
The problem of QPS maybe is the bandwidth limit or transaction execution(it will lock the table or row). So you just increase pool size is not worked. Additional, You can try to use the producer-consumer pattern.
I need to execute a load test using Java in which one of the test strategies require x threads to be fired of every y period of time for z minutes and thereafter have a constant totalThread amount of threads running for the load test duration (eg with a total of 100 threads, start 10 threads at 5 second intervals until all 100 threads have started, and continue to keep all 100 threading running (once it has finished execution it should restart) for the specified duration of the test, say one hour)
I have attempted to use the timer task but it seems limiting, would thread pool scheduler be a better option? What would be the best approach?
public class MyTask extends TimerTask{
public void run() {
System.out.println("STARTING THREAD "+ counter +" "+ new Date());
//execute test
counter++;
if (counter > maxIterations) {
MyTask.this.cancel();
return;
}
}
List<TimerTask> MyTaskList = new ArrayList<TimerTask>();
for (int i = 1 ; i <= threadsPerIteration ; i++) {
TimerTask MyTimerTask = new MyTask(NumberOfIterations);
MyTaskList.add(MyTimerTask);
timer.schedule(MyTimerTask, initialDelayMilli, threadDelayMilli);
}
Thank You
Don't use a TimerTask for each thread. Instead, use a single TimerTask, that fires once per interval, with your example numbers once every 5 seconds.
Each of the first 10 times the TimerTask fires, it spawns off 10 threads. On each subsequent firing, it checks for the number of active threads, and spawns off enough new threads to bring the total to 100, until the end of your test.
Thanks for the help, i decided to use the threadpool executor together with the timertask class as follows:
I used the Executors.newScheduledThreadPool(int x) method to control the amount of threads able to run concurrently, together with a timer task that is set to increase the threadpool size every y amount of time :
TimerTask DelayTimerTask = new TimerTask() { //task to increase threadpool size
public void run() {
MyExecutor.setCorePoolSize(i * incrementAmount); //timer task increments threadpool size by threadPoolIncrement
i++;
}
};
timer.scheduleAtFixedRate(DelayTimerTask,0,intervalLength);
in this way the amount of concurrent threads will increase by incrementAmount every intervalLength.
I am trying out the executor service in Java, and wrote the following code to run Fibonacci (yes, the massively recursive version, just to stress out the executor service).
Surprisingly, it will run faster if I set the nThreads to 1. It might be related to the fact that the size of each "task" submitted to the executor service is really small. But still it must be the same number also if I set nThreads to 1.
To see if the access to the shared Atomic variables can cause this issue, I commented out the three lines with the comment "see text", and looked at the system monitor to see how long the execution takes. But the results are the same.
Any idea why this is happening?
BTW, I wanted to compare it with the similar implementation with Fork/Join. It turns out to be way slower than the F/J implementation.
public class MainSimpler {
static int N=35;
static AtomicInteger result = new AtomicInteger(0), pendingTasks = new AtomicInteger(1);
static ExecutorService executor;
public static void main(String[] args) {
int nThreads=2;
System.out.println("Number of threads = "+nThreads);
executor = Executors.newFixedThreadPool(nThreads);
Executable.inQueue = new AtomicInteger(nThreads);
long before = System.currentTimeMillis();
System.out.println("Fibonacci "+N+" is ... ");
executor.submit(new FibSimpler(N));
waitToFinish();
System.out.println(result.get());
long after = System.currentTimeMillis();
System.out.println("Duration: " + (after - before) + " milliseconds\n");
}
private static void waitToFinish() {
while (0 < pendingTasks.get()){
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
executor.shutdown();
}
}
class FibSimpler implements Runnable {
int N;
FibSimpler (int n) { N=n; }
#Override
public void run() {
compute();
MainSimpler.pendingTasks.decrementAndGet(); // see text
}
void compute() {
int n = N;
if (n <= 1) {
MainSimpler.result.addAndGet(n); // see text
return;
}
MainSimpler.executor.submit(new FibSimpler(n-1));
MainSimpler.pendingTasks.incrementAndGet(); // see text
N = n-2;
compute(); // similar to the F/J counterpart
}
}
Runtime (approximately):
1 thread : 11 seconds
2 threads: 19 seconds
4 threads: 19 seconds
Update:
I notice that even if I use one thread inside the executor service, the whole program will use all four cores of my machine (each core around 80% usage on average). This could explain why using more threads inside the executor service slows down the whole process, but now, why does this program use 4 cores if only one thread is active inside the executor service??
It might be related to the fact that the size of each "task" submitted
to the executor service is really small.
This is certainly the case and as a result you are mainly measuring the overhead of context switching. When n == 1, there is no context switching and thus the performance is better.
But still it must be the same number also if I set nThreads to 1.
I'm guessing you meant 'to higher than 1' here.
You are running into the problem of heavy lock contention. When you have multiple threads, the lock on the result is contended all the time. Threads have to wait for each other before they can update the result and that slows them down. When there is only a single thread, the JVM probably detects that and performs lock elision, meaning it doesn't actually perform any locking at all.
You may get better performance if you don't divide the problem into N tasks, but rather divide it into N/nThreads tasks, which can be handled simultaneously by the threads (assuming you choose nThreads to be at most the number of physical cores/threads available). Each thread then does its own work, calculating its own total and only adding that to a grand total when the thread is done. Even then, for fib(35) I expect the costs of thread management to outweigh the benefits. Perhaps try fib(1000).
I am writing a code for my homework, I am not so familiar with writing multi-threaded applications. I learned how to open a thread and start it. I better show the code.
for (int i = 0; i < a.length; i++) {
download(host, port, a[i]);
scan.next();
}
My code above connects to a server opens a.length multiple parallel requests. In other words, download opens a[i] connections to get the same content on each iteration. However, I want my server to complete the download method when i = 0 and start the next iteration i = 1, when the the threads that download has opened completes. I did it with scan.next() to stop it by hand but obviously it is not a nice solution. How can I do that?
Edit:
public static long download(String host, int port) {
new java.io.File("Folder_" + N).mkdir();
N--;
int totalLength = length(host, port);
long result = 0;
ArrayList<HTTPThread> list = new ArrayList<HTTPThread>();
for (int i = 0; i < totalLength; i = i + N + 1) {
HTTPThread t;
if (i + N > totalLength) {
t = (new HTTPThread(host, port, i, totalLength - 1));
} else {
t = new HTTPThread(host, port, i, i + N);
}
list.add(t);
}
for (HTTPThread t : list) {
t.start();
}
return result;
}
And In my HTTPThread;
public void run() {
init(host, port);
downloadData(low, high);
close();
}
Note: Our test web server is a modified web server, it gets Range: i-j and in the response, there is contents of the i-j files.
You will need to call the join() method of the thread that is doing the downloading. This will cause the current thread to wait until the download thread is finished. This is a good post on how to use join.
If you'd like to post your download method you will probably get a more complete solution
EDIT:
Ok, so after you start your threads you will need to join them like so:
for (HTTPThread t : list) {
t.start();
}
for (HTTPThread t : list) {
t.join();
}
This will stop the method returning until all HTTPThreads have completed
It's probably not a great idea to create an unbounded number of threads to do an unbounded number of parallel http requests. (Both network sockets and threads are operating system resources, and require some bookkeeping overhead, and are therefore subject to quotas in many operating systems. In addition, the webserver you are reading from might not like 1000s of concurrent connections, because his network sockets are finite, too!).
You can easily control the number of concurrent connections using an ExecutorService:
List<DownloadTask> tasks = new ArrayList<DownloadTask>();
for (int i = 0; i < length; i++) {
tasks.add(new DownloadTask(i));
}
ExecutorService executor = Executors.newFixedThreadPool(N);
executor.invokeAll(tasks);
executor.shutdown();
This is both shorter and better than your homegrown concurrency limit, because your limit will delay starting with the next batch until all threads from the current batch have completed. With an ExceutorService, a new task is begun whenever an old task has completed (and there are still tasks left). That is, your solution will have 1 to N concurrent requests until all tasks have been started, whereas the ExecutorService will always have N concurrent requests.