My understanding of the JVM multi-threading model is that when a thread executes an IO call, the thread is BLOCKED and put into a waiting queue by the JVM/OS until data is available.
I am trying to emulate this behavior in my code and running a benchmark with various thread sizes, using JMH and CompletableFuture.
However, the results are not what I expected. I was expecting a constant execution time (with thread/context switching overhead) irrespective of the number of threads (with memory limitations), since the tasks are IO bound and not CPU bound.
My cpu is a 4 core/ 8 thread laptop processor, and even with 1 or 2 threads, there is a discrepancy in the expected behavior.
I'm trying to read a 5MB file (separate file for each thread) in the async task. At the start of each iteration, I create a FixedThreadPool with the required number of threads.
#Benchmark
public void readAsyncIO(Blackhole blackhole) throws ExecutionException, InterruptedException {
List<CompletableFuture<Void>> readers = new ArrayList<>();
for (int i =0; i< threadSize; i++) {
int finalI = i;
readers.add(CompletableFuture.runAsync(() -> readFile(finalI), threadPool));
}
Object result = CompletableFuture
.allOf(readers.toArray(new CompletableFuture[0]))
.get();
blackhole.consume(result);
}
#Setup(Level.Iteration)
public void setup() throws IOException {
threadPool = Executors.newFixedThreadPool(threadSize);
}
#TearDown(Level.Iteration)
public void tearDown() {
threadPool.shutdownNow();
}
public byte[] readFile(int i) {
try {
File file = new File(filePath + "/" + fileName + i);
byte[] bytesRead = new byte[(int)file.length()];
InputStream inputStream = new FileInputStream(file);
inputStream.read(bytesRead);
return bytesRead;
} catch (Exception e) {
throw new CompletionException(e);
}
}
And the JMH config,
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#State(Scope.Benchmark)
#Warmup(iterations = 3)
#Fork(value=1)
#Measurement(iterations = 3)
public class SimpleTest {
#Param({ "1", "2", "4", "8", "16", "32", "50", "100" })
public int threadSize;
.....
}
Any idea on what I'm doing wrong ? Or are my assumptions incorrect ?
It seems reasonable. With single thread you see that 1 file takes ~ 2ms to deal with, adding more threads would lead to longer average per thread because each read(bytesRead) on very large size is likely to do multiple disk reads so there may be opportunity for IO blocking and thread context switching, plus - depending on the disks - more seek times.
Related
I have multi-threaded Spring Boot application in which I am reading data from table in batches (the table contains around 1 million records).
I am getting into Java heap memory issues, and I am unable to find a workaround. Below is the code sample.
I call the Spring Boot REST API which then calls this code. Here I am reading from db in the main thread in batches, then passing the batches to thread pool executorService and then finally processing the result in another thread pool resultProcessor.
The Worker class implements Callable<WorkerResult>
ExecutorService executorService = Executors.newFixedThreadPool(15);
Long workerCount = 0L;
ExecutorService resultProcessor = Executors.newFixedThreadPool(10);
List<CompletableFuture<WorkerResult>> futures = new ArrayList<>();
while (workerCount < totalData) {
List<Model> dbRecords = repo.getData(workerCount,workerCount+rp,date);
workerCount += rp + 1;
try {
futures.add(CompletableFuture.supplyAsync(() -> {
try {
return new Worker(dbRecords).call(); // Here for each record third party api is called
} catch (Exception ex) {
throw new CompletionException(ex);
}
// Or return default value
}, executorService).thenApplyAsync(result -> {
service.resultReceived(result); // update the results into db
return result;
}, resultProcessor));
} catch (RejectedExecutionException e) {
logData("Can't submit anymore tasks %s ", e.getMessage());
}
}
}
Outside the while loop once I have read all data from DB, then I call the CompletableFuture.allOf method to finish any remaining tasks.
Below is the code for that:
try {
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
resultProcessor.shutdown();
resultProcessor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
Here, if I do not add the CompletableFuture.allOf, the result is returned from this method without completing all tasks in the queues.
Instead of calling CompletableFuture.allOf, I have tried futures.foreach(CompletableFuture::join) but my issue didn't resolve that way either.
Currently, I have assigned 1GB ram to the Tomcat server, therefore I face heap space error after some 100 thousand records are processed successfully.
What can I do here to get rid of this error and improve code efficiency as well, also the solution should be in Java 8 and not the latest versions if possible.
I don't know how much data will be in real this is a test environment data.
I'm testing processing of a large file (10.000.100 rows) with java.
I wrote a piece of code which reads from the file and spawns a specified number of Threads (at most equal to the cores of the CPU) which, then, print the content of the rows of the file to the standard output.
The Main class is like the following:
public class Main
{
public static void main(String[] args)
{
int maxThread;
ArrayList<String> linesForWorker = new ArrayList<String>();
if ("MAX".equals(args[1]))
maxThread = Runtime.getRuntime().availableProcessors();
else
maxThread = Integer.parseInt(args[1]);
ExecutorService executor = Executors.newFixedThreadPool(maxThread);
String readLine;
Thread.sleep(1000L);
long startTime = System.nanoTime();
BufferedReader br = new BufferedReader(new FileReader(args[0]));
do
{
readLine= br.readLine();
if ("X".equals(readLine))
{
executor.execute(new WorkerThread((ArrayList) linesForWorker.clone()));
linesForWorker.clear(); // Wrote to avoid storing a list with ALL the lines of the file in memory
}
else
{
linesForWorker.add(readLine);
}
}
while (readLine!= null);
executor.shutdown();
br.close();
if (executor.awaitTermination(1L, TimeUnit.HOURS))
System.out.println("END\n\n");
long endTime = System.nanoTime();
long durationInNano = endTime - startTime;
System.out.println("Duration in hours:" + TimeUnit.NANOSECONDS.toHours(durationInNano));
System.out.println("Duration in minutes:" + TimeUnit.NANOSECONDS.toMinutes(durationInNano));
System.out.println("Duration in seconds:" + TimeUnit.NANOSECONDS.toSeconds(durationInNano));
System.out.println("Duration in milliseconds:" + TimeUnit.NANOSECONDS.toMillis(durationInNano));
}
}
And then the WorkerThread class is structured as following:
class WorkerThread implements Runnable
{
private List<String> linesToPrint;
public WorkerThread(List<String> linesToPrint) { this.linesToPrint = linesToPrint; }
public void run()
{
for (String lineToPrint : this.linesToPrint)
{
System.out.println(String.valueOf(Thread.currentThread().getName()) + ": " + lineToPrint);
}
this.linesToPrint = null; // Wrote to help garbage collector know I don't need the object anymore
}
}
I run the application specifing 1 and "MAX" (i.e. number of CPUs core, which is 4 in my case) as the maximum thread of the FixedThreadPool and I experienced:
An execution time of about 40 minutes when executing the application with 1 single thread in the FixedThreadPool.
An execution time of about 44 minutes when executing the application with 4 threads in the FixedThreadPool.
Someone could explain me this strange (at least for me) behaviour? Why multithreading didn't help here?
P.S. I have SSD on my machine
EDIT: I modified the code so that the Threads now create a file and write their set of lines to that file in the SSD. Now the execution time has diminished to about 5 s, but I still have that the 1-thread version of the program runs in about 5292 ms, while the multithreaded (4 threads) version runs in about 5773 ms.
Why the multithreaded version still lasts more? Maybe every thread, even to write his "personal" file, has to wait the other threads to release the SSD resource in order to access it and write?
I'm trying to read a string from a file, do an HTTP request with that string, and if the request returns a 200 then do another HTTP request with it.
I thought a good model for this would be the producer consumer model, but for some reason I'm totally stuck. The whole process just stops at a certain point for some reason and I have no idea why.
public static void main(String[] args) throws InterruptedException, IOException {
ArrayBlockingQueue<String> subQueue = new ArrayBlockingQueue<>(3000000);
ThreadPoolExecutor consumers = new ThreadPoolExecutor(100, 100, 10000, TimeUnit.MILLISECONDS, new ArrayBlockingQueue<>(10));
ThreadPoolExecutor producers = new ThreadPoolExecutor(100, 100, 10000, TimeUnit.MILLISECONDS, new ArrayBlockingQueue<>(10000000));
consumers.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
String fileName = "test";
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
String line;
while ((line = br.readLine()) != null) {
String address = new JSONObject(line).getString("Address");
producers.submit(new Thread(() -> {
if (requestReturn200(address)) {
try {
subQueue.put(address);
} catch (InterruptedException e) {
System.out.println("Error producing.");
}
}
}));
}
producers.shutdown();
}
while (subQueue.size() != 0 || !producers.isShutdown()) {
String address = subQueue.poll(1, TimeUnit.SECONDS);
if (address != null) {
consumers.submit(new Thread(() -> {
try {
System.out.println("Doing..." + address);
doOtherHTTPReqeust(address);
} catch (Exception e) {
System.out.println("Fatal error consuming);
}
}));
} else {
System.out.println("Null");
}
}
consumers.shutdown();
}
Any and all help would be greatly appreciated.
while (subQueue.size() != 0 || !producers.isShutdown()) {
First of all !producers.isShutdown() will always return !true because it is checked after producers.shutdown(). isShutdown does not says if tasks in pool are still running or not, but if pool has been shut down, resulting in inability to accept new tasks. In your case this will always be false
Second, subQueue.size() != 0 While your consumer creating loop and consumers takes much more faster data from queue than producers can provide, in middle of "producing" process, consumers might have clear the quueue resulting in condition subQueue.size!= to be falsy. As you know this would break the loop and forbit submition of producers.
You should stop using queue.size() but rather use blocking properties of BlockingQueue. queue.take() will block until new element is available.
So the overall flow should be like that.
Start some pool of producer tasks, like you are doing right now.
Let producer put data in blocking queue - yep you are here
Start some (I would say fixed) number of consumers
Let consumers queue.take() data from queue. This will force consumers to "autowait" for new data and take it when it will become available.
I will put aside mentions that creating 200 threads is insane and misses the whole purpose of multithreading consumers/producers/task pools, at least in your case IMHO. The idea is to use small amount of threads as they are heavyweight to do plenty of queued tasks. But that is discussion for different time .
I'm creating a ScheduledThreadPoolExecutor that execute in given period. I implement following code to check the memory leak or GC overload exception. When running this application with jvm parameter -Xms4m -Xmx10m, it executes finally block without completing the try block statement.
I read an article about CANCELING SCHEDULEDFUTURES (MEMORY LEAK) and try to replicate.
public class ThreadScheduling {
public static void main(String... args) throws Exception {
new Scheduling().run();
}
}
import com.google.common.util.concurrent.ThreadFactoryBuilder;
import java.util.Date;
import java.util.concurrent.*;
public class Scheduling extends Thread {
public void run() {
start();
}
/**
* https://blog.kapsi.de/blog/canceling-scheduledfutures-memory-leak
*
* #param nameFormat
* #param useDaemonThreads
* #param poolSize
* #return
**/
public ScheduledThreadPoolExecutor scheduledExecutor(String nameFormat, boolean useDaemonThreads, int poolSize) {
final RejectedExecutionHandler handler = new ThreadPoolExecutor.AbortPolicy();
final ThreadFactory threadFactory = new ThreadFactoryBuilder()
.setNameFormat(nameFormat)
.setDaemon(useDaemonThreads)
.build();
final ScheduledThreadPoolExecutor executor =
new ScheduledThreadPoolExecutor(poolSize, threadFactory, handler);
return executor;
}
public void start() {
final ScheduledThreadPoolExecutor executor =
scheduledExecutor(" executor %s", false, 1);
executor.scheduleAtFixedRate(new Runnable() {
public void run() {
try {
System.out.println(new Date() + " Going to execute service.");
String s = new String();
for(int i = 0; i < 100000; i++ )
{
s += "ssssssssssssssssssssssssssssssssssss" +
"ssssssssssssssssssssssssssssssssssss" +
"ssssssssssssssssssssssssssssssssssss" +
"ssssssssssssssssssssssssssssssssssss" +
"ssssssssssssssssssssssssssssssssssss" +
"ssssssssssssssssssssssssssssssssssss" +
"ssssssssssssssssssssssssssssssssssss" +
"ssssssssssssssssssssssssssssssssssss" +
"ssssssssssssssssssssssssssssssssssss" +
"ssssssssssssssssssssssssssssssssssss";
}
System.out.println(new Date() + " service completed.");
} catch (Exception ex) {
System.out.println(new Date() + " Error during executing services");
} finally {
executor.purge();
System.out.println(new Date() + " Finalize service.");
}
}
}, 2, 10, TimeUnit.SECONDS);
Output:
Mon Nov 27 19:22:55 IST 2017 Going to execute service.
Mon Nov 27 19:23:18 IST 2017 Finalize service.
How ScheduledThreadPoolExecutor execute run method without completing the CPU bound task?
GC automatically manage required memory, if the heap is less than the required memory.
Is heap memory automatically free by ScheduledThreadPoolExecutor in every periodic execution of thread?
The problem you have is that you are running out of memory which triggers an OutOfMemoryError which is not an Exception. This means all you see is the output of the finally class.
NOTE: In this code
executor.scheduleAtFixedRate(new Runnable() {
A Future object is returned which captures any uncaught Exception or Error. If you discard this you won't see the Error.
How ScheduledThreadPoolExecutor execute run method without completing the CPU bound task?
If the task throws an Error or Exception it completes without finishing its task.
GC automatically manage required memory, if the heap is less than the required memory.
And it keeps below the 10 MB limit you set, which means it will throw an OutOfMemoryError if you try to exceed this. (Or if you get too close in some cases)
Is heap memory automatically free by ScheduledThreadPoolExecutor in every periodic execution of thread?
Heap memory is freed when the GC runs and any object without a strong reference can be cleaned up.
In your case, you don't have enough memory to run the task once so it errors and is not run again.
NOTE: you can tell memory was freed, otherwise you wouldn't see the log message in the finally block as this uses memory to print. This won't work if there is no free memory.
BTW
executor.purge();
Only removes cancelled tasks in the queue. Your task is neither cancelled nor in the queue when it is running.
you can try with RuntimeException.
catch (Exception ex)
System.out.println(new Date() + " Error during executing services");
catch (RuntimeException ex)
System.out.println(new Date() + " Error during executing services");
I just copied a working example of data transfer from this forum and used it with little change in my program, but I can't see what's the problem with its speed. I tested the main example and it transfers some 1MB in less than 30ms. Both read and write speed is very good. But when I use it in my case the same amount of data is not transfered in less than 400ms! The writing remains effecient, but reading is somehow problematic.
Here the data is written. For now, I'm not trying to speed up the first part, i.e. the serialization of the object. My question is about the second part.
private static void writeObject(OutputStream out, Object obj) throws IOException {
long t1 = System.currentTimeMillis();
ByteArrayOutputStream bArr = new ByteArrayOutputStream();
ObjectOutputStream ojs = new ObjectOutputStream(bArr);
ojs.writeObject(obj);
ojs.close();
long t2 = System.currentTimeMillis();
byte[] arr = bArr.toByteArray();
int len = arr.length;
for (int i = 0; i < arr.length; i += BUFFER_SIZE)
out.write(arr, i, Math.min(len - i, BUFFER_SIZE));
out.close();
long t3 = System.currentTimeMillis();
System.out.println(t3 - t2);
}
Well, this is not that bad! t3 - t2 prints some 30ms.
The problem is here, in readObject(), and not in its second part, where the object is deserialized, at least not for now, but the problem is in the first part, where t2 - t1 turns out to be more than 400ms, as I mentioned.
private static Object readObject(InputStream in) throws IOException, ClassNotFoundException {
long t1 = System.currentTimeMillis();
ByteArrayOutputStream bao = new ByteArrayOutputStream();
byte[] buff = new byte[BUFFER_SIZE];
int read;
while ((read = in.read(buff)) != -1) {
bao.write(buff, 0, read);
}
in.close();
long t2 = System.currentTimeMillis();
ByteArrayInputStream bArr = new ByteArrayInputStream(bao.toByteArray());
Object o = new ObjectInputStream(new BufferedInputStream(bArr)).readObject();
long t3 = System.currentTimeMillis();
System.out.println(t2 - t1);
return o;
}
And here is the main():
final static int BUFFER_SIZE = 64 * 1024;
public static void main(String[] args) throws Exception {
final String largeFile1 = "store.aad";
final Table t = (Table) new ObjectInputStream(new FileInputStream(largeFile1)).readObject();
new Thread(new Runnable() {
public void run() {
try {
ServerSocket serverSocket = new ServerSocket(12345);
Socket clientSocket = serverSocket.accept();
readObject(clientSocket.getInputStream());
} catch (Exception e) {
}
}
}).start();
new Thread(new Runnable() {
public void run() {
try {
Thread.sleep(1000);
Socket socket = new Socket("localhost", 12345);
OutputStream socketOutputStream = socket.getOutputStream();
writeObject(socketOutputStream, t);
} catch (Exception e) {
}
}
}).start();
}
Where am I going wrong?!
(An obligatory comment about the difficulty of getting Java benchmarks correct).
It seems that you are launching two threads, a reader thread and a writer thread. It is entirely feasible that things proceed in the following order:
The reader thread starts.
The reader thread records t1.
The reader thread calls read(), but is blocked because no data is available yet.
The writer thread starts.
The writer thread sleeps for a second.
The writer thread calls write().
The writer thread exits.
The reader thread's read() call returns.
The reader thread t2, etc., and exits.
Now, if you are seeing ~400ms for t2 - t1, this is probably not what is happening: it seems probable that the writer thread's call to sleep() must be happening before t1 is recorded. But the short answer is that it seems unclear what t2 - t1 is measuring. In particular, it seems incorrect to expect it to measure simply the time read() takes doing work (as opposed to waiting for the data to read).
If you want to read and write with buffered IO, you can use BufferedInputStream and BufferedOutputStream to read and write respectively. And, you could use a try-with-resources Statement to close. To write, something like
private static final int BUFFER_SIZE = 32 * 1024;
private static void writeObject(OutputStream out, Object obj) //
throws IOException {
try (ObjectOutputStream ojs = new ObjectOutputStream(//
new BufferedOutputStream(out, BUFFER_SIZE));
ojs.writeObject(obj);
}
}
and to read like
private static Object readObject(InputStream in) throws IOException,//
ClassNotFoundException {
try (ObjectInputStream ois = new ObjectInputStream(//
new BufferedInputStream(in, BUFFER_SIZE))) {
return ois.readObject();
}
}
When you are performing a micro-benchmark I suggest you ignore all the results you get for at least the first 2 seconds of CPU time to give you JVM a chance to warmup.
I would write this without using sleep.
For the purpose of your test, writing the object is irrelevant. You just need to write a new byte[size] and see how long it takes.
For testing short latencies, I would use System.nanoTime()
I would start by writing a small message first and looking at the round trip time. i.e. client sends a packet to the server and the server sends it back again.
Last but not least, you will get better performance by using NIO which was added in Java 1.4 (2004)
Here is some code wrote earlier EchoServerMainand EchoClientMain which produces results like this.
On a E5-2650 v2 over loopback
Throughput was 2880.4 MB/s
Loop back echo latency was 5.8/6.2 9.6/19.4 23.2us for 50/90 99/99.9 99.99%tile
Note: These timing are the full round trip time to send the packet to the server and back again. The timings are in micro-seconds.
I just copied a working example ...
No you didn't. You made up something completely different.
int len = arr.length;
for (int i = 0; i < arr.length; i += BUFFER_SIZE)
out.write(arr, i, Math.min(len - i, BUFFER_SIZE));
This loop is complete nonsense. It can be replaced completely by
out.write(arr, 0, len);
However you're just adding latency and wasting space with all this.
private static void writeObject(OutputStream out, Object obj) throws IOException {
long t1 = System.currentTimeMillis();
out.writeObject(obj);
out.close();
long t2 = System.currentTimeMillis();
System.out.println(t2-t1);
}
There is no point in the ByteArrayOutputStream, and writing more bytes than are in it to the real ObjectOutputStream is simply invalid. There's no point in benchmarking operations that should never sanely take place.
Why you're closing the ObjectOutputStream is another mystery. And presumably you have similar code at the receiving side: reapcle it all with ObjectInputStream.readObject().