CompleteableFuture for a large dataset

CompleteableFuture for a large dataset - java

I have multi-threaded Spring Boot application in which I am reading data from table in batches (the table contains around 1 million records).
I am getting into Java heap memory issues, and I am unable to find a workaround. Below is the code sample.
I call the Spring Boot REST API which then calls this code. Here I am reading from db in the main thread in batches, then passing the batches to thread pool executorService and then finally processing the result in another thread pool resultProcessor.
The Worker class implements Callable<WorkerResult>
ExecutorService executorService = Executors.newFixedThreadPool(15);
Long workerCount = 0L;
ExecutorService resultProcessor = Executors.newFixedThreadPool(10);
List<CompletableFuture<WorkerResult>> futures = new ArrayList<>();
while (workerCount < totalData) {
List<Model> dbRecords = repo.getData(workerCount,workerCount+rp,date);
workerCount += rp + 1;
try {
futures.add(CompletableFuture.supplyAsync(() -> {
try {
return new Worker(dbRecords).call(); // Here for each record third party api is called
} catch (Exception ex) {
throw new CompletionException(ex);
}
// Or return default value
}, executorService).thenApplyAsync(result -> {
service.resultReceived(result); // update the results into db
return result;
}, resultProcessor));
} catch (RejectedExecutionException e) {
logData("Can't submit anymore tasks %s ", e.getMessage());
}
}
}
Outside the while loop once I have read all data from DB, then I call the CompletableFuture.allOf method to finish any remaining tasks.
Below is the code for that:
try {
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
resultProcessor.shutdown();
resultProcessor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
Here, if I do not add the CompletableFuture.allOf, the result is returned from this method without completing all tasks in the queues.
Instead of calling CompletableFuture.allOf, I have tried futures.foreach(CompletableFuture::join) but my issue didn't resolve that way either.
Currently, I have assigned 1GB ram to the Tomcat server, therefore I face heap space error after some 100 thousand records are processed successfully.
What can I do here to get rid of this error and improve code efficiency as well, also the solution should be in Java 8 and not the latest versions if possible.
I don't know how much data will be in real this is a test environment data.

Related

Future.get(5,TimeUnit.SECONDS) doesnt timeout after 5 seconds if native methods are used in Java

I am using Executor framework in my java code. I am facing an issue and i need clarification regarding the same.
Below is my java code,
ExecutorService executorObj = Executors.newFixedThreadPool(10);
String name = "default";
Future<String> futRes = executorObj.submit(new Callable<String>() {
#Override
public String call() {
computePropertyPage("");
return "Hello";
}
});
try {
System.out.println("waiting for name for 5 seconds maximum...");
return futRes.get(5,TimeUnit.SECONDS);
} catch (Exception e) {
System.out.println("Exception occurred : " + e);
return name;
}
In the above code, computePropertyPage() is a native method. Its properly linked with the java code. But the call to the function is not getting completed. Its stuck indefinitely. If the call is stuck for more than 5 seconds, i am expecting TimeOutException after 5 seconds. But i am not recieving it.
Instead of native method call, if i just add a sleep of 10 seconds as below,
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I get TimeOutException.
I just want to know if its the limitation from the java side that it dont have control on the native methods and thats the reason its not able to throw TimeOutException for futRes.get(5,TimeUnit.SECONDS);

Your method computePropertyPage completes in less than 5 seconds and return response. Since you aren't calling shutdown on ExecutorService it isn't terminating. Try calling executorObj.shutdown();

Creating observables that do IO work

I have several resources in my app I need to load and dump into my database on first launch. I want to do this parallely.
So i created an observable wrapper around reading a file.
#Override
public Observable<List<T>> loadDataFromFile() {
return Observable.create(new Observable.OnSubscribe<List<T>>() {
#Override
public void call(Subscriber<? super List<T>> subscriber) {
LOG.info("Starting load from file for %s ON THREAD %d" + type, Thread.currentThread().getId());
InputStream inputStream = null;
try {
Gson gson = JsonConverter.getExplicitGson();
inputStream = resourceWrapper.openRawResource(resourceId);
InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
List<T> tList = gson.fromJson(inputStreamReader, type);
subscriber.onNext(tList);
subscriber.onCompleted();
LOG.info("Completed load from file for " + type);
} catch (Exception e) {
LOG.error("An error occurred loading the file");
subscriber.onError(e);
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {
}
}
}
}
});
}
However its not asynchronous, There are two approaches to making this asynchronous that i see:
1) Do the asynchrony inside the observable Spawn a new thread or use a callback based file reading api.
2) Use a scheduler to do the work on an I/O thread,
Again for the DB i have to create my own observable that wraps the databases Api and there is a synchronous and asynchronous version with a callback.
So what is the correct way of creating observables that do i/o work?
Secondly How can i use these observables in a chain to read these files all in parallel, then for each store the contents in the DB. I want to receive an onCompleted event when the entire process is complete for all my reference data.

One good thing about RX is you can control on what thread your "work" is done. You can use
subscribeOn(Schedulers.io())
If you want to load resources in parallel I suggest using the merge (or mergeDelayError) operator.
Assuming you have a function
Observable<List<T>> loadDataFromresource(int resID)
to load one resource, you could first create a list of observables for each resource
for (int i=0 ; i<10; i++) {
obsList.add(loadDataFromresource(i+1).subscribeOn(Schedulers.io()));
}
associating a scheduler with each observable. Merge the observables using
Observable<List<T>> mergedObs = Observable.merge(obsList);
Subscribing to the resulting observable should then load the resources in parallel. If you'd like to delay errors until the end of the merged observable then use
Observable<List<T>> mergedObs = Observable.mergeDelayError(obsList);

I'm not a Java developer, but in C# this is basically how this kind of code should be structured:
public IObservable<string> LoadDataFromFile()
{
return
Observable.Using(
() => new FileStream("path", FileMode.Open),
fs =>
Observable.Using(
() => new StreamReader(fs),
sr => Observable.Start(() => sr.ReadLine())));
}
Hopefully you can adapt from that.

Using CountDownLatch and ExecutorService classes to insert 100K records in DB

I have to insert some values in DB from an ArrayList having more than 100K records. I am using CountDownLatch and ExecutorService classes as given below to run 10 threads at a time to improve performance while insertion. I am calling a Stored procedure to insert the Employee details into 2 different tables after some processing of details. Is it correct approach for my requirement ?
public static void writeData(List<Employee> empList) throws SQLException {
Connection con = null;
try {
Class.forName("oracle.jdbc.driver.OracleDriver");
con = DriverManager.getConnection("jdbc:oracle:thin:#localhost:1521:xe", "system", "oracle");
final CountDownLatch latch = new CountDownLatch(empList.size());
ExecutorService taskExecutor = Executors.newFixedThreadPool(10);
final CallableStatement cstmt = con.prepareCall("{Call Prc_Insert_Employee(?,?,?)}");
for (int i = 0; i < empList.size(); i++) {
final Employee emp = empList.get(i);
Thread worker = new Thread() {
public void run() {
try {
cstmt.setString(1, emp.getId());
cstmt.setString(2, emp.getName());
cstmt.setString(2, emp.getAge());
cstmt.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
finally{
latch.countDown();
}
}
};
taskExecutor.execute(worker);
}
taskExecutor.shutdown();
latch.await();
} catch (Exception e) {
System.out.println(e);
} finally {
con.close();
}
}

Here are all the remarks that I have regarding your code:
You should consider using addBatch() instead of executeUpdate() to reduce the total amount of round trips between your database and your application, it should already help a lot in term of performance especially with a remote database, maybe you won't even need to use several threads anymore with this approach.
I don't believe that it is a good practice to share your CallableStatement. I don't think that it is meant to be thread safe, you should use a dedicated Connection and CallableStatement for each thread.
You need to call connection.setAutoCommit(false) to disable the auto commit mode which is not meant to be use to load a lot of data. It means that you will need to explicitly call connection.commit() every x stored records.
In your code you are supposed to use Runnable instead of Thread as it is what is expected by the ExecutorService. Creating Thread instances here is not needed as it will be seen by the ExecutorService as Runnable such that you will only have 10 threads even if you provide more than 10 Runnable objects to the ExecutorService to execute.
The CountDownLatch is not needed as it is covered already by the method shutdown() as it is mentioned into the javadoc:
Initiates an orderly shutdown in which previously submitted tasks are executed, but no new tasks will be accepted.

How to avoid Thread Interrupted Exception

I'm indexing pdf files using Apache lucene with threads. Some threads take time more than 15 minutes. After 15 minutes of thread execution it will throw Thread interrupted Exception.Is there a way to increase the time limit to avoid this issue.
I got this exception when there is a single thread running and it indexed nearly 76% of its pdf files.
application server is Glassfish
List<Thread> threads = new ArrayList<Thread>();
Thread worker;
for (int a = 1;a <= count; a++) {
IndexManualRunnable indexManualRunnable =
new IndexManualRunnable(indexfileLocation, collections, manual, processId);
worker = new Thread(indexManualRunnable);
worker.setName(manual.getName());
worker.setPriority(Thread.MAX_PRIORITY);
worker.start();
threads.add(worker);
}
for (Thread thread : threads) {
try {
thread.join();
} catch (InterruptedException interruptedException) {
saveReport("", "", "Interrupted Exception", 1, processId, Category.INDEXING, thread.getName());
interruptedException.printStackTrace();
}
}

UPDATE:
I see that you are using Glassfish and are saying this interrupt is occurring every time at 15 minutes. It appears Glassfish is set to timeout at around 900 seconds which is 15 minutes by default - this throws an InterruptException.
Since your application viably needs to process for longer than 15 minutes, update the following server config to a time limit you see fit.
http.request-timeout-seconds
Here is an example asadmin command to update the property but I have not tested it:
# asadmin set server-config.network-config.protocols.protocol.<listener-name>.http.request-timeout-seconds=-1
- NOTE: <listener-name> is the name of the listener they wish to disable the timeout on.
- (-1) means unlimited
To deal with an interrupted thread you can catch and handle the exception yourself. If you want the thread to continue executing regardless you would theoretically do nothing - but to keep the code clean and proper I would implement it as below:
boolean interrupted = false;
try {
while (true) {
try {
return queue.take();
} catch (InterruptedException e) {
interrupted = true;
// fall through and retry
}
}
} finally {
if (interrupted)
Thread.currentThread().interrupt();
}
I like this example because it does not just leave an empty catch clause, it instead preserves the fact that it was called in a boolean. It does this so that when it finally does complete you can check if it was interrupted and if so respect it and interrupt the current thread.
For more information and where the example came from look into the following article:
http://www.ibm.com/developerworks/library/j-jtp05236/

Change the domain.xml in Glassfish
<thread-pools>
<thread-pool name="http-thread-pool" idle-thread-timeout-seconds="1800" />
<thread-pool max-thread-pool-size="200" name="thread-pool-1" idle-thread-timeout-seconds="1800" />
</thread-pools>
increase the idle-thread-timeout-seconds

Android UI is freezing when service executes several high loaded threads

My android application implements data protection and working with cloud.
Application consists of UI and standalone service (runing in own process).
I'm using IPC(Messages & Handlers) to communicate between UI and service.
I have the next situation - before make some work with data i need to know about data size and data items count (i have to enumerate contacts, photos, etc and collect total information for progresses).
About problem:
When enumeration starts on the service side(it uses 4 runing threads in threadpool) my UI is freezing for several seconds (depends on total data size).
Does anybody know any way to make UI work good - without freezing in this moment?
Update:
Here is my ThreadPoolExecutor wrapper that i am using in service to execute estimate tasks(created like new ThreadPoolWorker(4,4,10)):
public class ThreadPoolWorker {
private Object threadPoolLock = new Object();
private ThreadPoolExecutor threadPool = null;
private ArrayBlockingQueue<Runnable> queue = null;
private List<Future<?>> futures = null;
public ThreadPoolWorker(int poolSize, int maxPoolSize, int keepAliveTime){
queue = new ArrayBlockingQueue<Runnable>(5);
threadPool = new ThreadPoolExecutor(poolSize, maxPoolSize, keepAliveTime, TimeUnit.SECONDS, queue);
threadPool.prestartAllCoreThreads();
}
public void runTask(Runnable task){
try{
synchronized (threadPoolLock) {
if(futures == null){
futures = new ArrayList<Future<?>>();
}
futures.add(threadPool.submit(task));
}
}catch(Exception e){
log.error("runTask failed. " + e.getMessage() + " Stack: " + OperationsHelper.StringOperations.getStackToString(e.getStackTrace()));
}
}
public void shutDown()
{
synchronized (threadPoolLock) {
threadPool.shutdown();
}
}
public void joinAll() throws Exception{
synchronized (threadPoolLock) {
try {
if(futures == null || (futures != null && futures.size() <= 0)){
return;
}
for(Future<?> f : futures){
f.get();
}
} catch (ExecutionException e){
log.error("ExecutionException Error: " + e.getMessage() + " Stack: " + OperationsHelper.StringOperations.getStackToString(e.getStackTrace()));
throw e;
} catch (InterruptedException e) {
log.error("InterruptedException Error: " + e.getMessage() + " Stack: " + OperationsHelper.StringOperations.getStackToString(e.getStackTrace()));
throw e;
}
}
}
}
Here the way to start enumeration tasks that i use:
estimateExecutor.runTask(contactsEstimate);

I must say you did not provided enough information (the part of the code you suspect as the cause..)
but from my knowledge and experience I can make an educated guess -
you are probably performing code on the UI thread (main thread) that it execution taking a while. I can also guess that this code is : querying cotacts / gallery provider for all the data..
in case you don't know - Service callback methods also been executed from the main thread (the UI thread..) unless explicitly you run them from AsyncTask / another thread, and querying content providers and processing it returned cursor for data can also be heavy operation that need to be executed from another thread for not blocking the main UI thread.
after removing the code performing this expensive queries to another thread - there is no reason you'll experience any freezing.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

CompleteableFuture for a large dataset - java

Related

Future.get(5,TimeUnit.SECONDS) doesnt timeout after 5 seconds if native methods are used in Java

Creating observables that do IO work

Using CountDownLatch and ExecutorService classes to insert 100K records in DB

How to avoid Thread Interrupted Exception

Android UI is freezing when service executes several high loaded threads

Categories

Resources