In my application we have number of clients Databases, every hour we
get new data for processing in that databases
There is a cron to checks data from this databases and pickup the data and
Then Create thread pool and It start execution of 30 threads in parallel and
remaining thread are store in queue
it takes several hours to process this all threads
So while execution, if new data arrives then it has to wait, because this cron
will not pickup this newly arrived data until it's current execution is not
got finished.
Sometimes we have priority data for processing but due to this case that
clients also need to wait for several hours for processing their data.
Please give me the suggestion to avoid this wait state for newly arrived data
(I am working on java 1.7 , tomcat7 and SQL server2012)
Thank you in advance
Please let me know, for more information on this if not clear
Each of your thread should procces data in bulk (for example 100/1000 records) and this records should be selected from DB by priority. Each time you select new records for proccesing data with highest priority go first.
I can't create comment yet :(
For this problem we are thinking about two solution
Create more then one thread pool for processing normal and high
priority data.
Create more then one tomcat instance with same code for processing normal and priority
data
But I am not understanding which solution is best for my case 1 or 2
Please give me suggestions about above solutions, so that i can take decision on it
You can use ExecutorService newCachedThreadPool()
Benefits of using a cached thread pool :
The pool creates new threads if needed but reuses previously constructed threads if they are available.
Only if no threads are available for reuse will a new thread be created and added to the pool.
Threads that have not been used for more than sixty seconds are terminated and removed from the cache. Hence a pool which has not been used long enough will not consume any resources.
Related
I am trying to implement pagination in LDAP using vlv, using reference from document https://docs.ldap.com/ldap-sdk/docs/javadoc/com/unboundid/ldap/sdk/controls/VirtualListViewRequestControl.html
it is working fine with single thread, but when try with multiple threads concurrently upto 5 threads it works fine, but as number of threads increased only 5 threads can run successfully exceed threads got failed with below error message:
LDAPException(resultCode=51 (busy), numEntries=0, numReferences=0, diagnostiMessage='Other sort requests already in progress', ldapSDKVersion=5.1.1..
I am using OpenLDAP, Unboundid api for connection with Java. About data size it is around 100k.
Tried with single connection and multiple connections(with multiple concurrent threads) getting same error in both cases.
Tried to synchronize block for fetching data.
On exception, make thread to wait and try again.
All above things didn't worked, threads cannot fetch data from LDAP.
After trying to close and reconnect connection as described in https://www.openldap.org/lists/openldap-technical/201107/msg00006.html
failed thread can fetch data but after retry lot of times, in my case thread retried about 2k times then it started fetching data.
Is there any better solution, retrying 2k times and getting result is not a good option.
From my experience in JAVA, it is better to use thread pools which shifts your solution from "how to manage threads" into a more robust and tasks oriented one.
To the point (of your use case): you may want to define a thread pool with a fixed size of thread. The pool will manage all incoming loads by re-using the threads in the pool. This is very efficient because more threads does not equal more performance. You may want to use a mechanism that re-uses threads, rather than just open and close threads and use too much of them.
You may start with something similar to this:
ExecutorService executorService = Executors.newFixedThreadPool(10);
Future<SearchResult> task1 = executorService.submit(() -> {
// your logic goes here
return result;
});
SearchResult result = task1.get();
This is an over simplified piece of code but you can clearly see that:
Tasks may be initiated from a stack (dynamically)
Results can be fetched by using a listener (you grab results only when they are ready - no polling needed)
The thread pool manages loads - so you can tweak your configuration and boost performance without changing your code (perfect for various environments that may want to configure your solution to suit their hardware profile)
I think you should give it a try.. after all - retrying 2000 times before success is really not that kind of idle 🙃
I have questions regarding the performance tuning.
I'm using Linux 64bit server, java 1.8 with wildfly 10.0.0.final . I developed a webservice which uses thread factory and managed executor service through the wildfly configuration.
the purpose of my webervice is to receive the request which has large data, save data, and then create a new thread to process this data, then return response to request. This way the webservice can return response quickly without waiting for data processing to finish.
The configured managed-executor-service holds a thread pool config specifically for this purpose.
for my understanding in configuration, the core-thread defines how many threads will be alive in the thread pool. when core-thread is full, new requests will be put in queue, when queue is full, then new threads will be created, but these newly created thread will be terminated after some time.
I'm trying to figure out what is the best combination to set the thread pool. The following is my concerns:
if this core-thread is set too small (like 5), maybe the responding time will be long because only 5 active threads are processing data, the rest are put in queue until queue is full. the response time won't look good at heavy load time
if I set core-thread to be big, (like 100 maybe), that means even the system is not busy, there still will be 100 live threads in the pool. I don't see any configuration that can allow these threads to be terminated. I'm concerned it is too many live threads idle.
Does anyone have any suggestions on how to set parameters to handle both heavy load and light load situation without too many idle threads left in pool? I'm actually not familiar with this area, like how many idle threads means too many, how to measure it.
the following is the configuration for thread factory and managed-executor-service.
<managed-thread-factory name="UploadThreadFactory" jndi-name="java:jboss/ee/concurrency/factory/uploadThreadFactory"/>
<managed-executor-service name="UploadManagedExecutor" Jodi-name="java:jboss/ee/concurrency/executor/uploadManagedExecutor" context-service="default" thread-factory="UploadThreadFactory" hung-task-threshold="60000" core-thread="5" max-thread="100" keep-alive-time="5000" queue-length="500"/>
Thanks a lot for your help,
Helen
We have an Servlet based application which can uploads files.
Uploaded files are saved in the server , and after that it is delegated for File Processing and later insert in our Mongo DB.
Files are large greater than 50 mbs and it will take 30m -1 hour depending upon server load.
Problem happens when multiple files gets processed at a time in separate threads off course, but it will eventually slow up the system and finally one of the thread gets aborted , which we can never trace it.
So we are now planning for a Multiple Producer - single consumer based approach , where file jobs are queued one by one , and the Consumer will consume it from queue one by one , but sequentially .
Provided we need the clustering capability to implemented in the application later on ,
For this approach we are planning to implement the below process.
when a file job comes, we will put it in a mongo collection with status New.
Next it will call the consumer thread immediately .
Consumer will check if there is already a running tasks with status "Running"
if there is no running status , it will start the task .
Upon completion ,before ending, consumer will again check the table, if there are any tasks with status NEW , if it is there ,it will take the task IN FIFO manner by checking the time stamp , and the process continuous.
If there is current Running tasks, it will simply insert the new task to db. since there is a already running consumer ,that thread will take care of the new job inserted into db ,while the current process ends.
By this way , we can also ensure that it will run smoothly on the clustered environment also without any additional configuration .
There are message queue based solution with RabbitMQ or ActiveMQ , but wee neeeds to minimize the additional component configuration
Let me know if our approach is correct or ,do we have a better solution out there .
Thanks,
I have the next scenario:
the server send a lot of information from a Socket, so I need to read this information and validate it. The idea is to use 20 threads and batches, each time the batch size is 20, the thread must send the information to the database and keep reading from the socket waiting for more.
I don't know what it would be the best way to do this, I was thinking:
create a Socket that will read the information
Create a Executor (Executors.newFixedThreadPool(20)) and validate de information, and add each line into a list and when the size is 20 execute the Runnable class that will send the information to the database.
Thanks in advance for you help.
You don't want to do this with a whole bunch of threads. You're better off using a producer-consumer model with just two threads.
The producer thread reads records from the socket and places them on a queue. That's all it does: read record, add to queue, read next record. Lather, rinse, repeat.
The consumer thread reads a record from the queue, validates it, and writes it to the database. If you want to batch the items so that you write 20 at a time to the database, then you can have the consumer add the record to a list and when the list gets to 20, do the database update.
You probably want to look up information on using the Java BlockingQueue in producer-consumer programs.
You said that you might get a million records a day from the socket. That's only 12 records per second. Unless your validation is hugely processor intensive, a single thread could probably handle 1,200 records per second with no problem.
In any case, your major bottleneck is going to be the database updates, which probably won't benefit from multiple threads.
I am in process of building a system which has a basic Produce Consumer paradigm flavor to it, but the Producer part needs to be a in a Transaction mode.
Here is my exact scenario :
Poller Thread -
[Transaction START]
Polls the DB for say 10 records
* Sets the status in DB for those records as IN-Progress
* Puts the above 10 records in a LinkedBlockingQueue - workqueue
[Transaction END]
Worker Thread Pool of 5
* Polls the workqueue for the tasks, do some lookups and update the same records in DB with some looked up values.
Now my problem is with the Part 1 of the process because if for lets says some reason my process of extracting and updating from DB is successful but inserting in the queue fails for one record, i can rollback the whole transaction, and all my records in DB will be in state NOT Processed, but there can some elements which will be inserted in this work queue, and my worker threadpool can pickup them and start processing, which should not happen.
Trying to find if there is a way to have the writes to the blockingqueue in transnational manner.
Thinking of adding some writelock() readlock() mechanisms, if i can stop the worker threads from reading if something is being written in the queue.
Any thoughts for a better approach.
Thanks,
Consider the worst case scenario: the unplug case (database connection lost) and the crash case (program out of memory). How would you recover from that?
Some hints:
If you can think of a reason why inserting in the queue would fail (queue full), don't start the transaction. Just skip one poll.
First commit the in-progress transaction, then add all records to the workqueue. Or use one transaction per record so you can add the records to the workqueue one by one.
Maintain an in-memory HashSet of IDs of all records being processed. If the ID is in the set but the record is not in-progress, or vice versa, something is very wrong (e.g. task for a record did not complete/crashed).
Set a timestamp when in-progress was set. Have another background-process check for records that are in-progress for too long. Reset the in-progress state if the ID is not in the in-progress HashSet and your normal process will retry to operation.
Make your tasks idempotent: see if you can find a way that tasks can recognize work for the record has already been done. This might be a relative expensive operation, but it will give you the guarantee that work is only done once in case of retries.