Using multi threading for reading information

Using multi threading for reading information - java

I have the next scenario:
the server send a lot of information from a Socket, so I need to read this information and validate it. The idea is to use 20 threads and batches, each time the batch size is 20, the thread must send the information to the database and keep reading from the socket waiting for more.
I don't know what it would be the best way to do this, I was thinking:
create a Socket that will read the information
Create a Executor (Executors.newFixedThreadPool(20)) and validate de information, and add each line into a list and when the size is 20 execute the Runnable class that will send the information to the database.
Thanks in advance for you help.

You don't want to do this with a whole bunch of threads. You're better off using a producer-consumer model with just two threads.
The producer thread reads records from the socket and places them on a queue. That's all it does: read record, add to queue, read next record. Lather, rinse, repeat.
The consumer thread reads a record from the queue, validates it, and writes it to the database. If you want to batch the items so that you write 20 at a time to the database, then you can have the consumer add the record to a list and when the list gets to 20, do the database update.
You probably want to look up information on using the Java BlockingQueue in producer-consumer programs.
You said that you might get a million records a day from the socket. That's only 12 records per second. Unless your validation is hugely processor intensive, a single thread could probably handle 1,200 records per second with no problem.
In any case, your major bottleneck is going to be the database updates, which probably won't benefit from multiple threads.

Related

Need suggestion about java thread pool execution queue processing

In my application we have number of clients Databases, every hour we
get new data for processing in that databases
There is a cron to checks data from this databases and pickup the data and
Then Create thread pool and It start execution of 30 threads in parallel and
remaining thread are store in queue
it takes several hours to process this all threads
So while execution, if new data arrives then it has to wait, because this cron
will not pickup this newly arrived data until it's current execution is not
got finished.
Sometimes we have priority data for processing but due to this case that
clients also need to wait for several hours for processing their data.
Please give me the suggestion to avoid this wait state for newly arrived data
(I am working on java 1.7 , tomcat7 and SQL server2012)
Thank you in advance
Please let me know, for more information on this if not clear

Each of your thread should procces data in bulk (for example 100/1000 records) and this records should be selected from DB by priority. Each time you select new records for proccesing data with highest priority go first.
I can't create comment yet :(

For this problem we are thinking about two solution
Create more then one thread pool for processing normal and high
priority data.
Create more then one tomcat instance with same code for processing normal and priority
data
But I am not understanding which solution is best for my case 1 or 2
Please give me suggestions about above solutions, so that i can take decision on it

You can use ExecutorService newCachedThreadPool()
Benefits of using a cached thread pool :
The pool creates new threads if needed but reuses previously constructed threads if they are available.
Only if no threads are available for reuse will a new thread be created and added to the pool.
Threads that have not been used for more than sixty seconds are terminated and removed from the cache. Hence a pool which has not been used long enough will not consume any resources.

How to correctly use BlockingQueue in java when I want to drop messages from the its head

I'm writing app for Android that process real-time data.
My app reads binary data from data bus (CAN), parse and display it on the screen.
App reads data in background thread. A need rapidly transfer data from one thread to another. Displaying data should be most actual.
I've found the nice java queue that almost implements required behavior: LinkedBlockingQueue. I plan to set the strong limit for this queue (about 100 messages).
Consumer thread should read data from queue with the take() method. But producer thread can't wait for consumer. By this reason it can't use standard method put() (because it's blocking).
So, I plan to put messages to my queue using the following construction:
while (!messageQueue.offer(message)) {
messageQueue.poll();
}
That is, the oldest message should be removed from queue to provide a place for the new actual data.
Is this a good practice? Or I've lost some important details?

Can't see anything wrong with it. You know what you are doing (loosing the head record). This can't relate to any practice; it's your call to use the api like you want. I personally prefer ArrayBlockingQueue though (less temp objects).

This should be what you're looking for: Size-limited queue that holds last N elements in Java
Top answer refers to an apache lib queue which will drop elements.

Multi-threaded file processing and database batch insertions

I have an Java main application which will read a file, line-by-line. Each line represents subscriber data.
name, email, mobile, ...
An subscriber object is created for each line being processed and then this object is persisted in database using JDBC.
PS: Input file has around 15 million subscriber data and application takes around 10-12 hours to process. I need to reduce this to around 2-3 hours as this task is an migration activity and down-time that we get is around 4-5 hours.
I know I need to use multiple thread / thread pool may be Java's native ExecuterService. But I am asked to do a batch update as well. Say taking a thread pool of 50 or 100 worker threads and batch update of 500-1000 subscribers.
I am familiar using ExecuterService but not getting an approach where I can have batch update logic too in it.
My overall application code looks like:
while (null != (line = getNextLine())) {
Subscriber sub = getSub(line); // creates subscriber object by parsing the line
persistSub(sub); // JDBC - PreparedStatement insert query executed
}
Need to know an approach where I can process it faster with multiple threads and using batch update or any existing frameworks or Java API's which can be used for such cases.

persistSub(sub) should not immediately access the database. Instead, it should store sub in an array of length 500-1000 and only when the array is full, or the input file terminated, wrap it in a Runnable and submit to a thread pool. The Runnable then accesses database via jdbc like it is described in JDBC Batching with PrepareStatement Object.
UPDATE
If writing into database is slow and input file reading is fast, many arrays with data can be created waiting to be written in the database, and system can run out of memory. So persistSub(sub) should keep track of the number of allocated arrays. The easiest way is to use a Semaphore inbitialized with allowed number of arrays. Before a new array is allocated, persistSub(sub) makes Semaphore.aquire(). Each Runnable task, before its end, makes Semaphore.release().

Reading huge file in Java

I read a huge File (almost 5 million lines). Each line contains Date and a Request, I must parse Requests between concrete **Date**s. I use BufferedReader for reading File till start Date and than start parse lines. Can I use Threads for parsing lines, because it takes a lot of time?

It isn't entirely clear from your question, but it sounds like you are reparsing your 5 million-line file every time a client requests data. You certainly can solve the problem by throwing more threads and more CPU cores at it, but a better solution would be to improve the efficiency of your application by eliminating duplicate work.
If this is the case, you should redesign your application to avoid reparsing the entire file on every request. Ideally you should store data in a database or in-memory instead of processing a flat text file on every request. Then on a request, look up the information in the database or in-memory data structure.
If you cannot eliminate the 5 million-line file entirely, you can periodically recheck the large file for changes, skip/seek to the end of the last record that was parsed, then parse only new records and update the database or in-memory data structure. This can all optionally be done in a separate thread.

Firstly, 5 million lines of 1000 characters is only 5Gb, which is not necessarily prohibitive for a JVM. If this is actually a critical use case with lots of hits then buying more memory is almost certainly the right thing to do.
Secondly, if that is not possible, most likely the right thing to do is to build an ordered Map based on the date. So every date is a key in the map and points to a list of line numbers which contain the requests. You can then go direct to the relevant line numbers.
Something of the form
HashMap<Date, ArrayList<String>> ()
would do nicely. That should have a memory usage of order 5,000,000*32/8 bytes = 20Mb, which should be fine.
You could also use the FileChannel class to keep the I/O handle open as you go jumping from on line to a different line. This allows Memory Mapping.
See http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html
And http://en.wikipedia.org/wiki/Memory-mapped_file

A good way to parallelize a lot of small tasks is to wrap the processing of each task with a FutureTask and then pass each task to a ThreadPoolExecutor to run them. The executor should be initalized with the number of CPU cores your system has available.
When you call executor.execute(future), the future will be queued for background processing. To avoid creating and destroying too many threads, the ScheduledThreadPoolExecutor will only create as many threads as you specified and execute the futures one after another.
To retrieve the result of a future, call future.get(). When the future hasn't completed yet (or wasn't even started yet), this method will freeze until it is completed. But other futures get executed in background while you wait.
Remember to call executor.shutdown() when you don't need it anymore, to make sure it terminates the background threads it otherwise keeps around until the keepalive time has expired or it is garbage-collected.
tl;dr pseudocode:
create executor
for each line in file
create new FutureTask which parses that line
pass future task to executor
add future task to a list
for each entry in task list
call entry.get() to retrieve result
executor.shutdown()

Improve performance of database access

I have an application that listens on a port for UDP datagrams. I use a UDP inbound channel adapter to listen on this port. My UDP channel adapter is configured to use a ThreadPoolTaskExecutor to dispatch incoming UDP datagrams. After the UDP channel adapter I use a direct channel. My channel has only one subscriber i.e. a service activator.
The service adds the incoming messages to a synchronized list stored in memory. Then, I have a single thread that retrieves the content of the list every 5 seconds and does a batch update to a MySQL database.
My problem:
A first bulk of message arrives. The threads of my ThreadPoolExecutor get the incoming message from the UDP channel adapter and add them to the synchronized list. Let's say 10000 messages have been received and inserted.
The background thread retrieves the 10000 messages and does a batch update (JdbcTemplate.update(String[]).
At this point, the background thread waits the response from the database. But, now, because it takes time to the database to execute the 10000 INSERT, 20000 messages have been received and are present in the list.
The background thread receives a response from the database. Then, it retrieves the 20000 messages and does a batch update (JdbcTemplate.update(String[]).
It takes more time to the database to execute the INSERT and during this time, 35000 messages have been received and stored in the list.
The heap size grows constantly and causes after a certain time a memory execption.
I'm trying to find solution to improve the performance of my application.
Thanks

Storing 10,000 records every 5 seconds is quite alot for any database to sustain.
You need to consider other options
use a different data store e.g a NoSQL data store, or a flat file.
ensure you have good write performance on your disks e.g using a write cache.
use a disk sub-system with mutliple disks or an SSD drive.

Suggestions
a. Do you really need a single synchronized list? Can't you have a group of lists, and let's say divide the work between these lists , let's say by running hashCode on a key of the data?
b. Can you use a thread pool of threads that read information from the list (I would use a queue here, by the way) , this way, when one thread is "stuck" due to heavy batch insertion, other threads can still read "jobs" from the queue and perform them?
c. Is your database co-hosted on the same machine as the application? This can improve performance
d. Can you post your insert query? maybe someone can offer you a way to optimize it?

Use a Database Connection pool so that you don't have to wait on the commit on any one thread. Just grab the next available connection and do parallel inserts.

I get 5.000 inserts per second sustained on a SQLServer table, but that required quite a few optimizations. Did not use all of the tips below, some might be of use to you.
Check the MySQL Insert Speed documentation tips in
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
Parallelize the insert process
Aggregate messages if possible. Instead of storing all messages insert a row with information about received messages in a timeframe, of a certain type etc.
Change the table to have no Indexes or Foreign keys except for the primary key
Switch to writing to a textfile (and import that during the night in a loaddata bulk file insert if you really want it in the database)
Use a seperate database instance to serve only your table
...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.