Concurent access in hsqldb with JAVA Thread

Concurent access in hsqldb with JAVA Thread - java

I write JAVA program by using multi thread,
I have more than 5000 threads and each thread accesses the same table to insert or select data (Not to update).
I use HSQLDB (file mode) with Hibernate/Spring.
The reason that i use multi thread is to reduce execusion time, but table is access by One thread at the time.
I configure hsqldb.tx=mvcc for multi version control but any changes
Can some one know how to alow multiple thread to access the same table at the same time?

Using more than one thread to SELECT from a table improves performance because the threads can access the same database table at the same time.
When multiple threads perform INSERT into a table, the INSERT statements must be executed one at a time by the database because there may be PRIMARY KEY or UNIQUE constraints that have to be checked in a queue to prevent inconsistencies in the database.
In any case, the computer is capable of running threads up the number of CPU cores at the same time. If you have more threads, they are queued by the OS.

I feel HSQLDB file mode,hsql mode and res mode do not support multithreading or multi user connection one at the same time. I am also detecting the required mode to perform the simultaneous tasks

Related

JDBC transactions in multi-threaded environment

Developing a Java application that share a single Connection between mutiple threads, the problem of concurrency arise.
If thread A updates record 1 in table T, and simultaneously thread B issues a SELECT on record 1 in table T, how do I ensure thread B reads the updated values of thread A?
java.sql.Connection offers transactions with begin(), commit() and rollback(), but does this process also cover data correctness?
I think I'm missing something.

Two points:
You shouldn't share a jdbc.Connection between threads, at least for any 'seriously production' code, see here. For demo purposes, I think, sharing a Connection is OK;
If a thread reads from DB after relevant DB transaction is committed, it will see data written by another thread.
For your second question
will thread B timeout until the first transaction has commit() or rollback()
-- B will block till A tx is finished (either by commit or rollback) if:
B tries to update/delete same table row which is being updated by A, and ...
A updates that row under DB-level lock, using SELECT ... FOR UPDATE.
You can get this behavior using two consoles (for example, with PostgreSQL psql), each console stands for a thread:
in A console type following:
BEGIN;
SELECT some_col FROM some_tbl WHERE some_col = some_val FOR UPDATE;
now in B console type:
BEGIN;
UPDATE some_tbl SET some_col = new_val WHERE some_col = some_val;
You should see that UPDATE blocks until in A you do either COMMIT or ROLLBACK.
Above explanation uses separate DB connections, just like Java JDBC connection pool. When you share single connection between Java threads, I think, any interaction with DB will block if connection is used by some other thread.

Jdbc is a standard that is broadly adopted but with uneven levels of adherence, it is probably not good to make sweeping statements about what is safe.
I would not expect there is anything to keep statement executions and commits and rollbacks made from multiple threads from getting interleaved. Best case, only one thread can use the connection at a time and the others block, making multithreading useless.
If you don't want to provide a connection to each thread, you could have the threads submit work items to a queue that is consumed by a single worker thread handling all the jdbc work. But it's probably less impact on existing code to introduce a connection pool.
In general if you have concurrent updates and reads then they happen in the order that they happen. Locking and isolation levels provide consistency guarantees for concurrent transactions but if one hasn't started its transaction yet those aren't applicable. You could have a status flag, version number, or time stamp on each row to indicate when an update occurred.
If you have a lot of updates it can be better to collect them in a flat file and execute a bulk copy. It can be much faster than using jdbc. Then with updates out of the way execute selects in jdbc.

Getting usernames from database that are not being used by a thread

I have a multi threaded Java program where each thread gets one username for some processing which takes about 10 minutes or so.
Right now it's getting the usernames by a sql query that returns one username randomly and the problem is that the same username can be given to more than one thread at a time.
I don't want a username that is being processed by a thread, to be fetched again by another thread. What is a simple and easy way to achieve this goal?

Step-by-step solution:
Create a threads table where you store the threads' state. Among other columns, you need to store the owner user's id there as well.
When a thread is associated to a user, create a record, storing the owner, along with all other juicy stuff.
When a thread is no longer associated to a user, set its owner to null.
When a thread finishes its job, remove its record.
When you randomize your user for threads, filter out all the users who are already associated to at least a thread. This way you know any users at the end of randomization are threadless.
Make sure everything is in place. If, while working on the feature some thread records were created and should be removed or disposed from its owner, then do so.

There is a lot of ways to do this... I can think of three solution to this problem:
1) A singleton class with a array that contains all the user already in use. Be sure that the acces to the array is synchronized and you remove the unused users from it.
2) A flag in the user table that contains a unique Id referencing the thread that is using it. After you have to manage when you remove the flag from the table.
-> As an alternative, why do you check if a pool of connections shared by all the thread could be the solution to your problem ?

You could do one batch query that returns all of the usernames you want from the database and store them in a List (or some type of collection).
Then ensure synchronised access to this list to prevent two threads taking the same username at the same time. Use a synchronised list or a synchronised method to access the list and remove the username from the list.

One way to do it is to add another column to your users table.this column is a simple flag that shows if a user has an assigned thread or not.
but when you query the db you have to wrap it in a transaction.
you begin the transaction and then first you select a user that doesn't have a thread after that you update the flag column and then you commit or roll back.
since the queries are wrapped in a transaction the db handles all the issues that happen in scenarios like this.
with this solution there is no need to implement synchronization mechanisms in your code since the database will do it for you.
if you still have problems after doing this i think you have to configure isolation levels of your db server.

You appear to want a work queue system. Don't reinvent the wheel - use a well established existing work queue.
Robust, reliable concurrent work queuing is unfortunately tricky with relational databases. Most "solutions" land up:
Failing to cope with work items not being completed due to a worker restart or crash;
Actually land up serializing all work on a lock, so all but one worker are just waiting; and/or
Allowing a work item to be processed more than once
PostgreSQL 9.5's new FOR UPDATE SKIP LOCKED feature will make it easier to do what you want in the database. For now, use a canned reliable task/work/message queue engine.
If you must do this yourself, you'll want to have a table of active work items where you record the active process ID / thread ID of the worker that's processing a row. You will need a cleanup process that runs periodically, on thread crash, and on program startup that removes entries for failed jobs (where the worker process no longer exists) so they can be re-tried.
Note that unless the work the workers do is committed to the database in the same transaction that marks the work queue item as done, you will have timing issues where the work can be completed then the DB entry for it isn't marked as done, leading to work being repeated. To absolutely prevent that requires that you commit the work to the DB in the same transaction as the change that marks the work as done, or that you use two-phase commit and an external transaction manager.

Mysql/JDBC: Deadlock

I have a J2EE server, currently running only one thread (the problem arises even within one single request) to save its internal model of data to MySQL/INNODB-tables.
Basic idea is to read data from flat files, do a lot of calculation and then write the result to MySQL. Read another set of flat files for the next day and repeat with step 1. As only a minor part of the rows change, I use a recordset of already written rows, compare to the current result in memory and then update/insert it correspondingly (no delete, just setting a deletedFlag).
Problem: Despite a purely sequential process I get lock timeout errors (#1204) and Innodump show record locks (though I do not know how to figure the details). To complicate things under my windows machine everything works, while the production system (where I can't install innotop) has some record locks.
To the critical code:
Read data and calculate (works)
Get Connection from Tomcat Pool and set to autocommit=false
Use Statement to issue "LOCK TABLES order WRITE"
Open Recordset (Updateable) on table order
For each row in Recordset --> if difference, update from in-memory-object
For objects not yet in the database --> Insert data
Commit Connection, Close Connection
The Steps 5/6 have an Commitcounter so that every 500 changes the rows are committed (to avoid having 50.000 rows uncommitted). In the first run (so w/o any locks) this takes max. 30sec / table.
As stated above right now I avoid any other interaction with the database, but it in future other processes (user requests) might read data or even write some fields. I would not mind for those processes to read either old/new data and to wait for a couple of minutes to save changes to the db (that is for a lock).
I would be happy to any recommendation to do better than that.
Summary: Complex code calculates in-memory objects which are to be synchronized with database. This sync currently seems to lock itself despite the fact that it sequentially locks, changes unlocks the tables without any exceptions thrown. But for some reason row locks seem to remain.
Kind regards
Additional information:
Mysql: show processlist lists no active connections (all asleep or alternatively waiting for table locks on table order) while "show engine INNODB" reports a number of row locks (unfortuantely I can't understand which transaction is meant as output is quite cryptic).

Solved: I wrongly declared a ResultSet as updateable. The ResultSet was closed only on a "finalize()" method via Garbage Collector which was not fast enough - before I reopended the ResultSet and tried therefore to aquire a lock on an already locked table.
Yet it was odd, that innotop showed another query of mine to hang on a completely different table. Though as it works for me, I do not care about oddities:-)

Using (preferably Hazelcast) distributed synchronization primitives for efficiently synchronizing MySQL Table access

I have a MySQL DB table with about 100k - 200k rows. The table rows are created, updated, deleted frequently every 5-10 seconds. Currently a single JVM app instance reads all the rows and processes all the entities in the table which are found to be in a specific state. The processing task is scheduled every 1-2 minutes and processing itself takes about 1-2 minutes. Now I want to make this JVM app a clustered fault tolerant service. One option is to have multiple instances acquire a distributed Hazelcast lock, and whichever instance acquires the lock processes the entities. However ideally I would want all the JVM instances partaking in the service to be processing some of the rows but at the same time ensure that each row is processed at least once in a given 5 minute interval.
Is there a way I can use Hazelcast to shard the responsibility for a subset of table rows among multiple node instances ?
PS : Replacing MySQL is not an option, open source alternatives to Hazelcast are an option.

I think solving the problem of this case with Hazelcast distributed lock is unnecessary.
Quartz is best option for your problem.
http://quartz-scheduler.org/

Best practice for multithreaded applications that talk to a sql database

I have 20 factory machines all processing tasks. I have a database adapter that talks to a MYSQL database that the machines store all their info in. Is it safe to have each factory machine panel have it's own database adapter and updater thread? The updater thread just continuously checks to see if the panel's current taskID is the same as the current in the database and if not it repopulates the panel with information about the new task.
I'm not certain if having too many connections will add overhead or not?

RDBS are designed to be accessed by multiple clients at a given time. It's one of their purpose.
So I don't think 20, or even a thousand of simultaneous connection will cause any problem at all.

Rather than having many connection all doing same task create one process which maintains the list of taskid (if it is different for all machines) and checks the current taskid in database. if it is changed then send message to all the machines (which has change in their taskid)to update their panels. This will avoid unnecessary load on database and will also handle increase in number of machines without any impact.

The number of connection in mysql is controlled by max_connections (total number of allowed simultaneous connection) system variable and max_user_connections(max number of simultaneous connections per user). Take a look on your server settings and maybe change them. Default numbers are definitely bigger than 20 though.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.