I wish to execute LOAD DATA LOW_PRIORITY INFILE statement from Java.
I am dealing only with MyISAM engine.
I am interested if statement.execute("LOAD DATA LOW_PRIORITY INFILE ...") will execute this query asynchronously, or will it block until this statement will be completed.
I am asking this since I have SQL operations after this statement that are based on the loaded data, but I am still interested that any read operation on this table that are executed concurrently will have higher priority over LOAD DATA statement.
LOAD DATA LOW_PRIORITY INFILE ... blocks until completion on the commandline, so I assume your code will block too.
If you want concurrent transactions to be able to read from your table during the import, then you want to use the CONCURRENT option instead of LOW PRIORITY.
As stated in the manual:
If you specify CONCURRENT with a MyISAM table that satisfies the
condition for concurrent inserts (that is, it contains no free blocks
in the middle), other threads can retrieve data from the table while
LOAD DATA is executing.
Related
I want to delete entries from multiple tables in a postgreSQL DB.
The tables have foreign key constraints, so I need to delete them in particular order only (otherwise delete will fail).
I am thinking of adding them to a batch and running executeBatch()
I understand that executeBatch submits all the statements together to driver but how are statements executed? Is the order of deletion will be maintained as per order of adding to the batch? I can't find it mentioned in API doc
The JDBC 4.3 specification explicitly specifies the behaviour of a batch execution in section 14.1.2 Successful Execution:
Batch commands are executed serially (at least logically) in the order
in which they were added to the batch.
and
The entries in the array are ordered according to the order in which
the commands were processed (which, again, is the same as the order in
which the commands were originally added to the batch).
The "at least logically" gives databases some leeway to reorder things as an optimization, as long as the resulting behaviour is the same as if the batch was executed in the specified order. Execution in-order is also necessary to ensure the returned update counts match, and for exception behaviour.
They are executed in order.
The purpose of "batching" is to collect the SQL statements and transmit them as a block, a sequence of statements, in order to reduce the network overhead of communicating with the database server.
A full "send SQL, wait for response" takes time, so by sending multiple requests together, a lot of waiting time can be eliminated.
I am fetching records (of large data set, around 1 Million records)from MariaDB in batches of size 500 (by using 'limit').
For each fetch iteration I am opening and closing the connection.
In my peer review I was advised to fetch the result set once and batch process by iterating on the result set itself, i.e. without closing the connection.
Is the second method right way of doing it ?
Edit : After I fetch records in batches of size 500 I am updating a field for each record and putting it on a messaging queue.
Yes, the second method is the right way to do it. Here are some reasons:
There is overhead to running the query multiple times.
The underlying data might change in the tables you are using, and the separate batches might be inconsistent.
You are depending on the ordering of the results and the sorting might have duplicates.
Your program starts
Connect to database
do some SQL (selects/inserts/whatever)
do some more SQL (selects/inserts/whatever)
do some more SQL (selects/inserts/whatever)
...
Disconnect from database
Your program ends
That is, keep the connection open as long as needed during the program. (Even if you don't explicitly disconnect, the termination of your program will do the termination. This is important to note when doing a web site -- each 'page' is essentially a separate 'program'; the db connection cannot be held between pages.)
You have another implied question... "Should I grab a batch of rows at once, then process them in the client?" The answer is "It depends".
If the processing can be done in SQL, it is probably much more efficient to do it there. Example: summing up some numbers.
If you fetch some rows from one table, then for each of those rows, fetch row(s) from another table... It will be much more efficient to use an SQL JOIN.
"Batching" may not be relevant. The client interface is probably
Fetch rows from a table (possibly all rows, even if millions)
Look at each row in turn.
Please provide the specifics of what you will be doing with the million rows so we can discuss more specifically.
Polling loop:
If you check for new things to do only once a minute, do reconnect each time.
Given that, it does not make sense to hang onto a resultset between pools.
Opening and closing a database connection is quite time consuming. Keeping the connection open will save a lot of time.
I have got one scenario where I need to insert 16k records in a DB table. So along with normal DB batch insert I have created Callable task which will take up respective batch(size of 500 records) and will do insertion in independent manner. I am curious to know how underlying database will take these requests. Does database locking at page level will block rest of java threads until first thread with batch of 500 records get committed?
My answer is for Sybase ASE. For Sybase IQ see Guillaume's answer.
Does database locking at page level will block rest of java threads until first thread with batch of 500 records get committed?
That depends on what locking granularity you have set. According to Sybase's doc, there are three locking granularities:
Allpages locking, which locks datapages and index pages
Datapages locking, which locks only the data pages
Datarows locking, which locks only the data rows
So, if you select Allpages your threads will block until the current batch gets committed. Otherwise, your threads will not block, but will naturally incur a higher locking overhead.
For full details on Sybase ASE's locking granularity, see this documentation.
From Sybase IQ's documentation:
Sybase allows multiple readers, but only one writer to a table.
So, unless you open and close the transaction for every row you insert (which will be slow), your Threads will have to wait until one transaction closes to start a new one.
I have a J2EE server, currently running only one thread (the problem arises even within one single request) to save its internal model of data to MySQL/INNODB-tables.
Basic idea is to read data from flat files, do a lot of calculation and then write the result to MySQL. Read another set of flat files for the next day and repeat with step 1. As only a minor part of the rows change, I use a recordset of already written rows, compare to the current result in memory and then update/insert it correspondingly (no delete, just setting a deletedFlag).
Problem: Despite a purely sequential process I get lock timeout errors (#1204) and Innodump show record locks (though I do not know how to figure the details). To complicate things under my windows machine everything works, while the production system (where I can't install innotop) has some record locks.
To the critical code:
Read data and calculate (works)
Get Connection from Tomcat Pool and set to autocommit=false
Use Statement to issue "LOCK TABLES order WRITE"
Open Recordset (Updateable) on table order
For each row in Recordset --> if difference, update from in-memory-object
For objects not yet in the database --> Insert data
Commit Connection, Close Connection
The Steps 5/6 have an Commitcounter so that every 500 changes the rows are committed (to avoid having 50.000 rows uncommitted). In the first run (so w/o any locks) this takes max. 30sec / table.
As stated above right now I avoid any other interaction with the database, but it in future other processes (user requests) might read data or even write some fields. I would not mind for those processes to read either old/new data and to wait for a couple of minutes to save changes to the db (that is for a lock).
I would be happy to any recommendation to do better than that.
Summary: Complex code calculates in-memory objects which are to be synchronized with database. This sync currently seems to lock itself despite the fact that it sequentially locks, changes unlocks the tables without any exceptions thrown. But for some reason row locks seem to remain.
Kind regards
Additional information:
Mysql: show processlist lists no active connections (all asleep or alternatively waiting for table locks on table order) while "show engine INNODB" reports a number of row locks (unfortuantely I can't understand which transaction is meant as output is quite cryptic).
Solved: I wrongly declared a ResultSet as updateable. The ResultSet was closed only on a "finalize()" method via Garbage Collector which was not fast enough - before I reopended the ResultSet and tried therefore to aquire a lock on an already locked table.
Yet it was odd, that innotop showed another query of mine to hang on a completely different table. Though as it works for me, I do not care about oddities:-)
I write JAVA program by using multi thread,
I have more than 5000 threads and each thread accesses the same table to insert or select data (Not to update).
I use HSQLDB (file mode) with Hibernate/Spring.
The reason that i use multi thread is to reduce execusion time, but table is access by One thread at the time.
I configure hsqldb.tx=mvcc for multi version control but any changes
Can some one know how to alow multiple thread to access the same table at the same time?
Using more than one thread to SELECT from a table improves performance because the threads can access the same database table at the same time.
When multiple threads perform INSERT into a table, the INSERT statements must be executed one at a time by the database because there may be PRIMARY KEY or UNIQUE constraints that have to be checked in a queue to prevent inconsistencies in the database.
In any case, the computer is capable of running threads up the number of CPU cores at the same time. If you have more threads, they are queued by the OS.
I feel HSQLDB file mode,hsql mode and res mode do not support multithreading or multi user connection one at the same time. I am also detecting the required mode to perform the simultaneous tasks