Our applications may run over multiple DB depending on customer infrastructure.
We use Hibernate orm, so we can deploy our applications over various RDBMS.
We notice that we have an abnormal consumption of memory in all the environments where a MySql database is used.
Analyzing the problem we see that it raises when we use scrollable result.
At the opposite of other environnements (SQL server, Oracle,....) it looks like the scrollable result in MySql fetch immediately all the results of the query.
As far as I know and I have seen in Oracle and SQL Server when you use scrollable result a cursor is open, and rows are fetched from the DB server only when next() method is called. All our cursors are `FORWARD_ONLY.
I think that the cause is the MySql-jdbc-connector that doesn't handle correctly the scrollable result.
Is that correct?
Can I have the scrollable result working correctly on MySql? And, if yes, how?
Thanks in advance to anyone
Yes apparently MySQL caches ResultSet data by default because it's the most efficent way for it
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
By default, ResultSets are completely retrieved and stored in memory.
In most cases this is the most efficient way to operate and, due to
the design of the MySQL network protocol, is easier to implement.
you can try: query.setFetchSize(), it will give the underlying jdbc hint about your requirment.
But it all depends on the driver, some may actually just ignore it and proceed to fetch everything
Related
For a thick-client project I'm working on, I have to remotely connect to a database (IBM i-series) and perfom a number of SQL related tasks:
Download/Update a set of local/offline 'control' data - this data may have changed between runs unnoticed.
On command, download data from multiple (15-20) tables and store separately into a single Java object. The names of the tables are known, but the schema name changes between runs and can change inter-run (as far as I know, PreparedStatements do not allow one to dynamically insert the schema).
I had considered using joins/unions/etc to perform all of these queries as one, but the project requires me to have in-memory separations between table data (instead of one big joined lump).
Perform between 2 and 100+ repetitions of (2)
The last factor is that this needs to be run on high-latency (potentially dial-up) network connections using Java 1.5 on the oldest computers possible.
Currently I run 15-20 dynamically constructed PreparedStatements but I know this to be rather inefficient (I measured, so as to avoid premature optimization ala Knuth).
What would be the most efficient and error-tolerant method of performing these tasks?
My thoughts:
Regarding (1), I really have no idea other than checking the entire table against the new table, at which point I feel I might as well just download the new (potentially and likely unchanged) table and replace the old one, but this takes more time.
For (2): Ideally I'd be able to construct something similar to an array of SELECT statements, send them all at once, and have the database return one ResultSet per internal query. From what I understand, however, neither Statement nor PreparedStatement support returning multiple ResultSet objects.
Lastly, the best way I can think of doing (3) is to batch a number of (2) operations.
There is nothing special about having moving requirements, but the single most important thing to use when talking to most databases is having a connection pool in your Java application and use it properly.
This also applies here. The IBM i DB2/400 database is quite fast, and the database driver available in the jt400 project (type 4, no native code) is quite good, so you can pull over quite a bit of data in a short while simply by generating SQL on the fly.
Note that if you only have a single schema you can tell in the conneciton which one you need, and can then use non-qualified table names in your SQL statements. Read the JDBC properties in the InfoCenter very carefully - it is a bit tricky to get right. If you need multiple schemaes, the "naming=system" allows for library lists - i.e. a list of schemaes to look for the tables, which can be very useful when done correctly. The IBM i folks can help you here.
That said, if the connection is the limiting factor, you might have a very strong case for running the "create object from tables" Java code directly on the IBM i. You should already now prepare for being able to measure the traffic to the database - either with network monitoring tooling, using p6spy or simply going through a proxy (perhaps even a throtteling one)
Ideally, you would have the database group provide you with a set of stored procedures to optimize the access to the database.
Since you don't have access, you may want to ask them if they have timestamp data in the database at the row level to see when records were modified, this way you can select only the data that's changed since some point in time.
What #ThorbjørnRavnAndersen is suggesting is moving the database code on to the IBM host and connecting to it via RMI or JMS from the client. So the server code would be a RMI or JMS Server that accesses the database on your behalf and returns you java objects instead of bringing SQL resultsets across the wire.
I would pass along your requirements to the database team and see if they can't do something for you. I'm sure they don't want all these remote clients bringing all the data down each time, so it would benefit them as much as it would benefit you.
I have a big production web-application (Glassfish 3.1 + MySQL 5.5). All tables are InnoDB. Once per several days application totally hangs.
SHOW FULL PROCESSLIST shows many simple insert or update queries on different tables but all having status
Waiting for table level lock
Examples:
update user<br>
set user.hasnewmessages = NAME_CONST('in_flag',_binary'\0' COLLATE 'binary')
where user.id = NAME_CONST('in_uid',66381)
insert into exchanges_itempacks
set packid = NAME_CONST('in_packId',332149), type = NAME_CONST('in_type',1), itemid = NAME_CONST('in_itemId',23710872)
Queries with the longest 'Time' are waiting for the table-level lock too.
Please help to figure out why MySQL tries to get level lock and what can be locking all these tables. All articles about the InnoDB locking say this engine uses no table locking if you don't force it to do so.
My my.cnf has this:
innodb_flush_log_at_trx_commit = 0
innodb_support_xa = 0
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode=2
Binary log is off. I have no "LOCK TABLES" or other explicit locking commands at all. Transactions are READ_UNCOMMITED.
SHOW ENGINE INNODB STATUS output:
http://avatar-studio.ru:8080/ph/imonout.txt
Are you using MSQLDump to backup your database while it is still being accessed by your application? This could cause that behaviour.
I think there are some situations when MySQL does a full table lock (i.e. using auto-inc).
I found a link which may help you: http://mysqldatabaseadministration.blogspot.com/2007/06/innodb-table-locks.html
Also review java persistence code having all con's commited/rollbacked and closed. (Closing always in finally block.)
Try setting innodb_table_locks=0 in MySQL configuration.
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_table_locks
Just a few ideas ...
I see you havily use NAME_CONST in your code. Just try not to use it. You know, mysql can be sometimes buggy (I also found several bugs), so I recommend don't rely on features which are not so common / well tested. It is related to column names, so maybe it locks something? Well it should't if it affects only the result, but who knows? This is suspicious. Moreover, this is marked as a function for internal use only.
This may seem simple, but you don't have a long-running select statement that is possibly locking out updates and inserts? There's no query that's actually running and not locked?
Have you considered using MyISAM instead of InnoDB?
If you are not utilizing any transactional features, MyISAM might make more sense.
Its simpler, easier to optimize, and since it doesn't have sophisticated transactional capabilities, easier to configure in your my.cnf.
Also, depending on the type of db load your app creates, MyISAM might be more appropriate. I prefer MyISAM for read-heavy applications, again, it's easier to configure and understand.
Other suggestions:
It might be a good idea to find a way to not use NAME_CONST in your SQL.
"This function was added in MySQL 5.0.12. It is for internal use only."
When the documentation of an open source product says this, its probably a good idea to heed it's advise.
By default, MySQL stores all InnoDB tables & schemas data in 1 enormous file, there could be some kind of OS level locking on that particular file that propogates to MySQL that prevents all table access. By using the innodb_file_per_table option , you may eliminate that potential issue. This also makes MySQL more space efficient.
in this case you have to create several different database table with same column each other and do not inset more then 3000 row per table, in this case if you want to enter more data into table you have to create another dynamic table(generate table using code) and insert new data into this table and access data from that table. in your condition if more and more table will have to generate then you have to create new database.
i think this tip will help you to design your database more carefully and solve error.
I've been looking around trying to determine some Hibernate behavior that I'm unsure about. In a scenario where Hibernate batching is properly set up, will it only ever use multiple insert statements when a batch is sent? Is it not possible to use a DB independent multi-insert statement?
I guess I'm trying to determine if I actually have the batching set up correctly. I see the multiple insert statements but then I also see the line "Executing batch size: 25."
There's a lot of code I could post but I'm trying to keep this general. So, my questions are:
1) What can you read in the logs to be certain that batching is being used?
2) Is it possible to make Hibernate use a multi-row insert versus multiple insert statements?
Hibernate uses multiple insert statements (one per entity to insert), but sends them to the database in batch mode (using Statement.addBatch() and Statement.executeBatch()). This is the reason you're seeing multiple insert statements in the log, but also "Executing batch size: 25".
The use of batched statements greatly reduces the number of roundtrips to the database, and I would be surprised if it were less efficient than executing a single statement with multiple inserts. Moreover, it also allows mixing updates and inserts, for example, in a single database call.
I'm pretty sure it's not possible to make Hibernate use multi-row inserts, but I'm also pretty sure it would be useless.
I know that this is an old question but i had the same problem that i thought that hibernate batching means that hibernate would combine multiple inserts into one statement which it doesn't seem to do.
After some testing i found this answer that a batch of multiple inserts is just as good as a multi-row insert. I did a test inserting 1000 rows one time using hibernate batch and one time without. Both tests took about 20s so there was no performace gain in using hibernate batch.
To be sure i tried using the rewriteBatchedStatements option from the MySQL Connector/J which actually combines multiple inserts into one statement. It reduced the time to insert 1000 records down to 3s.
So after all hibernate batch seems to be useless and a real multi-row insert to be much better. Am i doing something wrong or what causes my test results?
The Oracle bulk insert collect an array of entyty and pass in a single block to the db associating to it a unic ciclic insert/update/delete.
Is unic way to speed network throughput .
Oracle suggest to do it calling a stored procedure from hibernate passing it an array of datas.
http://biemond.blogspot.it/2012/03/oracle-bulk-insert-or-select-from-java.html?m=1
Is not only a software problem but infrastructural!
Problem is network data flow optimization and TCP stack fragmentation.
Mysql have function.
You have to do something like what is described in this article.
Normal transfer on network the correct volume of data is the solution
You have also to verify network mtu and Oracle sdu/tdu utilization respect data transferred between application and database
Is there a way to get a ResultSet you obtain from running a JDBC query to be lazily-loaded? I want each row to be loaded as I request it and not beforehand.
Short answer:
Use Statement.setFetchSize(1) before calling executeQuery().
Long answer:
This depends very much on which JDBC driver you are using. You might want to take a look at this page, which describes the behavior of MySQL, Oracle, SQL Server, and DB2.
Major take-aways:
Each database (i.e. each JDBC driver) has its own default behavior.
Some drivers will respect setFetchSize() without any caveats, whereas others require some "help".
MySQL is an especially strange case. See this article. It sounds like if you call setFetchSize(Integer.MIN_VALUE), then it will download the rows one at a time, but it's not perfectly clear.
Another example: here's the documentation for the PostgreSQL behavior. If auto-commit is turned on, then the ResultSet will fetch all the rows at once, but if it's off, then you can use setFetchSize() as expected.
One last thing to keep in mind: these JDBC driver settings only affect what happens on the client side. The server may still load the entire result set into memory, but you can control how the client downloads the results.
Could you not achieve this by setting the fetch size for your Statement to 1?
If you only fetch 1 row at a time each row shouldn't be loaded until you called next() on the ResultSet.
e.g.
Statement statement = connection.createStatement();
statement.setFetchSize(1);
ResultSet resultSet = statement.executeQuery("SELECT .....");
while (resultSet.next())
{
// process results. each call to next() should fetch the next row
}
There is an answer provided here.
Quote:
The Presto JDBC driver never buffers the entire result set in memory. The server API will return at most ~1MB of data to the driver per request. The driver will not request more data from the server until that data is consumed (by calling the next() method on ResultSet an appropriate number of times).
Because of how the server API works, the driver fetch size is ignored (per the JDBC specification, it is only a hint).
Prove that the setFetchSize is ignored
I think what you would want to do is defer the actually loading of the ResultSet itself. You would need to implement that manually.
You will find this a LOT easier using hibernate. You will basically have to roll-your-own if you are using jdbc directly.
The fetching strategies in hibernate are highly configurable, and will most likely offer performance options you weren't even aware of.
I am writing a program that does a lot of writes to a Postgres database. In a typical scenario I would be writing say 100,000 rows to a table that's well normalized (three foreign integer keys, the combination of which is the primary key and the index of the table). I am using PreparedStatements and executeBatch(), yet I can only manage to push in say 100k rows in about 70 seconds on my laptop, when the embedded database we're replacing (which has the same foreign key constraints and indices) does it in 10.
I am new at JDBC and I don't expect it to beat a custom embedded DB, but I was hoping it to be only 2-3x slower, not 7x. Anything obvious that I maybe missing? does the order of the writes matter? (i.e. say if it's not the order of the index?). Things to look at to squeeze out a bit more speed?
This is an issue that I have had to deal with often on my current project. For our application, insert speed is a critical bottleneck. However, we have discovered for the vast majority of database users, the select speed as their chief bottleneck so you will find that there are more resources dealing with that issue.
So here are a few solutions that we have come up with:
First, all solutions involve using the postgres COPY command. Using COPY to import data into postgres is by far the quickest method available. However, the JDBC driver by default does not currently support COPY accross the network socket. So, if you want to use it you will need to do one of two workarounds:
A JDBC driver patched to support COPY, such as this one.
If the data you are inserting and the database are on the same physical machine, you can write the data out to a file on the filesystem and then use the COPY command to import the data in bulk.
Other options for increasing speed are using JNI to hit the postgres api so you can talk over the unix socket, removing indexes and the pg_bulkload project. However, in the end if you don't implement COPY you will always find performance disappointing.
Check if your connection is set to autoCommit. If autoCommit is true, then if you have 100 items in the batch when you call executeBatch, it will issue 100 individual commits. That can be a lot slower than calling executingBatch() followed by a single explicit commit().
I would avoid the temptation to drop indexes or foreign keys during the insert. It puts the table in an unusable state while your load is running, since nobody can query the table while the indexes are gone. Plus, it seems harmless enough, but what do you do when you try to re-enable the constraint and it fails because something you didn't expect to happen has happened? An RDBMS has integrity constraints for a reason, and disabling them even "for a little while" is dangerous.
You can obviously try to change the size of your batch to find the best size for your configuration, but I doubt that you will gain a factor 3.
You could also try to tune your database structure. You might have better performances when using a single field as a primary key than using a composed PK. Depending on the level of integrity you need, you might save quite some time by deactivating integrity checks on your DB.
You might also change the database you are using. MySQL is supposed to be pretty good for high speed simple inserts ... and I know there is a fork of MySQL around that tries to cut functionalities to get very high performances on highly concurrent access.
Good luck !
try disabling indexes, and reenabling them after the insert. also, wrap the whole process in a transaction