Large ResultSet on postgresql query

Large ResultSet on postgresql query - java

I'm running a query against a table in a postgresql database. The database is on a remote machine. The table has around 30 sub-tables using postgresql partitioning capability.
The query will return a large result set, something around 1.8 million rows.
In my code I use spring jdbc support, method JdbcTemplate.query, but my RowCallbackHandler is not being called.
My best guess is that the postgresql jdbc driver (I use version 8.3-603.jdbc4) is accumulating the result in memory before calling my code. I thought the fetchSize configuration could control this, but I tried it and nothing changes. I did this as postgresql manual recomended.
This query worked fine when I used Oracle XE. But I'm trying to migrate to postgresql because of the partitioning feature, which is not available in Oracle XE.
My environment:
Postgresql 8.3
Windows Server 2008 Enterprise 64-bit
JRE 1.6 64-bit
Spring 2.5.6
Postgresql JDBC Driver 8.3-603

In order to use a cursor to retrieve data you have to set the ResultSet type of ResultSet.TYPE_FORWARD_ONLY (the default) and autocommit to false in addition to setting a fetch size. That is referenced in the doc you linked to but you didn't explicitly mention that you did those steps.
Be careful with PostgreSQL's partitioning scheme. It really does very horrible things with the optimizer and can cause massive performance issues where there should not be (depending on specifics of your data). In any case, is your row only 1.8M rows? There is no reason that it would need to be partitioned based on size alone given that it is appropriately indexed.

I'm betting that there's not a single client of your app that needs 1.8M rows all at the same time. You should think of a sensible way to chunk the results into smaller pieces and give users the chance to iterate through them.
That's what Google does. When you do a search there might be millions of hits, but they return 25 pages at a time with the idea that you'll find what you want in the first page.
If it's not a client, and the results are being massaged in some way, I'd recommend letting the database crunch all those rows and simply return the result. It makes no sense to return 1.8M rows just to do a calculation on the middle tier.
If neither of those apply, you've got a real problem. Time to rethink it.
After reading the later responses it sounds to me like this is more of a reporting solution that ought to be crunched in batch or calculated in real time and stored in tables that are not part of your transactional system. There's no way that bringing 1.8M rows to the middle tier for calculating moving averages can scale.
I'd recommend reorienting yourself - start thinking about it as a reporting solution.

The fetchSize property worked as described at postgres manual.
My mistake was that I was setting auto commit = false to a connection from a connection pool that was not the connection being used by the prepared statement.
Thanks for all the feedback.

I did everything above, but I needed one last piece: be sure the call is wrapped in a transaction and set the transaction to read only, so that no rollback state is required.
I added this: #Transactional(readOnly = true)
Cheers.

Related

Cassandra Prepared Statements broken after schema migration

I'm preparing statements in the constructor of my repository class, like this
PreparedStatement getStatement = cqlSession.prepare(selectFrom("the_table")
.all()
.whereColumn("the_key").isEqualTo(bindMarker())
.build());
I later bind it to a BoundStatement and read the results like this
String someColumn = row.isNull("some_column") ? null : row.getString("some_column")
During runtime I run an alter on the table, adding a column. Something I expected would work since I query for columns only by name. However, it seems that the driver internally has mapped the column names to column indexes which are now all broken.
Is this expected? It severely limits what can be done in run time.
Am I missing something that would have forced a re-prepare of a statement on schema change?
Can I intercept it and re-prepare manually?
I'm running spring boot 2.5.9 and the driver version 4.11.3
The cassandra cluster runs version 3.11.10

thanks for the question!
Your symptoms sound similar to the behaviours described in CASSANDRA-10786. If you're dealing with a version before Cassandra 4.0 this could very well be what's going on.
Does the description in that issue match up with what you're seeing?

To add to #absurdfarce's answer, Cassandra 3.7 is a very old release. In fact, it was released all the way back to early 2016.
There were several important fixes to prepared statements since then. Although a quick browse of those fixes don't seem to be directly related to the issue you reported, they are significant nonetheless.
Additionally there were several fixes to schema migration/propagation in the last 6 years. Again, they don't directly relate to your problem and I'm ordinarily loathe to push for an upgrade but I think 6 years' worth of fixes between C* 3.7 to 3.11.latest merits it. Cheers!

how to find out the slow queries in mysql (mostly in Mariadb)

I have few questions:
1) I am newbie to performance testing as a starting assignment I have to investigate the slow queries in MariaDb version : 10.0.17-MariaDB MariaDB Server .
I tried with these settings in the /etc/my.cnf.d/server.cnf
[mysqld]
long_query_time=1
log-slow-queries=/var/log/mysql/log-slow-queries.log
And after doing that I could no start the database. I get a simple
starting MySQL.... [FAILED] message.
I came across Slow query log overview for mariadb which made a little sense :(
Can any one provide me a tutorial of how it should be done.
2) In my application we already use Hibernate for data layer, Does it even make sense to find out the slow query log in the above mentioned way ?
3) How can i achieve the same thing in the mongodb.? like to list out the most frequently used queries, Slow queries ?
Any help would be appreciated.

Converting comment to answer:
When mysql won't start, as first thing you should check mysql error log (probably /var/log/(mysql/)mysqld.log) for exact error.
In your case "log-slow-queries" is starting option name (and deprecated too), you should use slow_query_log with boolean value and slow_query_log_file for filename.
slow_query_log=1 means ENABLE logging
long_query_time=1 means IF ENABLED log queries longer than 1 second
then there is
- log_queries_not_using_indexes=0/1 which, if enabled, will log even queries faster than 1s if they are not using indexes to locate rows
All these and other can be found with descriptions in MySQL manual
For MongoDB there seems to be profiler which is described in answers to this question How to find queries not using indexes or slow in mongodb

Tuning Jackrabbit data model (VERSION_BUNDLE table)

As part of our application, we are using Jackrabbit (1.6.4) to store documents. Each document that is retrieved by our application is put into a folder structure in Jackrabbit, which is created if not existing.
Our DBA has noticed that the following query is executed a lot against the Oracle (11.2.0.2.0) database holding the Jackrabbit schema - more than 50000 times per hour, causing a lot of IO on the database. In fact, it is one of the top 5 SQL statements in terms of IO over elapsed time (97% IO):
select BUNDLE_DATA from VERSION_BUNDLE where NODE_ID = :1
Taking a look at the database, one notices that this table initially only contains a single record, comprising the node_id (data type RAW) key with the DEADBEEFFACEBABECAFEBABECAFEBABE value and then a couple of bytes in the bundle_data BLOB column. Later on, more records are added with additional data.
The SQL for the table looks like this:
CREATE TABLE "VERSION_BUNDLE"
(
"NODE_ID" RAW(16) NOT NULL ENABLE,
"BUNDLE_DATA" BLOB NOT NULL ENABLE
);
I have the following questions:
Why is Jackrabbit accessing this table so frequently?
Any Jackrabbit tuning options to make this faster?
Is the BUNDLE_DATA value changed by Jackrabbit at all or is it just read for every access to the repository?
Is there any way to tune the database schema to make it deal better with this scenario?
Update: The table only initially contains one record, additional records are added over time as decided internally by Jackrabbit. The access still seems to be read-only for most of the cases, as insert or update statements are not reported as being run with a high number.

Is this physical i/o or logical? With the data being read that often I'd be surprised if the blocks are being aged out of the cache fast enough for physical i/o to be required.

If the JCR-Store is based within a Oracle database you could reorganize the underlying table.
Build a hash-cluster of that table to prevent index accesses
Check if you've licenses to use partitioning option
By deleting unnecessary versions in your application rows will got deleted (Version prune)
If you're storing binary objects like pictures, documents - just have also a look at VERSION_BINVAL.

Why is Jackrabbit accessing this table so frequently?
Then it's a sign that you're creating versions in your repository. Is that something which your application is supposed to do?
Any Jackrabbit tuning options to make this faster?
Not that I'm aware of ; one option to investigate is to upgrade to a more recent Jackrabbit version. Version 2.4.2 was just released and 1.6.4 is almost two years old. It's a possibility that there were performance improvements between these releases.
Is the BUNDLE_DATA value changed by Jackrabbit at all or is it just read for every access to the repository?
By the looks of it's the GUID of the root repository node.
Is there any way to tune the database schema to make it deal better with this scenario?
As far as I know the schema is auto-generated by Jackrabbit so the only options are to modify the table definition in a compatible way after it's been created. But that's a topic for a DBA, which I am not.

Why is Jackrabbit accessing this table so frequently?
We have seen that this table is accessed very often even if you are not asking for versions.
Take a look to this thread from Jackrabbit users mailing list

How to diagnose performance problems with SQL Server Views and JDBC

I have a view defined in SQL server 2008 that joins 4 tables together. Executing this view in SQL Server Management Studio takes roughly 3 seconds to run and returns about 45,000 records. My application is written in Java using hibernate to simply do a "from MyViewObject" query in HQL. When this is run, the execution time is consistently around 45 seconds. I have also tried simply using JDBC to run this query and received the same level of performance, so I've assumed it has nothing to do with hibernate.
My question: What can I do to diagnose this problem? There is obviously something different between how Management Studio is running the query vs how my application is running the query but I have not been able to come up with much.
The only thing I've come up with as a potentially viable explanation is an issue with the jtds library that contains the driver for SQL Server in Java.
Any guidance here would be greatly appreciated.
UPDATE
I went back to trying pure JDBC and tried adding the selectMethod and responseBuffering attributes to my connection string but didn't get any improvements. I also took my JDBC code from my application and ran it from a test program containing nothing but my JDBC code and it ran in the expected 3 seconds. So to me this seems environmental for the application.
My application is a Google Web Toolkit(GWT) based app, and the JDBC code is being run in my primary RPC Servlet. Essentially, the RPC method receives the call and immediately executes the JDBC code. Nothing in this setup gives me much indication of why the performance is terrible though. I am going to try the JDBC 3.0 driver and see if that works any better, but it doesn't feel like that will fix the issue to me quite yet.
My goal for the moment is to get my query working live with JDBC and then switch it back over to Hibernate so I can keep the testing simple enough. Thanks for the help so far!
UPDATE 2
I'm finally starting to zero in on the source of the problem, though still no idea what the actual issue is. I opened up the view in SQL Server and copied the SQL statement (rather large) exactly into my code and executed it using JDBC instead of pulling the data from the view and most of the performance issues are gone. It seems that some combination of GWT, SQL Server Views and JDBC is not working properly here. I don't see keeping a very large hand-written query in my code as a long term solution, but it does offer a bit more insight.

<property name="hibernate.show_sql">true</property>
setting this will show you the SQL query generated by hibernate. Analyze the query and make sure you are not missing a relationship.
reply for Update 1 and 2:
Like you mentioned, ran the query on your sql query and it seems like it is fast. So another thing to remember about hibernate is that it creates the object that is returned by your query (of course this depends if you initialize lazy obj. Dont remember what it is called). How many objects does your query return? also you can do a simple bench on where the issue is.
For example, before running the query, sysout the current time and then sysout the current time after. do these for all the places that you suspect is slowing your application down.

To analyze the problem you should look up you manual for tools that display the query or execution plan. Maybe you're missing an index on a join column.

something funny with embedded hsql

i'm just curious about something.I'm using hsql in myproject (embedded of course).At some time i felt the need to visualize what hibernate was generating.I took a free copy of dbvisualizer. here is the hsqljdbc.properties
jdbc.url=jdbc:hsqldb:file:mydb;create=true
hibernate hbm2ddl.auto=create
i downloaded the hsql 1.8.0_10. i did all the required procedure.i could connect and see the tables but after that changes made to the table don't seem be willing to show up.then i've tried to delete the db generate a new one but still.You got any idea in this?
I usually Derby but i've realized lately that it's not that precise about relationship management.I use mysql for the moment which is not good for development so i want to know if i forgot to do something or it's just meant to behave that way.Thanks for reading this

Using HSQLDB for development and testing is discussed in detail in the new Guide.
http://hsqldb.org/doc/2.0/guide/deployment-chapt.html#dec_app_dev_testing
HSQLDB uses a write delay mechanism by default and changes are flushed to disk after 10 seconds in version 1.8.x or 0.5 sec in version 2.0 and later.
You can force the database to shutdown and write all the changes when the last connection is closed with this URL:
jdbc.url=jdbc:hsqldb:file:mydb;shutdown=true
With HSQLDB 2.x you can use the write_delay property to force each commit to write to disk immediately:
jdbc.url=jdbc:hsqldb:file:mydb;hsqldb.write_delay=false
Version 2.2.9 and later persist the latest changes when the last connection is closed, so it may not be necessary to use hsqldb.write_delay=false for tests that close the connections.
With HSQLDB 1.8, you need to run an SQL command at the beginnig to do this:
SET WRITE_DELAY FALSE

By default, HSQLDB keeps table contents in memory until the database is shut down: http://www.hsqldb.org/doc/guide/ch05.html#N10DD6
Depending on your needs (eg, working in a development environment) this may be sufficient. For production, however, I'd rather use a DBMS that writes each change to disk in multiple places (which for my means Oracle, although MySQL probably works just as well).

Why don't you just set the show_sql property to true if you want to see what hibernate does?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.