I'm having an interesting issue with Spring JDBC. I have a simple SELECT query which just selects a single row from the database by its identifier. When I run this query through PgAdmin raw on the database it returns in approximately 500ms. When I run it through my Java stack it takes a full 20 seconds, and it does not get faster on subsequent runs.
This is the last layer of my code that hangs:
return jdbcTemplate.queryForObject(QUERY_SINGLE_COMPONENT, queryArguments, new ComponentRowMapper());
(my jdbcTemplate is a NamedParameterJdbcTemplate). I have run VisualVM on my code during this execution:
the PostgreSQL driver readMore is taking essentially all of the time. I'm using the PostgreSQL driver 42.0.0 but I tried updating to the latest version which did not help.
Other IDs in this same query return just fine, other queries I'm making through this same codebase work just fine. Its been driving me crazy for days and would appreciate any insight!
EDIT: I've managed to isolate the issue to our schema search path. We use a test schema to stage database changes. When test is in the schema search path it is slow, when test is removed it is fast. In both cases it is still slower than when I run it in PgAdmin by an order of magnitude.
Related
I am using Spring 4.1.6, and I have my service working fine with Hibernate. In the root of the project I've got my schema.sql which is being run every time I run the server. The problem is that first time I ran the server, I put some data in db, and when I restarted it, the script was executed again and I lost all that data that I loaded before restart.
So, I think that I have two options two solve this problem:
Edit sql script to execute all queries just in case they do not exist (which would be more laborious since I have to edit the script every time I export my db)
Tell hibernate, by some way, to execute sql script just in some cases. That would be great if there existed some config that executes the script just in case the data base doesn't exist.
Do you know if this is even possible? Thanks in advance.
It sounds like this is the perfect use-case for a tool called Liquibase. This is basically a version control tool for your database which allows you to define changes to your schema and/or data and ensures that these changes are only applied once.
It's incredibly useful if multiple people are changing the same database schema and ensures that your database is always valid for the version of the code that you have checked out/released etc.
I am developing a very large scale J2EE application and we chose to use Derby as an embedded database for junit testing since hitting the actual prod database will slow down our tests. When I bootstrap my application, the Derby DB will create all the tables so I can run JDBC queries against it. It works fine but the drawback is I cannot actually query any of the tables except through JDBC calls at runtime, so if I need to make changes to my queries, I need to stop the app, modiify my query statements, then restart the application and run in debug. This process makes it very difficult when it comes to analyzing complex queries. Does anyone know of some kind of Derby plugin that can help me to query the DB without doing it through my java code?
If you are using Maven for your build, you can use the derby-maven-plugin, which I wrote and is available on GitHub and via Maven Central. It will take care of starting and stopping the database for you before your tests. You will need to populate this database, yourself of course. You will also have the database in your target/derby folder after the tests execute, so you can always query the data yourself afterwards. This will help you work in a separate development environment which doesn't affect the production database.
You can check here for my answer to a similar question.
I am developping an application that tests different WebServices, and I want it to be as generic as possible. I need to populate database to do JUnit tests, but I don't want these changes to be commited.
I know that some in-memory databases like HSQL DB allow testing on a sort of a virtual (or mock) database, but unfortunately I use oracle and I cannot change it now because of my complex data tables structure.
What is the best practice you suggest?
Thanks.
First of all, HSQL and Hibernate aren't related in any way. The question is whether you can find an embedded database which supports the same SQL as your production database (or rather the subset of SQL which your application uses).
A good candidate for this is H2 database since it emulates a lot of different SQL flavours.
On top of that: Don't test the database. Assume that the database is tested thoroughly by your vendor and just works.
In my code, I aim for:
Save and load each entity.
Generate the SQL for all the queries that I use and compare them against String literals in tests (i.e. I don't run the queries against the database all the time).
Some tests look for a System property. If it's set, then they will run the queries against the database. This happens during the night on my CI server.
The rationale for this: As long as the DB schema doesn't change, there is no point to actually run the queries. That means running them during the day while I sit in front of the computer is a huge waste of time.
To make sure that "low impact" changes don't slip through the gaps, I let a computer run them when I don't care.
Along the same lines, I have mocks for many DAOs which return various predefined results, so I don't have to query the database. The rationale here is that I want to test the processing of results from the database, not the JDBC API, the DB driver, the OS's TCP/IP stack, the network hardware (and software), or any other of the 1000 things between my code and the database records on a harddisk somewhere.
More details in my blog: http://blog.pdark.de/2008/07/26/testing-with-databases/
I have a view defined in SQL server 2008 that joins 4 tables together. Executing this view in SQL Server Management Studio takes roughly 3 seconds to run and returns about 45,000 records. My application is written in Java using hibernate to simply do a "from MyViewObject" query in HQL. When this is run, the execution time is consistently around 45 seconds. I have also tried simply using JDBC to run this query and received the same level of performance, so I've assumed it has nothing to do with hibernate.
My question: What can I do to diagnose this problem? There is obviously something different between how Management Studio is running the query vs how my application is running the query but I have not been able to come up with much.
The only thing I've come up with as a potentially viable explanation is an issue with the jtds library that contains the driver for SQL Server in Java.
Any guidance here would be greatly appreciated.
UPDATE
I went back to trying pure JDBC and tried adding the selectMethod and responseBuffering attributes to my connection string but didn't get any improvements. I also took my JDBC code from my application and ran it from a test program containing nothing but my JDBC code and it ran in the expected 3 seconds. So to me this seems environmental for the application.
My application is a Google Web Toolkit(GWT) based app, and the JDBC code is being run in my primary RPC Servlet. Essentially, the RPC method receives the call and immediately executes the JDBC code. Nothing in this setup gives me much indication of why the performance is terrible though. I am going to try the JDBC 3.0 driver and see if that works any better, but it doesn't feel like that will fix the issue to me quite yet.
My goal for the moment is to get my query working live with JDBC and then switch it back over to Hibernate so I can keep the testing simple enough. Thanks for the help so far!
UPDATE 2
I'm finally starting to zero in on the source of the problem, though still no idea what the actual issue is. I opened up the view in SQL Server and copied the SQL statement (rather large) exactly into my code and executed it using JDBC instead of pulling the data from the view and most of the performance issues are gone. It seems that some combination of GWT, SQL Server Views and JDBC is not working properly here. I don't see keeping a very large hand-written query in my code as a long term solution, but it does offer a bit more insight.
<property name="hibernate.show_sql">true</property>
setting this will show you the SQL query generated by hibernate. Analyze the query and make sure you are not missing a relationship.
reply for Update 1 and 2:
Like you mentioned, ran the query on your sql query and it seems like it is fast. So another thing to remember about hibernate is that it creates the object that is returned by your query (of course this depends if you initialize lazy obj. Dont remember what it is called). How many objects does your query return? also you can do a simple bench on where the issue is.
For example, before running the query, sysout the current time and then sysout the current time after. do these for all the places that you suspect is slowing your application down.
To analyze the problem you should look up you manual for tools that display the query or execution plan. Maybe you're missing an index on a join column.
I'm running a query against a table in a postgresql database. The database is on a remote machine. The table has around 30 sub-tables using postgresql partitioning capability.
The query will return a large result set, something around 1.8 million rows.
In my code I use spring jdbc support, method JdbcTemplate.query, but my RowCallbackHandler is not being called.
My best guess is that the postgresql jdbc driver (I use version 8.3-603.jdbc4) is accumulating the result in memory before calling my code. I thought the fetchSize configuration could control this, but I tried it and nothing changes. I did this as postgresql manual recomended.
This query worked fine when I used Oracle XE. But I'm trying to migrate to postgresql because of the partitioning feature, which is not available in Oracle XE.
My environment:
Postgresql 8.3
Windows Server 2008 Enterprise 64-bit
JRE 1.6 64-bit
Spring 2.5.6
Postgresql JDBC Driver 8.3-603
In order to use a cursor to retrieve data you have to set the ResultSet type of ResultSet.TYPE_FORWARD_ONLY (the default) and autocommit to false in addition to setting a fetch size. That is referenced in the doc you linked to but you didn't explicitly mention that you did those steps.
Be careful with PostgreSQL's partitioning scheme. It really does very horrible things with the optimizer and can cause massive performance issues where there should not be (depending on specifics of your data). In any case, is your row only 1.8M rows? There is no reason that it would need to be partitioned based on size alone given that it is appropriately indexed.
I'm betting that there's not a single client of your app that needs 1.8M rows all at the same time. You should think of a sensible way to chunk the results into smaller pieces and give users the chance to iterate through them.
That's what Google does. When you do a search there might be millions of hits, but they return 25 pages at a time with the idea that you'll find what you want in the first page.
If it's not a client, and the results are being massaged in some way, I'd recommend letting the database crunch all those rows and simply return the result. It makes no sense to return 1.8M rows just to do a calculation on the middle tier.
If neither of those apply, you've got a real problem. Time to rethink it.
After reading the later responses it sounds to me like this is more of a reporting solution that ought to be crunched in batch or calculated in real time and stored in tables that are not part of your transactional system. There's no way that bringing 1.8M rows to the middle tier for calculating moving averages can scale.
I'd recommend reorienting yourself - start thinking about it as a reporting solution.
The fetchSize property worked as described at postgres manual.
My mistake was that I was setting auto commit = false to a connection from a connection pool that was not the connection being used by the prepared statement.
Thanks for all the feedback.
I did everything above, but I needed one last piece: be sure the call is wrapped in a transaction and set the transaction to read only, so that no rollback state is required.
I added this: #Transactional(readOnly = true)
Cheers.