MS SQL Server sleeping transactions lock - java

I have a problem with sleeping transaction locks on MS SQL Server 2008.
Sometimes it's not really sleeping or when transaction is not completed and there is real lock. But this is not my case.
We have tomcat 6 and java app on it. When I'm doing any update in java and then I'm trying to select updated records - no luck.
If I use with (nolock) - it helps, I see the correct changed rows, but then, after some period of time this updated is rolled back.
If I do update through Studio - then it works fine, immediate result.
Similar situation I have in many places of my application.
For example I have nightly job which is doing recalculation of big chunks of data into bitfields. And at the end it removes old bitfields and copies new bitfields inside. But when job is completed - rollback happens. I tried to save these bitfields manually in tmp tables and then I replaced old with new one through Studio - it worked.
All statements and connections are closed. I verified. And I tried to do all these changes in scope of one transaction.
This application was running for a long period of time without any troubles, but now something happened.
I tried to find out the reason and used sp_who, sp_who2, different scripts which shows locking queries, also I did massive monitoring with SQL server profiler and I tried to find any solution on the Internet. I found only
Error: 1222, Severity: 16, State: 18
which is the result of this problem, not the cause. May be I'm moving in wrong direction. For me it looks like something changed in SQL server configuration and now, for some reason, it holds connection and all changes, which were made in scope of it for ever. When this connection is killed - everything rolled back.
If you have any ideas I would appreciate it.
Thanks in advance for any help.
UPDATE: I've searched and found one article:https://support.microsoft.com/en-us/kb/137983
and there is an option like:
If the Windows NT Server computer has successfully closed the connection, but the client process still exists on the SQL Server as indicated by sp_who, then it may indicate a problem with SQL Server's connection management. In this case, you should work with your primary support provider to resolve this issue.
May be this is mine option. I will investigate it further.

Related

Postgres vacuum/demon partially working when issued from JDBC

First of all I know it's odd to rely on a manual vacuum from the application layer, but this is how we decided to run it.
I have the following stack :
HikariCP
JDBC
Postgres 11 in AWS
Now here is the problem. When we start fresh with brand new tables with autovacuum=off the manual vacuum is working fine. I can see the number of dead_tuples growing up to the threshold then going back to 0. The tables are being updated heavily in parallel connections (HOT is being used as well). At some point the number of dead rows is like 100k jumping up to the threshold and going back to 100k. The n_dead_tuples slowly creeps up.
Now the worst of all is that when you issue vacuum from a pg console ALL the dead tuples are cleaned, but oddly enough when the application is issuing vacuum it's successful, but partially cleans "threshold amount of records", but not all ?
Now I am pretty sure about the following:
Analyze is not running, nor auto-vacuum
There are no long running transactions
No replication is going on
These tables are "private"
Where is the difference between issuing a vacuum from the console with auto-commit on vs JDBC ? Why the vacuum issued from the console is cleaning ALL the tupples whereas the vacuum from the JDBC cleans it only partially ?
The JDBC vacuum is ran in a fresh connection from the pool with the default isolation level, yes there are updates going on in parallel, but this is the same as when a vacuum is executed from the console.
Is the connection from the pool somehow corrupted and can not see the updates? Is the ISOLATION the problem?
Visibility Map corruption?
Index referencing old tuples?
Side-note: I have observed that same behavior with autovacuum on and cost limit through the roof like 4000-8000 , threshold default + 5% . At first the n_dead_tuples is close to 0 for like 4-5 hours... The next day the table is 86gigs with milions of dead tuples. All the other tables are vacuumed and ok...
PS: I will try to log a vac verbose in the JDBC..
PS2: Because we are running in AWS could it be a backup that is causing it to stop cleaning ?
PS3: When refering to vaccum I mean simple vacuum, not full vacuum. We are not issuing full vacuum.
The main problem was that vacuum is run by another user. The vacuuming that I was seeing was the HOT updates + selects running over that data resulting in on-the-fly vacuum of the page.
Next: Vacuuming is affected by long running transactions ACROSS ALL schemas and tables. Yes, ALL schemas and tables. Changing to the correct user fixed the vacuum, but it will get ignored if there is an open_in_transaction in any other schema.table.
Work maintance memory helps, but in the end when the system is under heavy load all vacuuming is paused.
So we upgraded the DB's resources a bit and added a monitor to help us if there are any issues.

H2 database hanging on specific query

We have an application storing data in a local H2 database (file mode). Everything works fine except for a single query that's executed on application shutdown. The query is issued to the H2, but never returns (no exception is thrown).
As far as I know, this only appears on a single workstation (feature still in test not in production).
When using the database of that workstation on my own workstation the application stops right there waiting for the query to return. So with this specific database it is reproducible.
When opening the database in an external tool (DbVisualizer Pro if it matters) and issuing the same query (in particular I used explain analyze <query> to not modify the data in the database) that query runs forever, too. The query looks as follows:
DELETE TOP(1000) FROM my_schema.SOMETABLE ST WHERE ST.someDate < '2019-05-23'
The problem is not directly tied to the shown date as it happend the same yesterday (where the date was 2019-05-22).
Strange thing is, when I stop the query execution and modify the date the query will work as expected (also with explain analyze so no data is modified). If I switch back to the original date it works as well.
When after that "trick" I start the application the query in question works like a charm. So I guess it must have something to do with the actual state of the particular database.
My question is: How to find out whats wrong with the database file?
I've already tried to do this "health check", but this reveals no problems.
Side note: "Running forever" here means I killed the application process after waiting for some 20 minutes, but I guess that time should be enough for deleting 16 of 18 entries in that specific table.

Used jdbc connections seem to be leaking and I cannot figure out why

I have been fighting with this issue for ages and I cannot for the life of me figure out what the problem is. Let me set the stage for the stack we are using:
Web-based Java 8 application
GWT
Hibernate 4.3.11
MySQL
MongoDB
Spring
Tomcat 8 (incl Tomcat connection pooling instead of C3PO, for example)
Hibernate Search / Lucene
Terracotta and EhCache
The problem is that every couple of days (sometimes every second day, sometimes once every 10 days, it varies) in the early hours of the morning, our application "locks up". To clarify, it does not crash, you just cannot log in or do anything for that matter. All background tasks - everything - just halt. If we attempt to login when it is in this state, we can see in our log file that it is authenticating us as a valid user, but no response is ever sent so the application just "spins".
The only pattern we have found to date related to when these "lock ups" occur is that it happens when our morning scheduled tasks or SAP imports are running. It is not always the same process that is running though, sometimes the lock up happens during one of our SAP imports and sometimes during internal scheduled task execution. All that these things have in common are that they run outside of business hours (between 1am and 6am) and that they are quite intensive processes.
We are using JavaMelody for monitoring and what we see every time is that starting at different times in this 1 - 6am window, the number of used jdbc connections just start to spike (as per the attached image). Once that starts, it is just a matter of time before the lock up occurs and the only way to solve it is to bounce Tomcat thereby restarting the application.
As for as I can tell, memory, CPU, etc, are all fine when the lock up occurs the only thing that looks like it has an issue is the constantly increasing number of used jdbc connections.
I have checked the code for our transaction management so many times to ensure that transactions are being closed off correctly (the transaction management code is quite old fashioned: explicit begin and commit in try block, rollback in catch blocks and entity manager close in a finally block). It all seems correct to me so I am really, really stumped. In addition to this, I have also recently explicitly configured the Hibernate connection release mode properly to after_transaction, but the issue still occurs.
The other weird thing is that we run several instances of the same application for different clients and this issue only happens regularly for one client. They are our client with by far the most data to be processed though and although all clients run these scheduled tasks, this big client is the only one with SAP imports. That is why I originally thought that the SAP imports were the issue, but it locked up just after 1am this morning and that was a couple hours before the imports even start running. In this case it locked up during an internal scheduled task executing.
Does anyone have any idea what could be causing this strange behavior? I have looked into everything I can think of but to no avail.
After some time and a lot of trial and error, my team and I managed to sort out this issue. Turns out that the spike in JDBC connections was not the cause of the lock-ups but was instead a consequence of the lock-ups. Apache Terracotta was the culprit. It was just becoming unresponsive it seems. It might have been a resource allocation issue, but I don't think so since this was happening on servers that were low usage as well and they had more than enough resources available.
Fortunately we actually no longer needed Terracotta so I removed it. As I said in the question, we were getting these lock-ups every couples of days - at least once per week, every week. Since removing it we have had no such lock-ups for 4 months and counting. So if anyone else experiences the same issue and you are using Terracotta, try dropping it and things might come right, as they did in my case.
As said by coladict, you need to look at "Opened jdbc connections" page in the javamelody monitoring report and before the server "locks up".
Sorry if you need to do that at 2h or 3h in the morning, but perhaps you can run a wget command automatically in the night.

Hanging PostgreSQL Processes

We are running a Java EE web application in JBoss that is using PostgreSQL 8.0.9 as the database.
One page in the application runs a big and complicated query when it is loaded. We had a problem that manifested if a user requested this page and closed their browser window before the requested page was returned to the client. The problem was that the closing of the window would spawn a new PostgreSQL thread/process (viewable via top) and the new thread/process would take a long time to switch from SELECT to idle in the top output. If approximately 5 or more users did this (closed the browser window before the large complicated query page returned to the client) in a small window of time the spawned threads/processes were growing and not switching to idle (staying in SELECT) and consuming a lot of CPU, causing major performance problems. It is important to mention that if the users that closed the browser window logged out, the associated thread/process would switch to idle and the CPU use would decrease. It is also important to mention that if JBoss was restarted the applicable threads/processes would switch to idle (as all the users would be logged out by the restart).
The problem of the hanging threads/processes seems to have been resolved by a database backup and RESTORE. Now the new threads/processes that are spawned are switched from SELECT to idle in a generally short period of time and the CPU is not burdened by them as much. Also, performance on large complicated queries in general seems to have improved significantly since the RESTORE.
We run VACUUM every 24 hours on the database. We do not run REINDEX on the database because of data corruption risks. We do tend to have rather high await numbers on iostat outputs, especially in the performance problem cases described above.
What happens to a database when it is dumped and restored (ex. REINDEX, etc.)? Which one of these seems to be the key to our solution?
Is there a setting that manages the number of threads/processes that are spawned when browser windows are closed before a page with a large complicated query is returned to the client? Is there a setting to manage the transition of threads/processes like this from SELECT to idle? Is there away to manage either of these at the application level?
Version 8.0 is already EOL and version 8.0.9 hasn't been patched in a long time as well: 8.0.26 has been the last. You are missing many patches and should at least update to the latest 8.0-version, but also start a migration to a version that is still supported. Since version 8.2 and 8.3, performance has become much better.
Question: Why do you think REINDEX corrupts your data? Corruption of data would make this statement pretty useless... REINDEX is not something you would do every day, but sometimes you need it.

Stored proc running 30% slower through Java versus running directly on database

I'm using Java 1.6, JTDS 1.2.2 (also just tried 1.2.4 to no avail) and SQL Server 2005 to create a CallableStatement to run a stored procedure (with no parameters). I am seeing the Java wrapper running the same stored procedure 30% slower than using SQL Server Management Studio. I've run the MS SQL profiler and there is little difference in I/O between the two processes, so I don't think it's related to query plan caching.
The stored proc takes no arguments and returns no data. It uses a server-side cursor to calculate the values that are needed to populate a table.
I can't see how the calling a stored proc from Java should add a 30% overhead, surely it's just a pipe to the database that SQL is sent down and then the database executes it....Could the database be giving the Java app a different query plan??
I've posted to both the MSDN forums, and the sourceforge JTDS forums (topic: "stored proc slower in JTDS than direct in DB") I was wondering if anyone has any suggestions as to why this might be happening?
Thanks in advance,
-James
(N.B. Fear not, I will collate any answers I get in other forums together here once I find the solution)
Java code snippet:
sLogger.info("Preparing call...");
stmt = mCon.prepareCall("SP_WB200_POPULATE_TABLE_limited_rows");
sLogger.info("Call prepared. Executing procedure...");
stmt.executeQuery();
sLogger.info("Procedure complete.");
I have run sql profiler, and found the following:
Java app :
CPU: 466,514 Reads: 142,478,387 Writes: 284,078 Duration: 983,796
SSMS :
CPU: 466,973 Reads: 142,440,401 Writes: 280,244 Duration: 769,851
(Both with DBCC DROPCLEANBUFFERS run prior to profiling, and both produce the correct number of rows)
So my conclusion is that they both execute the same reads and writes, it's just that the way they are doing it is different, what do you guys think?
It turns out that the query plans are significantly different for the different clients (the Java client is updating an index during an insert that isn't in the faster SQL client, also, the way it is executing joins is different (nested loops Vs. gather streams, nested loops Vs index scans, argh!)). Quite why this is, I don't know yet (I'll re-post when I do get to the bottom of it)
Epilogue
I couldn't get this to work properly. I tried homogenising the connection properties (arithabort, ansi_nulls etc) between the Java and Mgmt studio clients. It ended up the two different clients had very similar query/execution plans (but still with different actual plan_ids). I posted a summary of what I found to the MSDN SQL Server forums as I found differing performance not just between a JDBC client and management studio, but also between Microsoft's own command line client, SQLCMD, I also checked some more radical things like network traffic too, or wrapping the stored proc inside another stored proc, just for grins.
I have a feeling the problem lies somewhere in the way the cursor was being executed, and it was somehow giving rise to the Java process being suspended, but why a different client should give rise to this different locking/waiting behaviour when nothing else is running and the same execution plan is in operation is a little beyond my skills (I'm no DBA!).
As a result, I have decided that 4 days is enough of anyone's time to waste on something like this, so I will grudgingly code around it (if I'm honest, the stored procedure needed re-coding to be more incremental instead of re-calculating all data each week anyway), and chalk this one down to experience. I'll leave the question open, big thanks to everyone who put their hat in the ring, it was all useful, and if anyone comes up with anything further, I'd love to hear some more options...and if anyone finds this post as a result of seeing this behaviour in their own environments, then hopefully there's some pointers here that you can try yourself, and hope fully see further than we did.
I'm ready for my weekend now!
-James
You can attach the Profiler and monitor for the events SQL:BatchCompleted and SP:Completed, with a filter on duration > 1000. Run the procedure from your Java client and from SSMS. Compare the Reads and the Writes of the two events (Java vs. SSMS). Are they significantly different? This would indicate considerably different execution paths or plans, with significant difference in I/O.
Also try to capture the Showplan XML event of the two and compare the plans (save the event as a .sqlplan file, open it in SSMS to easy analysis). Do they have similar plans? Are there wild differences in Estimate vs. Actual (rows, rewinds, rebinds)? Do they have same degree of parallelism? The plans can aso be retrieved from sys.dm_exec_requests view.
Are there any warning events raised, like Missing Column Statistics, Sort Warnings, Hash Warning, Execution Warnings, Blocked Process?
the point is that you have at your disposal a whole arsenal of investigation tools. Once you find the root cause of the difference, you can trace it down to what is different between your Java environment settings and the SSMS environment (ADO.Net SqlClient). Things like default transaction isolation level, ANSI settings etc etc.
Checking: Is your problem that two applications (SSMS, Java) are making the exact same identical call to SQL Server, and SQL Server is acting differently for each? If so, I hit things like this every year or two, and they hurt my brain for days.
Once, I ultimately isolated each process call and logging everything for the entire process in Profiler. I eventually noticed that the Login event (under TextData) showed a host of information, like so:
-- network protocol: TCP/IP
set quoted_identifier on
set arithabort off
set numeric_roundabort off
set ansi_warnings on
set ansi_padding on
set ansi_nulls on
set concat_null_yields_null on
set cursor_close_on_commit off
set implicit_transactions off
set language us_english
set dateformat mdy
set datefirst 7
set transaction isolation level read committed
The "Existing Connection" event will show this information as well--but, sometimes immediately subsequent calls (batches, RPCs, I disremember just now) are sent [ISQL or OSQL did this, I think] to immediately reset some of these -- Arithabort and Quoted_Identifier seem to be favorites, and other SET options also get modified depending on the settings or requirements of whatever connectivity protocols your application's database interface is using.
Another one: some settings are kept as attributes of a procedure at "create" time, and others are factored in at compile time. On the one hand, your connection's SET values may be being overwritten by the configuration saved at the time the procedure was created; on the other hand, your two connections may differ so much that two execution plans are generated for one procedure. (All of this information is, after sufficient research, available in the sys. tables and DMVs.)
In short, it seems to me that SQL obscurities are messing you up. To this day, I loathe all these goombah settings. Things below my notice keep messing around with them [I mean, really, what fool would set implicit_transaction for a connection pool on? But once they did...] and it's hard to build structures when the ground (rules) keep changing out from underneath you. After all, remember what the guy said about building castles in a swamp...
I recall having a similar issue a while ago, because JTDS was silently converting a string parameter to Unicode or something similar. As a result of that conversion, SQL Server was unable to use the index which is was using when we ran the stored proc from SSMS.
HIH
Does the Java case include transmission of the results to the Java server (network overhead) plus some Java processing? A 12 minute query might produce quite a large amount of data.
If you are looking at the profiler and there is no difference between the executions then the difference must be with the client systems.
4 mins does seem like to long just to prepare a statement to send so the 12 min wait must cause some other effect -- no idea what it is.
I am not sure if this post is still relevant. We faced a similar problem in our application.
One key difference between running a stored procedure in SQL Management studio and one running from JDBC is that of transaction context. If you are using an ORM in Java, by default the stored procedure runs in a transaction context. When you run a stored procedure directly in SQL management studio the transaction is off. There is a substantial performance difference.
Sorry, I've not found a correct answer to this, so I don't want to allocate any of these as correct, so I am going to mark this answer as correct, and wish anyone luck who comes across anything similar!
Did you know that Microsoft ship JDBC drivers for their databases?
These may be more performant.
Obviously.. you may have resolved the problem by now.

Categories

Resources