First of all I know it's odd to rely on a manual vacuum from the application layer, but this is how we decided to run it.
I have the following stack :
HikariCP
JDBC
Postgres 11 in AWS
Now here is the problem. When we start fresh with brand new tables with autovacuum=off the manual vacuum is working fine. I can see the number of dead_tuples growing up to the threshold then going back to 0. The tables are being updated heavily in parallel connections (HOT is being used as well). At some point the number of dead rows is like 100k jumping up to the threshold and going back to 100k. The n_dead_tuples slowly creeps up.
Now the worst of all is that when you issue vacuum from a pg console ALL the dead tuples are cleaned, but oddly enough when the application is issuing vacuum it's successful, but partially cleans "threshold amount of records", but not all ?
Now I am pretty sure about the following:
Analyze is not running, nor auto-vacuum
There are no long running transactions
No replication is going on
These tables are "private"
Where is the difference between issuing a vacuum from the console with auto-commit on vs JDBC ? Why the vacuum issued from the console is cleaning ALL the tupples whereas the vacuum from the JDBC cleans it only partially ?
The JDBC vacuum is ran in a fresh connection from the pool with the default isolation level, yes there are updates going on in parallel, but this is the same as when a vacuum is executed from the console.
Is the connection from the pool somehow corrupted and can not see the updates? Is the ISOLATION the problem?
Visibility Map corruption?
Index referencing old tuples?
Side-note: I have observed that same behavior with autovacuum on and cost limit through the roof like 4000-8000 , threshold default + 5% . At first the n_dead_tuples is close to 0 for like 4-5 hours... The next day the table is 86gigs with milions of dead tuples. All the other tables are vacuumed and ok...
PS: I will try to log a vac verbose in the JDBC..
PS2: Because we are running in AWS could it be a backup that is causing it to stop cleaning ?
PS3: When refering to vaccum I mean simple vacuum, not full vacuum. We are not issuing full vacuum.
The main problem was that vacuum is run by another user. The vacuuming that I was seeing was the HOT updates + selects running over that data resulting in on-the-fly vacuum of the page.
Next: Vacuuming is affected by long running transactions ACROSS ALL schemas and tables. Yes, ALL schemas and tables. Changing to the correct user fixed the vacuum, but it will get ignored if there is an open_in_transaction in any other schema.table.
Work maintance memory helps, but in the end when the system is under heavy load all vacuuming is paused.
So we upgraded the DB's resources a bit and added a monitor to help us if there are any issues.
Related
Lot of SHOW TRANSACTION ISOLATION LEVEL appears in process list in Postgres 9.0 .
What are reasons for this and when it appears ?. All are in idle state.
How to disable this ?
I assume that with process list you mean the system view pg_stat_activity (accessible in pgAdmin III in the "statistics" tab or with "Tools/Server Status").
Since you say that the connections are idle, the query column does not show an active query, it shows the last query that has been issued in this database connection. I don't know which ORM or connection pooler you are using, but some software in your stack must insert these statements routinely at the end of a database action.
I wouldn't worry about them, this statements are not resource intensive and probably won't cause you any trouble.
If you really need to get rid of them, figure out which software in your stack causes them and investigate there.
So, we have already deploying an application, which consist a heavy business logic that my company uses. After some time, the performance was quite slower than before, actually in the weblogic data source configuration, we set the maximum connection to only 100, but recently it keeps on increasing until its limit.
We reconfigure the data source to 200, but it keeps on increasing, this is not ideal, because 100 is the max connection that we want it to be deployed.
Meanwhile, there were some thread stuck in the server too. But i think it's not the problem. Do someone knows why is this occuring so suddenly? (after implementation of a newer yet stable version, they said)
From the screenshot attached I can see that Active Connection Count is ~80. Also I can see connection are being leaked.
Enable Inactive Connection Timeout by defining some value (Based on avg time taken to execute statement).
Make sure that all JDBC connections are closed in your code after using it.
I have a problem with sleeping transaction locks on MS SQL Server 2008.
Sometimes it's not really sleeping or when transaction is not completed and there is real lock. But this is not my case.
We have tomcat 6 and java app on it. When I'm doing any update in java and then I'm trying to select updated records - no luck.
If I use with (nolock) - it helps, I see the correct changed rows, but then, after some period of time this updated is rolled back.
If I do update through Studio - then it works fine, immediate result.
Similar situation I have in many places of my application.
For example I have nightly job which is doing recalculation of big chunks of data into bitfields. And at the end it removes old bitfields and copies new bitfields inside. But when job is completed - rollback happens. I tried to save these bitfields manually in tmp tables and then I replaced old with new one through Studio - it worked.
All statements and connections are closed. I verified. And I tried to do all these changes in scope of one transaction.
This application was running for a long period of time without any troubles, but now something happened.
I tried to find out the reason and used sp_who, sp_who2, different scripts which shows locking queries, also I did massive monitoring with SQL server profiler and I tried to find any solution on the Internet. I found only
Error: 1222, Severity: 16, State: 18
which is the result of this problem, not the cause. May be I'm moving in wrong direction. For me it looks like something changed in SQL server configuration and now, for some reason, it holds connection and all changes, which were made in scope of it for ever. When this connection is killed - everything rolled back.
If you have any ideas I would appreciate it.
Thanks in advance for any help.
UPDATE: I've searched and found one article:https://support.microsoft.com/en-us/kb/137983
and there is an option like:
If the Windows NT Server computer has successfully closed the connection, but the client process still exists on the SQL Server as indicated by sp_who, then it may indicate a problem with SQL Server's connection management. In this case, you should work with your primary support provider to resolve this issue.
May be this is mine option. I will investigate it further.
We are running a Java EE web application in JBoss that is using PostgreSQL 8.0.9 as the database.
One page in the application runs a big and complicated query when it is loaded. We had a problem that manifested if a user requested this page and closed their browser window before the requested page was returned to the client. The problem was that the closing of the window would spawn a new PostgreSQL thread/process (viewable via top) and the new thread/process would take a long time to switch from SELECT to idle in the top output. If approximately 5 or more users did this (closed the browser window before the large complicated query page returned to the client) in a small window of time the spawned threads/processes were growing and not switching to idle (staying in SELECT) and consuming a lot of CPU, causing major performance problems. It is important to mention that if the users that closed the browser window logged out, the associated thread/process would switch to idle and the CPU use would decrease. It is also important to mention that if JBoss was restarted the applicable threads/processes would switch to idle (as all the users would be logged out by the restart).
The problem of the hanging threads/processes seems to have been resolved by a database backup and RESTORE. Now the new threads/processes that are spawned are switched from SELECT to idle in a generally short period of time and the CPU is not burdened by them as much. Also, performance on large complicated queries in general seems to have improved significantly since the RESTORE.
We run VACUUM every 24 hours on the database. We do not run REINDEX on the database because of data corruption risks. We do tend to have rather high await numbers on iostat outputs, especially in the performance problem cases described above.
What happens to a database when it is dumped and restored (ex. REINDEX, etc.)? Which one of these seems to be the key to our solution?
Is there a setting that manages the number of threads/processes that are spawned when browser windows are closed before a page with a large complicated query is returned to the client? Is there a setting to manage the transition of threads/processes like this from SELECT to idle? Is there away to manage either of these at the application level?
Version 8.0 is already EOL and version 8.0.9 hasn't been patched in a long time as well: 8.0.26 has been the last. You are missing many patches and should at least update to the latest 8.0-version, but also start a migration to a version that is still supported. Since version 8.2 and 8.3, performance has become much better.
Question: Why do you think REINDEX corrupts your data? Corruption of data would make this statement pretty useless... REINDEX is not something you would do every day, but sometimes you need it.
The underlying problem I want to solve is running a task that generates several temporary tables in MySQL, which need to stay around long enough to fetch results from Java after they are created. Because of the size of the data involved, the task must be completed in batches. Each batch is a call to a stored procedure called through JDBC. The entire process can take half an hour or more for a large data set.
To ensure access to the temporary tables, I run the entire task, start to finish, in a single Spring transaction with a TransactionCallbackWithoutResult. Otherwise, I could get a different connection that does not have access to the temporary tables (this would happen occasionally before I wrapped everything in a transaction).
This worked fine in my development environment. However, in production I got the following exception:
java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
This happened when a different task tried to access some of the same tables during the execution of my long running transaction. What confuses me is that the long running transaction only inserts or updates into temporary tables. All access to non-temporary tables are selects only. From what documentation I can find, the default Spring transaction isolation level should not cause MySQL to block in this case.
So my first question, is this the right approach? Can I ensure that I repeatedly get the same connection through a Hibernate template without a long running transaction?
If the long running transaction approach is the correct one, what should I check in terms of isolation levels? Is my understanding correct that the default isolation level in Spring/MySQL transactions should not lock tables that are only accessed through selects? What can I do to debug which tables are causing the conflict, and prevent those tables from being locked by the transaction?
I consider keeping transaction open for an extended time evil. During my career the definition of "extended" has descended from seconds to milli-seconds.
It is an unending source of non-repeatable problems and headscratching problems.
I would bite the bullet in this case and keep a 'work log' in software which you can replay in reverse to clean up if the batch fails.
When you say your table is temporary, is it transaction scoped? That might lead to other transactions (perhaps on a different transaction) not being able to see/access it. Perhaps a join involving a real table and a temporary table somehow locks the real table.
Root cause: Have you tried to use the MySQL tools to determine what is locking the connection? It might be something like next row locking. I don't know the MySQL tools that well, but on oracle you can see what connections are blocking other connections.
Transaction timeout: You should create a second connection pool/data source with a much longer timeout. Use that connection pool for your long running task. I think your production environment is 'trying' to help you out by detecting stuck connections.
As mentioned by Justin regarding Transaction timeout, I recently faced the problem in which the connection pool ( in my case tomcat dbcp in Tomcat 7), had setting which was supposed to mark the long running connections mark abandon and then close them. After tweaking those parameters I could avoid that issue.