We have a lot of long running processes in our Spring/Hibernate based application which span one single long running transaction.
Since we upgraded to MySQL 5.5. we often run into effects of Meta-Data Locking (see https://www.percona.com/blog/2013/02/01/implications-of-metadata-locking-changes-in-mysql-5-5/)
Before MySQL 5.5. we could easily do a ALTER TABLE ADD COLUMN even though a long running process was active. This was useful when we do staggered deployments (one server at a time until all servers are deployed with newest code)
With MySQL 5.5. this is not possible anymore, as the long transactions holds a Meta-data lock and we basically need to wait until all long running processes holding those locks have finished.
In other words: We cannot do DDL statements when there is at least one long running process running.
That basically means a downtime, which we would like to avoid.
Question: Which strategies do exist to handle that situation?
My idea would be:
have multiple sessionFactories:
a) SessionFactoryA (autocommit=true) for all tables which are not transactional (the ones we would like to touch with DDL statements)
b) SessionFactoryB (autocommit=false) for all other tables which should be transactional, even during long running processes.
Would that work? Which downsides are there? Any other ideas?
Related
Application is hosted on multiple Virtual Machines and DB is on single server. All VMs are pointing to single Instance of DB.
In this architecture, I have a table having very few record. But this table is accessed and updated by threads running on VMs very heavily. This is causing a performance bottleneck and sometimes record level exception. Database level locking does not seem to be best option as it is introducing significant delays in request processing.
Please suggest if there is any other technique to solve this problem.
Few questions first!
Is your application using connection pooling? If not, please use it. Creating a JDBC connection is expensive!
Is your application read heavy/write heavy?
What kind of storage engine are you using in your MySQL tables? InnoDB or MyISAM. If your application is write heavy, please use InnoDB based tables as it uses row level locking and will serve concurrent requests better.
One special case - if you are implementing queues on top of database tables, find a database that has a built-in queue operation and use that, or use a reliable messaging service. Building queues on top of databases is typically not efficient. See e.g. http://mikehadlow.blogspot.co.uk/2012/04/database-as-queue-anti-pattern.html
In general, running transactions on databases is slow because at the end of each transaction the database needs to be sure that enough has been saved out to disk that if the system died right now the changes made by the transaction would be safely preserved. If you don't need this you might find it faster to write a single non-database application that does what the database does but doesn't write anything out to disk, or still does database IO but does the minimum possible. Then instead of all of the VMs talking to the database directly they would all talk to this application.
I need to create a Java agent that can be aware and execute its instruccions as soon as any update for particular tables in a Mysql or Psql Database occurr.
Everything needs to be done automaticaly.
I was wondering given Im a novice in Java you guys could give me any advice..
My options are:
1) Having a trigger that after a commit could awake my java application. (using Pg_notify and others)
2) or Having the java application subscribed to a particular ID in a database (not sure if this can be done given asynchronous updates are not possible and I might need to have my agent asking xx second to the dabatase for changes)
Thanks!
Yes, a trigger that uses NOTIFY is a good way to do it in PostgreSQL. The important problem when using the JDBC driver is that there is no way to receive notifications asynchronously, you have to poll. This is usually fine as the NOTIFY/LISTEN mechanism is very light-weight: if you want to poll 10 (100?) times a second, then you can do so without causing performance problems. See http://jdbc.postgresql.org/documentation/83/listennotify.html for more.
MySQL is a little less helpful; you'll need to have triggers INSERT rows into a monitoring table and repeatedly poll that table with SELECT * (and then DELETE). This will work, but you are more likely to end up in a latency/performance trade-off.
We have J2EE application built using Hibernate and struts. We have RMI registry based implementation for business functionality.
In our application around 250 concurrent users are going to upload batches containing huge data named BATCHDET. These batches are first validated against 30 validation and then they are inserted to tables where we have parent and child relationship. Similar there are other operation which need huge processing. like printing etc.
There is one table containing 10 million record which gets accessed for all types of transactions and every process inserts and updates this table. This table has emerged as bottleneck. We have added all the required indexes as well.
After 30 minutes of run system JVM utilizes all the allocated 6GB of RAM and goes in no response state. When we tried to find out root cause we realized there was lock at database site and all the update queries related to BATCHDET table were in wait state. We tried everything which we could but no luck.
System run smooth when tried with 50 concurrent user but dies with 250 users which are expected. BATCHDET has lot of dependency on almost every module, not in mood to rewrite the implementation, could you please provide quick fix to it.
we have Thread based transaction demarcation at Hibernate implemented with HIbernateUtil.java. Transaction isolation is ReadCommitted. Is there any way where we can define no lock for all search operation. we have oracle 10G RDBMS.
Let me know if you need any other details.
~Amar
" Is there any way where we can define no lock for all search operation. we have oracle 10G RDBMS."
Oracle doesn't lock on selects, so in effect this is already in place.
Oracle also locks at a row level, so you need to stop thinking about the table as a whole and start thinking individual rows.
You need to talk with your DBA. There's a whole bunch of stuff to monitor in Oracle at both the system and session level. The DBA will be able to be able to look at v$session and tell you what the individual sessions are waiting on. There might be locks, it might be a disk bottle neck, it may be index contention, or it may be the database is sat there idle and all the inefficiency is in the java layer.
The underlying problem I want to solve is running a task that generates several temporary tables in MySQL, which need to stay around long enough to fetch results from Java after they are created. Because of the size of the data involved, the task must be completed in batches. Each batch is a call to a stored procedure called through JDBC. The entire process can take half an hour or more for a large data set.
To ensure access to the temporary tables, I run the entire task, start to finish, in a single Spring transaction with a TransactionCallbackWithoutResult. Otherwise, I could get a different connection that does not have access to the temporary tables (this would happen occasionally before I wrapped everything in a transaction).
This worked fine in my development environment. However, in production I got the following exception:
java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
This happened when a different task tried to access some of the same tables during the execution of my long running transaction. What confuses me is that the long running transaction only inserts or updates into temporary tables. All access to non-temporary tables are selects only. From what documentation I can find, the default Spring transaction isolation level should not cause MySQL to block in this case.
So my first question, is this the right approach? Can I ensure that I repeatedly get the same connection through a Hibernate template without a long running transaction?
If the long running transaction approach is the correct one, what should I check in terms of isolation levels? Is my understanding correct that the default isolation level in Spring/MySQL transactions should not lock tables that are only accessed through selects? What can I do to debug which tables are causing the conflict, and prevent those tables from being locked by the transaction?
I consider keeping transaction open for an extended time evil. During my career the definition of "extended" has descended from seconds to milli-seconds.
It is an unending source of non-repeatable problems and headscratching problems.
I would bite the bullet in this case and keep a 'work log' in software which you can replay in reverse to clean up if the batch fails.
When you say your table is temporary, is it transaction scoped? That might lead to other transactions (perhaps on a different transaction) not being able to see/access it. Perhaps a join involving a real table and a temporary table somehow locks the real table.
Root cause: Have you tried to use the MySQL tools to determine what is locking the connection? It might be something like next row locking. I don't know the MySQL tools that well, but on oracle you can see what connections are blocking other connections.
Transaction timeout: You should create a second connection pool/data source with a much longer timeout. Use that connection pool for your long running task. I think your production environment is 'trying' to help you out by detecting stuck connections.
As mentioned by Justin regarding Transaction timeout, I recently faced the problem in which the connection pool ( in my case tomcat dbcp in Tomcat 7), had setting which was supposed to mark the long running connections mark abandon and then close them. After tweaking those parameters I could avoid that issue.
I have this in mind:
On each server: (they all are set up identically)
A free database like MySQL or PostgreSQL.
Tomcat 6.x for hosting Servlet based Java applications
Hibernate 3.x as the ORM tool
Spring 2.5 for the business layer
Wicket 1.3.2 for the presentation layer
I place a load balancer in front of the servers and a replacement load balancer in case my primary load balancer goes down.
I use Terracotta to have the session information replicated between the servers. If a server goes down the user should be able to continue their work at another server, ideally as if nothing happened.
What is left to "solve" (as I haven't actually tested this and for example do not know what I should use as a load balancer) is the database replication which is needed.
If a user interacts with the application and the database changes, then that change must be replicated to the database servers on the other server machines. How should I go about doing that? Should I use MySQL PostgreSQL or something else (which ideally is free as we have a limited budget)? Does the other things above sound sensible?
Clarification: I cluster to get high availability first and foremost and I want to be able to add servers and use them all at the same time to get high scalability.
Since you're already using Terracotta, and you believe that a second DB is a good idea (agreed), you might consider expanding Terracotta's role. We have customers who use Terracotta for database replication. Here's a brief example/description but I think they have stopped supporting clients for this product.:
http://www.terracotta.org/web/display/orgsite/TCCS+Asynchronous+Data+Replication
You are trying to create a multi-master replication, which is a very bad idea, as any change to any database has to replicate to every other database. This is terribly slow - on one server you can get several hundred transactions per second using a couple of fast disks and RAID1 or RAID10. It can be much more if you have a good RAID controller with battery backed cache. If you add the overhead of communicating with all your servers, you'll get at most tens of transactions per second.
If you want high availability you should go for a warm standby solution, where you have a server, which is replicated but not used - when main server dies a replacement takes over. You can lose some recent transactions if your main server dies.
You can also go for one master, multiple slaves asynchronous replication. Every change to a database will have to be performed on one master server. But you can have several slave, read-only servers. Data on this slave servers can be several transactions behind the master so you can also lose some recent transactions in case of server death.
PostgreSQL does have both types of replication - warm standby using log shipping and one master, multiple slaves using slony.
Only if you will have a very small number of writes you can go for synchronous replication. This can also be set for PostgreSQL using PgPool-II or Sequoia.
Please read High Availability, Load Balancing, and Replication chapter in Postgres documentation for more.
For my (Perl-driven) website, I am using MySQL on two servers with database replication. Each MySQL server is slave and master at the same time. I did this for redudancy, not for performance, but the setup has worked fine for the past 3 years, we had almost no downtime at all during this period.
Regarding Kent's question / comment: I am using the standard replication that comes with MySQL.
Regarding the failover mechanism: I am using DNSMadeEasy.com's failover functionality. I have a Perl script run every 5 minutes via cron that checks if replication is still running (and also lots of other things such as server load, HDD sanity, RAM usage, etc.). During normal operation, the faster of the two servers delivers all web pages. If the script detects that something is wrong with the server (or if the server is just plain down), DNSMadeEasy switches DNS entries so that the secondary server becomes primary. Once the "real" primary server is back up, MySQL automatically catches up on missing database changes and DNSMadeEasy automatically switches back.
Here's an idea. Read Theo Schlossnagle's book Salable Internet Architectures.
What you're proposing is not a the best idea.
Load balancers are expensive and not as valuable as they would appear. Use something simpler for distributing the load between your servers (something like Wackamole).
Rather than fool around with DB replication, spend your money on a reliable DB server separate from your front-end web servers. Do regular backups and in the very unlikely event of DB failure, get back running as quickly as possible from ordinary backups.
AFAIK, MySQL does better job being scalable. See the documentation
http://dev.mysql.com/doc/mysql-ha-scalability/en/ha-overview.html
And there is a blog, where you can take a look at real life examples:
http://highscalability.com/tags/mysql