I am trying to create a program that updates 2 different tables using sql commands. The only thing I am worried about is that if the program updates one of the tables and then loses connection or whatever and does NOT update the other table there could be an issue. Is there a way I could either
A. Update them at the exact same time
or
B. Revert the first update if the second one fails.
Yes use a SQL transaction. Here is the tutorial:JDBC Transactions
Depending on the database, I'd suggest using a stored procedure or function based on the operations involved. They're supported by:
MySQL
Oracle
SQL Server
PostgreSQL
These encapsulate a database transaction (atomic in nature -- it either happens, or it doesn't at all), without the extra weight of sending the queries over the line to the database... Because they already exist on the database, the queries are parameterized (safe from SQL injection attacks) which means less data is sent -- only the parameter values.
Most SQL servers support transactions, that is, queueing up a set of actions and then having them happen atomically. To do this, you wrap your queries as such:
START TRANSACTION;
*do stuff*
COMMIT;
You can consult your server's documentation for more information about what additional features it supports. For example, here is a more detailed discussion of transactions in MySQL.
Related
I have a application which needs to aware of latest number of some records from a table from database, the solution should be applicable without changing the database code or add triggers or functions to it ,so I need a database vendor independent solution.
My program written in java but database could be (SQLite,MySQL,PostgreSQL or MSSQL),for now I'm doing Like that:
In a separate thread that is set as a daemon my application sends a simple command through JDBC to database to be aware of latest number of the records with condition:
while(true){
SELECT COUNT(*) FROM Mytable WHERE exited='1'
}
and this sort of coding causes DATABASE To lock,slows down the whole system and generates huge DB Logs which finally brings down the whole thing!
how can i do it in a right way to always have latest number of certain records or only counting when the number changed?
A SELECT statement should not -- by itself -- have the behavior that you are describing. For instance, nothing is logged with a SELECT. Now, it is possible that concurrent insert/update/delete statements are going on, and that these cause problems because the SELECT locks the table.
Two general things you can do:
Be sure that the comparison is of the same type. So, if exited is a number, do not use single quotes (mixing of types can confuse some databases).
Create an index on (exited). In basically all databases, this is a single command: create index idx_mytable_exited on mytable(exited).
If locking and concurrent transactions are an issue, then you will need to do more database specific things, to avoid that problem.
As others have said, make sure that exited is indexed.
Also, you can set the transaction isolation on your query to do a "dirty read"; this indicates to the database server that you do not need to wait for other processes' transactions to commit, and instead you wish to read the current value of exited on rows that are being updated by those other processes.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED is the standard syntax for using "dirty read".
Which way is better for saving log of data access in table for transactional database ?
Using trigger or using manual insert in table?
Manual means writing sqlQuery for inserting log of program in table.
Auditing of this kind is mostly done via triggers. The main reasons are:
Developers will not forget calling it, as it would happen if there is a separate insert need to be fired
A simple bug would not cause the second insert to fail and leave the previous operation non-audited
The auditing cannot be intentionally left out, it is really controlled by the owner of the DB
The extra network round-trip + query parsing required by the second insert is not a small matter. For basic operations the actual time-cost of these are significant.
On the other hand the only downside of this solution is the extra logic that is now on DB side. By default developers tend to leave as little logic live in the DB as possible (which is normally a good idea), but in this case I think it is not a valid argument. This is not business logic, it is an organic part of your DB. The data about "who accessed and what data" is still data, and belongs to the database.
I explain better my question since from the title it could be not very clear, but I didn't find a way to summarize the problem in few work. Basically I have a web application whose DB have 5 tables. 3 of these are managed using JPA and eclipselink implementation. The other 2 tables are manager directly with SQL using the package java.sql. When I say "managed" I mean just that query, insertion, deletion and updates are performed in two different way.
Now the problem is that I have to monitor the response time of each call to the DB. In order to do this I have a library that use aspects and at runtime I can monitor the execution time of any code snippet. Now the question is, if I want to monitor the response time of a DB request (let's suppose the DB in remote, so the response time will include also network latency, but actually this is fine), what are in the two distinct case described above the instructions whose execution time has to be considered.
I make an example in order to be more clear.
Suppose tha case of using JPA and execute a DB update. I have the following code:
EntityManagerFactory emf = Persistence.createEntityManagerFactory(persistenceUnit);
EntityManager em=emf.createEntityManager();
EntityToPersist e=new EntityToPersist();
em.persist(e);
Actually it is correct to suppose that only the em.persist(e) instruction connects and make a request to the DB?
The same for what concern using java.sql:
Connection c=dataSource.getConnection();
Statement statement = c.createStatement();
statement.executeUpdate(stm);
statement.close();
c.close();
In this case it is correct to suppose that only the statement.executeUpdate(stm) connect and make a request to the DB?
If it could be useful to know, actually the remote DBMS is mysql.
I try to search on the web, but it is a particular problem and I'm not sure about what to look for in order to find a solution without reading the JPA or java.sql full specification.
Please if you have any question or if there is something that is not clear from my description, don't hesitate to ask me.
Thank you a lot in advance.
In JPA (so also in EcliplseLink) you have to differentiate from SELECT queries (that do not need any transaction) and queries that change the data (DELETE, CREATE, UPDATE: all these need a transacion). When you select data, then it is enough the measure the time of Query.getResultList() (and calls alike). For the other operations (EntityManager.persist() or merge() or remove()) there is a mechanism of flushing, which basically forces the queue of queries (or a single query) from the cache to hit the database. The question is when is the EntityManager flushed: usually on transaction commit or when you call EntityManager.flush(). And here again another question: when is the transaction commit: and the answer is: it depends on your connection setup (if autocommit is true or not), but a very correct setup is with autocommit=false and when you begin and commit your transactions in your code.
When working with statement.executeUpdate(stm) it is enough to measure only such calls.
PS: usually you do not connect directly to any database, as that is done by a pool (even if you work with a DataSource), which simply gives you a already established connection, but that again depends on your setup.
PS2: for EclipseLink probably the most correct way would be to take a look in the source code in order to find when the internal flush is made and to measure that part.
For a thick-client project I'm working on, I have to remotely connect to a database (IBM i-series) and perfom a number of SQL related tasks:
Download/Update a set of local/offline 'control' data - this data may have changed between runs unnoticed.
On command, download data from multiple (15-20) tables and store separately into a single Java object. The names of the tables are known, but the schema name changes between runs and can change inter-run (as far as I know, PreparedStatements do not allow one to dynamically insert the schema).
I had considered using joins/unions/etc to perform all of these queries as one, but the project requires me to have in-memory separations between table data (instead of one big joined lump).
Perform between 2 and 100+ repetitions of (2)
The last factor is that this needs to be run on high-latency (potentially dial-up) network connections using Java 1.5 on the oldest computers possible.
Currently I run 15-20 dynamically constructed PreparedStatements but I know this to be rather inefficient (I measured, so as to avoid premature optimization ala Knuth).
What would be the most efficient and error-tolerant method of performing these tasks?
My thoughts:
Regarding (1), I really have no idea other than checking the entire table against the new table, at which point I feel I might as well just download the new (potentially and likely unchanged) table and replace the old one, but this takes more time.
For (2): Ideally I'd be able to construct something similar to an array of SELECT statements, send them all at once, and have the database return one ResultSet per internal query. From what I understand, however, neither Statement nor PreparedStatement support returning multiple ResultSet objects.
Lastly, the best way I can think of doing (3) is to batch a number of (2) operations.
There is nothing special about having moving requirements, but the single most important thing to use when talking to most databases is having a connection pool in your Java application and use it properly.
This also applies here. The IBM i DB2/400 database is quite fast, and the database driver available in the jt400 project (type 4, no native code) is quite good, so you can pull over quite a bit of data in a short while simply by generating SQL on the fly.
Note that if you only have a single schema you can tell in the conneciton which one you need, and can then use non-qualified table names in your SQL statements. Read the JDBC properties in the InfoCenter very carefully - it is a bit tricky to get right. If you need multiple schemaes, the "naming=system" allows for library lists - i.e. a list of schemaes to look for the tables, which can be very useful when done correctly. The IBM i folks can help you here.
That said, if the connection is the limiting factor, you might have a very strong case for running the "create object from tables" Java code directly on the IBM i. You should already now prepare for being able to measure the traffic to the database - either with network monitoring tooling, using p6spy or simply going through a proxy (perhaps even a throtteling one)
Ideally, you would have the database group provide you with a set of stored procedures to optimize the access to the database.
Since you don't have access, you may want to ask them if they have timestamp data in the database at the row level to see when records were modified, this way you can select only the data that's changed since some point in time.
What #ThorbjørnRavnAndersen is suggesting is moving the database code on to the IBM host and connecting to it via RMI or JMS from the client. So the server code would be a RMI or JMS Server that accesses the database on your behalf and returns you java objects instead of bringing SQL resultsets across the wire.
I would pass along your requirements to the database team and see if they can't do something for you. I'm sure they don't want all these remote clients bringing all the data down each time, so it would benefit them as much as it would benefit you.
I've been looking around trying to determine some Hibernate behavior that I'm unsure about. In a scenario where Hibernate batching is properly set up, will it only ever use multiple insert statements when a batch is sent? Is it not possible to use a DB independent multi-insert statement?
I guess I'm trying to determine if I actually have the batching set up correctly. I see the multiple insert statements but then I also see the line "Executing batch size: 25."
There's a lot of code I could post but I'm trying to keep this general. So, my questions are:
1) What can you read in the logs to be certain that batching is being used?
2) Is it possible to make Hibernate use a multi-row insert versus multiple insert statements?
Hibernate uses multiple insert statements (one per entity to insert), but sends them to the database in batch mode (using Statement.addBatch() and Statement.executeBatch()). This is the reason you're seeing multiple insert statements in the log, but also "Executing batch size: 25".
The use of batched statements greatly reduces the number of roundtrips to the database, and I would be surprised if it were less efficient than executing a single statement with multiple inserts. Moreover, it also allows mixing updates and inserts, for example, in a single database call.
I'm pretty sure it's not possible to make Hibernate use multi-row inserts, but I'm also pretty sure it would be useless.
I know that this is an old question but i had the same problem that i thought that hibernate batching means that hibernate would combine multiple inserts into one statement which it doesn't seem to do.
After some testing i found this answer that a batch of multiple inserts is just as good as a multi-row insert. I did a test inserting 1000 rows one time using hibernate batch and one time without. Both tests took about 20s so there was no performace gain in using hibernate batch.
To be sure i tried using the rewriteBatchedStatements option from the MySQL Connector/J which actually combines multiple inserts into one statement. It reduced the time to insert 1000 records down to 3s.
So after all hibernate batch seems to be useless and a real multi-row insert to be much better. Am i doing something wrong or what causes my test results?
The Oracle bulk insert collect an array of entyty and pass in a single block to the db associating to it a unic ciclic insert/update/delete.
Is unic way to speed network throughput .
Oracle suggest to do it calling a stored procedure from hibernate passing it an array of datas.
http://biemond.blogspot.it/2012/03/oracle-bulk-insert-or-select-from-java.html?m=1
Is not only a software problem but infrastructural!
Problem is network data flow optimization and TCP stack fragmentation.
Mysql have function.
You have to do something like what is described in this article.
Normal transfer on network the correct volume of data is the solution
You have also to verify network mtu and Oracle sdu/tdu utilization respect data transferred between application and database