A hobby project of mine is a Java web application. It's a simple web page with a form. The user fills out the form, submits, and is presented with some results.
The data is coming over a JDBC Connection. When the user submits, I validate the input, build a "CREATE ALIAS" statement, a "SELECT" statement, and a "DROP ALIAS" statement. I execute them and do whatever I need to do with the ResultSet from the query.
Due to an issue with the ALIASes on the particular database/JDBC combination I'm using, it's required that each time the query is run, these ALIASes are created with a unique name. I'm using an int to ensure this which gets incremented each and every time we go to the database.
So, my data access class looks a bit like:
private final static Connection connection = // initialized however
private static int uniqueInvocationNumber = 0;
public static Whatever getData(ValidatedQuery validatedQuery) {
String aliasName = "TEMPALIAS" + String.valueOf(uniqueInvocationNumber);
// build statements, execute statements, deal with results
uniqueInvocationNumber++;
}
This works. However, I've recently been made aware that I'm firmly stuck in Jon Skeet's phase 0 of threading knowledge ("Complete ignorance - ignore any possibility of problems.") - I've never written either threaded code or thread-aware code. I have absolutely no idea what can happen when many users are using the application at the same time.
So my question is, (assuming I haven't stumbled to thread-safety by blind luck / J2EE magic):
How can I make this safe?
I've included information here which I believe is relevant but let me know if it's not sufficient.
Thanks a million.
EDIT: This is a proper J2EE web application using the Wicket framework. I'm typically deploying it inside Jetty.
EDIT: A long story about the motivation for the ALIASes, for those interested:
The database in question is DB2 on AS400 (i5, System i, iSeries, whatever IBM are calling it these days) and I'm using jt400.
Although DB2 on AS400 is kind of like DB2 on any other platform, tables have a concept of a "member" because of legacy stuff. A member is kind of like a chunk of a table. The query I want to run is
SELECT thisField FROM thisTable(thisMember)
which treats thisMember as a table in its own right so just gives you thisField for all the rows in the member.
Now, queries such as this run fine in an interactive SQL session, but don't work over JDBC (I don't know why). The workaround I use is to do something like
CREATE ALIAS tempAlias FOR thisTable(thisMember)
then a
SELECT thisField FROM tempAlias
then a
DROP ALIAS tempAlias
which works but for one show-stopping issue: when you do this repeatedly with the ALIAS always called "tempAlias", and have a case where thisField has a different length from one query to the next, the result set comes back garbled for the second query (getString for the first row is fine, the next one has a certain number of spaces prepended, the next one the same number of spaces further prepended - this is from memory, but it's something like that).
Hence the workaround of ensuring each ALIAS has a distinct name which clears this up.
I've just realised (having spent the time to tap this explanation out) that I probably didn't spend enough time thinking about the issue in the first place before seizing on the workaround. Unfortunately I haven't yet fulfilled my dream of getting an AS400 for my bedroom ;) so I can't try anything new now.
Well, I'm going to ignore any SQL stuff for the moment and just concentrate on the uniqueInvocationNumber part. There are two problems here:
There's no guarantee that the thread will see the latest value at any particular point
The increment isn't atomic
The simplest way to fix this in Java is to use AtomicInteger:
private static final AtomicInteger uniqueInvocationNumber = new AtomicInteger();
public static Whatever getData(ValidatedQuery validatedQuery) {
String aliasName = "TEMPALIAS" + uniqueInvocationNumber.getAndIncrement()
// build statements, execute statements, deal with results
}
Note that this still assumes you're only running a single instance on a single server. For a home project that's probably a reasonable assumption :)
Another potential problem is sharing a single connection amongst different threads. Typically a better way of dealing with database connections is to use a connection pool, and "open/use/close" a connection where you need to (closing the connection in a finally block).
If that static variable and the incrementing of the unique invocation number is visible to all requests, I'd say that it's shared state that needs to be synchronized.
I know this doesn't answer your question but I would seriously consider re-implementing the feature so creating all those aliases isn't required. (Could you explain what kind of alias you're creating and why it's necessary?)
I understand this is just a hoby project, but consider putting on your 'to do list' to switch to using a connection pool. It's all part of the learning which I guess is part of your motivation for doing this project. Connection pools are the proper way to deal with multiple simultaneous users in a database backed web-app.
Related
I am attempting to make a website (using html, javascript and jsp) that sends modification and selection queries to a db at the same time. MySQL apparently doesn't like that (ConcurrectModificationExceptions everywhere).
I thought of creating something that receives sql statements concurrently and then orders them into a queue based on some property, then execute the queue one by one, after making sure that they don't contradict each other(an insert statement after on that deletes a table would contradict).
The problem is that I'm not sure how to check if two statements conflict. What I had in mind is checking what the tables would theoretically look like if the statements were executed (by running them on a duplicated table) and then if an error is thrown, a statement conflicts with another statement. But this means I have to duplicate the table many times, and I highly doubt it would work.
So, How can I check if two statements conflict?
For example:
String sql1 = "DELETE FROM users WHERE id=3625036";
String sql2 = "UPDATE users SET displayName=\\"FOO\\" WHERE id=3625036";
If these two are received concurrently and then ordered in some way, then sql2 might be executed after sql1 and that would throw an exception. How can I check for conflict in the given example?
MySQL like all full DB systems supports lots of concurrent operations, within normal transactional and locking restrictions. You're best off solving your particular problem by asking a question on stack overflow.
I think you shouldn't set students the task of managing queueing etc. The complexity of what you're describing is significant, and more importantly that's what database systems are for. They should be taught not to reinvent the wheel when they can make use of something that's far better than they can build. Unless you're specifically wanting to teach such low-level DB construction.
It should be a driver issue of the mysql try updating the mysql driver.
Another workaround is to implement the table level synchronization at your code.
example:
class UserDAO{
public void upateUsers(String sql){
synchronized(UserDAO.class){
// do update operations
}
}
public void deleteUser(String sql){
synchronized(UserDAO.class){
// do delete operations
}
}
}
I am making a Java gui and web application which will use the same mysql database.
It's a DTh management system where all the information will be stored and retrieved dynamically depending on input.
I believe that views are static by nature and thus would be useless as all my queries will have a different where condition (userid).
Do I need to use triggers? I mean I could code the java to execute multiple statements instead of using a inbuilt trigger (e.g. Insert in customers name and family members name both will have a duplicate copy for head of the family). Is there a performance hit? Am I wrong in some way?
And same thing what is the use of stored procedures? Can't I use methods in java to do everything?
So, I am asking is it possible to shift all the calculation intensive stuff to java and web script instead of the sql. If yes, does this mean I only have to create the backend structure of Database(i.e. all the different tables and FK,PK) and do rest without using any sql stuff on mysql workbench?
Thank you for helping.
There is (as always) one correct answer: It depends.
If you only want to show and query some data, you probably won't need trigger or stored procedures.
Views are a different thing: They are pretty helpful if you want a static viesw to a join-table or something like that. If you don't need this, just don't use it.
Keys are really important. They make your data robust against wrong input.
What you shoud use is PrepearedStatement instead of Statement. If you only use PreparedStatements, you are (nearly ?) safe in the question of SQL-Injection.
We use Views because it just faster than select query and for just showing data (not edit-update) it is faster and preferable.
Trigger are fired at database side so it is faster because it just execute 2 or more queries in single execution.
Same in Stored procedures, because we can execute more than one queries in single database connection. If we execute different queries than it take more time on every execution for database connection (find database server, authenticate, find database,... etc.).
For a thick-client project I'm working on, I have to remotely connect to a database (IBM i-series) and perfom a number of SQL related tasks:
Download/Update a set of local/offline 'control' data - this data may have changed between runs unnoticed.
On command, download data from multiple (15-20) tables and store separately into a single Java object. The names of the tables are known, but the schema name changes between runs and can change inter-run (as far as I know, PreparedStatements do not allow one to dynamically insert the schema).
I had considered using joins/unions/etc to perform all of these queries as one, but the project requires me to have in-memory separations between table data (instead of one big joined lump).
Perform between 2 and 100+ repetitions of (2)
The last factor is that this needs to be run on high-latency (potentially dial-up) network connections using Java 1.5 on the oldest computers possible.
Currently I run 15-20 dynamically constructed PreparedStatements but I know this to be rather inefficient (I measured, so as to avoid premature optimization ala Knuth).
What would be the most efficient and error-tolerant method of performing these tasks?
My thoughts:
Regarding (1), I really have no idea other than checking the entire table against the new table, at which point I feel I might as well just download the new (potentially and likely unchanged) table and replace the old one, but this takes more time.
For (2): Ideally I'd be able to construct something similar to an array of SELECT statements, send them all at once, and have the database return one ResultSet per internal query. From what I understand, however, neither Statement nor PreparedStatement support returning multiple ResultSet objects.
Lastly, the best way I can think of doing (3) is to batch a number of (2) operations.
There is nothing special about having moving requirements, but the single most important thing to use when talking to most databases is having a connection pool in your Java application and use it properly.
This also applies here. The IBM i DB2/400 database is quite fast, and the database driver available in the jt400 project (type 4, no native code) is quite good, so you can pull over quite a bit of data in a short while simply by generating SQL on the fly.
Note that if you only have a single schema you can tell in the conneciton which one you need, and can then use non-qualified table names in your SQL statements. Read the JDBC properties in the InfoCenter very carefully - it is a bit tricky to get right. If you need multiple schemaes, the "naming=system" allows for library lists - i.e. a list of schemaes to look for the tables, which can be very useful when done correctly. The IBM i folks can help you here.
That said, if the connection is the limiting factor, you might have a very strong case for running the "create object from tables" Java code directly on the IBM i. You should already now prepare for being able to measure the traffic to the database - either with network monitoring tooling, using p6spy or simply going through a proxy (perhaps even a throtteling one)
Ideally, you would have the database group provide you with a set of stored procedures to optimize the access to the database.
Since you don't have access, you may want to ask them if they have timestamp data in the database at the row level to see when records were modified, this way you can select only the data that's changed since some point in time.
What #ThorbjørnRavnAndersen is suggesting is moving the database code on to the IBM host and connecting to it via RMI or JMS from the client. So the server code would be a RMI or JMS Server that accesses the database on your behalf and returns you java objects instead of bringing SQL resultsets across the wire.
I would pass along your requirements to the database team and see if they can't do something for you. I'm sure they don't want all these remote clients bringing all the data down each time, so it would benefit them as much as it would benefit you.
I have to go through a database and modify it according to a logic. The problem looks something like this. I have a history table in my database and I have to modify.
Before modifying anything I have to look at whether an object (which has several rows in the history table) had a certain state, say 4 or 9. If it had state 4 or 9 then I have to check the rows between the currently found row and the next state 4 or 9 row. If such a row (between those states) has a specific value in a specific column then I do something in the next row. I hope this is simple enough to give you an idea. I have to do this check for all the objects. Keep in mind that any object can be modified anywhere in its life cycle (of course until it reaches a final state).
I am using a SQL Sever 2005 and Hibernate. AFAIK I can not do such a complicated check in Transact SQL! So what would you recommend for me to do? So far I have been thinking on doing it as JUnit test. This would have the advantage of having Hibernate to help me do the modifications and I would have Java for lists and other data structures I might need and don't exist in SQL. If I am doing it as a JUnit test I am not loosing my mapping files!
I am curious what approaches would you use?
I think you should be able to use cursors to manage the complicated checks in SQL Server. You didn't mention how frequently you need to do this, but if this is a one-time thing, you can either do it in Java or SQL Server, depending on your comfort level.
If this check needs to be applied on every CRUD operation, perhaps database trigger is the way to go. If the logic may change frequently over the time, I would much rather writing the checks in Hibernate assuming no one will hit the database directly.
Let's presume that you are writing an application for a retail store chain. So, you would design your object model such that you would define 'Store' as the core business object and lots of supporting objects. Let's say 'Store' looks like follows:
class Store implements Validatable{
int storeNo;
int storeName;
... etc....
}
So, your client tells you that you have to import store schedule from a excel sheet into the application and you would have to run a series of validations on 'em. For instance, 'StoreIsInSameCountry';'StoreIsValid'... etc. So, you would design a Rule interface for checking all business conditions. Something like this:
interface Rule T extends Validatable> {
public Error check(T value) throws Exception;
}
Now, here comes the question. I am uploading 2000 stores from this excel sheet. So, I would end up running each rule defined for a store that many times. If I were to have 4 rules = 8000 queries to the database, i.e, 16000 hits to the connection pool. For a simple check where I would just have to check whether the store exists or not, the query would be:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID = ?
That way I would obtain get my 'Store' object. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
Alternatively, I could just do:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID in (1,2,3..... )
This query would actually return much faster than doing the one above it 2000 times.
However, it doesn't go well with the design that a Rule can be run for a single store only.
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
What would you do if you were in my shoes, and what is the best practice?
That way I would obtain get my 'Store' object from the database. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
This is what you should not do.
Create a temporary table, fill the table with your values and JOIN this table, like this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM temptable tt
JOIN STORE s
ON s.STORE_ID = t.id
or this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM STORE s
WHERE s.STORE_ID IN
(
SELECT id
FROM temptable tt
)
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
IN filters duplicates out.
If you want each eligible row to be selected for each duplicate value in the list, use JOIN.
IN is in no way a "not suggested methology".
In fact, there was a time when some databases did not support IN queries effciently, that's why folk wisdom still advices against using it.
But if your store_id is indexed properly (and it most probably is, if it's a PRIMARY KEY which it looks like), then all modern versions of major databases (that is Oracle, SQL Server, MySQL and PostgreSQL) will use an efficient plan to perform this query.
See this article in my blog for performance details in SQL Server:
IN vs. JOIN vs. EXISTS
Note, that in a properly designed database, validation rules are also set-based.
I. e. you implement your validation rules as queries against the temptable.
However, to support legacy rules, you can select values from temptable row-by-agonizing-row, apply the rules, and delete values which did not pass validation.
SELECT store_id FROM store WHERE store_active = 1
or even
SELECT store_id FROM store
will tell you all the active stores in a single query. You can now conduct the other tests on stores you know to exist, and you've saved yourself 1,999 hits to the database.
If you've got relatively uncontested database access, and no time constraint on how long the whole thing is going to take then you've no real need to worry about hitting the connection pool over and over again. That's what it's designed for, after all!
I think it's more of a business question with parameter of how often does the client run the import, how long would it take for you to implement either of the solution, and how expensive is your time per hour.
If it's something that runs once in a while, a bit of bad performance is acceptable in my opinion, especially if you can get the job done quick using clean code.
...a Rule can be run for a single store only.
Managing business rules along with performance is a tricky task, so there is a library ("Persistence Layer") that does exactly that. You define rules, then execute a bulk of commands, then the library fetch from DB whatever the rules require in a single query (by using temp tables rather than 'IN') and then passes it to the rules.
There is an example of a validator in here.