I have a java dynamic web app. I am exposing RESTful webservices for my android application.
The thing is that there are some services that do DB updates. Now, I want to host the application on public domain. I was wondering how parallel processing works on web hosting.
Say, my service /updateDB updates the database. Now, if there are two users who hit the same service at the same time, will the two of them run concurrently, because that will cause inconsistency in data. How exactly does the whole thing work.
Do I need to take care of synchronisation in my code?
Why kind of database are you using?
Certain database engines already have mechanisms in place to allow a transaction to be completed before another request over writes data. Most web developers do not have to worry about this because the application server (websphere, weblogic) and database (Mysql,Oracle) take care of these things for you.
(I am going to overly simplify this for you.)
A request to the webservice may perform one or more actions on the DB. These actions can be clumped together and be called a transaction. A transaction can include one or more of the following INSERT, UPDATE, DELETE etc. e.g A new customer registers for your webservice. the following actions take place which can be considered into a transaction.
Insert a new customer username password in the Customer table
Insert customers address in Address table
Update total customer count in Summary table
All the above actions can be completed as one transaction. If any of this fails then all actions will be reverted back automatically. Similarly if two customers registers simultaneously then the database will take care to not over write each other as well.
We can configure the database to make sure that every transaction should be completed before another transaction can dirty the data in a row.
In a database they are called ACID properties.
A - Atomicity - Every transaction must be complete, if anything in a transaction fails, then do not complete the transaction and also revert back every previous action within that transaction.
C - Consistency - make sure that every transaction that occurs will always update the database in a predefined manner. e.g. after every customer registration make sure that all the actions within it are executed
I - Isolation - if more than one request comes in, then they get executed on the database separately
D - Durabilty - after a transaction completes, the changes done should remain forever.
For example Mysql Database with the InnoDB engine supports this. There are other databases which support this as well.
You can read more here
http://java.dzone.com/articles/beginners-guide-acid-and
This is a very vast topic in databases.
Programming language have APIS which will help you write code in this manner. But the basic take away is that databases and applications servers will do most of the work for you. You just have to make sure to design the code structure to identify transactions and commit them appropriately).
Java and other programming languages are aware of ACID properties in DB and will help you achieve that goal.
Read more here about how you use Java to achieve things we mentioned above.
http://docs.oracle.com/javase/tutorial/jdbc/basics/transactions.html
Similarly other languages have similar functionality and APIs.
In google search for "java database transaction" or "<your favorite language>database transaction"
Related
I have a situation wherein as part of a online transaction, I have to save some data into other database, a slight latency (few seconds) in updating the other database is fine. Now since both databases are Oracle, I have below 3 options, I need some insight as which one is better.
Oracle Database Links: Wherein I convert the SQL into PL/SQL and make my database take care of writing into another Oracle based database for DEV env both the databases are in same server as different schema while in production they happen to be two separate ORACLE RACs separated by a few routers and switches.
Spring Batch: Use a batch job somehow to pick the transactions from my source database and process and write into another target database. This way my online transactions would not fail it other database ever goes down or hits a perf issue or face a network issue. And if they ever fail I can code for job restart ability. Is Spring batch well suited for such event publishing case? Would I hit any challenge in future?
2-Phase-Commit: I simply implement 2PC and save the data in both the database in a transaction. Or maybe make it look more future proof and save in a messaging system and my source database.
I am developing java application based on spring framework.
It
Connects to a MySQL database
Gets data from MySQLTable1 in POJOs
Manipulates (update,delete) it in memory
Inserts into a Netezza database table
The above 4 processes are done for each client (A,B,C) every hour.
I am using a spring JDBC template to get the data like this:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
and read each record into a POJO before I write it to a Netezza table.
There are going to be multiple instance of this application running every hour through a scheduler.
So Client A and Client B can be running concurrently but the SELECT will be unique,
I mean data for:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
will be different from
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='B' AND COL4='CONDITION'
But remember all of these are stored in memory as POJOs.
My questions are :
Is there a risk of data contamination?
Is there a need to implement database transaction using spring data transaction manager?
Does my application really need to use something like Spring Batch to deal with this?
I appreciate your thoughts and feedback.
I know this is a perfect scenario for using an ETL tool but that is out of scope.
Is there a risk of data contamination?
It depend on what you are doing with your data but I don't see how you can have data contamination if every instance is independant, you just have to make sure that every instances that run concurrently are not working on the same data (Client ID).
Is there a need to implement database transaction using spring data transaction manager?
You will probably need a transaction for insertion into the Netezza table. You certainly want your data to have a consistent state in the result table. If an error occur in the middle of the process, you'll probably want to rollback everything that was inserted before it failed. Regarding the transaction manager, you don't especially need the Spring transaction manager, but since you are using Spring it might be a good option.
Does my application really need to use something like Spring Batch to deal with this?
Does it really need it, probably not, but Spring Batch was made for those kind of application, so it might help you to structure your application (Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management). Everything can be made without the framework and it might be overkill to use it if you have a really small application. But at the end, if you need those features, you'll probably want to use it...
Spring Batch is ETL, so using it would be a good fit for this use case and also a good alternative to a commercial ETL tool.
Is there a risk of data contamination? Client A and B read separate data, so they can never interfere with each other by reading or writing the same data by accident. The risk would be if two clients with the same ID are created, but that is not the case.
Is there a need to implement database transaction using spring data transaction manager?
There is no mandatory need to do that, although programatic transaction management has many pitfalls and is best avoided. Spring Batch would manage transactions for you, as well as other aspects such as paging.
Does my application really need to use something like Spring Batch to deal with this? There is no mandatory need to do this, although it would help a lot, especially in the paging aspect. How will you handle queries that return thousands of rows? Without a framework this needs to be handled manually.
I have a Java web application which uses Hibernate for storing data into the database and retrieving them.
The strategy I am currently using is to load everything from the database on to the application at start up, and saving/updating them to the database as the user interacts with the application.
What I have also done is to keep track of Transaction history for each user as part of the business logic. (So this transaction history is all loaded on application start up).
The problem I can see is that I shouldn't load all the transaction history for all the user, because if there are a lot of the Transaction history, and users might not necessarily need to see them, then that could be a lot of memory being used up, so it is not efficient.
I was wondering if there is something similar to what PHP script can do, which is just query the database only when user request to see the transaction history, and so it is not using the server resource. (Asides from query the database) Or what are some suggestions/comments regards to what I am facing right now.
Thank you.
Query Hibernate when you need a given piece of information and let Hibernate manage putting it back to the database. This will allow Hibernate to manage the caching.
Note, that when using Hibernate, you should let Hibernate manage the data completely. Do not add or change data yourself using raw SQL.
If you are using a modern container, you should consider migrating to JPA as it is the standard in Java EE containers, allowing you to be more flexible when you need to scale. JPA is very close to Hibernate, but is an API, not an implementation, so you have more than one to choose from.
why not query hibernate for every request come in and release after response? This is a common approach.
My current set up is a single dedicated server with Java, hibernate app running on tomcat, apache http server, MYSQL.
I need to get a second server to share the load, but using the same database from the first server.
The backend processing(excluding db transaction) is time consuming, hence the second server for backend processing).
Will there be any unwanted consequences of this setup? Is this the optimal setup?
My apps do update/delete and has transaction control as follows:
beginTransaction();
getSession().save(obj);
//sessionFactory.openSession().save(obj);
commitTransaction()
As long as only one of the apps does database updates on a shared table you should be fine. What you definitely don't want to happen is:
app1: delete/update table1.record24
app2: delete/update table1.record24
because when Hibernate writes the records one of the processes will notice the data has changed and throw an error. And as a classic Heisenbug it's really difficult to reproduce.
When, on the other hand, the responsibilities are clearly separated (the apps share data for reading, but do not delete/update the same tables) it should be ok. Document that behavior though as a future upgrade may not take that into account.
EDIT 1:Answering comments
You overcome concurrency issues by design. For any given table:
Both apps may insert
Both apps may select
one of the apps may also update / delete in that table
Your frontend will probably insert into tables, and the backend can read those tables,
update rows where necessary, create new result rows, and delete rows as cleanup.
Alternatively, when the apps communicate, the frontend can transfer ownership of the records for a given task to the business backend, which gives the ownership back when finished. Make sure the hibernate cache is flushed (transaction is executed) and no hibernate objects of that task are in use before transferring ownership.
The trick of the game is to ensure that Hibernate will not attempt write records which are changed by the other app, as that will result in a StaleStateException.
And example of how I solved a similar problem:
app 1 receives data, and writes it in table1
app 2 reads table1, processes it, and writes/updates table2
app 2 deletes the processed records in table1
Note that app 1 only writes to the shared table. It also reads, writes and updates from other tables, but those tables are not accessed by app 2, so that's no problem.
It is a fairly common approach, both for failover and load balancing.
Here's a short article describing the setup:
http://raibledesigns.com/tomcat/
Beware of singletons in this setup.
I am not very familiar with databases and what they offer outside of the CRUD operations.
My research has led me to triggers. Basically it looks like triggers offer this type of functionality:
(from Wikipedia)
There are typically three triggering events that cause triggers to "fire":
INSERT event (as a new record is being inserted into the database).
UPDATE event (as a record is being changed).
DELETE event (as a record is being deleted).
My question is: is there some way I can be notified in Java (preferably including the data that changed) by the database when a record is Updated/Deleted/Inserted using some sort of trigger semantics?
What might be some alternate solutions to this problem? How can I listen to database events?
The main reason I want to do this is a scenario like this:
I have 5 client applications all in different processes/existing across different PCs. They all share a common database (Postgres in this case).
Lets say one client changes a record in the DB that all 5 of the clients are "interested" in. I am trying to think of ways for the clients to be "notified" of the change (preferably with the affected data attached) instead of them querying for the data at some interval.
Using Oracle you can setup a Trigger on a table and then have the trigger send a JMS message. Oracle has two different JMS implementations. You can then have a process that will 'listen' for the message using the JDBC Driver. I have used this method to push changes out to my application vs. polling.
If you are using a Java database (H2) you have additional options. In my current application (SIEM) I have triggers in H2 that publish change events using JMX.
Don't mix up the database (which contains the data), and events on that data.
Triggers are one way, but normally you will have a persistence layer in your application. This layer can choose to fire off events when certain things happen - say to a JMS topic.
Triggers are a last ditch thing, as you're operating on relational items then, rather than "events" on the data. (For example, an "update", could in reality map to a "company changed legal name" event) If you rely on the db, you'll have to map the inserts & updates back to real life events.... which you already knew about!
You can then layer other stuff on top of these notifications - like event stream processing - to find events that others are interested in.
James
Hmm. So you're using PostgreSQL and you want to "listen" for events and be "notified" when they occur?
http://www.postgresql.org/docs/8.3/static/sql-listen.html
http://www.postgresql.org/docs/8.3/static/sql-notify.html
Hope this helps!
Calling external processes from the database is very vendor specific.
Just off the top of my head:
SQLServer can call CLR programs from
triggers,
postgresql can call arbitrary C
functions loaded dynamically,
MySQL can call arbitrary C functions,
but they must be compiled in,
Sybase can make system calls if set
up to do so.
The simplest thing to do is to have the insert/update/delete triggers make an entry in some log table, and have your java program monitor that table. Good columns to have in your log table would be things like EVENT_CODE, LOG_DATETIME, and LOG_MSG.
Unless you require very high performance or need to handle 100Ks of records, that is probably sufficient.
I think you're confusing two things. They are both highly db vendor specific.
The first I shall call "triggers". I am sure there is at least one DB vendor who thinks triggers are different than this, but bear with me. A trigger is a server-side piece of code that can be attached to table. For instance, you could run a PSQL stored procedure on every update in table X. Some databases allow you to write these in real programming languages, others only in their variant of SQL. Triggers are typically reasonably fast and scalable.
The other I shall call "events". These are triggers that fire in the database that allow you to define an event handler in your client program. IE, any time there are updates to the clients database, fire updateClientsList in your program. For instance, using python and firebird see http://www.firebirdsql.org/devel/python/docs/3.3.0/beyond-python-db-api.html#database-event-notification
I believe the previous suggestion to use a monitor is an equivalent way to implement this using some other database. Maybe oracle? MSSQL Notification services, mentioned in another answer is another implementation of this as well.
I would go so far as to say you'd better REALLY know why you want the database to notify your client program, otherwise you should stick with server side triggers.
What you're asking completely depends on both the database you're using and the framework you're using to communicate with your database.
If you're using something like Hibernate as your persistence layer, it has a set of listeners and interceptors that you can use to monitor records going in and out of the database.
There are a few different techniques here depending on the database you're using. One idea is to poll the database (which I'm sure you're trying to avoid). Basically you could check for changes every so often.
Another solution (if you're using SQL Server 2005) is to use Notification Services, although this techonology is supposedly being replaced in SQL 2008 (we haven't seen a pure replacement yet, but Microsoft has talked about it publicly).
This is usually what the standard client/server application is for. If all inserts/updates/deletes go through the server application, which then modifies the database, then client applications can find out much easier what changes were made.
If you are using postgresql it has capability to listen notifications from JDBC client.
I would suggest using a timestamp column, last updated, together with possibly the user updating the record, and then let the clients check their local record timestamp against that of the persisted record.
The added complexity of adding a callback/trigger functionality is just not worth it in my opinion, unless supported by the database backend and the client library used, like for instance the notification services offered for SQL Server 2005 used together with ADO.NET.