We are trying to capture some transactions on [near] Real-Time occuring on the Core-database, in another remote database connected via VPN.
These transactions can be identified easily but we are facing challenge while deciding the workflow and identifying which technology to use.
For eg.
1.) Dumping CSV file every x seconds.
From the core system we create a CSV file every x seconds with the required information. We will then push/pull this file to the remote system and process it.
2.) Web Service
We will have 2 web services, one on the sender side & another on the reciever side.
Every x seconds the sender web service will execute a query and fetch records from the source database and push the data to reciever web service in batches of 'y' records.
The receiver will then process the records and send an acknowledgement for 'y' records.
Note.
1.) Ideally we would like to make the process Real-Time. Both the above ideas are [near] Real-Time and not Real-Time.
2.) The source database system is not specific. It can be oracle,ms-sql,mysql,sybase,informix etc.
3.) Remote target database is oracle.
Any ideas are most welcome and also the technology used can be flexible.
The main focus is on decreasing the load caused due to this process on the core-database.
Edit:
It is becoming more and more clear to me that getting actual Real time with heterogeneous database systems will be nearly impossible as the trigger/notify on insertion of records are RDBMS specific.
I would like to shift the focus of the question to get better near Real time ideas apart from the above 2 examples shared.
Also please note that we have little to no control over the source database & also the process/service which originally inserts the records in the database. We only have control over the records.
See this article for an example on how to listen for database changes (in this case a database trigger) in PostgreSQL. Basically you set up a function to handle the trigger that sends an event to all interested clients. You Application will then listen for this event and can start the sync whenever the trigger is executed. The example applies the trigger to new insertions on a specific table.
We have a situation where we have to perform a lengthy query to the database based on human input. As the input changes, the query has to be done over and over again, and the input may change once per second.
The problem is, we know that this will cause a spike in server activity for several seconds, and since it is not critical to have an answer immediately or on every input change, it means we can afford executing or not executing the query.
The criteria we would like to use is the current state of the database server, and only allow the query to be done if it is in a low or medium load state, skipping the query when the database server is under stress.
We use Oracle database for this, and so far we have not found any way, from Java, to do this except by actually loading into the server a known query and benchmark it, but that is essentially adding some load to the server. So my question: is there any other way, specifically in Oracle database, where we can discover from the Java side of the application the load of the database?
Depending on how you define "low or medium load state", I'd guess that hitting v$osstat would give you the information you're after Of course, hitting v$osstat constantly will also add to the load on the server. You may want to write a job that copies the v$osstat data to a table you control periodically (and can thus index appropriately) so that your application can hit that table rather than hitting the dynamic performance view constantly. Depending on the goal (i.e. are you trying to ensure that other users have enough resources or are you trying to ensure that your app remains responsive), you may want to use Resource Manager to control resource utilization among users, you may want to run the query asynchronously from the application, and/or you may want to use some sort of cache at the middle tier to avoid hitting the database every time.
I have not yet coded a potential solution to this, so before anyone asks I have zero code to back this up as I am trying to get a firm grip on the processing behind what needs to happen.
My problem is that I have an Oracle database that will be firing off jobs constantly (every 10 minutes or so) and I need a safe way (security wise and data integrity wise), to terminate these jobs and prevent them from executing while a nightly back up takes place. For the sake of discussion this will be done via a cron job. The way I think it should work is that the cron job will fire off at a time of 1 am (or some other low usage time, by low I mean < .001% of the user base will be interacting with the system). The java process will need to execute some PL/SQL function on the database that does the following things:
1) A force terminate on all running jobs
2) A snapshot of data that is to be written to an arbitrary directory
3) Restart all jobs (mark them as enabled instead of disabled)
My question is this:
How can this be accomplished with the minimum amount of permissions and does this loose architecture facilitate the ability to prevent data corruption, assuming Oracle is correctly generating undo/redo logs? IF this is an insecure/poor way of doing this, any other suggestions are appreciated.
In Oracle 10+ DBMS_Scheduler has a window definition that does exactly what you want. When the window ends, processing of running jobs can be terminated.
http://docs.oracle.com/cd/E14072_01/appdev.112/e10577/d_sched.htm
I have a swing desktop application that is installed on many desktops within a LAN. I have a mysql database that all of them talk to. At precisely 5 PM everyday, there is a thread that will wake up in each of these applications and try to back up files to a remote server. I would like to prevent all the desktop applications from doing the same thing.
The way I was thinking to do this was:
After waking up at 5PM , all the applications will try to write a row onto a MYSQL table. They will write the same information. Only 1 will succeed and the others will get a duplicate row exception. Whoever succeeds, then goes on to run the backup program.
My questions are:
Is this right way of doing things? Is there any better (easier) way?
I know we can do this using sockets as well. But I dont want to go down that route... too much of coding also I would need to ensure that all the systems can talk to each other first (ping)
Will mysql support such as a feature. My DB is INNO DB. So I am thinking it does. Typically I will have about 20-30 users in the LAN. Will this cause a huge overhead for the DB to handle.
If you could put an intermediate class in between the applications and the database that would queue up the results and allow them to proceed in an orderly manner you'd have it knocked.
It sounds like the applications all go directly against the database. You'll have to modify the applications to avoid this issue.
I have a lot of questions about the design:
Why are they all writing "the same row"? Aren't they writing information for their own individual instance?
Why would every one of them have exactly the same primary key? If there was an auto increment or timestamp you would't have this problem.
What's the isolation set to on the database connection? If it's set to SERIALIZABLE, you'll force each one to wait until the previous one is done, at the cost of performance.
Could you have them all write files to a common directory and pick them up later in an orderly way?
I'm just brainstorming now.
It seems you want to backup server data not client data.
I recommend to use a 3-tier architecture using Java EE.
You could use a Timer Service then to trigger the backup.
Though usually a backup program is an independent program e.g. started by a cron job on the server. But again: you'll need a server to do this properly, not just a shared folder.
Here is what I would suggest. Instead of having all clients wake up at the same time and trying to perform the backup, stagger the time at which they wake up.
So when a client wakes up
- It will check some table in your DB (MYSQL) to see if a back up job has completed or is running currently. If the job has completed, the client will go on with its normal duties. You can decide how to handle the case when the job is running.
- If the client finds that the back up job has not been run for the day, it will start the back up job. At the same time will modify the row to indicate that the back up job has started. Once the back up has completed the client will modify the table to indicate that the back up has completed.
This approach will prevent a spurt in network activity and can also provide a rudimentary form of failover. So if one client fails, another client at a later time can attempt the backup. (this is a bit more involved though. Basically it comes down to what a client should do when it sees that a back up job is on going).
I'm doing a school software project with my class mates in Java.
We store the info on a remote db.
When we start the application we pull all the information from the database and transform it into objects to use in our application (using java sql statemens).
In the application we edit some of these objects and then when we exit the application
we save or update information in the database using Hibernate.
As you see we dont use Hibernate for pulling in information, we use it just for saving and updating.
We have 2, but very similar problems.
The loading of object(when we start the app) and the saving of objects(with Hibernate) in the db(when closing the app) is taking too much time.
And our project its not a huge enterprise application, its a quite small app, we just manage some students, teachers, homeworks and tests. So our db is also very very small.
How could we increase performance ?
later edit: if we use a local database it runs very quick, it just runs slow on remote databases
Are you saying you are loading the entire database into memory and then manipulating it? If that is the case, why don't you instead simply use the database as a storage device, and do lookups and manipulation as necessary (using Hibernate if you like, or something else if you don't)? The key there is to make sure that you are using connection pooling, as that will reduce the connection time.
If this is what you are doing, then you could be running into memory issues as well - first, by not caching the entire database in memory, you will reduce memory and will spread out the network load from the beginning/end to the times when it needs to happen.
These 2 sentences are red flags for me :
When we start the application we pull
all the information from the database
and transform it into objects to use
in our application (using java sql
statemens). In the application we edit
some of these objects and then when we
exit the application we save or update
information in the database using
Hibernate.
Is there a requirements reason that you are loading all the information from the database into memory at startup, or why you're waiting until shutdown to save changes back in the database?
If not, I'd suggest a design change. If you've already got Hibernate mappings for the tables in the DB, I'd use Hibernate for both all of your CRUD (create, read, update, delete) operations. And, I'd only load the data that each page in your app needs, as it needs it.
If you can't make that kind of design change at this point, I think you've got to look closely at how you're managing the database connections. Are you using connection pools? Are you opening up multiple connections? Forgetting to release them?
Something else to look at. How are you using Hibernate to save the entities to the db? Are you doing a getHibernateTemplate().get on each one and then doing an entity.save or entity.update on each one? If so, that means you are also causing Hibernate to run a select query for each database object before it does a save or update. So, essentially, you'd be loading each database object twice (once at the beginning of the program, once before saving). To see if that's what's happening, you can turn on the show_sql property or use P6Spy to see exactly what queries Hibernate is running.
For what you are doing, you may very well be better off serializing your objects and writing them out to a flat file.
But, much more likely, you should just read / update objects directly from your database as needed instead of all at once, for all the reasons aperkins gives.
Also, consider what happens if your application crashes? If all of your updates are saved only in memory until the application is closed, everything would be lost if the app closes unexpectedly.
The difference in loading everything from a remote DB server versus loading everything from a local DB server is the network latency / pipe size. The network is a much smaller pipe than anything else. Two questions: first, how much data are we really talking about? Second, what is your network speed? 10/100/1000? Figure between 10 and 20% of your pipe size is going to be overhead due to everything from networking protocols to the actual queries themselves.
As others have stated, the way you've architected is usually high on the list of "don't do". When starting, pull only enough data to initialize the app. As the user works through it, pull what you need for that task.
The ONLY time you pull everything is when they are working in a disconnected state. In that case, you still don't load everything as objects in the application, you just work from a local data store which gets sync'ed with the remote server every so often.
The project its pretty much complete. we cant do large refactoring on it now.
I tried to use a second level cache for Hibernate when saving. EhCacheProvider.
in hibernate.xml:
net.sf.ehcache.hibernate.EhCacheProvider
i have done a config for the cache, ehcache.xml:
i have put the cache.jar in the project build path
and i have set the hibernate property for every class and set in the mapping.
But this cache doesn't seem to have an effect. I dont know if it works(if it is used).
Try minimising number of SQL queries, since every query has its own overhead.
You can enable database compression, which should speed things up when there is a lot of data.
Maybe you are connecting to the database many times?
Check the ping time of remote database server - it might be the problem.
As your application is just slow when running on a remote database server, I'd assume that the performance loss is due to:
Connecting to the server: try to reuse connections (pass the instance around) or use connection pooling
Query round-trip time: use as few queries as possible, see here in case of a hand-written DAL:
Preferred way of retrieving row with multiple relating rows
For hibernate you may use its batch functionality and adjust hibernate.batch_size.
In all cases, especially when you can't refactor larger parts of the codebase, use a profiler (method time or sql queries) to find the bottleneck. I bet you'll find thousands of queries, each taking 10ms RTT) which could be merged into one.
Some other things you can look into:
You can allocate more memory to the JVM
Use the jconsole tool to investigate what the bottlenecks are.
Why dont you have two separate threads?
Thread 1 will load your objects one by one.
Thread 2 will process objects as they are loaded.
Your app will seem more interactive at startup.
It never hurts to review the basics:
Improving speed means reducing time (obviously), and to do that, you find activities that take significant time but can be eliminated or replaced with something that uses less time. What I mean by activity is almost always a function call, method call, or property call, performed on a specific line of code for a specific purpose. If may invoke I/O or it may invoke computation, or both. If its purpose is not essential, then it can be optimized.
Many people use profilers to try to find these time-wasting lines of code, but most profilers miss the target because they look at functions, not lines, they go to sleep during I/O, and they worry about "self time".
Many more people try to guess what could be the problem, or they ask others to guess, such as by asking on SO. Such guesses, in the nature of guesses, are sometimes right - more often not, but people still invest time and resources in them.
There's a very simple way to find out for sure, without guessing, what could fruitfully be optimized, and here is one way to do it in Java.
Thanks for your answers. Their were more than helpful.
We completely solved this problem like so:
Refactored the LOAD code. Now it uses Hibernate with Lazy Fetching.
Refactored the SAVE code. Now it saves, just the data that was modified and right after the time it was modified. This way we dont have a HUGE save an the end.
Im amazed of how good it all went. The amount of new code we had to write was very very small.