database polling using Java - java

I am stuck at some point wherein I need to get database changes in a Java code. Request is to get any record updated, added, deleted in any table of db; should be recognized by Java program. How could it be implemented JMS? or a Java thread?
Update: Thanks guys for your support i am actually using Oracle as DB and Weblogic 10.3 workshop. Actually I want to get the updates from a table in which I have only read permission so guys what do you all suggest. I can't update the DB. Only thing I can do is just read the DB and if there is any change in the table I have to get the information/notification that certain data rows has been added/deleted or updated.

Unless the database can send a message to Java, you'll have to have a thread that polls.
A better, more efficient model would be one that fires events on changes. A database that has Java running inside (e.g., Oracle) could do it.

We do it by polling the DB using an EJB timer task. In essence, we have a status filed which we update when we have processed that row.
So the EJB timer thread calls a procedure that grabs rows which are flagged "un-treated".
Dirty, but also very simple and robust. Especially, after a crash or something, it can still pick up from where it crashed without too much complexity.
The disadvantage is the wasted load on the DB, and also response time will be limited (probably requires seconds).

We have accomplished this in our firm by adding triggers to database tables that call an executable to issue a Tib Rendezvous message, which is received by all interested Java applications.
However, the ideal way to do this IMHO is to be in complete control of all database writes at the application level, and to notify any interested parties at this point (via multi-cast, Tib, etc). In reality this isn't always possible where you have a number of disparate systems.

You're indeed dependent on whether the database in question supports it. You'll also need to take the overhead into account. Lot of inserts/updates also means a lot of notifications and your Java code has to handle them consistently, else it will bubble up.
If the datamodel allows it, just add an extra column which holds a timestamp which get updated on every insert/update. Most major DB's supports an auto-update of the column on every insert/update. I don't know which DB server you're using, so I'll give only a MySQL-targeted example:
CREATE TABLE mytable (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
somevalue VARCHAR(255) NOT NULL,
lastupdate TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
INDEX (lastupdate)
)
This way you don't need to worry about inserting/updating the lastupdate yourself. You can just do an INSERT INTO mytable (somevalue) VALUES (?) or UPDATE mytable SET somevalue = ? WHERE id = ? and the DB will do the magic.
After ensuring that the DB server's time and Java application's time are the same, you can just fire a background thread (using either Timer with TimerTask, or ScheduledExecutorService with Runnable or Callable) which does roughly this:
Date now = new Date();
statement = connection.prepareStatement("SELECT id FROM mytable WHERE lastupdate BETWEEN ? AND ?");
statement.setDate(1, this.lastTimeChecked);
statement.setDate(2, now);
resultSet = statement.executeQuery();
while (resultSet.next()) {
// Handle accordingly.
}
this.lastTimeChecked = now;
Update: as per the question update it turns out that you have no control over the DB. Well, then you don't have much good/efficient options. Either just refresh the entire list in Java memory with entire data from DB without checking/comparing for changes (probably the fastest way), or dynamically generate a SQL query based on the current data which excludes the current data from the results.

I assume that you're talking about a situation where anything can update a table. If for some reason you're instead talking about a situation where only the Java application will be updating the table that's different. If you're using Java only you can put this code in your DAO or EJB doing the update (it's much cleaner than using a trigger in this case).

An alternative way to do this is to funnel all database calls through a web service API, or perhaps a JMS API, which does the actual database calls. Processes could register there to get a notification of a database update.

We have a similar requirement. In our case we have a legacy system that we do not want to adversely impact performance on the existing transaction table.
Here's my proposal:
A new work table with pk to transaction and insert timestamp
A new audit table that has same columns as transaction table + audit columns
Trigger on transaction table to dump all insert/update/deletes to an audit table
Java process to poll the work table, join to the audit table, publish the event in question and delete from the work table.
Question is: What do you use for polling? Is quartz overkill? How can you scale back the polling frequency based on the current DB load?

Related

How much can I trust OracleDB's ROWID in a long run?

I am working on a small POC using Spring Boot and OracleDB.
The situation is :
While application startup, I load few properties (some data) from the DB in the cache. There are going to be frequent request where I will be needing this data, hence I decided to cache it. The data in the DB will rarely change. Only once in a while some one can insert/delete/update a couple of rows using the sql script. While it changes, I have implemented Oracle's DB change notification to send notification to the spring boot service that some data has changed and the data in the cache is is now in the stale state.
In the notification event, I only get the ROWID pseudocolumn which can be used to point to what portion of data from the db is different from the cache that I have. To be on the safer side, I have decided to cache ROWIDs to map the data in cache and data object in the notification event that DB sends me. While working for a couple of days, I have found out that the ROWID doesn't change but how much shall I trust this non-changing behavior of the ROWIDs in the long run or in the production environment?
Few Scenarios explained for clarification:
Cache will reload itself every time the server restarts. Therefore, data change while the server is down situation is out of picture.
I am (up until the poc) getting every insert/update/delete notification in the made in the db using the sql query/script.
Example of event.toString() for reference:
Connection information : local=view-localhost/127.0.0.1:47632, remote=view-localhost/127.0.0.1:57117
Registration ID : 1201
Notification version : 1
Event type : QUERYCHANGE
Database name : orcl
Query Change Description (length=1)
query ID=41, query change event type=QUERYCHANGE
Table Change Description (length=1): operation=[INSERT], tableName=SYSTEM.PRODUCT, objectNumber=73323
Row Change Description (length=1):
ROW: operation=INSERT, ROWID=AAAR5rAABAAAbHZAAA
Assuming your table does not have row_movement enabled (check dba_tables).
You need to be careful of deletes then inserts - these will logically give a row a new rowid (it’s a completely new row after all).
You also will need to be aware of table moves, this is an intensive operation that requires indexes are rebuilt anyway so is unlikely to happen without much notice.
Otherwise, a row will keep it’s rowid.

Change summary after executing SQL query

I am trying to log a “change summary” from each INSERT/UPDATE MySQL/SQL Server query that executes in a Java program. For example, let’s say I have the following query:
Connection con = ...
PreparedStatement ps = con.prepareStatement(“INSERT INTO cars (color, brand) VALUES (?, ?)”);
ps.setString(1, “red”);
ps.setString(2, “toyota”);
ps.executeUpdate();
I want to build a “change set“ from this query so I know that one row was inserted into the cars table with the values color=red and brand=toyota.
Ideally, I would like MySQL/SQL Server to tell me this information as that would be the most accurate. I want to avoid using a Java SQL parser because I may have queries with “IF EXISTS BEGIN ELSE END”, in which case I would want to know what was the final query that was inserted/updated.
I only want to track INSERT/UPDATE queries. Is this possible?
What ORM do you use? If you don't use one, now could be the time to start - you give the impression that you have all these prepared statement scattered throughout the code, which is something that needs improving anyway.
Using something like Hibernate means you can just activate its logging and keep the query/parameter data. It might also make you focus your data later a bit more (if it's a bit haphazardly structured right now).
If you're not willing to switch to using an ORM consider creating your own class, perhaps called LoggingPreparedStatement, that is identical to normal PreparedStatement (subclass or wrapper of PreparedStatement such that it uses all the same method names etc so it's a drop in replacement) and logs whatever you want. Use find/replace across the code base to switch to using it.
As an alternative to doing it on the client side, you can get the database to do It. For SQL server it has change tracking, don't know what there is for MySQL but it'll be something proprietary. For something consistent, most DB have triggers that have some mechanism of identifying old and new data and you can stash this in a history table(s) to see what was changed and when. Triggers that keep history have a regularity to their code that means they can be programmatically generated from a list of the table columns and datatypes, so you can query the db for the column names (most db have some virtual tables that tell you info about the real tables) etc and generate your triggers in code and (re)apply them whenever schema changes. The advantage of using triggers is that they really easily identify the data that was changed. The disadvantage is that this is all they can see so if you want your trigger to know more you have to add that info to the table or the session so the trigger can access it - stuff like who ran the query, what the query was. If you're not willing to add useless columns to a table (and indeed, why should you) you can rename all your tables and provide a set of views that select from the new names and are named the old names. These new views can expose extra columns that your client side can update and the views themselves can have INSTEAD OF triggers that update the real tables. Doesn't help for selections though because deleting data doesn't need any data from the client, so the whole thing is a mess. If you were going that wholesale on your DB you'd just switch to using stored procedures for your data modifications and embark on a massive job to change your client side calls. An alternative that is also well leveraged for SQL Server is the CONTEXT_INFO variable, a 128byte variable block of binary data that lives for the life of your connection/session or it's newer upgrade SESSION_CONTEXT, a 256kb set of key value pairs. If you're building something at the client side that logs the user, query and parameter data and you're also building a trigger that logs the data change you could use these variables, programmatically set at the start of each data modification statement, to give your trigger something more involved than "what is the current time" to identify which triggered dataset relates to which query logged. Generating a guid in the client and passing it to the db in some globally readable way that means the database trigger can see it and log it in the history table , tying the client side log of the statement and parameters to the server side set of logged row changes

Configuring database change notification to get only newly inserted or updated data in Java

I am building an application that does some processing after looking up a database (oracle).
Currently, I have configured the application with Spring Integration and it polls data in a periodic fashion regardless of whether any data is updated or inserted.
The problem here is, I cannot add or use any column to distinguish between old and new records. Also, for no insert or update in table as well, poller polls data from database and feeds the data into message channel.
For that, I want to switch to database change notification and I need to register the query something like
SELECT * FROM EMPLOYEE WHERE STATUS='ACTIVE'
now this active status is true for old and new entries and I want to eliminate the old entries from my list. So that, only after a new insert or an existing update, I want to get data which are added newly or updated recently.
Well, that is really very sad that you can't modify the data model in the database. I'd really suggest to try to insist to change the table for your convenience. For example might really be just one more column LAST_MODIFIED, so could to filter the old records and only poll those which date is very fresh.
There is also possibility in Oracle like trigger, so you can perform some action on INSERT/UPDATE and modify some other table for your interest.
Otherwise you don't have choice unless use one more extra persistence service to track loaded records. For example MetadataStore based on Redis or MongoDB: https://docs.spring.io/spring-integration/docs/4.3.12.RELEASE/reference/html/system-management-chapter.html#metadata-store

Accessing database multiple times

I am working on solution of below mentioned but could not find any best practice/tool for this.
For a batch of requests(say 5000 unique ids and records) received in webservice, it has to fetch rows for those unique ids in database and keep them in buffer(or cache) and compare those with records received in webservice. If there is a change for a particular data(say column) that will be updated in table for that unique id. And in turn, the child tables of that table also get affected. For ex, if someone changes his laptop model number and country, model number will be updated in a table and country value in another table. Likewise it goes on accessing multiple tables in short time. The maximum records coming in a webservice call might reach 70K in one call in an hour.
I don't have any other option than implementing it in java. Is there any good practice of implementing this, or can it be achieved using any open source java tools. Please suggest. Thanks.
Hibernate is likely to be the first thing you should try. I tend to avoid because it is overkill for most of my applications but it is a standard tool for accessing database which anyone who knows Java should at least have an understanding of. There are dozens of other solutions you could use but Hibernate is the most often used.
JDBC is the API to use to access relational database. Useful performance and security tips:
use prepared statements
use where ... in () queries to load many rows at once, but beware on the limit in the number of values in the in clause (1000 max in Oracle)
use batched statements to make your updates, rather than executing each update separately (see http://download.oracle.com/javase/1.3/docs/guide/jdbc/spec2/jdbc2.1.frame6.html)
See http://download.oracle.com/javase/tutorial/jdbc/ for a tutorial on JDBC.
This sounds not that complicated. Of course, you must know (or learn):
SQL
JDBC
Then you can go through the web service data record by record and for each record do the following:
fetch corresponding database record
for each field in record
if updated
execute corresponding update SQL statement
commit // every so many records
70K records per hour should be not the slightest problem for a decent RDBMS.

database audit table

I have an existing application that I am working w/ and the customer has defined the table structure they would like for an audit log. It has the following columns:
storeNo
timeChanged
user
tableChanged
fieldChanged
BeforeValue
AfterValue
Usually I just have simple audit columns on each table that provide a userChanged, and timeChanged value. The application that will be writing to these tables is a java application, and the calls are made via jdbc, on an oracle database. The question I have is what is the best way to get the before/after values. I hate to compare objects to see what changes were made to populate this table, this is not going to be efficient. If several columns change in one update, then this new table will have several entries. Or is there a way to do this in oracle? What have others done in the past to track not only changes but changed values?
This traditionally what oracle triggers are for. Each insert or update triggers a stored procedure which has access to the "before and after" data, which you can do with as you please, such as logging the old values to an audit table. It's transparent to the application.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:59412348055
If you use Oracle 10g or later, you can use built in auditing functions. You paid good money for the license, might as well use it.
Read more at http://www.oracle.com/technology/pub/articles/10gdba/week10_10gdba.html
"the customer has defined the table structure they would like for an audit log"
Dread words.
Here is how you would implement such a thing:
create or replace trigger emp_bur before insert on emp for each row
begin
if :new.ename = :old.ename then
insert_audit_record('EMP', 'ENAME', :old.ename, :new.ename);
end if;
if :new.sal = :old.sal then
insert_audit_record('EMP', 'SAL', :old.sal, :new.sal);
end if;
if :new.deptno = :old.deptno then
insert_audit_record('EMP', 'DEPTNO', :old.deptno, :new.deptno);
end if;
end;
/
As you can see, it involves a lot of repetition, but that is easy enough to handle, with a code generator built over the data dictionary. But there are more serious problems with this approach.
It has a sizeable overhead: an
single update which touches ten
field will generate ten insert
statements.
The BeforeValue and AfterValue
columns become problematic when we
have to handle different datatypes -
even dates and timestamps become
interesting, let alone CLOBs.
It is hard to reconstruct the state
of a record at a point in time. We
need to start with the earliest
version of the record and apply the
subsequent changes incrementally.
It is not immediately obvious how
this approach would handle INSERT
and DELETE statements.
Now, none of those objections are a problem if the customer's underlying requirement is to monitor changes to a handful of sensitive columns: EMPLOYEES.SALARY, CREDIT_CARDS.LIMIT, etc. But if the requirement is to monitor changes to every table, a "whole record" approach is better: just insert a single audit record for each row affected by the DML.
I'll ditto on triggers.
If you have to do it at the application level, I don't see how it would be possible without going through these steps:
start a transaction
SELECT FOR UPDATE of the record to be changed
for each field to be changed, pick up the old value from the record and the new value from the program logic
for each field to be changed, write an audit record
update the record
end the transaction
If there's a lot of this, I think I would be creating an update-record function to do the compares, either at a generic level or a separate function for each table.

Categories

Resources