querying DB in a loop continuously in java - java

Is it advisable to query a database continuously in a loop, to get any new data which is added to specific table?
I have below a piece of code:
while(true)
try{
// get connection
// execute only "SELECT" query
}
catch(Exception e){}
finally{// close connection
}
//Sleep 5 sec's
}

It is a simple approach that works in many cases. Make sure that the select statement you use doesn't put as little load as possible on the database.
The better (but more difficult to setup) variant would be either to use some mechanism to get actively informed by the database about changes. Some databases can for example can send information with some queuing mechanism, which again could be triggered using a database trigger.

Querying database in loop is not advisable but if you need the same you can daemonize your program.

If longer then 5 s a timer would be appropriate.
For a kind of staying totally up-to-date:
Triggers and cascading inserts/deletes can propagate data inside the database itself.
Otherwise before altering the database, issue messages in a message queue. This not necessarily needs to be a Message Queue (capitals) but can be any kind of queue, like a publish/subscribe mechanism or whatever.

On one hand, if your database has a low change rate then it would be better to use/implement a notification system. Many RDBMS have notification features (Oracle's Database Change Notification, Postgres' Asynchronous Notifications, ...), and if your RDBMS does not have them, it is easy to implement/emulate using triggers (if your RDBMS support them).
On the other hand, if the change rate is very high then your solution is preferable. But you need to adjust carefully the interval time and you must note: reading on intervals to detect changes has a negative collateral effect.
Using/implementing a notification system it is easy to inform the program what has been changed. (A new row X inserted on table A, a new updated row Y on table B, …).
But if you read your data on intervals, it is not easy to determine what has been changed. Then you have two options:
a) you must not only read but load/process all information every interval;
b) or you must not only read but compare database data with memory resident data to determine what has changed every interval.

Related

Can I have local state in a Kafka Processor?

I've been reading a bit about the Kafka concurrency model, but I still struggle to understand whether I can have local state in a Kafka Processor, or whether that will fail in bad ways?
My use case is: I have a topic of updates, I want to insert these updates into a database, but I want to batch them up first. I batch them inside a Java ArrayList inside the Processor, and send them and commit them in the punctuate call.
Will this fail in bad ways? Am I guaranteed that the ArrayList will not be accessed concurrently?
I realize that there will be multiple Processors and multiple ArrayLists, depending on the number of threads and partitions, but I don't really care about that.
I also realize I will loose the ArrayList if the application crashes, but I don't care if some events are inserted twice into the database.
This works fine in my simple tests, but is it correct? If not, why?
Whatever you use for local state in your Kafka consumer application is up to you. So, you can guarantee only the current thread/consumer will be able to access the local state data in your array list. If you have multiple threads, one per Kafka consumer, each thread can have their own private ArrayList or hashmap to store state into. You could also have something like a local RocksDB database for persistent local state.
A few things to look out for:
If you're batching updates together to send to the DB, are those updates in any way related, say, because they're part of a transaction? If not, you might run into problems. An easy way to ensure this is the case is to set a key for your messages with a transaction ID, or some other unique identifier for the transaction, and that way all the updates with that transaction ID will end up in one specific partition, so whoever consumes them is sure to always have the
How are you validating that you got ALL the transactions before your batch update? Again, this is important if you're dealing with database updates inside transactions. You could simply wait for a pre-determined amount of time to ensure you have all the updates (say, maybe 30 seconds is enough in your case). Or maybe you send an "EndOfTransaction" message that details how many messages you should have gotten, as well as maybe a CRC or hash of the messages themselves. That way, when you get it, you can either use it to validate you have all the messages already, or you can keep waiting for the ones that you haven't gotten yet.
Make sure you're not committing to Kafka the messages you're keeping in memory until after you've batched and sent them to the database, and you have confirmed that the updates went through successfully. This way, if your application dies, the next time it comes back up, it will get again the messages you haven't committed in Kafka yet.

constantly check database [duplicate]

I'm using JDBC, need to constantly check the database against changing values.
What I have currently is an infinite loop running, inner loop iterating over a changing values, and each iteration checking against the database.
public void runInBG() { //this method called from another thread
while(true) {
while(els.hasElements()) {
Test el = (Test)els.next();
String sql = "SELECT * FROM Test WHERE id = '" + el.getId() + "'";
Record r = db.getTestRecord(sql);//this function makes connection, executeQuery etc...and return Record object with values
if(r != null) {
//do something
}
}
}
}
I'm think this isn't the best way.
The other way I'm thinking is the reverse, to keep iterating over the database.
UPDATE
Thank you for the feedback regarding timers, but I don't think it will solve my problem.
Once a change occurs in the database I need to process the results almost instantaneously against the changing values ("els" from the example code).
Even if the database does not change it still has to check constantly against the changing values.
UPDATE 2
OK, to anyone interested in the answer I believe I have the solution now. Basically the solution is NOT to use the database for this. Load in, update, add, etc... only whats needed from the database to memory.
That way you don't have to open and close the database constantly, you only deal with the database when you make a change to it, and reflect those changes back into memory and only deal with whatever is in memory at the time.
Sure this is more memory intensive but performance is absolute key here.
As to the periodic "timer" answers, I'm sorry but this is not right at all. Nobody has responded with a reason how the use of timers would solve this particular situation.
But thank you again for the feedback, it was still helpful nevertheless.
Another possibility would be using ScheduledThreadPoolExecutor.
You could implement a Runnable containing your logic and register it to the ScheduledExecutorService as follows:
ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(10);
executor.scheduleAtFixedRate(myRunnable, 0, 5, TimeUnit.SECONDS);
The code above, creates a ScheduledThreadPoolExecutor with 10 Threads in its pool, and would have a Runnable registered to it that will run in a 5 seconds period starting immediately.
To schedule your runnable you could use:
scheduleAtFixedRate
Creates and executes a periodic action that becomes enabled first after the given initial delay, and subsequently with the given period; that is executions will commence after initialDelay then initialDelay+period, then initialDelay + 2 * period, and so on.
scheduleWithFixedDelay
Creates and executes a periodic action that becomes enabled first after the given initial delay, and subsequently with the given delay between the termination of one execution and the commencement of the next.
And here you can see the advantages of ThreadPoolExecutor, in order to see if it fits to your requirements. I advise this question: Java Timer vs ExecutorService? too in order to make a good decision.
Keeping the while(true) in the runInBG() is a bad idea. You better remove that. Instead you can have a Scheduler/Timer(use Timer & TimerTask) which would call the runInBG() periodically and check for the updates in the DB.
u could use a timer--->
Timer timer = new Timer("runInBG");
//Taking an instance of class contains your repeated method.
MyClass t = new MyClass();
timer.schedule(t, 0, 2000);
As you said in the comment above, if application controls the updates and inserts then you can create a framework which notifies for 'BG' thread or process about change in database. Notification can be over network via JMS or intra VM using observer pattern or both local and remote notifications.
You can have generic notification message like (it can be class for local notification or text message for remote notifications)
<Notification>
<Type>update/insert</Type>
<Entity>
<Name>Account/Customer</Name>
<Id>id</Id>
<Entity>
</Notification>
To avoid a 'busy loop', I would try to use triggers. H2 also supports a DatabaseEventListener API, that way you wouldn't have to create a trigger for each table.
This may not always work, for example if you use a remote connection.
UPDATE 2
OK, to anyone interested in the answer I believe I have the solution now. Basically the solution is NOT to use the database for this. Load in, update, add, etc... only whats needed from the database to memory. That way you don't have to open and close the database constantly, you only deal with the database when you make a change to it, and reflect those changes back into memory and only deal with whatever is in memory at the time. Sure this is more memory intensive but performance is absolute key here.

How to iterate over db records correctly with hibernate

I want to iterate over records in the database and update them. However since that updating is both taking some time and prone to errors, I need to a) don't keep the db waiting (as e.g. with a ScrollableResults) and b) commit after each update.
Second thing is that this is done in multiple threads, so I need to ensure that if thread A is taking care of a record, thread B is getting another one.
How can I implement this sensibly with hibernate?
To give a better idea, the following code would be executed by several threads, where all threads share a single instance of the RecordIterator:
Iterator<Record> iter = db.getRecordIterator();
while(iter.hasNext()){
Record rec = iter.next();
// do something lengthy here
db.save(rec);
}
So my question is how to implement the RecordIterator. If on every next() I perform a query, how to ensure that I don't return the same record twice? If I don't, which query to use to return detached objects? Is there a flaw in the general approach (e.g. use one RecordIterator per thread and let the db somehow handle synchronization)? Additional info: there are way to many records to locally keep them (e.g. in a set of treated records).
Update: Because the overall process takes some time, it can happen that the status of Records changes. Due to that the ordering of the result of a query can change. I guess to solve this problem I have to mark records in the database once I return them for processing...
Hmmm, what about pushing your objects from a reader thread in some bounded blocking queue, and let your updater threads read from that queue.
In your reader, do some paging with setFirstResult/setMaxResults. E.g. if you have 1000 elements maximum in your queue, fill them up 500 at a time. When the queue is full, the next push will automatically wait until the updaters take the next elements.
My suggestion would be, since you're sharing an instance of the master iterator, is to run all of your threads using a shared Hibernate transaction, with one load at the beginning and a big save at the end. You load all of your data into a single 'Set' which you can iterate over using your threads (be careful of locking, so you might want to split off a section for each thread, or somehow manage the shared resource so that you don't overlap).
The beauty of the Hibernate solution is that the records aren't immediately saved to the database, since you're using a transaction, and are stored in hibernate's cache. Then at the end they'd all be written back to the database at once. This would save on those expensive database writes you're worried about, plus it gives you an actual object to work with on each iteration, instead of just a database row.
I see in your update that the status of the records may change during processing, and this could always cause a problem. If this is a constantly running process or long running, then my advice using a hibernate solution would be to work in smaller sets, and yes, add a flag to mark records that have been updated, so that when you move to the next set you can pick up ones that haven't been touched.

Java: Do something on event in SQL Database?

I'm building an application with distributed parts.
Meaning, while one part (writer) maybe inserting, updating information to a database, the other part (reader) is reading off and acting on that information.
Now, i wish to trigger an action event in the reader and reload information from the DB whenever i insert something from the writer.
Is there a simple way about this?
Would this be a good idea? :
// READER
while(true) {
connect();
// reload info from DB
executeQuery("select * from foo");
disconnect();
}
EDIT : More info
This is a restaurant Point-of-sale system.
Whenever the cashier punches an order into the db - the application in the kitchen get's the notification. This is the kind of system you would see at McDonald's.
The applications needn't be connected in any way. But each part will connect to a single mySQL server.
And i believe i'd expect immediate notifications.
You might consider setting up an embedded JMS server in your application, I would recommend ActiveMQ as it is super easy to embed.
For what you want to do a JMS Topic is a perfect fit. When the cashier punches in an order the order is not written to the database but in a message on the Topic, let's name it newOrders.
On the topic there are 2 subscribers : NewOrderPersister and KitchenNotifier. These will each have an onMessage(Message msg) method which contains the details of the order. One saves it to the database, the other adds it to a screen or yells it through te kitchen with text-to-speech, whatever.
The nice part of this is that the poster does not need to know which and how many subscribers are there waiting for the messages. So if you want a NewOrderCOunter in the backoffice to keep an online count of how much money the owner has made today, or add a "FreanchFiresOrderListener" to have a special display near the deep frying pan, nothing has to change in the rest of the application. They just subscribe to the topic.
The idea you are talking about is called "polling". As Graphain pointed out you must add a delay in the loop. The amount of delay should be decided based on factors like how quickly you want your reader to detect any changes in database and how fast the writer is expected to insert/update data.
Next improvement to your solution could be to have an change-indicator within the database. Your algo will look something like:
// READER
while(true) {
connect();
// reload info from DB
change_count=executeQuery("select change_count from change_counters where counter=foo");
if(change_count> last_change_count){
last_change_count=change_count;
reload();
}
disconnect();
}
The above change will ensure that you do not reload data unnecessarily.
You can further tune the solution to keep a row level change count so that you can reload only the updated rows.
I don't think it's a good idea to use a database to synchronize processes. The parties using the database should synchronize directly, i.e., the writer should write its orders and then notify the kitchen that there is a new order. Then again the notification could be the order itself (or some ID for the database). The notifications can be sent via a message broker.
It's more or less like in a real restaurant. The kitchen rings a bell when meals are finished and the waiters fetch them. They don't poll unnecessarily.
If you really want to use the database for synchronization, you should look into triggers and stored procedures. I'm fairly sure that most RDBMS allow the creation of stored procedures in Java or C that can do arbitrary things like opening a Socket and communicating with another Computer. While this is possible and not as bad as polling I still don't think of it as a very good idea.
Well to start with you'd want some kind of wait timer in there or it is literally going to poll every instance of time it can which would be a pretty bad idea unless you want to simulate what it would be like if Google was hosted on one database.
What kind of environment do the apps run in? Are we talking same machine notification, cross-network, over the net?
How frequently do updates occur and how soon does the reader need to know about them?
I have done something similar before using jGroups I don't remember the exact details as it was quite a few years ago but I had a listener on the "writer" end which would then use JGroups to send out notification of change which would cause the receivers to respond accordingly.

Help with java threads or executors: Executing several MySQL selects, inserts and updates simmultaneously

I'm writing an application to analyse a MySQL database, and I need to execute several DMLs simmultaneously; for example:
// In ResultSet rsA: Select * from A;
rsA.beforeFirst();
while (rsA.next()) {
id = rsA.getInt("id");
// Retrieve data from table B: Select * from B where B.Id=" + id;
// Crunch some numbers using the data from B
// Close resultset B
}
I'm declaring an array of data objects, each with its own Connection to the database, which in turn calls several methods for the data analysis. The problem is all threads use the same connection, thus all tasks throw exceptios: "Lock wait timeout exceeded; try restarting transaction"
I believe there is a way to write the code in such a way that any given object has its own connection and executes the required tasks independent from any other object. For example:
DataObject dataObject[0] = new DataObject(id[0]);
DataObject dataObject[1] = new DataObject(id[1]);
DataObject dataObject[2] = new DataObject(id[2]);
...
DataObject dataObject[N] = new DataObject(id[N]);
// The 'DataObject' class has its own connection to the database,
// so each instance of the object should use its own connection.
// It also has a "run" method, which contains all the tasks required.
Executor ex = Executors.newFixedThreadPool(10);
for(i=0;i<=N;i++) {
ex.execute(dataObject[i]);
}
// Here where the problem is: Each instance creates a new connection,
// but every DML from any of the objects is cluttered in just one connection
// (in MySQL command line, "SHOW PROCESSLIST;" throws every connection, and all but
// one are idle).
Can you point me in the right direction?
Thanks
I think the problem is that you've confounded a lot of middle tier, transactional, and persistent logic into one class.
If you're dealing directly with ResultSet, you're not thinking about things in a very object-oriented fashion.
You're smart if you can figure out how to get the database to do some of your calculations.
If not, I'd recommend keeping Connections open for the minimum time possible. Open a Connection, get the ResultSet, map it into an object or data structure, close the ResultSet and Connection in local scope, and return the mapped object/data structure for processing.
You keep persistence and processing logic separate this way. You save yourself a lot of grief by keeping connections short-lived.
If a stored procedure solution is slow it could be due to poor indexing. Another solution will perform equally poorly if not worse. Try running EXPLAIN PLAN and see if any of your queries are using TABLE SCAN. If yes, you have some indexes to add. It could also be due to large rollback logs if your transactions are long-running. There's a lot you could and should do to ensure you've done everything possible with the solution you have before switching. You could go to a great deal of effort and still not address the root cause.
After some time of brain breaking, I figured out my own mistakes... I want to put this new knowledge, so... here I go
I made a very big mistake by declaring the Connection objet as a Static object in my code... so obviously, despite I created a new Connection for each new data object I created, every transaction went through a single, static, connection.
With that first issue corrected, I went back to the design table, and realized that my process was:
Read an Id from an input table
Take a block of data related to the Id read in step 1, stored in other input tables
Crunch numbers: Read the related input tables and process the data stored in them
Save the results in one or more output tables
Repeat the process while I have pending Ids in the input table
Just by using a dedicated connection for input reading and a dedicated connection for output writing, the performance of my program increased... but I needed a lot more!
My original approach for steps 3 and 4 was to save into the output each one of the results as soon as I had them... But I found a better approach:
Read the input data
Crunch the numbers, and put the results in a bunch of queues (one for each output table)
A separated thread is checking every second if there's data in any of the queues. If there's data in the queues, write it to the tables.
So, by dividing input and output tasks using different connections, and by redirecting the core process output to a queue, and by using a dedicated thread for output storage tasks, I finally achieved what I wanted: Multithreaded DML execution!
I know there are better approaches to this particular problem, but this one works quite fine.
So... if anyone is stuck with a problem like this... I hope this helps.

Categories

Resources