Notifying postgres changes to java application - java

Problem
I'm building a postgres database for a few hundred thousand products. I will set-up an index (Solr or maybe ElasticSearch) to improve query times for complex search queries.
The point now is how to let the index synchronized with the database?
In the past I had a kind of application that polled the database periodically to check for updates that should be done, but I would have an outdated index state time (from the database update to the index update pull).
I would prefer a solution in which the database would notify my application (java application) that something has been changed within the database, and at that point the application will decide if the index needs to be updated or not. To be more accurate, I would build a kind of producer and consumer structure in wish the replica will receive notifications from postgres that something changed, if this is pertinent to the data indexed it is stored in a stack of updates-to-do. The consumer would consume this stack and build the documents to be stored into the index.
Possible Solutions
One solution would be to write a kind of replica end-point in which the application would behave as a postgres instance that is being used to replicate the data from the original database. Do someone have some experience with this approach?
Which other solution do I have for this problem?

Which other solution do I have for this problem?
Use LISTEN and NOTIFY to tell your app that things have changed.
You can send the NOTIFY from a trigger that also records changes in a queue table.
You'll need a PgJDBC connection that has sent a LISTEN for the event(s) you're using. It must poll the database by sending periodic empty queries ("") if you're using SSL; if you are not using SSL this can be avoided by use of the async notification checks. You'll need to unwrap the Connection object from your connection pool to be able to cast the underlying connection to a PgConnection to use listen/notify with. See related answer
The producer/consumer bit will be harder. To have multiple crash-safe concurrent consumers in PostgreSQL you need to use advisory locking with pg_try_advisory_lock(...). If you don't need concurrent consumers then it's easy, you just SELECT ... LIMIT 1 FOR UPDATE a row at a time.
Hopefully 9.4 will include an easier method of skipping locked rows with FOR UPDATE, as there's work in development for it.

To use LISTEN and NOTIFY of postgres you need to use a driver that can support asynchronous notifications. The postgres JDBC driver does not support asynchronous notifications.
To constantly LISTEN over a channel from Application Server go for the pgjdbc-ng 0.6 driver.
http://impossibl.github.io/pgjdbc-ng/
It supports async notifications, without polling.

In general, I would recommend to implement loose coupling using the EAI patterns. Then, if you decide to exchange the database, the code at the index side does not change.
In case, you want to stick with tight coupling, I would recommend to use
LISTEN/NOTIFY.
In Java, it is important to use the pgjdbc-ng driver, because it supports async
notifications without polling.
Here's an asynchronous pattern (based on this answer):
import com.impossibl.postgres.api.jdbc.PGConnection;
import com.impossibl.postgres.api.jdbc.PGNotificationListener;
import com.impossibl.postgres.jdbc.PGDataSource;
import java.sql.Statement;
public static void listenToNotifyMessage() {
PGDataSource dataSource = new PGDataSource();
dataSource.setHost("localhost");
dataSource.setPort(5432);
dataSource.setDatabase("database_name");
dataSource.setUser("postgres");
dataSource.setPassword("password");
PGNotificationListener listener = (int processId, String channelName, String payload)
-> System.out.println("notification = " + payload);
try (PGConnection connection = (PGConnection) dataSource.getConnection()) {
Statement statement = connection.createStatement();
statement.execute("LISTEN test");
statement.close();
connection.addNotificationListener(listener);
// it only works if the connection is open. Therefore, we do an endless loop here.
while (true) {
Thread.sleep(500);
}
} catch (Exception e) {
System.err.println(e);
}
}
In the other statements, you can now execute NOTIFY test, 'This is a payload';. You can also execute NOTIFY in triggers etc.

Related

Server overloading due to mysql and tomcat

I have a system where I have multiple sensors and I need to collect data from each sensor every minute. I am using
final Runnable collector = new Runnable(){public void run() {{...}};
scheduler.scheduleAtFixedRate(collector, 0, 1, TimeUnit.MINUTES);
to initiate the process every minute and starts an individual thread for each sensor. Each thread opens a mysql connection and gets details of the sensor from database, opens a socket to collect data and stores data into the database and closes socket and db connection. (I make sure all the connections are closed)
Now there are other applications which I use to generate alerts and reports from that data.
Now as the number of sensors are increasing the server starts to get overload and the applications are getting slow.
I need some expert advice, how to optimise my system and what is the best way to implement these type of systems. Should I use only one application to (collect data + generate alarm + generate reports, generate chart images + etc).
Thanks in advance.
Here is the basic code for data collector application
public class OnlineSampling
{
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
public void startProcess(int start)
{
try
{
final Runnable collector = new Runnable()
{
#SuppressWarnings("rawtypes")
public void run()
{
DataBase db = new DataBase();
db.connect("localhost");
try
{
ArrayList instruments = new ArrayList();
//Check if the database is connected
if(db.isConnected())
{
String query="SELECT instrumentID,type,ip,port,userID FROM onlinesampling WHERE status = 'free'";
instruments = db.getData(query,5);
for(int i=0;i<instruments.size();i++)
{
...
OnlineSamplingThread comThread = new OnlineSamplingThread(userID,id,type,ip,port,gps,unitID,parameterID,timeZone,units,parameters,scaleFactors,offsets,storageInterval);
comThread.start();
//This onlineSamplingThread opens the socket and collects the data and does few more things
}
}
} catch (Exception e)
{
e.printStackTrace();
}
finally
{
//Disconnect from the database
db.disconnect();
}
}
};
scheduler.scheduleAtFixedRate(collector, 0, 60 , TimeUnit.SECONDS);
} catch (Exception e) {}
}
}
UPDATED:
How many sensors do you have? We have around 400 sensors (increasing).
How long is data-gathering session with each sensor?
Each sensor has a small webserver with a sim card in it wo connect to the internet. It depends on the 3G network, in normal conditions it does not take more than 3.5 seconds.
Are you closing the network connections properly after you're done with a single sensor? I make sure I close the socket everytime, I have also set the timeout duration for each socket which is 3.5 seconds.
What OS are you using to collect sensor data? We have our own protocol to communicate with the sensors using socket programming.
Is it configured as a server or a desktop? Each sensor is a server.
What you probably need is connection pooling - instead of opening one DB connection per sensor, have a shared pool of opened connections that each thread uses when it needs to access the DB. That way, the number of connections can be much smaller than the number of your sensors (assuming that most of the time, the program will do other things than read/write into the DB, like communicate with the sensor or wait for sensor response).
If you don't use a framework that has connection pooling feature, you can try Apache Commons DBCP.
Reuse any open files or sockets whenever you can. DBCP is a good start.
Reuse any threads if you can. That "comThread" is very suspect in that regard.
Consider adding queues to your worker threads. This will allow you to have threads that process tasks/jobs serially.
Profile, Profile, Profile!! You really have no idea what to optimize until you profile. JProfiler and YourKit are both very popular, but there are some free tools such as Netbeans and VisualVM.
Use Database caching such as Redis or Memcache
Consider using Stored Procedures versus inline queries
Consider using a Service Oriented Architecture or Micro-Services. Splitting each application function into a separate service which can be tightly optimized for that function.
These are from the smalll amount of code you posted. But profile should give you a much better idea.
Databases are made to handle loads of way more than "hundreds of inserts" per minute. In fact a MySQL database can easily handle hundreds of inserts per second.So, you problem it's probably not related to the load.
The first goal it's to find out "What is slow" or "What is collapsing", run all the queries that your application runs and see if any of them are abnormally slow compared to the others. Alternatively configure the Slow Query Log (https://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html ) with parameters fitting to your problem, and then analice the output.
Once you find "What" is the problem, you can ask for help here with laying out more information. We have no way to help you with the information provided.
However, just as a hunch, what's the max_connections parameter value you have for your database? The default value it's 100 or 151 I think, so if you have more than 151 sensors connected at the database at the same time it will queue or drop the new incoming connections. If that's your issue you just have to minimise the time sensors are connected to your database and it will fix the issue.
Your system is (almost certainly) slowing down because of the enormous overhead of starting threads, opening database connections, and then closing them. 300 sensors means five of these operations per second, continuously. That's too many.
Here's what you need to do to make this scalable.
First step
Make your sampling program long-running, rather than starting it over frequently.
Have it start a sensor thread for each 20 sensors (approximately).
Each thread will query its sensors one by one and insert the results into some sort of thread-safe data structure. A Bag or a Queue would be suitable.
When your sensor threads come to the end of each minute's work, make each of them sleep for the remaining time before the next minute starts, then start over.
Have your program start a single database-writing thread. That thread will open a database connection and hold it open. It will then take results from the queue and write them to the database, waiting when no results are available.
The database-writing thread should start a MySQL transaction, then INSERT some number of rows (ten to 100), then Commit the transaction and start another one, rather than using the default autocommit behavior. (If you're using MyISAM tables, you don't need to do this.)
This will drastically improve your throughput and reduce your MySQL overhead.
Second step
When your workload gets too big for a single program instance with multiple sensor threads to handle, run multiple instances of the program, each with its own list of sensors.
Third step
When the workload gets too big for a single machine, add another one and run new instances of your program on that new machine.
Collecting data from hundreds of sensors should not pose a performance problem if done correctly. To scale this process you should carefully manage your database connections as well as your sensor connections and you should leverage queues for the sensor-sampling and sensor-data writing processes. If your sensor count is stable, you can cache the sensor connection data, possibly with periodic updates to your sensor connection cache.
Use a connection pool to talk to your database. Query your database for the sensor connection information, then release that connection back to the pool as soon as possible -- do not keep the database connection open while talking to the sensor. It's likely reading sensor connection data (which talks to your database) can be done in a single thread, and that thread creates sensor sampling jobs for your executor.
Within each sensor sampling job, open the HTTP sensor connection, collect sensor data, close HTTP sensor connection, and then create a sensor data write job to write the sampling data to the database. Assuming your sensors are distinct nodes, an HTTP sensor connection pool is not likely to help much because HTTP client and server connections are relatively light (unlike database connections).
Writing sensor-sampling data back to the database should also be made in a queue and these database write jobs should use your database connection pool.
With this design, you should be able to easily handle hundreds of sensors and likely thousands of sensors with modest hardware running a Linux server OS as the collector and a properly configured database.
I suggest you test these processes independently, so you know the sustainable rates for each step:
reading and caching sensor connection data and create sampling jobs;
execute sampling jobs and create writing jobs; and,
execute sample data writing jobs.
Let me know if you'd like code as well.

querying DB in a loop continuously in java

Is it advisable to query a database continuously in a loop, to get any new data which is added to specific table?
I have below a piece of code:
while(true)
try{
// get connection
// execute only "SELECT" query
}
catch(Exception e){}
finally{// close connection
}
//Sleep 5 sec's
}
It is a simple approach that works in many cases. Make sure that the select statement you use doesn't put as little load as possible on the database.
The better (but more difficult to setup) variant would be either to use some mechanism to get actively informed by the database about changes. Some databases can for example can send information with some queuing mechanism, which again could be triggered using a database trigger.
Querying database in loop is not advisable but if you need the same you can daemonize your program.
If longer then 5 s a timer would be appropriate.
For a kind of staying totally up-to-date:
Triggers and cascading inserts/deletes can propagate data inside the database itself.
Otherwise before altering the database, issue messages in a message queue. This not necessarily needs to be a Message Queue (capitals) but can be any kind of queue, like a publish/subscribe mechanism or whatever.
On one hand, if your database has a low change rate then it would be better to use/implement a notification system. Many RDBMS have notification features (Oracle's Database Change Notification, Postgres' Asynchronous Notifications, ...), and if your RDBMS does not have them, it is easy to implement/emulate using triggers (if your RDBMS support them).
On the other hand, if the change rate is very high then your solution is preferable. But you need to adjust carefully the interval time and you must note: reading on intervals to detect changes has a negative collateral effect.
Using/implementing a notification system it is easy to inform the program what has been changed. (A new row X inserted on table A, a new updated row Y on table B, …).
But if you read your data on intervals, it is not easy to determine what has been changed. Then you have two options:
a) you must not only read but load/process all information every interval;
b) or you must not only read but compare database data with memory resident data to determine what has changed every interval.

On DBConnections reuse

Today:
I continue to work on an application, which, constantly creates and closes a LOT of (sql) connections to the same (Oracle) database, using the same credentials running very basic selects, updates and inserts.
Currently, a singleton CustomConnectionManager checks in its pool to see if any CustomConnectors are available to be given out. Once given out they may be closed by the client. Upon close() the underlying SQLConnection (created and maintained by the CustomConnector) is also closed. If CustomConnector is unavailable, new CustomConnector is created.
The good thing about this is that ultimately SQLConnection remains closed after each use, however, very little reuse is going on as the value lies in SQLConnection not in CustomConnector.
Since all users of the system will connect to the same database, the same single connection may be used to accommodate all requests. Original solution that created new Connectors upon each request seems very wasteful.
Proposed:
singleton CustomConnectionManager will maintain 2 queues:
a queue of available CustomConnector, each of which will maintain it's own SQLConnection and
a queue of inUse CustomConnectors. Upon request new CustomConnector will be created and given out.
Client interaction only happens with the singleton CustomConnectionManager.
When new connection is needed, manager creates it, gives it out to client and places it in inUse queue. When client is done using the connection, instead of closing it, client will .markConnectorAvailable(), which would put it back into the availableConnectors queue ( Client will no longer be able to control underlying SQLConnection)
Question 1: What do you think? Will this work? Is there an existing solution that already does this well?
Question 2: If proposed approach is not a complete waste, what is a good point for CustomConnector's to close it's SQLConnections?
That's connection pooling. ADO.Net's kills them after no use in a set time (can be set in the connectionstring, (default is two minutes as I remember.).

Multiple database queries in parallel, for a single client request

To complete certain requests from the user, in my application, I am issuing multiple DB queries from a single method, but they are currently being executed sequentially & thus the application is blocked until the time it has received the response/data for the previous query, then proceeding to next query. This is not something I like much. I would like to issue parallel queries.
Also after issuing queries I would like to do some other work, (instead of being blocked till previous queries response) & on getting the response for each query I would like to execute a code block specific to each query's data. What is the way to do this ?
Edit:My DB API does provide connection pooling.
I'm just a little bit familiar with Java multithreading.
Using:-
------
Java 1.6
Cassandra 1.1 Database with Hector
You should understand before you start doing this
To benefit from concurrency, you need to have multiple db connections. THe best way to solve this is to create a db pool.
You need to create a runnable / callable class for executing a db Statement. You will need to put together some messaging system to alert listeners to when your query has completed
Understand that when you are sending multiple requests at the same time, all bets are off as to which will complete first, and that there may be conflicts between statements that destabilize your app.
I have the similar task/issue. For complete build result I need to send few requests for few different service (few on REST, few on Thrift), for decrease latency I need to sent it in parallel. My idea is using java.util.concurrent.Future, make simple aggregation manager , which create many requests together and will be wait last retrieved response and return all needed data. In more advanced solution this manager can make/combine final result during other queries, but this solution can be not thread safe.
Here's a very trivial/limited approach:
final Connection conn = ...;
final Object[] result = new Object[1];
Thread t1 = new Thread(new Runnable() {
public void run() {
Object results = conn.executeQuery();
result[0] = results;
}
});
t1.setName("DBQueryWorker");
t1.start();
// do other work
while (t1.isAlive()) {
// wait on thread one
}
This is a simple approach, but many others are possible (eg, thread pooling via Java Concurrency task executors, Spring task executors, etc).

Java: Do something on event in SQL Database?

I'm building an application with distributed parts.
Meaning, while one part (writer) maybe inserting, updating information to a database, the other part (reader) is reading off and acting on that information.
Now, i wish to trigger an action event in the reader and reload information from the DB whenever i insert something from the writer.
Is there a simple way about this?
Would this be a good idea? :
// READER
while(true) {
connect();
// reload info from DB
executeQuery("select * from foo");
disconnect();
}
EDIT : More info
This is a restaurant Point-of-sale system.
Whenever the cashier punches an order into the db - the application in the kitchen get's the notification. This is the kind of system you would see at McDonald's.
The applications needn't be connected in any way. But each part will connect to a single mySQL server.
And i believe i'd expect immediate notifications.
You might consider setting up an embedded JMS server in your application, I would recommend ActiveMQ as it is super easy to embed.
For what you want to do a JMS Topic is a perfect fit. When the cashier punches in an order the order is not written to the database but in a message on the Topic, let's name it newOrders.
On the topic there are 2 subscribers : NewOrderPersister and KitchenNotifier. These will each have an onMessage(Message msg) method which contains the details of the order. One saves it to the database, the other adds it to a screen or yells it through te kitchen with text-to-speech, whatever.
The nice part of this is that the poster does not need to know which and how many subscribers are there waiting for the messages. So if you want a NewOrderCOunter in the backoffice to keep an online count of how much money the owner has made today, or add a "FreanchFiresOrderListener" to have a special display near the deep frying pan, nothing has to change in the rest of the application. They just subscribe to the topic.
The idea you are talking about is called "polling". As Graphain pointed out you must add a delay in the loop. The amount of delay should be decided based on factors like how quickly you want your reader to detect any changes in database and how fast the writer is expected to insert/update data.
Next improvement to your solution could be to have an change-indicator within the database. Your algo will look something like:
// READER
while(true) {
connect();
// reload info from DB
change_count=executeQuery("select change_count from change_counters where counter=foo");
if(change_count> last_change_count){
last_change_count=change_count;
reload();
}
disconnect();
}
The above change will ensure that you do not reload data unnecessarily.
You can further tune the solution to keep a row level change count so that you can reload only the updated rows.
I don't think it's a good idea to use a database to synchronize processes. The parties using the database should synchronize directly, i.e., the writer should write its orders and then notify the kitchen that there is a new order. Then again the notification could be the order itself (or some ID for the database). The notifications can be sent via a message broker.
It's more or less like in a real restaurant. The kitchen rings a bell when meals are finished and the waiters fetch them. They don't poll unnecessarily.
If you really want to use the database for synchronization, you should look into triggers and stored procedures. I'm fairly sure that most RDBMS allow the creation of stored procedures in Java or C that can do arbitrary things like opening a Socket and communicating with another Computer. While this is possible and not as bad as polling I still don't think of it as a very good idea.
Well to start with you'd want some kind of wait timer in there or it is literally going to poll every instance of time it can which would be a pretty bad idea unless you want to simulate what it would be like if Google was hosted on one database.
What kind of environment do the apps run in? Are we talking same machine notification, cross-network, over the net?
How frequently do updates occur and how soon does the reader need to know about them?
I have done something similar before using jGroups I don't remember the exact details as it was quite a few years ago but I had a listener on the "writer" end which would then use JGroups to send out notification of change which would cause the receivers to respond accordingly.

Categories

Resources