I have a system where I have multiple sensors and I need to collect data from each sensor every minute. I am using
final Runnable collector = new Runnable(){public void run() {{...}};
scheduler.scheduleAtFixedRate(collector, 0, 1, TimeUnit.MINUTES);
to initiate the process every minute and starts an individual thread for each sensor. Each thread opens a mysql connection and gets details of the sensor from database, opens a socket to collect data and stores data into the database and closes socket and db connection. (I make sure all the connections are closed)
Now there are other applications which I use to generate alerts and reports from that data.
Now as the number of sensors are increasing the server starts to get overload and the applications are getting slow.
I need some expert advice, how to optimise my system and what is the best way to implement these type of systems. Should I use only one application to (collect data + generate alarm + generate reports, generate chart images + etc).
Thanks in advance.
Here is the basic code for data collector application
public class OnlineSampling
{
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
public void startProcess(int start)
{
try
{
final Runnable collector = new Runnable()
{
#SuppressWarnings("rawtypes")
public void run()
{
DataBase db = new DataBase();
db.connect("localhost");
try
{
ArrayList instruments = new ArrayList();
//Check if the database is connected
if(db.isConnected())
{
String query="SELECT instrumentID,type,ip,port,userID FROM onlinesampling WHERE status = 'free'";
instruments = db.getData(query,5);
for(int i=0;i<instruments.size();i++)
{
...
OnlineSamplingThread comThread = new OnlineSamplingThread(userID,id,type,ip,port,gps,unitID,parameterID,timeZone,units,parameters,scaleFactors,offsets,storageInterval);
comThread.start();
//This onlineSamplingThread opens the socket and collects the data and does few more things
}
}
} catch (Exception e)
{
e.printStackTrace();
}
finally
{
//Disconnect from the database
db.disconnect();
}
}
};
scheduler.scheduleAtFixedRate(collector, 0, 60 , TimeUnit.SECONDS);
} catch (Exception e) {}
}
}
UPDATED:
How many sensors do you have? We have around 400 sensors (increasing).
How long is data-gathering session with each sensor?
Each sensor has a small webserver with a sim card in it wo connect to the internet. It depends on the 3G network, in normal conditions it does not take more than 3.5 seconds.
Are you closing the network connections properly after you're done with a single sensor? I make sure I close the socket everytime, I have also set the timeout duration for each socket which is 3.5 seconds.
What OS are you using to collect sensor data? We have our own protocol to communicate with the sensors using socket programming.
Is it configured as a server or a desktop? Each sensor is a server.
What you probably need is connection pooling - instead of opening one DB connection per sensor, have a shared pool of opened connections that each thread uses when it needs to access the DB. That way, the number of connections can be much smaller than the number of your sensors (assuming that most of the time, the program will do other things than read/write into the DB, like communicate with the sensor or wait for sensor response).
If you don't use a framework that has connection pooling feature, you can try Apache Commons DBCP.
Reuse any open files or sockets whenever you can. DBCP is a good start.
Reuse any threads if you can. That "comThread" is very suspect in that regard.
Consider adding queues to your worker threads. This will allow you to have threads that process tasks/jobs serially.
Profile, Profile, Profile!! You really have no idea what to optimize until you profile. JProfiler and YourKit are both very popular, but there are some free tools such as Netbeans and VisualVM.
Use Database caching such as Redis or Memcache
Consider using Stored Procedures versus inline queries
Consider using a Service Oriented Architecture or Micro-Services. Splitting each application function into a separate service which can be tightly optimized for that function.
These are from the smalll amount of code you posted. But profile should give you a much better idea.
Databases are made to handle loads of way more than "hundreds of inserts" per minute. In fact a MySQL database can easily handle hundreds of inserts per second.So, you problem it's probably not related to the load.
The first goal it's to find out "What is slow" or "What is collapsing", run all the queries that your application runs and see if any of them are abnormally slow compared to the others. Alternatively configure the Slow Query Log (https://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html ) with parameters fitting to your problem, and then analice the output.
Once you find "What" is the problem, you can ask for help here with laying out more information. We have no way to help you with the information provided.
However, just as a hunch, what's the max_connections parameter value you have for your database? The default value it's 100 or 151 I think, so if you have more than 151 sensors connected at the database at the same time it will queue or drop the new incoming connections. If that's your issue you just have to minimise the time sensors are connected to your database and it will fix the issue.
Your system is (almost certainly) slowing down because of the enormous overhead of starting threads, opening database connections, and then closing them. 300 sensors means five of these operations per second, continuously. That's too many.
Here's what you need to do to make this scalable.
First step
Make your sampling program long-running, rather than starting it over frequently.
Have it start a sensor thread for each 20 sensors (approximately).
Each thread will query its sensors one by one and insert the results into some sort of thread-safe data structure. A Bag or a Queue would be suitable.
When your sensor threads come to the end of each minute's work, make each of them sleep for the remaining time before the next minute starts, then start over.
Have your program start a single database-writing thread. That thread will open a database connection and hold it open. It will then take results from the queue and write them to the database, waiting when no results are available.
The database-writing thread should start a MySQL transaction, then INSERT some number of rows (ten to 100), then Commit the transaction and start another one, rather than using the default autocommit behavior. (If you're using MyISAM tables, you don't need to do this.)
This will drastically improve your throughput and reduce your MySQL overhead.
Second step
When your workload gets too big for a single program instance with multiple sensor threads to handle, run multiple instances of the program, each with its own list of sensors.
Third step
When the workload gets too big for a single machine, add another one and run new instances of your program on that new machine.
Collecting data from hundreds of sensors should not pose a performance problem if done correctly. To scale this process you should carefully manage your database connections as well as your sensor connections and you should leverage queues for the sensor-sampling and sensor-data writing processes. If your sensor count is stable, you can cache the sensor connection data, possibly with periodic updates to your sensor connection cache.
Use a connection pool to talk to your database. Query your database for the sensor connection information, then release that connection back to the pool as soon as possible -- do not keep the database connection open while talking to the sensor. It's likely reading sensor connection data (which talks to your database) can be done in a single thread, and that thread creates sensor sampling jobs for your executor.
Within each sensor sampling job, open the HTTP sensor connection, collect sensor data, close HTTP sensor connection, and then create a sensor data write job to write the sampling data to the database. Assuming your sensors are distinct nodes, an HTTP sensor connection pool is not likely to help much because HTTP client and server connections are relatively light (unlike database connections).
Writing sensor-sampling data back to the database should also be made in a queue and these database write jobs should use your database connection pool.
With this design, you should be able to easily handle hundreds of sensors and likely thousands of sensors with modest hardware running a Linux server OS as the collector and a properly configured database.
I suggest you test these processes independently, so you know the sustainable rates for each step:
reading and caching sensor connection data and create sampling jobs;
execute sampling jobs and create writing jobs; and,
execute sample data writing jobs.
Let me know if you'd like code as well.
Related
I encountered several stuck JDBC connections in my code due to poor network health. I am planning java.sql.Connection.setNetworkTimeout library function. As per docs:-
Sets the maximum period a Connection or objects created from the Connection will wait for the database to reply to any one request
Now, what exactly is the request here? my query takes really long time to respond and even longer time to process (I am using jdbc interface to a big data DB). So do I need to keep this timeout time, bigger than the expected query execution time (to prevent false trigger) or will there exist keep alive messages, being exchanged to keep track on network connection?, in which case I will keep it really low
So if your NetworkTimeout is smaller than the QueryTimeout, the query will be terminated on your side - thread that waits for the DB to reply (notice that setNetworkTimeout has Executor executor parameter) will be interrupted. Depending on the underlying implementation NetworkTimeout may cancel the query on the DB side as well.
If NetworkTimeout > QueryTimeout, and query completes within QueryTimeout then nothing bad should happen. If problems you experience are exactly in this case, you should try to work on the OS level settings for keeping TCP connections alive so that no firewall terminates them too soon.
When it comes to keeping TCP connections alive it is usually more a matter of the OS level settings than the application itself. You can read more about it (Linux) here.
Problem
I'm building a postgres database for a few hundred thousand products. I will set-up an index (Solr or maybe ElasticSearch) to improve query times for complex search queries.
The point now is how to let the index synchronized with the database?
In the past I had a kind of application that polled the database periodically to check for updates that should be done, but I would have an outdated index state time (from the database update to the index update pull).
I would prefer a solution in which the database would notify my application (java application) that something has been changed within the database, and at that point the application will decide if the index needs to be updated or not. To be more accurate, I would build a kind of producer and consumer structure in wish the replica will receive notifications from postgres that something changed, if this is pertinent to the data indexed it is stored in a stack of updates-to-do. The consumer would consume this stack and build the documents to be stored into the index.
Possible Solutions
One solution would be to write a kind of replica end-point in which the application would behave as a postgres instance that is being used to replicate the data from the original database. Do someone have some experience with this approach?
Which other solution do I have for this problem?
Which other solution do I have for this problem?
Use LISTEN and NOTIFY to tell your app that things have changed.
You can send the NOTIFY from a trigger that also records changes in a queue table.
You'll need a PgJDBC connection that has sent a LISTEN for the event(s) you're using. It must poll the database by sending periodic empty queries ("") if you're using SSL; if you are not using SSL this can be avoided by use of the async notification checks. You'll need to unwrap the Connection object from your connection pool to be able to cast the underlying connection to a PgConnection to use listen/notify with. See related answer
The producer/consumer bit will be harder. To have multiple crash-safe concurrent consumers in PostgreSQL you need to use advisory locking with pg_try_advisory_lock(...). If you don't need concurrent consumers then it's easy, you just SELECT ... LIMIT 1 FOR UPDATE a row at a time.
Hopefully 9.4 will include an easier method of skipping locked rows with FOR UPDATE, as there's work in development for it.
To use LISTEN and NOTIFY of postgres you need to use a driver that can support asynchronous notifications. The postgres JDBC driver does not support asynchronous notifications.
To constantly LISTEN over a channel from Application Server go for the pgjdbc-ng 0.6 driver.
http://impossibl.github.io/pgjdbc-ng/
It supports async notifications, without polling.
In general, I would recommend to implement loose coupling using the EAI patterns. Then, if you decide to exchange the database, the code at the index side does not change.
In case, you want to stick with tight coupling, I would recommend to use
LISTEN/NOTIFY.
In Java, it is important to use the pgjdbc-ng driver, because it supports async
notifications without polling.
Here's an asynchronous pattern (based on this answer):
import com.impossibl.postgres.api.jdbc.PGConnection;
import com.impossibl.postgres.api.jdbc.PGNotificationListener;
import com.impossibl.postgres.jdbc.PGDataSource;
import java.sql.Statement;
public static void listenToNotifyMessage() {
PGDataSource dataSource = new PGDataSource();
dataSource.setHost("localhost");
dataSource.setPort(5432);
dataSource.setDatabase("database_name");
dataSource.setUser("postgres");
dataSource.setPassword("password");
PGNotificationListener listener = (int processId, String channelName, String payload)
-> System.out.println("notification = " + payload);
try (PGConnection connection = (PGConnection) dataSource.getConnection()) {
Statement statement = connection.createStatement();
statement.execute("LISTEN test");
statement.close();
connection.addNotificationListener(listener);
// it only works if the connection is open. Therefore, we do an endless loop here.
while (true) {
Thread.sleep(500);
}
} catch (Exception e) {
System.err.println(e);
}
}
In the other statements, you can now execute NOTIFY test, 'This is a payload';. You can also execute NOTIFY in triggers etc.
I have an application that listens on a port for UDP datagrams. I use a UDP inbound channel adapter to listen on this port. My UDP channel adapter is configured to use a ThreadPoolTaskExecutor to dispatch incoming UDP datagrams. After the UDP channel adapter I use a direct channel. My channel has only one subscriber i.e. a service activator.
The service adds the incoming messages to a synchronized list stored in memory. Then, I have a single thread that retrieves the content of the list every 5 seconds and does a batch update to a MySQL database.
My problem:
A first bulk of message arrives. The threads of my ThreadPoolExecutor get the incoming message from the UDP channel adapter and add them to the synchronized list. Let's say 10000 messages have been received and inserted.
The background thread retrieves the 10000 messages and does a batch update (JdbcTemplate.update(String[]).
At this point, the background thread waits the response from the database. But, now, because it takes time to the database to execute the 10000 INSERT, 20000 messages have been received and are present in the list.
The background thread receives a response from the database. Then, it retrieves the 20000 messages and does a batch update (JdbcTemplate.update(String[]).
It takes more time to the database to execute the INSERT and during this time, 35000 messages have been received and stored in the list.
The heap size grows constantly and causes after a certain time a memory execption.
I'm trying to find solution to improve the performance of my application.
Thanks
Storing 10,000 records every 5 seconds is quite alot for any database to sustain.
You need to consider other options
use a different data store e.g a NoSQL data store, or a flat file.
ensure you have good write performance on your disks e.g using a write cache.
use a disk sub-system with mutliple disks or an SSD drive.
Suggestions
a. Do you really need a single synchronized list? Can't you have a group of lists, and let's say divide the work between these lists , let's say by running hashCode on a key of the data?
b. Can you use a thread pool of threads that read information from the list (I would use a queue here, by the way) , this way, when one thread is "stuck" due to heavy batch insertion, other threads can still read "jobs" from the queue and perform them?
c. Is your database co-hosted on the same machine as the application? This can improve performance
d. Can you post your insert query? maybe someone can offer you a way to optimize it?
Use a Database Connection pool so that you don't have to wait on the commit on any one thread. Just grab the next available connection and do parallel inserts.
I get 5.000 inserts per second sustained on a SQLServer table, but that required quite a few optimizations. Did not use all of the tips below, some might be of use to you.
Check the MySQL Insert Speed documentation tips in
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
Parallelize the insert process
Aggregate messages if possible. Instead of storing all messages insert a row with information about received messages in a timeframe, of a certain type etc.
Change the table to have no Indexes or Foreign keys except for the primary key
Switch to writing to a textfile (and import that during the night in a loaddata bulk file insert if you really want it in the database)
Use a seperate database instance to serve only your table
...
I have 20 factory machines all processing tasks. I have a database adapter that talks to a MYSQL database that the machines store all their info in. Is it safe to have each factory machine panel have it's own database adapter and updater thread? The updater thread just continuously checks to see if the panel's current taskID is the same as the current in the database and if not it repopulates the panel with information about the new task.
I'm not certain if having too many connections will add overhead or not?
RDBS are designed to be accessed by multiple clients at a given time. It's one of their purpose.
So I don't think 20, or even a thousand of simultaneous connection will cause any problem at all.
Rather than having many connection all doing same task create one process which maintains the list of taskid (if it is different for all machines) and checks the current taskid in database. if it is changed then send message to all the machines (which has change in their taskid)to update their panels. This will avoid unnecessary load on database and will also handle increase in number of machines without any impact.
The number of connection in mysql is controlled by max_connections (total number of allowed simultaneous connection) system variable and max_user_connections(max number of simultaneous connections per user). Take a look on your server settings and maybe change them. Default numbers are definitely bigger than 20 though.
I have a java server that handles logins from multiple clients. The server creates a thread for each tcp/ip socket listener. Database access is handled by another thread that the server creates.
At the moment the number of clients I have attaching to the server is quite low (<100) so I have no real performance worries, but I am working out how I should handle more clients in the future. My concern is that with lots of clients my server and database threads will get bogged down by constant calls to their methods from the client threads.
Specifically in relation to the database: At the moment each client thread accesses the public database thread on its server parent and executes a data access method. What I think I should do is have some kind of message queue that a client thread can put its data request on and the database thread will do it when it gets round to it. If there is data to be returned from the data access call then it can put it on a queue for the client thread to pick up. All of this wouldn't hit the main server code or any other client threads.
I therefore think that I want to implement an asynchronous message queue that client threads can put a message on and the database thread will pick up from. Is that the right approach? Any thoughts and links to somewhere I can read up about implementation would be appreciated.
I would not recommend this approach.
JMS was born for this sort of thing. It'll be better than any implementation you'll write from scratch. I'd recommend using a Java EE app server that has JMS built in or something like ActiveMQ or RabbitMQ that you can add to a servlet engine like Tomcat.
I would strongly encourage you to investigate these before writing your own.
What you are describing sounds like an ExecutorCompletionService. This is essentially an asynch task broker that accepts requests (Runnables or Callables) from one thread, returning a "handle" to the forthcoming result in the form of a Future. The request is then executed in a thread pool (which could be a single thread thread pool) and the result of the request is then delivered back to the calling thread through the Future.
In between the time that the request is submitted and response is supplied, your client thread will simply wait on the Future (with an optional timeout).
I would advise, however, that if you're expecting a big increase in the number of clients (and therefore client threads), you should evaluate some of the Java NIO Server frameworks out there. This will allow you to avoid allocating one thread per client, especially since you expect all these threads to spend some time waiting on DB requests. If this is the case, I would suggest looking at MINA or Netty.
Cheers.
//Nicholas
It sounds like what you want to do is limit the number of concurrent requests to the database you want to allow. (To stop it being overloaded)
I suggest you have a limited size connection pool. When too many threads want to use the database they will have to wait until a connection is free. A simple way to do this is with a BlockingQueue with all the connections created in advance.
private final BlockingQueue<Connection> connections = new ArrayBlockingQueue<Connection>(40); {
// create connections
}
// to perform a query.
Connection conn = connections.get();
try {
// do something
} finally {
connections.add(conn);
}
This way you can keep your thread design much the same as it is and limit the number of concurrent queries to the database. With some tweaking you can create the connections as needed and provide a time out if a database connection cannot be obtained quickly.