Real time data consumption from mysql - java

I have a use case in which my data is present in Mysql.
For each new row insert in Mysql, I have to perform analytics for the new data.
How I am currently solving this problem is:
My application is a Spring-boot application, in which I have used Scheduler which checks for new row entered in the database after every 2 seconds.
The problem with the current approach is:
Even if there is no new data available in Mysql table, Scheduler fires MySQL query to check if new data available or not.
One way to solve this type of problem in any SQL database in Triggers .
But till now I am not successful in creating Mysql triggers which can call Java-based Spring application or a simple java application.
My question is :
Is their any better way to solve my above use-case? Even I am open to change to another storage (database) system if they are built for this type of use-case.

This fundamentally sounds like an architecture issue. You're essentially using a database as an API which, as you can see, causes all kinds of issues. Ideally, this db would be wrapped in a service that can manage the notification of systems that need to be notified. Let's look at a few different options going forward.
Continue to poll
You didn't outline what the actual issue is with your current polling approach. Is running the job when it's not needed causing an issue of some kind? I'd be a proponent for just leaving it unless you're interested in making a larger change.
Database Trigger
While I'm unaware of a way to launch a java process via a db trigger, you can do an HTTP POST from one. With that in mind, you can have your batch job staged in a web app that uses a POST to launch the job when the trigger fires.
Wrap existing datastore in a service
This is, IMHO, the best option. This allows there to be a system of record that provides an API that can be versioned, etc. This would allow any logic around who to notify would also be encapsulated into this service.
Replace data store with something that allows for better notifications
Without any real information on what the data being store is, it's hard to say how practical this is. But using something as Apache Kafka or Apache Geode would both be options that provide the ability to be notified when new data is persisted (Kafka by listening to the topic, Geode via a continuous query).
For the record, I'd advocate for the wrapping of the existing database in a service. That service would be the only way into the db and take on responsibility for any notifications required.

Related

What would be the best approach for fetch, transform and update data in spring?

We have a requirement where we need to pull data from multiple rest API services transform it and populate it into new database. There would be huge amount of records that might have to be fetched, transformed and updated this way. But it is one time activity once all the data that we get from rest calls have been transformed and populated into new DB we would not have to re run the transformation anytime later. What is the best way to achieve in spring.
Can spring batch be possible solution if it has to be a one time execution?
If it is a one-time thing I wouldn't bother using Spring Batch. I would simply call the external APIs, get the data, transform it and then persist it in your database. You could trigger the process either by exposing an endpoint in your own API to start it or relying on a Scheduled task.
Keeping things as simple as possible (but never simpler) is one of the greatest assets you can have while developing software but it is also one of the hardest thing to achieve for us as software engineers simply because we usually overthink the solutions.
For this kind of problem, it will be better if you use the ETL (extract, transfer, and load) tool or framework, my recommendation is Kafka check this link, I think it will be helpful Link

How to constantly check if time is up for auction site

I am working on a project where we are creating an auction site that works on a weekly basis. The plan is to start all auctions on Monday and end all on Friday.
So far I have figured out that I need a database that holds the start and end date so I can check to see how much time left and so. But I need to be able to constantly check and see if the time is up or not and I do not know how to proceed. What is the proper way to do this?
We are using Java8 with Spring and react as frontend.
two solution:
Use websocket, server set a timer which due at Friday, and once timer expired, send the event to client.
client side do timer also.
You have 3 layers in play here:
Frontend (React)
Backend (Java8/Spring app)
Database
Now you need to figure out how to propagate data between those layers.
Propagating from backend to frontend can be done using either polling or websockets.
Propagating from database to backend can be done using either polling or database triggers.
I'd personally connect React with Spring App via websockets. Then I'd have some background task polling the database and pushing the discovered changes to connected websocket clients.
Here is Springs own tutorial on websockets https://spring.io/guides/gs/messaging-stomp-websocket/
I think you are looking for a pull model. Basically your Java application needs to pull the end date from database at certain intervals. You can write a cron job for that.Quartz cron is one of the popular Java based frameworks out there http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html . It handles distributed system also. So if your application is having multiple instances, Quartz can cover it for you.
Another variant in pull model, you can read the entry with end dates in JVM local cache or some other cache(Redis, Memcache) & run a cron on that. But you have to maintain cache consistency with database.
Which one you choose depends on your business use case(how frequently end date changes, how frequently you need to check for end dates etc.).
Other option was to go for push model. But push model won't work with traditional databases for this case.
Possible option - is to extend org.springframework.web.servlet.handler.HandlerInterceptorAdapter
And write logic of checking current time against your time range in this class - with throwing of an exception if check fails.
Potential optimization - cache values from DB (at least for some time, for example - 15 minutes - as it will help to decrease number of actual calls to the database.

Should I use AKKA for the periodical task

I have a terminal server monitor project. In the backend, I use the Spring MVC, MyBatis and PostgreSQL. Basically I query the session information from DB and send back to front-end and display it to users. But there is some large queries(like searching total users, total sessions, etc.), which slow down the system when user opens the website, So I want to do these queries as asynchronous tasks so the website could be opened fast rather than waiting for the query. Also, I would check terminal server state periodically from DB(every hour), and if terminal server fails or average load is too high, I would notifying admins. I do not know what should I use, maybe AKKA, or any other way to do these two jobs(1.do the large query asynchronously 2. do some periodical query)? Please help me, thanks!
You can achieve this using Spring and caching where necessary.
If the data you're displaying is not required to be "in real-time", but it can be "near real-time" you can read the data from the DB periodically and cache it. Your app then reads from the cache.
There's different approaches you can explore.
You can try to create a materialized view in PostgreSQL which will hold the statistic data you need. Depending on your requirements you have to see how to handle refresh intervals etc.
Another approach is to use application level cache - you can leverage Spring for that(Spring docs). You can populate the cache on start up and refresh it as necessary.
The task that runs every hour can be implemented again leveraging Spring (Spring docs) #Scheduled annotation.
To answer your question - don't use Akka - you have all the tools necessary to achieve the task in the Spring ecosystem.
Akka is not very relevant here, it is for event-driven programming model which deals with concurrency issues to build highly scalable multithreaded applications.
You can use Spring task scheduler for running heavy queries periodically. If you want to keep it simple, you can solve your problem by simply storing the data like total users, total sessions etc, in the global application context. And periodically update this data from database using spring scheduler. You can also store the same in a separate database table, so that this data can be easily loaded at the initialization time.
I really don't see why you need "memcached", "materialized views", "Websockets" and other heavy technologies and frameworks, for a caching a small set of data. All you need is maintain a set of global parameters in your application context, keep them updated using a scheduled task as frequently as desired.

Can you place a listener on a db table or similar

I have a java backend system. We need integrate to a third party. I need to return results to a client. Currently we are using a view (SQL SERVER) that the third party write to and keep a tracker of the unique id somewhere.
I have a Spring-wired poller that runs every 10min that will return everything from what was last sent to end and update the tracker table with the new ID. Nothing complicated.
I would like to know if there is a simple way to almost "listen" on the table. If any new rows are added, grab them and return them via my service. And if there is, is it advisable?
Disclaimer: I have not exactly done this, yet, and SQL Server is not my playground. However, combining triggers with non-SQL commands should be possible nowadays, and searching the internet for 'sql server notification' yields a section on 'Query Notifications in SQL Server':
https://msdn.microsoft.com/en-us/library/t9x04ed2(v=vs.110).aspx
In general, as long as it is possible to somehow send a command to a socket from inside a trigger then you could use a PUB-SUB message queue (RabbitMQ, NSQ etc.) to send a notification that you can retrieve in your Java program. Of course, you would have to install triggers on any columns that you want to monitor. Whether it is possible to monitor a schema (or database) for changes in general - this might only be possible if there is some logging inside the database that you have access to. This might not be there, out of the box. The trigger is probably the cleaner way because it resides in the schema/database itself and does not need to access system tables.
EDIT: also found this SO question/answer about socket connection inside a trigger: Creating socket inside a SQL-CLR trigger or stored procedure
You could use triggers on the view, but that won't help in java land as most DBs aren't going to let you run java code there. Your choices are: 1) polling like you are doing or 2) create a web service (or JMS queue or something) that the third party calls to push the update/insert of data. Then you are in java land and Hibernate/Spring can handle the insert and do whatever processing you need.
You could use triggers on the tables, if performance is not an issue

Getting events from a database

I am not very familiar with databases and what they offer outside of the CRUD operations.
My research has led me to triggers. Basically it looks like triggers offer this type of functionality:
(from Wikipedia)
There are typically three triggering events that cause triggers to "fire":
INSERT event (as a new record is being inserted into the database).
UPDATE event (as a record is being changed).
DELETE event (as a record is being deleted).
My question is: is there some way I can be notified in Java (preferably including the data that changed) by the database when a record is Updated/Deleted/Inserted using some sort of trigger semantics?
What might be some alternate solutions to this problem? How can I listen to database events?
The main reason I want to do this is a scenario like this:
I have 5 client applications all in different processes/existing across different PCs. They all share a common database (Postgres in this case).
Lets say one client changes a record in the DB that all 5 of the clients are "interested" in. I am trying to think of ways for the clients to be "notified" of the change (preferably with the affected data attached) instead of them querying for the data at some interval.
Using Oracle you can setup a Trigger on a table and then have the trigger send a JMS message. Oracle has two different JMS implementations. You can then have a process that will 'listen' for the message using the JDBC Driver. I have used this method to push changes out to my application vs. polling.
If you are using a Java database (H2) you have additional options. In my current application (SIEM) I have triggers in H2 that publish change events using JMX.
Don't mix up the database (which contains the data), and events on that data.
Triggers are one way, but normally you will have a persistence layer in your application. This layer can choose to fire off events when certain things happen - say to a JMS topic.
Triggers are a last ditch thing, as you're operating on relational items then, rather than "events" on the data. (For example, an "update", could in reality map to a "company changed legal name" event) If you rely on the db, you'll have to map the inserts & updates back to real life events.... which you already knew about!
You can then layer other stuff on top of these notifications - like event stream processing - to find events that others are interested in.
James
Hmm. So you're using PostgreSQL and you want to "listen" for events and be "notified" when they occur?
http://www.postgresql.org/docs/8.3/static/sql-listen.html
http://www.postgresql.org/docs/8.3/static/sql-notify.html
Hope this helps!
Calling external processes from the database is very vendor specific.
Just off the top of my head:
SQLServer can call CLR programs from
triggers,
postgresql can call arbitrary C
functions loaded dynamically,
MySQL can call arbitrary C functions,
but they must be compiled in,
Sybase can make system calls if set
up to do so.
The simplest thing to do is to have the insert/update/delete triggers make an entry in some log table, and have your java program monitor that table. Good columns to have in your log table would be things like EVENT_CODE, LOG_DATETIME, and LOG_MSG.
Unless you require very high performance or need to handle 100Ks of records, that is probably sufficient.
I think you're confusing two things. They are both highly db vendor specific.
The first I shall call "triggers". I am sure there is at least one DB vendor who thinks triggers are different than this, but bear with me. A trigger is a server-side piece of code that can be attached to table. For instance, you could run a PSQL stored procedure on every update in table X. Some databases allow you to write these in real programming languages, others only in their variant of SQL. Triggers are typically reasonably fast and scalable.
The other I shall call "events". These are triggers that fire in the database that allow you to define an event handler in your client program. IE, any time there are updates to the clients database, fire updateClientsList in your program. For instance, using python and firebird see http://www.firebirdsql.org/devel/python/docs/3.3.0/beyond-python-db-api.html#database-event-notification
I believe the previous suggestion to use a monitor is an equivalent way to implement this using some other database. Maybe oracle? MSSQL Notification services, mentioned in another answer is another implementation of this as well.
I would go so far as to say you'd better REALLY know why you want the database to notify your client program, otherwise you should stick with server side triggers.
What you're asking completely depends on both the database you're using and the framework you're using to communicate with your database.
If you're using something like Hibernate as your persistence layer, it has a set of listeners and interceptors that you can use to monitor records going in and out of the database.
There are a few different techniques here depending on the database you're using. One idea is to poll the database (which I'm sure you're trying to avoid). Basically you could check for changes every so often.
Another solution (if you're using SQL Server 2005) is to use Notification Services, although this techonology is supposedly being replaced in SQL 2008 (we haven't seen a pure replacement yet, but Microsoft has talked about it publicly).
This is usually what the standard client/server application is for. If all inserts/updates/deletes go through the server application, which then modifies the database, then client applications can find out much easier what changes were made.
If you are using postgresql it has capability to listen notifications from JDBC client.
I would suggest using a timestamp column, last updated, together with possibly the user updating the record, and then let the clients check their local record timestamp against that of the persisted record.
The added complexity of adding a callback/trigger functionality is just not worth it in my opinion, unless supported by the database backend and the client library used, like for instance the notification services offered for SQL Server 2005 used together with ADO.NET.

Categories

Resources