Database polling when there's an insert - Spring Data JPA - java

I have a requirement where whenever there's an entry in the table, I want to trigger an event. I have used EntityListeners (Spring Data JPA concept) for this, which is working perfectly fine; but the issue here is the insert can happen through stored procedures or manual entry. I tried searching online and found the Spring JPA inbound and outbound channel adapter concept, but I think this concept doesn't help me much in what I want to achieve. Can anybody clarify to me if this concept helps me as I have no much idea on this concept or provide me with any solutions on how I can achieve this?

There are no "great" mechanisms for raising events "from the data layer" in SQL Server.
There are three "OK" ones:
Triggers (only arguably OK)
Triggers seem like an obvious solution, but then you have to ask yourself... what will the trigger actually do? If it just writes data into another table, you still haven't gotten yourself outside the database. There are various arcane tricks you could try to use for this, like CLR procedures, or a few extended procedures.
But if you go down that route, you have to start thinking about another consideration: Triggers happen in the same transaction as the DML operation that caused them to fire. If they take time to execute, you'll be slowing down your OLTP workloads. If they do anything that is potentially unreliable they could fail, causing your transaction to roll back.
Triggers plus service broker
Service broker provides a mechanism - perhaps the only even-half-sensible mechanism - to get your data out of SQL and into some kind of listener in a "push" based manner. You still have a trigger, but the trigger writes data to a service broker queue. A listener can use a special waitfor receive statement to listen for data as it appears in the queue. The nice thing about this is that once the trigger has pushed data into a broker queue, its job is done. The "receipt" of that data is decoupled from the transaction that caused it to be enqueued in the first place. This sort of service broker mechanism is what is used by things like the SqlDependency built into dot net.
The two main issues with service broker are complexity and performance. Service broker has a steep learning curve, and it's easy to get things wrong. Performance becomes complex if you need to scale, because while it's "easy" to build xml or json payloads, large set based data changes can mean those payloads are massive.
In any case, if you want to explore this route, you're going to want to read (all of) the excellent articles on the subject by Remus Rusanu
Bear in mind that this is an asynchronous "near real time" mechanism, not a synchronous "real time" mechanism like triggers.
Polling a built in change detection mechanism: CDC or Change Tracking.
Sql server comes with two flavours of technology that can natively "watch" changes that happen in tables, and record them: Change Tracking, and Change Data Capture
Neither of these push data out of the database, they're both "pull" based. What they do is store additional data in the database when changes happen. CDC can provide a complete log of every change, whereas change tracking "points to" rows that have changed via the primary key values. Though both of these involve "polling history", there are significant differences between them, so read the fine print.
Note that CDC is "doubly asynchronous" - the data is read from the transaction log, so recording the data is not part of the original transaction. And then you have to poll the CDC data, it's not pushed out to you. Furthermore, the functions generated by Microsoft when you enable CDC can be unbelievably slow as soon as you ask for something useful, like net changes with mask (which can tell you which columns really changed their value), and your ability to enable CDC comes with a lot of caveats and limitations (again, read the docs for all of this).
As to which of these is "best", well, that's a matter of opinion and circumstance. I have used CDC extensively, service broker rarely, and triggers almost never, as a way of getting events out of SQL. I have never actually used change tracking in a production environment, but if I had the choice again I would probably have chosen change tracking rather than change data capture, at least until or unless there were requirements that mandated the use of CDC because of its additional functionality beyond what change tracking can provide.
One last note: If you need to "guarantee" that the events that get raised have in fact been collected by a listener and successfully forwarded to subscribers, well, you have some work ahead of you! Guaranteed messaging is hard.

Related

Collection processing or database request ? which one is better

This is my first post on stackoverflow, so please be nice to me :-)
So let me explain the context. I'm developing a web service with a standard layer (resources, services, DAO Layer...). I use JPA with hibernate implementation for my object model with the database.
For a class A parent and a class B child, most of the time when i want to find an object B on the collection, I use the streamAPI to filter the collection based on what i want. My question here is more general, is it better to search an object by requesting the database (from my point of view this gonna cause a lot of calls to the database but it's gonna use less CPU), or do the opposite by searching over the model object and process over collection (this gonna cause less database calls, but more CPU process)
If you consider latency, the database will always be slower.
So you gotta ask yourself some questions:
how far away is the database (latency)?
how big is the dataset?
How do I process them ?
do I have any major runtime issues ?
from my point of view this gonna cause a lot of calls to the database but it's gonna use less CPU), or do the opposite by searching over the model object and process over collection (this gonna cause less database calls, but more CPU process)
You're program is probably not very performant programmed. I suggest you check the O-Notation if you have any major runtime leaks.
Your Question is very broad, so it's hard to tell you, for your use-case, which might be the best.
Use database to return data what you need and Java to perform processing on them that would be complicated to do in a JPQL/SQL query.
Databases are designed to perform queries more efficiently than Java (stream or no).
Besides, fetching many data from a database to finally keep only a part of them is not efficient.
The database is usually faster since it is optimized for requesting specific data. Usually one would add indexes to speed up querying on certain fields.
TLDR: Filter your data in the database and process them from java.
This isn't an easy question to answer, since there are many different factors that would influence my decision to go to the db or not. First, I think it's fair to say that, for almost every app I've worked on in the past 20 years, hitting the DB for information is the default strategy. More recently (say past 10 or so years) data access through web service calls has become common as well.
For me, the main question would be something along the lines of, "Are there any situations when I would not hit an external resource (DB, Service, or even file read) for data every time I need it?"
So, I'll outline some of the things I would consider.
Is the data search space very small?
If you are searching a data space of tens of different records, then this information might be a candidate for non-db storage. On the other hand, once you get past a fairly small set records, this approach becomes increasingly untenable. Examples of these "small sets" might be something like salutations (Mr., Ms., Dr., Mrs., Lord). I looks for small sets of data that rarely change, which I, as a lazy developer, wouldn't mind typing into a configuration file. Once I get past something like 50 different records (like US States, for example), I want to pull that info from a DB or service call.
Are the data cacheable?
If you have multiple requests that could legitimately use the exact same data, then leverage caching in your application. Examine the data and expected usage of your service for opportunities to leverage regularities in data and likely requests to cache data whenever possible. Remember to consider cache keys, how long items should be cached, and when cached items should be evicted.
In many web usage scenarios, it's not uncommon that each display could include a fairly large amount of cached information, and a small amount of dynamic data. Menu and other navigation items are good candidates for caching. User-specific data, such as contract-sepcific pricing in an eCommerce app are often poor candidates.
Can you pre-load some data into cache?
Some items can be read once and cached for the entire duration of your application. A list of US States and/or Canadian Provinces is a good example here. These almost never change, so once read from the db, you would rarely need to read them again. Consider application components that can load such data on startup, and then hold this data in an appropriate collection.

can apache flink be used to join huge non real time data smart?

I am supposed to join some huge SQL tables with the json of some REST services by some common key ( we are talking about multiple sql tables with a few REST services calls ). The thing is this data is not real time/ infinite stream and also don’t think I could order the output of the REST services by the join columns. Now the silly way would be to bring all data and then match the rows, but that would imply to store everything in memory/ some storage like Cassandra or Redis.
But, I was wondering if flink could use some king of stream window to join say X elements ( so really just store in RAM just those elements at a point ) but also storing the nonmatched element for later match in maybe some kind of hash map. This is what I mean by smart join.
The devil is in the details, but yes, in principle this kind of data enrichment is quite doable with Flink. Your requirements aren't entirely clear, but I can provide some pointers.
For starters you will want to acquaint youself with Flink's managed state interfaces. Using these interfaces will ensure your application is fault tolerant, upgradeable, rescalable, etc.
If you wanted to simply preload some data, then you might use a RichFlatmap and load the data in the open() method. In your case a CoProcessFunction might be more appropriate. This is a streaming operator with two inputs that can hold state and also has access to timers (which can be used to expire state that is no longer needed, and to emit results after waiting for out-of-order data to arrive).
Flink also has support for asynchronous i/o, which can make working with external services more efficient.
One could also consider approaching this with Flink's higher level SQL and Table APIs, by wrapping the REST service calls as user-defined functions.

Interprocess communication via a database

If I have 2 processes running in different nodes and they share a database, is there a pattern that one node be able to send some notification to the other process via the database?
Is some kind of polling a table normally used or is there a better way?
Instead of polling (which translates into burning not only CPU cycles but in this case also database resources and bandwidth), how about this? if you were using Oracle you could define a trigger ON UPDATE for the table you want to be notified and call a Java Stored Procedure (JSP) from the trigger. The JSP could then use whatever notification mechanism to notify the other component about the change. This is not going to be extremely fast but well ...
The proper way would be to have the component updating the Database sending a parallel notification to the other component and again use any available technology for this RMI, JMS etc
If you want to use a database, you can insert entries into a table on the producing side and poll to find new entries on the consuming side. This may be the simplest option for your project.
There are many possible alternatives such as JMI, RMI, Sockets, NoSql databases, files, but without more information it's not possible to tell if these would be better. (Often simplest is best)
Polling is not an optimal solution. If you have a large number of clients or users, the database is going to be kept busy answering to the pollsters.
Users blocking or waiting for an update is much preferable, if possible. Users generally prefer a responsive system.
The two main criteria to consider before deciding are the maximum number of concurrent users and how quickly users needs to be notified of the event they have expressed an interest in.
A better solution than polling, if your database supports it, is select() or something like inotify(). For instance, PostgreSQL supports select(), so you can do a non-busy-loop while waiting for input into the DB. That being said, Database-as-IPC is considered an anti-pattern.
The simplest solution is just polling on another process.
However, if you want another process receives the data change immediate, you then should consider using some notification mechanism, such as rpc, http request, etc.

Java synchronization options for preventing duplicate orders (file, db locking?)

I have two use cases for placing an order on a website. One is directly submitted from a web front end with a creditcard, and the other is a notification of an external payment from a processor like paypal. In both situations, I need to ensure that the order is only placed one time.
I would like to use the same mechanism for both scenarios if possible, to help with code reuse. In the first use case, the user can submit the order form multiple times and result in different theads trying to place an order. I can use ajax to stop this, but I need a server side solution for certainty. In the second usecase, the notification messages may be sent through in duplicates so I need to protect against that too.
I want the solution to be scalable across a distributed environment, so a memory lock is out of the question. I was looking at saving a unique token to the database to prevent multiple submissions there, but I really don't want to be messing with the existing database transactions. The real solution it seems is to lock on something external like a file in a shared location across jvms.
All orders have a unique long id, so I could use that to synchronize. What would be the best way of doing this? I could potentially create a file per id, or do something fancier with a region of the file. However I don't have much experience with file locking, so if there is a better option I would love to hear it. Any code samples would help very much.
If you already have a unique long id, nothing better than a simple database table with manually assigned primary keys can't happen to you. Every RDBMS (and also key-value NoSQL databases) will effectively and efficiently discover primary keys clashes. It is basically:
Start transaction
INSERT INTO orders VALUES (your_unique_id)
Commit
Depending on the database, 2. or 3. will throw an exception which you can easily catch.
If you really want to avoid databases (could you elaborate a little bit more why?), you can:
Use file locking (nasty and not scalable), don't go that way.
In-memory locking with clustering (with Terracotta it's like working with normal boolean that is magically clustered)
Queuing requests and having only single consumer.
Using JMS and single-threaded consumer looks promising, however you still have to discover duplicates (but at least you avoid concurrently placed orders) and it might be terribly slow...

Way to know table is modified

There are two different processes developed in Java running independently,
If any of the process modifyies the table, can i get any intimation? As the table is modified. My objective is i want a object always in sync with a table in database, if any modification happens on table i want to modify the object.
If table is modified can i get any intimation regarding this ? Do Database provide any facility like this?
We use SQL Server and have certain triggers that fire when a table is modified and call an external binary. The binary we call sends a Tib rendezvous message to notify other applications that the table has been updated.
However, I'm not a huge fan of this solution - Much better to control writing to your table through one "custodian" process and have other applications delegate to that. To enforce this you could change permissions on your table so that only your custodian process can write to the database.
The other advantage of this approach is being able to provide a caching layer within your custodian process to cater for common access patterns. Granted that a DBMS performs caching anyway, but by offering it at the application layer you will have more control / visibility over it.
No, database doesn't provide these services. You have to query it periodically to check for modification. Or use some JMS solution to send notifications from one app to another.
You could add a timestamp column (last_modified) to the tables and check it periodically for updates or sequence numbers (which are incremented on updates similiar in concept to optimistic locking).
You could use jboss cache which provides update mechanisms.
One way, you can do this is: Just enclose your database statement in a method which should return 'true' when successfully accomplished. Maintain the scope of the flag in your code so that whenever you want to check whether the table has been modified or not. Why not you try like this???
If you're willing to take the hack approach, and your database stores tables as files (eg, mySQL), you could always have something that can check the modification time of the files on disk, and look to see if it's changed.
Of course, databases like Oracle where tables are assigned to tablespaces, and tablespaces are what have storage on disk it won't work.
(yes, I know this is a bad approach, that's why I said it's a hack -- but we don't know all of the requirements, and if he needs something quick, without re-writing the whole application, this would technically work for some databases)

Categories

Resources