Java synchronization options for preventing duplicate orders (file, db locking?)

Java synchronization options for preventing duplicate orders (file, db locking?) - java

I have two use cases for placing an order on a website. One is directly submitted from a web front end with a creditcard, and the other is a notification of an external payment from a processor like paypal. In both situations, I need to ensure that the order is only placed one time.
I would like to use the same mechanism for both scenarios if possible, to help with code reuse. In the first use case, the user can submit the order form multiple times and result in different theads trying to place an order. I can use ajax to stop this, but I need a server side solution for certainty. In the second usecase, the notification messages may be sent through in duplicates so I need to protect against that too.
I want the solution to be scalable across a distributed environment, so a memory lock is out of the question. I was looking at saving a unique token to the database to prevent multiple submissions there, but I really don't want to be messing with the existing database transactions. The real solution it seems is to lock on something external like a file in a shared location across jvms.
All orders have a unique long id, so I could use that to synchronize. What would be the best way of doing this? I could potentially create a file per id, or do something fancier with a region of the file. However I don't have much experience with file locking, so if there is a better option I would love to hear it. Any code samples would help very much.

If you already have a unique long id, nothing better than a simple database table with manually assigned primary keys can't happen to you. Every RDBMS (and also key-value NoSQL databases) will effectively and efficiently discover primary keys clashes. It is basically:
Start transaction
INSERT INTO orders VALUES (your_unique_id)
Commit
Depending on the database, 2. or 3. will throw an exception which you can easily catch.
If you really want to avoid databases (could you elaborate a little bit more why?), you can:
Use file locking (nasty and not scalable), don't go that way.
In-memory locking with clustering (with Terracotta it's like working with normal boolean that is magically clustered)
Queuing requests and having only single consumer.
Using JMS and single-threaded consumer looks promising, however you still have to discover duplicates (but at least you avoid concurrently placed orders) and it might be terribly slow...

Related

Collection processing or database request ? which one is better

This is my first post on stackoverflow, so please be nice to me :-)
So let me explain the context. I'm developing a web service with a standard layer (resources, services, DAO Layer...). I use JPA with hibernate implementation for my object model with the database.
For a class A parent and a class B child, most of the time when i want to find an object B on the collection, I use the streamAPI to filter the collection based on what i want. My question here is more general, is it better to search an object by requesting the database (from my point of view this gonna cause a lot of calls to the database but it's gonna use less CPU), or do the opposite by searching over the model object and process over collection (this gonna cause less database calls, but more CPU process)

If you consider latency, the database will always be slower.
So you gotta ask yourself some questions:
how far away is the database (latency)?
how big is the dataset?
How do I process them ?
do I have any major runtime issues ?
from my point of view this gonna cause a lot of calls to the database but it's gonna use less CPU), or do the opposite by searching over the model object and process over collection (this gonna cause less database calls, but more CPU process)
You're program is probably not very performant programmed. I suggest you check the O-Notation if you have any major runtime leaks.
Your Question is very broad, so it's hard to tell you, for your use-case, which might be the best.

Use database to return data what you need and Java to perform processing on them that would be complicated to do in a JPQL/SQL query.
Databases are designed to perform queries more efficiently than Java (stream or no).
Besides, fetching many data from a database to finally keep only a part of them is not efficient.

The database is usually faster since it is optimized for requesting specific data. Usually one would add indexes to speed up querying on certain fields.
TLDR: Filter your data in the database and process them from java.

This isn't an easy question to answer, since there are many different factors that would influence my decision to go to the db or not. First, I think it's fair to say that, for almost every app I've worked on in the past 20 years, hitting the DB for information is the default strategy. More recently (say past 10 or so years) data access through web service calls has become common as well.
For me, the main question would be something along the lines of, "Are there any situations when I would not hit an external resource (DB, Service, or even file read) for data every time I need it?"
So, I'll outline some of the things I would consider.
Is the data search space very small?
If you are searching a data space of tens of different records, then this information might be a candidate for non-db storage. On the other hand, once you get past a fairly small set records, this approach becomes increasingly untenable. Examples of these "small sets" might be something like salutations (Mr., Ms., Dr., Mrs., Lord). I looks for small sets of data that rarely change, which I, as a lazy developer, wouldn't mind typing into a configuration file. Once I get past something like 50 different records (like US States, for example), I want to pull that info from a DB or service call.
Are the data cacheable?
If you have multiple requests that could legitimately use the exact same data, then leverage caching in your application. Examine the data and expected usage of your service for opportunities to leverage regularities in data and likely requests to cache data whenever possible. Remember to consider cache keys, how long items should be cached, and when cached items should be evicted.
In many web usage scenarios, it's not uncommon that each display could include a fairly large amount of cached information, and a small amount of dynamic data. Menu and other navigation items are good candidates for caching. User-specific data, such as contract-sepcific pricing in an eCommerce app are often poor candidates.
Can you pre-load some data into cache?
Some items can be read once and cached for the entire duration of your application. A list of US States and/or Canadian Provinces is a good example here. These almost never change, so once read from the db, you would rarely need to read them again. Consider application components that can load such data on startup, and then hold this data in an appropriate collection.

Concurrency Control on my Code

I am working on an order capture and generator application. Application is working fine with concurrent users working on different orders. The problem starts when two Users from different systems/locations try to work on the same order. How it is impacting the business is, that for same order, application will generate duplicate data since two users are working on that order simultaneously.
I have tried to synchronize the method where I am generating the order, but that would mean that no other user can work on any new order since synchronize will put a lock for that method. This will certainly block all the users from generating a new order when one order is being progressed, since, it will hit the synchronized code.
I have also tried with criteria initilization for an order, but no success.
Can anyone please suggest a proper approach??
All suggestions/comments are welcome. Thanks in advance.

Instead of synchronizing on the method level, you may use block-level synchronization for the blocks of code which must be operated on by only one thread at a time. This way you can increase the scope for parallel processing of the same order.

On a grand scale, if you are backing up your entities in a database, I would advice you to look at optimistic locking.
Add a version field to your order entity. Once the order is placed (the first time) the version is 1. Every update should then come in order from this, so imagine two subsequent concurrent processes
a -> Read data (version=1)
Update data
Store data (set version=2 if version=1)
b -> Read data (version=1)
Update data
Store data (set version=2 if version=1)
If the processing of these two are concurrent rather than serialized, you will notice how one of the processes indeed will fail to store data. That is the losing user, who will have to retry his edits. (Where he reads version=2 instead).
If you use JPA, optimistic locking is as easy as adding a #Version attribute to your model. If you use raw JDBC, you will need to add the add it to the update condition
update table set version=2, data=xyz where orderid=x and version=1
That is by far the best and in fact preferred solution to your general problem.

Interprocess communication via a database

If I have 2 processes running in different nodes and they share a database, is there a pattern that one node be able to send some notification to the other process via the database?
Is some kind of polling a table normally used or is there a better way?

Instead of polling (which translates into burning not only CPU cycles but in this case also database resources and bandwidth), how about this? if you were using Oracle you could define a trigger ON UPDATE for the table you want to be notified and call a Java Stored Procedure (JSP) from the trigger. The JSP could then use whatever notification mechanism to notify the other component about the change. This is not going to be extremely fast but well ...
The proper way would be to have the component updating the Database sending a parallel notification to the other component and again use any available technology for this RMI, JMS etc

If you want to use a database, you can insert entries into a table on the producing side and poll to find new entries on the consuming side. This may be the simplest option for your project.
There are many possible alternatives such as JMI, RMI, Sockets, NoSql databases, files, but without more information it's not possible to tell if these would be better. (Often simplest is best)

Polling is not an optimal solution. If you have a large number of clients or users, the database is going to be kept busy answering to the pollsters.
Users blocking or waiting for an update is much preferable, if possible. Users generally prefer a responsive system.
The two main criteria to consider before deciding are the maximum number of concurrent users and how quickly users needs to be notified of the event they have expressed an interest in.

A better solution than polling, if your database supports it, is select() or something like inotify(). For instance, PostgreSQL supports select(), so you can do a non-busy-loop while waiting for input into the DB. That being said, Database-as-IPC is considered an anti-pattern.

The simplest solution is just polling on another process.
However, if you want another process receives the data change immediate, you then should consider using some notification mechanism, such as rpc, http request, etc.

Using hashmap or H2 database?

I am developing a web application in which I need to store session, user messages etc. I am thinking of using HashMap or H2 database.
Please let me know which is better approach in terms of performance and memory utilization. The web site has to support 10,000 users.
Thanks.

As usual with these questions, I would worry about performance as/when you know it's an issue.
10000 users is not a lot of data to hold in memory. I would likely start off with a standard Java collection, and look at performance when you predict it's going to cause you grief.
Abstract out the access to this Java collection such that when you substitute it, the refactoring required is localised (and perhaps make it configurable, such that you can easily perform before/after performance tests with your different solutions -H2, Derby, Oracle, etc. etc.)

If your session objects aren't too big (which should be the case), there is no need to persist them in a database.
Using a database for this would add a lot of complexity in a case when you can start with a few lines of code. So don't use a database, simply store them in a ligth memory structure (HashMap for example).
You may need to implement a way to clean your HashMap if you don't want to keep sessions in memory when the user left from a long time. Many solutions are available (the easiest is simply to have a background thread removing from time to time the too old sessions). Note that it's usually easier to clean a hashmap than a database.

Both H2 and Hash Map are gonna keep the data in memory (So from space point of view they are almost the same).
If look ups are simple like KEY VALUE then looking up in the Hash Map will be quicker.
If you have to do comparisons like KEY < 100 etc use H2.
In fact 10K user info is not that high a number.

If you don't need to save user messages - use the collections. But if the message is should be saved, be sure to use a database. Because after restart you lost all data.

The problem with using a HashMap for storing objects is that you would run into issues when your site becomes too big for one server and would need to be clustered in order to scale with demand. Then you would face problems with how to synchronise the HashMap instances on different servers.
A possible alternative would be to use a key-value store like Redis as you won't need the structure of a database or even use the distributed cache abilities of something like EHCache

Way to know table is modified

There are two different processes developed in Java running independently,
If any of the process modifyies the table, can i get any intimation? As the table is modified. My objective is i want a object always in sync with a table in database, if any modification happens on table i want to modify the object.
If table is modified can i get any intimation regarding this ? Do Database provide any facility like this?

We use SQL Server and have certain triggers that fire when a table is modified and call an external binary. The binary we call sends a Tib rendezvous message to notify other applications that the table has been updated.
However, I'm not a huge fan of this solution - Much better to control writing to your table through one "custodian" process and have other applications delegate to that. To enforce this you could change permissions on your table so that only your custodian process can write to the database.
The other advantage of this approach is being able to provide a caching layer within your custodian process to cater for common access patterns. Granted that a DBMS performs caching anyway, but by offering it at the application layer you will have more control / visibility over it.

No, database doesn't provide these services. You have to query it periodically to check for modification. Or use some JMS solution to send notifications from one app to another.

You could add a timestamp column (last_modified) to the tables and check it periodically for updates or sequence numbers (which are incremented on updates similiar in concept to optimistic locking).
You could use jboss cache which provides update mechanisms.

One way, you can do this is: Just enclose your database statement in a method which should return 'true' when successfully accomplished. Maintain the scope of the flag in your code so that whenever you want to check whether the table has been modified or not. Why not you try like this???

If you're willing to take the hack approach, and your database stores tables as files (eg, mySQL), you could always have something that can check the modification time of the files on disk, and look to see if it's changed.
Of course, databases like Oracle where tables are assigned to tablespaces, and tablespaces are what have storage on disk it won't work.
(yes, I know this is a bad approach, that's why I said it's a hack -- but we don't know all of the requirements, and if he needs something quick, without re-writing the whole application, this would technically work for some databases)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.