I'm working on a project that runs in a clustered environment, where there are many nodes and a single database. The project uses Spring-data-JPA (1.9.0) and Hibernate (5.0.1). I'm having trouble resolving how to prevent duplicate row issues.
For sake of example, here's a simple table
#Entity
#Table(name = "scheduled_updates")
public class ScheduledUpdateData {
public enum UpdateType {
TYPE_A,
TYPE_B
}
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
#Column(name = "id")
private UUID id;
#Column(name = "type", nullable = false)
#Enumerated(EnumType.STRING)
private UpdateType type;
#Column(name = "source", nullable = false)
private UUID source;
}
The important part is that there is a UNIQUE(type, source) constraint.
And of course, matching example repository:
#Repository
public class ScheduledUpdateRepository implements JpaRepository<ScheduledUpdateData, UUID> {
ScheduledUpdateData findOneByTypeAndSource(final UpdateType type, final UUID source);
//...
}
The idea for this example is that parts of the system can insert rows to be schedule for something that runs periodically, any number of times between said runs. When whatever that something is actually runs, it doesn't have to worry about operating on the same thing twice.
How can I write a service method that would conditionally insert into this table? A few things I've tried that don't work are:
Find > Act - The service method would use the repository to see if a entry already exists, and then either update the found entry or save a new one as needed. This does not work.
Try insert > Update if fail - The service method would try to insert, catch the exception due to the unique constraint, and then do an update instead. This does not work since the transaction will already be in a rolled-back state and no further operations can be done in it.
Native query with "INSERT INTO ... WHERE NOT EXISTS ..."* - The repository has a new native query:
#Repository
public class ScheduledUpdateRepository implements JpaRepository<ScheduledUpdateData, UUID> {
// ...
#Modifying
#Query(nativeQuery = true, value = "INSERT INTO scheduled_updates (type, source)" +
" SELECT :type, :src" +
" WHERE NOT EXISTS (SELECT * FROM scheduled_updates WHERE type = :type AND source = :src)")
void insertUniquely(#Param("type") final String type, #Param("src") final UUID source);
}
This unfortunately also does not work, as Hibernate appears to perform the SELECT used by the WHERE clause on its own first - which means in the end multiple inserts are tried, causing a unique constraint violation.
I definitely don't know a lot of the finer points of JTA, JPA, or Hibernate. Any suggestions on how insert into tables with unique constraints (beyond just the primary key) across multiple JVMs?
Edit 2016-02-02
With Postgres (2.3) as a database, tried using Isolation level SERIALIZABLE - sadly by itself this still caused constraint violation exceptions.
You are trying to ensure that only 1 node can perform this operation at a time.
The best (or at least most DB-agnostic) way to do this is with a 'lock' table.
This table will have a single row, and will act as a semaphore to ensure serial access.
Make sure that this method is wrapped in a transaction
// this line will block if any other thread already has a lock
// until that thread's transaction commits
Lock lock = entityManager.find(Lock.class, Lock.ID, LockModeType.PESSIMISTIC_WRITE);
// just some change to the row, it doesn't matter what
lock.setDateUpdated(new Timestamp(System.currentTimeMillis()));
entityManager.merge(lock);
entityManager.flush();
// find your entity by unique constraint
// if it exists, update it
// if it doesn't, insert it
Hibernate and its query language offer support for an insert statement. So you can actually write that query with HQL. See here for more information. http://docs.jboss.org/hibernate/orm/5.0/userguide/html_single/Hibernate_User_Guide.html#_hql_syntax_for_insert
It sounds like an upsert case, that can be handled as suggested here.
Find > Act - The service method would use the repository to see if a entry already exists, and then either update the found entry or save a new one as needed. This does not work.
Why does this not work?
Have you considered "optimistic locking"?
These two posts may help:
https://www.baeldung.com/jpa-optimistic-locking
https://www.baeldung.com/java-jpa-transaction-locks
Related
just a quick question please in case something stands out immediately.
We're migrating an EAR/EJB application from Weblogic 11g to latest WS Liberty (22.x) also upgrading several of the frameworks including JPA to 2.2. This also changes JPA implementation to eclipseLink. We came from com.oracle.weblogic.11g.modules:javax.persistence:1.0.0.0_1-0-2. Underlying DB is MS-SQL Server.
And I'm running into some weirdness with regards to related objects not being resolved/queried intermittently.
Just as an example we have entities where the columns hold reference data codes or similar lookups. Say I have an entity called PayemntRecordT and it has a status code which refers to a ref table that also holds a textual description. Something like this:
SQL:
CREATE TABLE [PAYMENT_RECORD_T](
[PAYMENT_ID] [int] NOT NULL,
...
[PAYMENT_STATUS_CD] [CHAR](8) NOT NULL,
...
)
ALTER TABLE [PAYMENT_RECORD_T] WITH CHECK ADD CONSTRAINT [FK_PAYM4] FOREIGN KEY([PAYMENT_STATUS_CD])
REFERENCES [RECORD_STATUS_T] ([REC_STAT_CD])
GO
CREATE TABLE [RECORD_STATUS_T] (
[RECORD_STAT_CD] [CHAR](8) NOT NULL,
[RECORD_STAT_DSC] [VARCHAR](60) NOT NULL
CONSTRAINT [PK_RECORD_STATUS_T] PRIMARY KEY CLUSTERED (
[RECORD_STAT_CD] ASC
)WITH (PAD_INDEX = OFF...) ON [PRIMARY]
) ON [PRIMARY]
GO
Java:
#Table(name = "PAYMENT_RECORD_T")
#Entity
public class PaymentRecordT {
...
#ManyToOne
#PrimaryKeyJoinColumn(name = "payment_status_cd", referencedColumnName = "REC_STAT_CD")
private RecordStatusT recordStatusT;
}
#Table(name = "RECORD_STATUS_T")
#Entity
public class RecordStatusT {
#Column(name = "REC_STAT_CD")
#Id
private String recStatCd;
#Column(name = "REC_STAT_DSC")
#Basic
private String recStatDsc;
}
Others relations in our app might not be primary key relations but loose relations in which case its just #JoinColumn but the pattern would be the same.
My 'weirdness' is the following:
So in this example I have a list of 10 'Payment Records' each of them have such a record status, which is actually NON NULL in the database. When I do the initial retrieval via EJB method it grabs the 10 records and I also get the correctly resolved/queried record statuses.
Then I add a new record via EJB method (TRANSACTION_REQUIERD). After the add method returns I can query the new payment record in the database via SSMS. Its committed and it looks 100% correct and it contains a correct record status code.
Now I run the retrieval method again and I get the 11 records as I would expect. Only the 11th (newly inserted) record will have recordStatusT as null.
When I restart the app all goes well again for the retrieval of all 11 records. But for subsequent additions the outcome seems again 'undefined'.
In JDBC logging I an see that during the original retrieval of the records the record_status_t table was queried but the 2nd time around it was not and I have no explanation why.
I played with FETCHTYPE.EAGER and read up on caching etc but I'm not going anywhere.
Any ideas?
Thanks for your time
Carsten
I solved the problem by ensuring that after inserts/updates the objects arent being queried from the cache.
In the end - rather than doing it with query hint - I disabled caching for the entity involved using the #Chacheable annotation, like so
#Table(name = "PAYMENT_RECORD_T")
#Entity
#Cacheable(false)
public class PaymentRecordT {
...
#ManyToOne
#PrimaryKeyJoinColumn(name = "payment_status_cd", referencedColumnName = "REC_STAT_CD")
private RecordStatusT recordStatusT;
}
I still feel like there should be a better solution. Eclipselink tracks the inserts/updates so it should be able track what needs rereading from the DB and what not. I still feel like I don't fully understand the entire picture, but this works for me and its reasonably clean.
I can leave the considerable amount of read-only data/objects chacheable and the few that are changeable as non-cacheable.
Thanks for reading
Carsten
Given I have entity Car with column model which doesn't accept NULLs
#Table(name = "CAR")
#Entity
public class Car extends AbstractEntity<Long> {
#Column(name = "MODEL", nullable = false)
private final String model;
}
When I prepare database schema, insert data (including NULLs in MODEL column) manually and start up application, it doesn't fail to start.
Why is that?
Do conditions specified in #Column annotation only apply for insert/update operations, not for read operations?
Yes, you can read null values with nullable = false. But when you try to save or update an entity with model = null, the JPA lever error will be thrown.
Check out the specification for nullable.
This JPA constraints just prohibit non-valid data from being written to the database, in order not to call it for no reason (by the way, you should have the same constraints in your database as you have in JPA).
These constraints have nothing to do with data that is already there. So that's why your application doesn't fail to start.
Have a look at this answer for better explanation.
Let's say I have a List of entities:
List<SomeEntity> myEntities = new ArrayList<>();
SomeEntity.java:
#Entity
#Table(name = "entity_table")
public class SomeEntity{
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private long id;
private int score;
public SomeEntity() {}
public SomeEntity(long id, int score) {
this.id = id;
this.score = score;
}
MyEntityRepository.java:
#Repository
public interface MyEntityRepository extends JpaRepository<SomeEntity, Long> {
List<SomeEntity> findAllByScoreGreaterThan(int Score);
}
So when I run:
myEntityRepository.findAllByScoreGreaterThan(10);
Then Hibernate will load all of the records in the table into memory for me.
There are millions of records, so I don't want that. Then, in order to intersect, I need to compare each record in the result set to my List.
In native MySQL, what I would have done in this situation is:
create a temporary table and insert into it the entities' ids from the List.
join this temporary table with the "entity_table", use the score filter and then only pull the entities that are relevant to me (the ones that were in the list in the first place).
This way I gain a big performance increase, avoid any OutOfMemoryErrors and have the machine of the database do most of the work.
Is there a way to achieve such an outcome with Spring Data JPA's query methods (with hibernate as the JPA provider)? I couldn't find in the documentation or in SO any such use case.
I understand you have a set of entity_table identifiers and you want to find each entity_table whose identifier is in that subset and whose score is greater than a given score.
So the obvious question is: how did you arrive to the initial subset of entity_tables and couldn't you just add the criteria of that query to your query that also checks for "score is greater than x"?
But if we ignore that, I think there's two possible solutions. If the list of some_entity identifiers is small (what exactly is "small" depends on your database), you could just use an IN clause and define your method as:
List<SomeEntity> findByScoreGreaterThanAndIdIn(int score, Set<Long) ids)
If the number of identifiers is too large to fit in an IN clause (or you're worried about the performance of using an IN clause) and you need to use a temporary table, the recipe would be:
Create an entity that maps to your temporary table. Create a Spring Data JPA repository for it:
class TempEntity {
#Id
private Long entityId;
}
interface TempEntityRepository extends JpaRepository<TempEntity,Long> { }
Use its save method to save all the entity identifiers into the temporary table. As long as you enable insert batching this should perform all right -- how to enable differs per database and JPA provider, but for Hibernate at the very least set the hibernate.jdbc.batch_size Hibernate property to a sufficiently large value. Also flush() and clear() your entityManager regularly or all your temp table entities will accumulate in the persistence context and you'll still run out of memory. Something along the lines of:
int count = 0;
for (SomeEntity someEntity : myEntities) {
tempEntityRepository.save(new TempEntity(someEntity.getId());
if (++count == 1000) {
entityManager.flush();
entityManager.clear();
}
}
Add a find method to your SomeEntityRepository that runs a native query that does the select on entity_table and joins to the temp table:
#Query("SELECT id, score FROM entity_table t INNER JOIN temp_table tt ON t.id = tt.id WHERE t.score > ?1", nativeQuery = true)
List<SomeEntity> findByScoreGreaterThan(int score);
Make sure you run both methods in the same transaction, so create a method in a #Service class that you annotate with #Transactional(Propagation.REQUIRES_NEW) that calls both repository methods in succession. Otherwise your temp table's contents will be gone by the time the SELECT query runs and you'll get zero results.
You might be able to avoid native queries by having your temp table entity have a #ManyToOne to SomeEntity since then you can join in JPQL; I'm just not sure if you'll be able to avoid actually loading the SomeEntitys to insert them in that case (or if creating a new SomeEntity with just an ID would work). But since you say you already have a list of SomeEntity that's perhaps not a problem.
I need something similar myself, so will amend my answer as I get a working version of this.
You can:
1) Make a paginated native query via JPA (remember to add an order clause to it) and process a fixed amount of records
2) Use a StatelessSession (see the documentation)
I am facing a strange problem in Hibernate. Operating in a multithreaded env, when trying to insert into one of the tables, getting duplicate entries in table. Only the primary key is different, rest all other fields are getting exactly duplicate.
Using Hibernate + Oracle and using Spring - HibernateTemplate object.
Here's the relevant portion of my my BO class and below given code to save the object. Not using any transient fields.
Have checked other posts related to this, but none of them addresses the root cause of the problem. I don't want to introduce any constraints/unique indexes on db table.
#Entity
#Table(name="ADIRECIPIENTINTERACTION")
#Lazy(value = true)
#Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
#GenericGenerator(name="recipientInteractionSeq", strategy = "native", parameters =
{ #Parameter(name="sequence", value="SEQiRecipientInteractId")})
public class RecipientInteractionBO extends BusinessObject{
private static final long serialVersionUID = 1L;
#Id
#GeneratedValue(generator = "recipientInteractionSeq", strategy = GenerationType.AUTO)
#Column(name="IRECIPIENTINTERACTIONID")
private long lId; ....
And here's the Code used to save the BO.
-----------------------------------------------------
RecipientInteractionBO recInt = (RecipientInteractionBO) objectPS
.getUniqueResult(detachedCriteria);
if (recInt == null) {
recInt = new RecipientInteractionBO();
....
hibernateTemplateObj.insertObject(recInt);
} else {
...
hibernateTemplateObj.saveOrUpdate(recInt);
}
Please let me know if any other details are required.
Check your data persistence code for possible race conditions for multiple threads. You are checking for the existence of the RecipientInteractionBO which is possibly querying from database. If two threads are running simultaneously, both check for it's existence, since for both it's not there both persist the new entity. You might need to use synchronization to make the process of checking and inserting/updating to be done only for one thread at a single time.
I have an Keyword and a KeywordType as entities. There are lots of keywords of few types. When trying to persist the second keyword of a type, the unique constraint is violated and the transaction is rolled back. Searching SO i found several possibilies (some of them from different contexts, so I'm not sure of their validity here) - this post and this post advise catching the Exception which would be of no use to me as I end up where I started and still need to somehow persist the keyword.
Same applies to locking proposed for a different situaltion here Custom insert statements as proposed in this and this posts wouldn't work proper I guess, since I'm using Oracle and not MySQL and woulnd like to tie the implementation to Hibernate. A different workaround would be trying to retrieve the type first in the code generating the keywords, and set it on the keyword if found or create a new one if not.
So, what would be the best - most robust, portable (for different databases and persistence providers) and sane approach here?
Thank you.
The involved entities:
public class Keyword {
#Id
#GeneratedValue
private long id;
#Column(name = "VALUE")
private String value;
#ManyToOne
#JoinColumn(name = "TYPE_ID")
private KeywordType type;
...
}
and
#Entity
#Table(uniqueConstraints = {#UniqueConstraint(columnNames = { "TYPE" }) })
public class KeywordType {
#Id
#GeneratedValue
private long id;
#Column(name = "TYPE")
private String type;
...
}
Your last solution is the right one, IMO. Search for the keyword type, and if not found, create it.
Catching the exception is not a good option because
it's hard to know which exception to catch and make your code portable across JPA and DB engines
The JPA engine will be in an undetermined state after such an exception, and you should always rollback in this case.
Note however that with this technique, you might still have two transactions searching for the same type in parallel, and then try to insert it in parallel. One of the transaction will rollback, but it will be much less frequent.
If you're using EJB 3.1 and you don't mind serializing this operation, a singleton bean using container managed concurrency can solve the problem.
#Singleton
#ConcurrencyManagement(ConcurrencyManagementType.CONTAINER)
public class KeywordTypeManager
{
#Lock(LockType.WRITE)
public void upsert(KeywordType keywordType)
{
// Only one thread can execute this at a time.
// Your implementation here:
// ...
}
#Inject
private KeywordTypeDao keywordTypeDao;
}
I would go for this option:
A different workaround would be trying
to retrieve the type first in the code
generating the keywords, and set it on
the keyword if found or create a new
one if not.