Hibernate Search is not reindexing based on changes to "calculated" values

Hibernate Search is not reindexing based on changes to "calculated" values - java

We use Hibernate Search (4.1) throughout our application to manage searching and indexing of resources, but sometimes need to manage "calculated" values in the index, eg, calls to #IndexEmbedded or #Field attached to getters without actual properties:
public class Resource {
#ManyToMany(...)
private List<Keyword> keywords = new ArrayList<Keyword>();
public List<Keyword> getKeywords() {
return keywords;
}
public List<Keyword> setKeywords(List<Keyword> keyword>) {
this.keywords=keywords;
};
#IndexedEmbedded
public List<Keyword> getIndexedKeywords() {
List<Keyword> toReturn = new ArrayList<Keyword>();
for (Keyword keyword: getKeywords()) {
if (keyword.isIndexed) {
toReturn.add(keyword);
}
}
return toReturn;
}
}
...
public void saveResource(Resource resource, Collection<Keyword> keywords) {
resource.getKeywords().retainAll(keywords);
resource.getKeywords().addAll(keywords);
session.saveOrUpdate(resource);
// will trigger a persist in the database correctly, but will not trigger a reindex
};
but calling saveResource does not cause HibernateSearch to reindex. Shouldn't it?

HibernateSearch 4.1 (or anything after 3.4) uses 'smart' checks to see whether reindexing needs to happen, which matches the #IndexedEmbedded calls to Hibernate's persisted fields. If these do not match, than Hibernate Search does not reindex the object, because the checks do not see anything that matches, and thus will not call reindex. This is likely because HibernateSearch would have to either perform tremendous amounts of reflection and introspection on indexed fields to determine whether underlying values have changed, or calculate and store the result of each call such that it can determine whether the value has changed or not, thus both slowing down the reindexing process and/or bloating the lucene index with a stored hash or entire result value. Probably best to simply either taint a field that causes the reindexing, or manually reindex the object at the end.

Related

How to implement Seed and Next when extending UserVersionType

I'm trying to implement a String based UserVersionType. I have found examples enough to understand how to use the UserType methods to an extent. I can't find anything that shows me exactly what to do with next() and seed().
So I have something like this
public class StringVersionType implements UserType, UserVersionType {
...
public int compare(Object o1, Object o2) {
String a = (String) o1;
String b = (String) o2;
return a.compareTo(b);
}
public Object next(Object arg0, SharedSessionContractImplementor arg1)
{
return "DUMMY SEED"; // + LocalTime.now().toString();
}
public Object seed(SharedSessionContractImplementor session){
return "DUMMY SEED"; // LocalTime.now().toString();
}
}
I've tried adding simple code to return a string that is always the same and code that might change the version number. I always get an error on update. Looking at the hibernate console output when I add almost anything to these UserVersionType methods hibernate stops doing a select and then an update but always goes straight to a new insert query and so fails on a primary key still exists.
Obviously I'm misunderstanding what seed and next should do but I can't find any useful documentation ?
Can anyone tell me more about how to use them ?

Seed:
Generate an initial version.
Parameters:
session - The session from which this request originates. May be null; currently this only happens during startup when trying to determine the "unsaved value" of entities.
Returns:
an instance of the type
#Override
public Object
seed(SharedSessionContractImplementor session)
{
return ( (UserVersionType) userType ).seed(
session );
}
For properties mapped as either version or timestamp, the insert statement gives you two options. You can either specify the property in the properties_list, in which case its value is taken from the corresponding select expressions, or omit it from the properties_list, in which case the seed value defined by the org.hibernate.type.VersionType is used
next:
Increment the version.
Parameters:
session - The session from which this request originates.
current - the current version
Returns:
an instance of the type
public Object next(Object current,
SessionImplementor session) {
return ( (UserVersionType) userType ).next( current, session );
}
From the docs:
"UPDATE statements, by default, do not effect the version or the timestamp attribute values for the affected entities. However, you can force Hibernate to set the version or timestamp attribute values through the use of a versioned update. This is achieved by adding the VERSIONED keyword after the UPDATE keyword. Note, however, that this is a Hibernate specific feature and will not work in a portable manner. Custom version types, org.hibernate.usertype.UserVersionType, are not allowed in conjunction with a update versioned statement."
Other Docs:
Dedicated version number
The version number mechanism for optimistic locking is provided through a #Version annotation.
The #Version annotation
#Entity
public class Flight implements Serializable {
...
#Version
#Column(name="OPTLOCK")
public Integer getVersion() { ... }
}
Here, the version property is mapped to the OPTLOCK column, and the entity manager uses it to detect conflicting updates, and prevent the loss of updates that would be overwritten by a last-commit-wins strategy.
The version column can be any kind of type, as long as you define and implement the appropriate UserVersionType.
Your application is forbidden from altering the version number set by Hibernate. To artificially increase the version number, see the documentation for properties LockModeType.OPTIMISTIC_FORCE_INCREMENT or LockModeType.PESSIMISTIC_FORCE_INCREMENTcheck in the Hibernate Entity Manager reference documentation.
Database-generated version numbers
If the version number is generated by the database, such as a trigger, use the annotation #org.hibernate.annotations.Generated(GenerationTime.ALWAYS).
Declaring a version property in hbm.xml
<version
column="version_column"
name="propertyName"
type="typename"
access="field|property|ClassName"
unsaved-value="null|negative|undefined"
generated="never|always"
insert="true|false"
node="element-name|#attribute-
name|element/#attribute|."
/>
This is all I can find from the documentation that can help you understand why and how to use those methods. Give me feed back about the irrelevent parts, due to my misunderstanding to the question, to remove it.

Spring cache #CacheEvict matches key in a list?

I'm using Spring cache and trying to evict cache by a list of key(id).
#CacheEvict(value="cacheName",key=?, condition=? )
public void deleteByIds(List<Integer> ids){...}
How can I manage to do that?

#CacheEvict
Annotation indicating that a method (or all methods on a class)
triggers a cache evict operation.
The cachaName or value
Names of the caches in which method invocation results are stored.
Condition
Expression used for making the method caching conditional.
Key
root.method, root.target, and root.caches for references to the method, target object, and affected cache(s) respectively.
Solution for your problem:
Assuming that every object from the List it is cached into, for example cacheName = "entities" and for the key you can use the entity ID (which is the String representation of the Integer value) you should write a second method to evict the cache.
public void deleteByIds(List<Intiger> intigers){
for(Intigier i : intigers){
deleteEntity(i.toString());
}
}
#CacheEvict(cacheName = "entities", key="entityId", condition="entityId!=null")
private void deleteEntity(String entityId){
//processing : for ex delete from the database and also remove from cache
}

Spring Boot JPARepository performance on save()

I have an issue where my spring boot application performance is very slow when inserting data.
I am extracting a large subset of data from one database and inserting the data into another database.
The following is my entity.
#Entity
#Table(name = "element")
public class VXMLElementHistorical {
#Id
#Column(name = "elementid")
private long elementid;
#Column(name = "elementname")
private String elementname;
Getter/Setter methods...
I have configured a JPA repository
public interface ElementRepository extends JpaRepository<Element, Long> {
}
and call the save() method with my object
#Transactional
public void processData(List<sElement> hostElements)
throws DataAccessException {
List<Element> elements = new ArrayList<Element>();
for (int i = 0; i < hostElements.size(); i++) {
Element element = new Element();
element.setElementid(hostElements.get(i).getElementid());
element.setElementname(hostElements.get(i).getElementname());
elements.add(element);
}
try{
elementRepository.save(elements);{
//catch etc...
}
What is happening is that for each item, it is taking between 6 and 12 seconds to perform an insert. I have turned on hibernate trace logging and statistics and what is happening when I call the save function is that hibernate performs two queries, a select and an insert. The select query is taking 99% of the overall time.
I have ran the select query direct on the database and the result returns in nanoseconds. Which leads me to believe it is not an indexing issue however I am no DBA.
I have created a load test in my dev environment, and with similar load sizes, the over all process time is no where near as long as in my prod environment.
Any suggestions?

Instead of creating a list of elements and saving those, save the individual elements. Every now an then do a flush and clear to prevent dirty checking to become a bottleneck.
#PersistenceContext
private EntityManager entityManager;
#Transactional
public void processData(List<sElement> hostElements)
throws DataAccessException {
for (int i = 0; i < hostElements.size(); i++) {
Element element = new Element();
element.setElementid(hostElements.get(i).getElementid());
element.setElementname(hostElements.get(i).getElementname());
elementRepository.save(element)
if ( (i % 50) == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush(); // flush the last records.
You want to flush + clear every x elements (here it is 50 but you might want to find your own best number.
Now as you are using Spring Boot you also might want to add some additional properties. Like configuring the batch-size.
spring.jpa.properties.hibernate.jdbc.batch_size=50
This will, if your JDBC driver supports it, convert 50 single insert statements into 1 large batch insert. I.e. 50 inserts to 1 insert.
See also https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/

As #M. Deinum said in comment you can improve by calling flush() and clear() after a certain number of inserts like below.
int i = 0;
for(Element element: elements) {
dao.save(element);
if(++i % 20 == 0) {
dao.flushAndClear();
}
}

Since loading the entities seems to be the bottleneck and you really just want to do inserts, i.e. you know the entities don't exist in the database you probably shouldn't use the standard save method of Spring Data JPA.
The reason is that it performs a merge which triggers Hibernate to load an entity that might already exist in the database.
Instead, add a custom method to your repository which does a persist on the entity manager. Since you are setting the Id in advance, make sure you have a version property so that Hibernate can determine that this indeed is a new entity.
This should make the select go away.
Other advice given in other answers is worth considering as a second step:
enable batching.
experiment with intermediate flushing and clearing of the session.
saving one instance at a time without gathering them in a collection, since the call to merge or persist doesn't actually trigger writing to the database, but only the flushing does (this is a simplification, but it shall do for this context)

How do I handle an empty java set of strings in AWS DynamoDB

I'm trying to store an array of strings within an AWS DynamoDB table. For the most part this array will be populated with at least one string. However there is the case where the array could be empty.
I've created a DynamoDB model in a Java Lambda function that has a set of strings as one of it's properties. If I try to save a DynamoDB model when the set of strings is empty it gives me an error saying I can't store an empty set in DynamoDB.
So my question is, how would handle removing that set property when it's empty from my model before I save / update it in the DynamoDB?
Here's an example of the model.
#DynamoDBTable(tableName = "group")
public class Group {
private String _id;
private Set<String> users;
#Null
#DynamoDBHashKey
#DynamoDBAutoGeneratedKey
public String getId() {
return _id;
}
public void setId(final String id) {
_id = id;
}
#DynamoDBAttribute
public Set<String> getUsers(){
return users;
}
public void setUsers(final Set<String> users) {
this.users = users;
}
public void addUser(String userId) {
if(this.users == null){
this.setUsers(new HashSet<String>(Arrays.asList(userId)));
}else{
this.getUsers().add(userId);
}
}
}
First time when I will create a group. It could have no user, or it could have one or more user.

This is somewhat of an old question but the way I would solve this problem is with a custom DynamoDBMarshaller.
Making use of the #DynamoDBMarshalling annotation, you can decorate the POJO accessor methods in order to dictate to the DynamoDB mapper which marshaller class to use to serialize and deserialize the set of strings. This way you get control over the special use cases.
Here is also a link to an AWS blog post with an example
The one caveat with the approach above is the customer marshaller solution serializes and deserializes to/from string so the representation in the database wouldn't be a set per se. However, I wouldn't consider that to be too bad.
Another approach might be to use the Document API, instead of the object mappers, which gives you full control over the items. Though I would still go for custom mapper with string backing.

When it came down to the actual problem it was because the code I inherited had the SaveBehavior on the DynamoDBMapperConfig set to UPDATE_SKIP_NULL_ATTRIBUTES which skips null values and therefore will never remove them from DynamoDB. See this post of explanations of all the different save behaviors. https://java.awsblog.com/post/Tx1MH3BFPW8FX6W/Using-the-SaveBehavior-Configuration-for-the-DynamoDBMapper
To elaborate on this a little bit further.
I'm setting my DynamoDBMapperConfig.SaveBehavior as such within my JAVA project.
_dynamoDBMapperConfig = new DynamoDBMapperConfig.Builder()
.withSaveBehavior(SaveBehavior.UPDATE)
.withTableNameResolver(TABLE_NAME_RESOLVER)
.withConversionSchema(ConversionSchemas.V2)
.build();
Then whenever you update a model that has a mapped #DynamoDBAttribute you also need to set the particular value to (null).
user.setFirstName(null)
This was the easiest way I found to be able to remove attributes from a DynamoDB Entry.

Hibernate: Check foreign key constraint violation before delete?

The system I'm currently working on has a policy in which objects that have no relationships can be freely deleted, while those that do must be logically deleted. This is meant to prevent historical information from being deleted.
So basically, what I was trying to do was determine whether an object's key is currently present in another table. If it isn't I would simply call delete(), otherwise I would set a property that indicates a logical delete, and call update().
I'm using Spring transaction management, so I'm trying to mess with the session itself as least as possible. My initial approach seemed to work at first, but you'll see that it has a major flaw:
#Transactional
public void deleteObject(SomeEntity object)
{
//try to delete
this.someEntityDAO.delete(object);
try //force foreign key constraint check
{
this.someEntityDAO.flush();
}
catch (ConstraintViolationException e)
{
//reload object
object= this.someEntityDAO.loadById(object.getId());
//save as inactive instead of deleting
object.setActive(false);
this.someEntityDAO.update(object);
}
}
Since Hibernate exceptions are fatal, this is completely unreliable (even though it works). I was wondering if there is a way to do a sort of "peek" operation in which I could test if the delete will fail due to a constraint, without actually performing the operation (and thus invalidating the session). The only thing I can think of is to manually check each related table to see if the id is present, but this would be very tedious and error-prone in tables with many relationships. I want to leverage the constraints that are already in place in the database, if possible.

Speaking specifically to:
So basically, what I was trying to do was determine whether an object's key is currently present in another table. If it isn't I would simply call delete(), otherwise I would set a property that indicates a logical delete, and call update().
and:
I was wondering if there is a way to do a sort of "peek" operation in which I could test if the delete will fail due to a constraint, without actually performing the operation (and thus invalidating the session).
I have only worked occasionally with Hibernate, but the general answer is: This is what SQL is for. It's all in your where clause!
For clarity: You do your delete with a sufficient where clause that it does the check in the transaction itself; The delete deletes whatever it is that meets the constraints given.
Update:
When you write:
"So basically, what I was trying to do was determine whether an object's key is currently present in another table. If it isn't I would simply call delete(), otherwise I would set a property that indicates a logical delete, and call update()."
the problem is that YOU are trying to do this when you should let (direct) the database engine to do it for you in your SQL. Investigate use of the "not exists" clause...

I know i´m late but this is how you can do it:
Create a procedure in your DB:
PROCEDURE CHECK_DELETABLE(pk in SomeTable.pk%type,
CHECK_OUT OUT NUMBER)
is
BEGIN
CHECK_OUT := 1;
begin
EXECUTE IMMEDIATE 'delete from SomeTable where pk = ' || pk;
EXCEPTION WHEN OTHERS THEN
begin
CHECK_OUT := 0;
end;
end;
rollback;
end;
Then you can write a JPA Function, which tells you if the record is deletable:
public boolean checkDeletable(Long pk) {
if (pk == null || pk < 1) {
return false;
}
return new CheckDeletableProcedure(getDataSource()).execute(pk);
}
And the CheckDeletableProcedure:
public class CheckDeletableProcedure extends StoredProcedure {
private static final String STORED_PROC_NAME = "SOME_PACKAGE.CHECK_DELETABLE";
public CheckDeletableProcedure(DataSource ds) {
super(ds, STORED_PROC_NAME);
declareParameter(new SqlParameter("pk", Types.BIGINT));
declareParameter(new SqlOutParameter("CHECK_OUT", Types.INTEGER));
}
public boolean execute(Long pk) {
HashMap<String, Object> inParams = new HashMap<>();
inParams.put("pk", pk);
Map<String, Object> result = execute(inParams);
return ((Integer) result.get("CHECK_OUT")) == 1;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hibernate Search is not reindexing based on changes to "calculated" values - java

Related

How to implement Seed and Next when extending UserVersionType

Spring cache #CacheEvict matches key in a list?

Spring Boot JPARepository performance on save()

How do I handle an empty java set of strings in AWS DynamoDB

Hibernate: Check foreign key constraint violation before delete?

Categories

Resources