Ensure entity is added to hibernate search index - java

We currently have a process that can be summarized as follows:
Insert list of entity A from batch load process.
Update the status of those entities after a specified date has passed.
We use hibernate search to index some of the properties of entity A. However, we also have a requirement that we don't index the entity until the status has been updated.
Currently, we check at indexing time with an EntityIndexingInterceptor whether or not to exclude the entity based on its status.
The problem is we don't index the status field itself - so when it changes, hibernate's transparent mechanism of adding it to the index isn't applied, and it isn't ever added.
Is there a better way of being able to force hibernate to add it to the index without adding the field itself to the index? We currently rebuild the index nightly which is usually OK but still leaves a window where an entity may not be searchable until the next rebuild.

Which version of Hibernate Search are you using? Dirty checking optimization should be automatically disabled when using interceptors:
Dirty checking optimization is disabled when interceptors are used. Dirty checking optimization does check what has changed in an entity and only triggers an index update if indexed properties are changed. The reason is simple, your interceptor might depend on a non indexed property which would be ignored by this optimization.
(From the documentation)
If it isn't, please report the issue with a reproducer, or at least mentioning the exact version of Hibernate Search/Hibernate ORM you are using: JIRA, test case templates
Until we fix this (if it's actually a bug), you can always disable the dirty checking optimization explicitly:
hibernate.search.enable_dirty_check false

Related

How to keep a java list in memory synced with a table in database?

I want to perform a search of a inputs in a list. That list resides in a database. I see two options for doing that-
Hit the db for each search and return the result.
keep a copy in memory synced with table and search in memory and return the result.
I like the second option as it will be faster. However I am confused on how to keep the list in sync with table.
example : I have a list L = [12,11,14,42,56]
and I receive an input : 14
I need to return the result if the input does exists in the list or not. The list can be updated by other applications. I need to keep the list in sync with table.
What would be the most optimized approach here and how to keep the list in sync with database?
Is there any way my application can be informed of the changes in the table so that I can reload the list on demand.
Instead of recreating your own implementation of something that already exists, I would leverage Hibernate's Second Level Cache (2LC) with an implementation such as EhCache.
By using a 2LC, you can specify the time-to-live expiration time for your entities and once they expire, any query would reload them from the database. If the entity cache has not yet expired, Hibernate will hydrate them from the 2LC application cache rather than the database.
If you are using Spring, you might also want to take a look at #Cachable. This operates at the component / bean tier allowing Spring to cache a result-set into a named region. See their documentation for more details.
To satisfied your requirement, you should control the read and write in one place, otherwise, there will always be some unsync case for the data.

How to update Hibernate applications in production the right way?

I read the discussion about using hbm2ddl.auto=update in order to auto-update changes to the database schema.
The thread is from 2008 and I do not know how secure it is to use the auto-update mode today.
We are running a small JavaEE on a Glassfish with Hibernate 4.3.11 and PostgreSQL. We plan to use continious integration with Jenkins.
Is it useful to work with hbm2ddl.auto=update enabled? Or is it better to use an easy alternative to update/check the updates maybe manually?
I know it is hard to give a blanket statement.
You should not use hbm2ddl.auto=update to update production databases.
Few reasons:
Hibernate will only INSERT missing columns and not modify existing columns. Therefore, if you rename a property (Client to Customer), Hibernate will create a new column Customer, leaving the column Client untouched. You will need to manually "move" the data there and remove the orphan column.
Hibernate will not remove constraints on no longer mapped columns. Thus, if your Client column was NOT NULL, any insert query to that table will now fail in the first place, because Hibernate won't provide any data for the orphan column (Which still has it's NOT NULL constraint) anymore.
Hibernate will not touch data types of existing columns. So, if you change a property type from String to Date - Hibernate will leave the column definition as varchar.
Hibernate does not remove columns of which you deleted the property, leading to data-polution and worst-case (The constraints remain in place) to no longer working applications.
If you create additiional constriants on existing columns - hibernate will not create them, because the column already existed before. (You might miss important contraints on the production db you added on existing columns)
So, perform your updates on your own is safer. If you have to take into account what hibernate is doing and what not - you'd better do it on your own from the scratch.

Google App Engine / Objectify Soft Delete

I am using Objectify for my DAO layer on GAE, I wanna make most of my entity soft-delete-able, is it a good idea to make these entities extending a parent with isActive boolean or should I used embedded or should I just make it an interface isSoftDeleteable?
Reason I am asking is that it seems Objectify storing Entity with same parent class in same Entity kind (at least from what I see in the _ah/admin) and it may slow down the query when everything is under the same entity kind, maybe?
Which is the best way or if there is better way to do soft-delete in GAE?
Please advise and Thanks in advance!
There is no single right answer to this question. The optimal solution chiefly depends on what percentage of your entities are likely going to be in deleted state at any given time.
One option is to store a field like #Index(IfTrue.class) boolean active; and add this filter to all queries:
ofy.load().type(Thing.class).filter("size >", 20).filter("active", true)
The downside of this is that it requires adding extra indexes - possibly several because you may now need multi-property indexes where single-property indexes would have sufficed.
Alternatively, you can store a 'deleted' flag and manually exclude deleted entities from query results. Less indexes to maintain, but it adds extra overhead to each query as you pull back records you don't want. If your deleted entries are sparse, this won't matter.
One last trick. You might find it best to store index a deleted date since it's probably most useful: #Index Date deleted; This lets you filter("deleted", null) to get the active items and also lets you filter by datestamp to get really old entities that you may wish to purge. However, be aware that this will cause the deleted date to index into any multi-property indexes, possibly significantly increasing index size if you have a high percentage of deleted entities. In this case, you may wish to #Index(IfNull.class) Date deleted; and use map-reduce to purge sufficiently old entities.
I agree with StickFigure's answer. Take advantage of the difference between an "empty" index and a "null" index. The tradeoff is that each write will incur more datastore write operations - when you add an index, that's at least 2 additional write ops (ascending and descending) indexes that you need every time you update that value. When you delete the index, it's 2 more writes. Personally, I think this is worth while.
Query time should be fairly predictable whenever you do a query on a single property of an entity kind, because if you think about what's happening underneath the covers, you are browsing a list of items in sequential order before doing a parallel batch get of the entity data.

Lucene seems to be caching search results - why?

In my project we use Lucene 2.4.1 for fulltext search. This is a J2EE project, IndexSearcher is created once. In the background, the index is refreshed every couple of minutes (when the content changes). Users can search the index through a search mechanism on the page.
The problem is, the results returned by Lucene seem to be cached somehow.
This is scenario I noticed:
I start the application and search for 'keyword' - 6 results are returned,
Index is refreshed, using Luke I see, that there are 8 results now to query 'keyword',
I search again using the application, again 6 results are returned.
I analyzed our configuration and haven't found any caching anywhere. I have debugged the search, and there is no caching in out code, searcher.search returnes 6 results.
Does Lucene cache results internally somehow? What properties etc. should I check?
To see changes made by IndexWriters against an index for which you have an open IndexReader, be sure to call IndexReader.reopen() to see the latest changes.
Make sure also that your IndexWriter is committing the changes, either through an explicit commit(), a close(), or having autoCommit set to true.
With versions prior to 2.9.0, Lucene cached automatically the results of queries. With later releases there's no caching unless you wrap your query in a QueryFilter and then wrap the result in a CachingWrapperFilter. You could consider switching to a release >= 2.9.0 if reopening the index becomes a problem
One more note: In order to IndexReader find the real-time other threads updated documents, when initialize IndexReader, the parameter "read-only" has to be false. Otherwise, method reopen() will not work.

Best practice to realize a long-term history-mode for a O/RM system(Hibernate)?

I have mapped several java classes like Customer, Assessment, Rating, ... to a database with Hibernate.
Now i am thinking about a history-mode for all changes to the persistent data. The application is a web application. In case of deleting (or editing) data another user should have the possibility to see the changes and undo it. Since the changes are out of the scope of the current session, i don't know how to solve this in something like the Command pattern, which is recommended for undo functionality.
For single value editing an approach like in this question sounds OK. But what about the deletion of a whole persistent entity? The simplest way is to create a flag in the table if this customer is deleted or not. The complexest way is to create a table for each class where deleted entities are stored. Is there anything in between? And how can i integrate these two things in a O/RM system (in my case Hibernate) comfortably, without messing around to much with SQL (which i want to avoid because of portability) and still have enough flexibility?
Is there a best practice?
One approach to maintaining audit/undo trails is to mark each version of an object's record with a version number. Finding the current version would be a painful effort if the this were a simple version number, so a reverse version numbering works best. "version' 0 is always the current and if you do an update the version numbers for all previous versions are incremented. Deleting an object is done by incrementing the version numbers on the current records and not inserting a new one at 0.
Compared to an attribute-by-attribute approach this make for far simpler rollbacks or historic version views but does take more space.
One way to do it would be to have a "change history" entity with properties for entity id of the entity changed, action (edit/delete), property name, orginal value, new value. Maybe also reference to the user performing the edit. A deletion would create entities for all properties of the deleted entity with action "delete".
This entity would provide enough data to perform undos and viewing of change history.
Hmm I'm looking for an answer to this too. So far the best I've found is the www.jboss.org/envers/ framework but even that seems to me like more work than should be necessary.

Categories

Resources