Is hibernate search remote indexing possible?

Is hibernate search remote indexing possible? - java

We are migrating a whole application originally developed in Oracle Forms a few years back, to a Java (7) web based application with Hibernate (4.2.7.Final) and Hibernate Search (4.1.1.Final).
One of the requirements is: as users are using the new migrated version, they able to use the Oracle Forms version - so Hibernate Search indexes will be out of sync. Is it feasable to implement a servlet so that some PL-SQL accesses some link that updates the local indexes in the application server (AS)?
I thought of implementing a some sort clustering mechanism for hibernate, but as I read through the documentation I realised that as clustering may be a good option for scalabillity and performance, for maintaining legacy data in sync may be a bit overkill.
Does anyone have any idea of how to implement a service, accessible via servlet, to update local AS indexes in a given model entity with a given ID?

I don't know what exactly you mean by the clustering part, but anyways:
It seems like you are facing a similar problem like me. I am currently in the works of creating a Hibernate-Search adaption for JPA providers (that are not Hibernate-ORM, meaning EclipseLink, TopLink, etc.) and I am working on an automatic reindexing feature at the moment. Since JPA doesn't have a event system suitable for reindexation with Hibernate-Search I came up with the idea to use triggers on a database level to keep track of everything.
For a basic OneToOne relationship it's pretty straight forward and for other things like relation-tables or anything that is not stored in the main table of an entity it gets a bit trickier, but once you got a system for OneToOne relationships it's not that hard to get to that next step. Okay, Let's start:
Imagine two Entities: Place and Sorcerer in the Lord of the rings universe. In order to keep things simple let's just say they are in a (quite restrictive :D) 1:1 relationship with each other. Normally you end up with 2 tables named SORCERER and PLACE.
Now you have to create 3 triggers (one for CREATE, one for DELETE and one for UPDATE) on each Table (SORCERER and PLACE) that store information about what entity (only the id, for mapping tables there are always multiple ids) has changed and how (CREATE, UPDATE, DELETE) into special UPDATE tables. Let's call these PLACE_UPDATES and SORCERER_UPDATES.
In addition to the ID of the original Object that has changed and the event-type these will need an ID field that is needed to be UNIQUE among all UPDATE tables. This is needed because if you want to feed information from the Update tables to the Hibernate-Search index you have to make sure the events are in the right order or you will break your index. How such an UNIQUE ID can be created on your database should be easy to find on the internet/stackoverflow.
Okay. Now that you have set up the triggers correctly you will just have to find a way to access all the UPDATES tables in a feasible fashion (I do this via querying from multiple tables at once and sorting each query by our UNIQUE id field and then just comparing the first result of each query with the others) and then update my index.
This can be a bit tricky and you have to find the correct ways of dealing with the specific update event but it can be done (that's what I am currently working on).
If you're interested in that part, you can find it here:
https://github.com/Hotware/Hibernate-Search-JPA/blob/master/hibernate-search-db/src/main/java/com/github/hotware/hsearch/db/events/IndexUpdater.java
The link to the whole project is:
https://github.com/Hotware/Hibernate-Search-JPA/
This uses Hibernate-Search 5.0.0.
I hope this was of help (at least a little bit).
And about your remote indexing problem:
The update tables can easily be used as some kind of dump for events until you send them to the remote machine that is to be updated.

Related

Hierarchical Data Model with JPA

Recently I come across a schema model like this
Structure looks exactly the same, i just renamed with Entity name like Table (*)
Starting from Table C, all the tables are having close to 200 Columns, from C to L
Reason for posting this is like, I never come across structure like this before, if anyone who have already experienced like this or worked similar or more complex than this please do share your idea,
Having a structure like this is good or bad, and why?
Assume we need to have API to save data for the table structure like this,
how to design the API
How we are going to manage the Transactional across all these tables
In service code, there are few cases where we might need to get data from these table and transfer to external system.
Catch here is, external system is accepting the request in the flatten structure not in the hierarchy which we have as mentioned above. If this data needs to be transferred to external system, how can we manage marshaling and un marshaling
Last but not least, API which is going to manage the data like this can be consumed atleast 2K a day.
What is your thought on this, I don't know exactly why we need it, it needs a detailed discussion and we need to break up the things.
If I consider Spring Data JPA, Hibernate. What are all things i need to consider,
More Importantly, all these tables row values will be limited based on the the ownerId/tenantId, so the data needs to be consistent across all the tables.

I can not comment on the general aspect of the structure as that is pretty domain specific and one would need to know why this structure was chosen to be able to say if it's good or not. Either way, you probably can't change this anyway, so why bother asking if it's good or not?
Having said that, with such a model there are a few aspects that you should consider:
When updating data, it is pretty important to update only columns that really changed to avoid index trashing and allow the DB to use spare storage in pages. This is a performance concern that usually comes up when using Hibernate with such models as Hibernate usually updates all "updatable" columns, not just the dirty ones. There is an option to do dynamic updates though. Without dynamic updates, you might produce a few more IOs per update and thus keep locks for a longer time which affects the overall scalability.
When reading data, it is very important not to use join fetching by default as that might result in a result set size explosion.

Hibernate - Fetch a sequential number from database, preventing duplicated keys during concurrency

I have a situation maintaining a legacy project, using JSF / Primefaces / Hibernate, the database is DB2, the original code was migrated from Delphi to Java, but keeping the database structure since it came from a vendor (we can't change it). There are some tables used to fetch a sequential id (SELECT MAX and UPDATE after that).
The table structure has a composite key (year and number), the issue today is: we select the max number based on the year from a param table (which holds the "next sequential" value). Sometimes users using concurrently get the same number, causing errors when trying to persist duplicated keys.
I tried to implement a Hibernate Interceptor to fetch and set the value during the onSave method, but I was unable to make it avoid the duplicated keys issue (Tried using it as SessionFactory-scoped). Also I tried to make the methods syncronized, but it didn't work also.
Is there a way to prevent this duplicated key issue (programmatically, without the need of changing the database) using Hibernate features?
Thanks in advance!

Data structure/Java Technique for managing a list of sequential commands

I'm not sure if something special exists for this use case - but it felt like a case where someone was likely to have made some sort of useful structure/technique/design-pattern.
My Situation
I have a set of SQL commands executed from middle tier (Java) to insert/update/delete data to any of a set of very large tables via joins from a related staging table.
I have more SQL commands which update various derived tables based on the staging table/actual table contents. Different tables will interact with different derived tables via different queries (as usual). These commands may have to be interleaved with the first set depending on the use case - so, I can't necessarily execute set 1 then set 2 all at once.
My Question
So, I need to build a chain of commands that get executed sequentially, and I need to trigger a rollback if any of them fail. I'd like to do this in the most clear, documented way possible.
Does anyone know a standard way of coding this? I'm sure anyone migrating from stored procedure code to middle tier code has done this before and I don't want to reinvent the wheel if there are good options out there.
Additional Information
One of my main concerns is making everything clear. To elaborate, I'll have a set of queries specifically designed to:
Truncate staging table A' and populate it with primary keys targeting deletion records
Delete from actual table A based on join with A'
Truncate staging table A' and populate it with full data for upserts
Update/Insert records from A' to A based on joins
The same logic will apply to tables B, C, D, etc. Unfortunately, it can be the case where just A and C need an extra step, like syncing deletes to a certain derived table, to be done after the deletions but before the upserts.
I'd obviously like to group all the logic for updating a table, and I'd like to group all the logic for updating a derived table as well, but at execution time they have to be intelligently interleaved and this sounds messy to me.

Don't write such a thing yourself. This is what JTA was born for.
You can use either JPA or Spring to do it.
Annotate the unit of work as transactional and let the database and JDBC handle it.
If you must do it yourself, follow the aspect-oriented approach and make it a decorative "before & after" implementation.

Abstraction layer for table partitioning - JPA

Facts
Database: PostgreSQL (latest)
Programming language: Java
Problem statement (simplified)
We have 2 tables - overview and details. There could be millions of rows in "overview" and each row of "overview" can have millions of rows associated with it in "details". The foreign key details.overview_id refers to overview.id. Most queries are of the general formSELECT * FROM details WHERE overview_id = xxx AND details.id > yyy AND details.id < zzz; If we have a single table for details, the queries will be too slow (although the queries on details are almost always on primary keys). More on the nature of DB activities: INSERT and UPDATE on overview happens infrequently. INSERT on details happen at a rapid pace, while UPDATE on the same table almost never happens and bulk DELETE happens sometimes.
What we already have
In the past we used raw SQL to partition the table "details" against each row in "overview". (In practice, we did not actually partition, instead we created new tables based on a template. These tables did not have any column called overview_id (saving storage space), instead we had a separate table that did the mapping between overview.id and the table-name of the specific partition table.) So, as you can understand, the partitions had to be generated on the fly as new rows were inserted in overview and partitions were dropped as rows were deleted from overview. All of this was managed inside the application. The application-database interaction has been blazing fast, but the application code is fairly complex, implying it is hard to maintain. Also, with raw SQL lying around everywhere, it is hard to scale the DB horizontally - we have to reinvent what most JPA providers have already done.
Current goal
Currently we are exploring options for a mechanism by which this partitioning can happen behind the scene - possibly by a JPA provider (I understand that this is not part of the JPA spec), so that we can focus on the application while the underlying framework/layer takes care of the scalability issues.
I looked at openJPA Slice and EclipseLink. Both of them provide partition (shard) management across hosts. We certainly need that. But we also need partition management within a single host. However, if there is a better or more elegant solution to this or if there is a totally different angle to look at this, I will be really glad to know about that.
I will appreciate any insight you can provide.
Thanks.
Prajesh

Have you looked into using Postgres's table partitioning?
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

Thank you all for your comments/answers till date. We decided to stick to what we already have (see the section named "what we already have"), with minor modifications.

Best practice to realize a long-term history-mode for a O/RM system(Hibernate)?

I have mapped several java classes like Customer, Assessment, Rating, ... to a database with Hibernate.
Now i am thinking about a history-mode for all changes to the persistent data. The application is a web application. In case of deleting (or editing) data another user should have the possibility to see the changes and undo it. Since the changes are out of the scope of the current session, i don't know how to solve this in something like the Command pattern, which is recommended for undo functionality.
For single value editing an approach like in this question sounds OK. But what about the deletion of a whole persistent entity? The simplest way is to create a flag in the table if this customer is deleted or not. The complexest way is to create a table for each class where deleted entities are stored. Is there anything in between? And how can i integrate these two things in a O/RM system (in my case Hibernate) comfortably, without messing around to much with SQL (which i want to avoid because of portability) and still have enough flexibility?
Is there a best practice?

One approach to maintaining audit/undo trails is to mark each version of an object's record with a version number. Finding the current version would be a painful effort if the this were a simple version number, so a reverse version numbering works best. "version' 0 is always the current and if you do an update the version numbers for all previous versions are incremented. Deleting an object is done by incrementing the version numbers on the current records and not inserting a new one at 0.
Compared to an attribute-by-attribute approach this make for far simpler rollbacks or historic version views but does take more space.

One way to do it would be to have a "change history" entity with properties for entity id of the entity changed, action (edit/delete), property name, orginal value, new value. Maybe also reference to the user performing the edit. A deletion would create entities for all properties of the deleted entity with action "delete".
This entity would provide enough data to perform undos and viewing of change history.

Hmm I'm looking for an answer to this too. So far the best I've found is the www.jboss.org/envers/ framework but even that seems to me like more work than should be necessary.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.