I started a Spring WebFlux project some times ago, the goal of this project is to offer an REST API which collects its data from a database.
I currently go with a reactive approach thanks Reactor project included inside Spring 5 release and created reactive controllers. I need to persist in my database normalized datas with relations, this is why I use PostgreSQL.
At the time I am writing this lines, no reactive programming support is provided for JDBC and so JPA. But my controllers are only truly non-blocking if other components that they work with are also non-blocking. If I write Spring WebFlux controllers that still depend on blocking repositories, then my reactive controllers will be blocked waiting for them to produce data.
I would like to be non-blocking end to end, so I wonder to move on one of the NoSQL databases supported by Spring Data : Cassandra DB or MongoDB. I don't think Cassandra DB really fits to my needs, I will need to rewrite my entities and think differently my database's structure to be query oriented.
I read it is possible to keep some relations between my entities with MongoDB, especially with the last 4.0 version without refractor completely my db schema. But I wonder what is worth ?
Switch to MongoDB even if I need to keep relational datas
Keep to fetch data in a blocking fashion and then translate it into a reactive type as soon as possible
Forget Spring WebFlux and go back to Spring MVC (probably not)
Thank you for any help and advice !
I think it depends on your context, it seems that moving to a document db might not be a good fit for your data as it seems fully relational unless you are sure you can model your data as a bunch of aggreates, otherwise you might end up having other problems such as transaction consistency when checking consistency rules between your models. As a first option i would try to fetch data in another thread, perhaps wrapping the call in an rxjava observable. Although it is still a blocking call it will not block the main thread and you will be able to make better use of resources.
Those are my 2 cents.
Regards
Is Hibernate is sufficient for handling an ERP application with 1000+ tables, Is there is any performance bottleneck occurred in any stage of the development.
If we proceed with Hibernate which design pattern we follow.
Hibernate is an ORM tool. It is supposed to map classes to tables, objects to tuples and so on. The main purpose of such a tool should be to cut down the boilerplate in accessing the database while saving, updating, deleting or fetching data from tables corresponding to an OO environment. If you have 1000+ tables in your system and they all (or in conjunctions) correspond to classes in your OO API, you should be good to go.
However, the real problem with using Hibernate with high volumes of data lies in the type of operation you want to perform. For CUD operations, hibernate is quite handy, but if you have to read data from several tables at once, you might have to apply a lot of trick with lazy loading (which for me is an overhead), otherwise it will try to load the data from entire family of classes (dependents and their dependents and so on). I found iBatis good for such a scenario (Reporting tool).
So, my answer would be unless you are going to do a lot of object persistence/ update/ delete in the object-oriented structure which is going to replicate in your database, you should not put the effort in trying to implement Hibernate.
A brief comparison of both (Hibernate and iBatis) can be found below:
https://softwareengineering.stackexchange.com/a/192005
https://www.javaworld.com/article/2077875/open-source-tools/ibatis--hibernate--and-jpa--which-is-right-for-you-.html
I am developing java application based on spring framework.
It
Connects to a MySQL database
Gets data from MySQLTable1 in POJOs
Manipulates (update,delete) it in memory
Inserts into a Netezza database table
The above 4 processes are done for each client (A,B,C) every hour.
I am using a spring JDBC template to get the data like this:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
and read each record into a POJO before I write it to a Netezza table.
There are going to be multiple instance of this application running every hour through a scheduler.
So Client A and Client B can be running concurrently but the SELECT will be unique,
I mean data for:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
will be different from
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='B' AND COL4='CONDITION'
But remember all of these are stored in memory as POJOs.
My questions are :
Is there a risk of data contamination?
Is there a need to implement database transaction using spring data transaction manager?
Does my application really need to use something like Spring Batch to deal with this?
I appreciate your thoughts and feedback.
I know this is a perfect scenario for using an ETL tool but that is out of scope.
Is there a risk of data contamination?
It depend on what you are doing with your data but I don't see how you can have data contamination if every instance is independant, you just have to make sure that every instances that run concurrently are not working on the same data (Client ID).
Is there a need to implement database transaction using spring data transaction manager?
You will probably need a transaction for insertion into the Netezza table. You certainly want your data to have a consistent state in the result table. If an error occur in the middle of the process, you'll probably want to rollback everything that was inserted before it failed. Regarding the transaction manager, you don't especially need the Spring transaction manager, but since you are using Spring it might be a good option.
Does my application really need to use something like Spring Batch to deal with this?
Does it really need it, probably not, but Spring Batch was made for those kind of application, so it might help you to structure your application (Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management). Everything can be made without the framework and it might be overkill to use it if you have a really small application. But at the end, if you need those features, you'll probably want to use it...
Spring Batch is ETL, so using it would be a good fit for this use case and also a good alternative to a commercial ETL tool.
Is there a risk of data contamination? Client A and B read separate data, so they can never interfere with each other by reading or writing the same data by accident. The risk would be if two clients with the same ID are created, but that is not the case.
Is there a need to implement database transaction using spring data transaction manager?
There is no mandatory need to do that, although programatic transaction management has many pitfalls and is best avoided. Spring Batch would manage transactions for you, as well as other aspects such as paging.
Does my application really need to use something like Spring Batch to deal with this? There is no mandatory need to do this, although it would help a lot, especially in the paging aspect. How will you handle queries that return thousands of rows? Without a framework this needs to be handled manually.
I wag going through a hibernate tutorial, where they say that hibernate is not suitable for data centric application. I am very much impressed by the 'object oriented structure' it gives to the program, but my application is very much data centric(it fetches and updates huge number of records. But I dont use any stored procedures). Cant I use hibernate?Are there any wrappers written over hibernate, which I can use for my application?Any help is appreciated.
I am not sure about specific meaning of phrase data centric. Aren't all database applications data centric? However, if you do process tons of data, Hibernate may not be the best choice. Hibernate is best to represent object models mapped to the database and it may have role in any application, but to do ETL (extract/transform/load) tasks you may need to write very efficient SQL by hand.
In principal you can, but it tends to be slow. Hibernate more or less creates an object for every row retrieved from the database. If you do this with large volumes of data, performance takes a serious hit. Also updates on many rows using a single update have only very basic support.
A wrapper won't help, at least with the object creation issue.
There are many advantages of using Hibernate, when one gets their object model correct as a developer there is a lot of appeal in interacting with the database via objects but in practice I have found initially Hibernate is great but becomes very frustrating when you come against issues like performance and fault finding.
When it comes to decision on the DA (Data Access) layer I ask myself this question.
Am I writing an application which has a requirement to run an different databases?
If the answer is yes then I will consider an (ORM) like Hibernate.
If its no then I will normally just use JDBC normally via Spring.
I feel that interacting with the database via JDBC is a lot more transparant and easier to find faults and performance tune.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Is there a viable alternative to Hibernate? Preferably something that doesn't base itself on JPA.
Our problem is that we are building a complex (as in, many objects refer to each other) stateful RIA system. It seems as Hibernate is designed to be used mainly on one-off applications - JSF and the like.
The problem is mainly that of lazy loading. Since there can be several HTTP requests between the initialization and actually loading lazy collections, a session per transaction is out of the question. A long-lived session (one per application) doesn't work well either, because once a transaction hits a snag and throws an exception, the whole session is invalidated, thus the lazy loaded objects break. Then there's all kinds of stuff that just don't work for us (like implicit data persisting of data from outside an initialized transaction).
My poor explanations aside, the bottom line is that Hibernate does magic we don't like. It seems like TopLink isn't any better, it also being written on top of EJB.
So, a stateless persistence layer (or even bright-enough object-oriented database abstraction layer) is what we would need the most.
Any thoughts, or am I asking for something that doesn't exist?
Edit: I'm sorry for my ambiguous terminology, and thank you all for your corrections and insightful answers. Those who corrected me, you are all correct, I meant JPA, not EJB.
If you're after another JPA provider (Hibernate is one of these) then take a look at EclipseLink. It's far more fully-featured than the JPA 1.0 reference implementation of TopLink Essentials. In fact, EclipseLink will be the JPA 2.0 reference implementation shipped with Glassfish V3 Final.
JPA is good because you can use it both inside and outside a container. I've written Swing clients that use JPA to good effect. It doesn't have the same stigma and XML baggage that EJB 2.0/2.1 came with.
If you're after an even lighter weight solution then look no further than ibatis, which I consider to be my persistence technology of choice for the Java platform. It's lightweight, relies on SQL (it's amazing how much time ORM users spend trying to make their ORM produce good SQL) and does 90-95% of what JPA does (including lazy loading of related entities if you want).
Just to correct a couple of points:
JPA is the peristence layer of EJB, not built on EJB;
Any decent JPA provider has a whole lot of caching going on and it can be hard to figure it all out (this would be a good example of "Why is Simplicity So Complex?"). Unless you're doing something you haven't indicatd, exceptions shouldn't be an issue for your managed objects. Runtime exceptions typically rollback transactions (if you use Spring's transaction management and who doesn't do that?). The provider will maintain cached copies of loaded or persisted objects. This can be problematic if you want to update outside of the entity manager (requiring an explicit cache flush or use of EntityManager.refresh()).
As mentioned, JPA <> EJB, they're not even related. EJB 3 happens to leverage JPA, but that's about it. We have a bunch of stuff using JPA that doesn't even come close to running EJB.
Your problem is not the technology, it's your design.
Or, I should say, your design is not an easy fit on pretty much ANY modern framework.
Specifically, you're trying to keep transactions alive over several HTTP requests.
Naturally, most every common idiom is that each request is in itself one or more transactions, rather than each request being a portion of a larger transaction.
There is also obvious confusion when you used the term "stateless" and "transaction" in the same discussion, as transactions are inherently stateful.
Your big issue is simply managing your transactions manually.
If you transaction is occurring over several HTTP requests, AND those HTTP requests happen to be running "very quicky", right after one another, then you shouldn't really be having any real problem, save that you WILL have to ensure that your HTTP requests are using the same DB connection in order to leverage the Databases transaction facility.
That is, in simple terms, you get a connection to the DB, stuff it in the session, and make sure that for the duration of the transaction, all of your HTTP requests go through not only that same session, but in such a way that the actual Connection is still valid. Specifically, I don't believe there is an off the shelf JDBC connection that will actually survive failover or load balancing from one machine to another.
So, simply, if you want to use DB transactions, you need to ensure that your using the same DB Connection.
Now, if your long running transaction has "user interactions" within it, i.e. you start the DB transaction and wait for the user to "do something", then, quite simply, that design is all wrong. You DO NOT want to do that, as long lived transactions, especially in interactive environments, are just simply Bad. Like "Crossing The Streams" Bad. Don't do it. Batch transactions are different, but interactive long lived transactions are Bad.
You want to keep your interactive transactions as short lived as practical.
Now, if you can NOT ensure you will be able to use the same DB connection for your transaction, then, congratulations, you get to implement your own transactions. That means you get to design your system and data flows as if you have no transactional capability on the back end.
That essentially means that you will need to come up with your own mechanism to "commit" your data.
A good way to do this would be where you build up your data incrementally into a single "transaction" document, then feed that document to a "save" routine that does much of the real work. Like, you could store a row in the database, and flag it as "unsaved". You do that with all of your rows, and finally call a routine that runs through all of the data you just stored, and marks it all as "saved" in a single transaction mini-batch process.
Meanwhile, all of your other SQL "ignores" data that is not "saved". Throw in some time stamps and have a reaper process scavenging (if you really want to bother -- it may well be actually cheaper to just leave dead rows in the DB, depends on volume), these dead "unsaved" rows, as these are "uncomitted" transactions.
It's not as bad as it sounds. If you truly want a stateless environment, which is what it sounds like to me, then you'll need to do something like this.
Mind, in all of this the persistence tech really has nothing to do with it. The problem is how you use your transactions, rather than the tech so much.
I think you should have a look at apache cayenne which is a very good alternative to "big" frameworks. With its decent modeler, the learning curve is shorten by a good documentation.
I've looked at SimpleORM last year, and was very impressed by its lightweight no-magic design. Now there seems to be a version 3, but I don't have any experience with that one.
Ebean ORM (http://www.avaje.org)
It is a simpler more intuitive ORM to use.
Uses JPA Annotations for Mapping (#Entity, #OneToMany etc)
Sessionless API - No Hibernate Session or JPA Entity Manager
Lazy loading just works
Partial Object support for greater performance
Automatic Query tuning via "Autofetch"
Spring Integration
Large Query Support
Great support for Batch processing
Background fetching
DDL Generation
You can use raw SQL if you like (as good as Ibatis)
LGPL licence
Rob.
BEA Kodo (formerlly Solarmetric Kodo) is another alternative. It supports JPA, JDO, and EJ3. It is highly configurable and can support agressive pre-fetching, detaching/attaching of objects, etc.
Though, from what you've described, Toplink should be able to handle your problems. Mostly, it sounds like you need to be able to attach/detach objects from the persistence layer as requests start and end.
Just for reference, why the OP's design is his biggest problem: spanning transactions across multiple user requests means you can have as many open transactions at a given time as there are users connected to your app - a transaction keeps the connection busy until it is committed/rolled back. With thousand of simultaneously connected users, this can potentially mean thousands of connections. Most databases don't support this.
Neither Hibernate nor Toplink (EclipseLink) is based on EJB, they are both POJO persistancy frameworks (ORM).
I agree with the previous answer: iBatis is a good alternative to ORM frameworks: full control over sql, with a good caching mechanism.
One other option is Torque, I am not saying it is better than any of the options mentioned above but just that it is another option to look at.
It is getting quite old now but may fit some of your requirements.
Torque
When I was myself looking for a replacement to Hibernate I stumbled upon DataNucleus Access Platform, which is an Apache2-licensed ORM. It isn't just ORM as it provides persistence and retrieval of data also in other datasources than RDBMS, like LDAP, DB4O and XML. I don't have any usage experience, but it looks interesting.
Consider breaking your paradigm completely with something like tox. If you need Java classes you could load the XML result into JDOM.