I have a scenario where in data is uploaded from excel sheet to mysql db. I am using spring data jpa. And the service calls the entities recursively after stuffing them with data taken from excel sheet to save in db. This creates "unable to acquire jdbc connections" after a certain load.
I tried with #Transactional to know advantage. Then I am thinking of using EntityManager manually in code and control transaction boundary so that all recursive save calls of entities happen within one transaction and thereby one connection object. I just wanted to check would it be a nice idea or is there any other approach I should take which is more performant. Needless to say anyhow I have to do it through entities.
My answer is completely based on the assumption that the way of implementing the requirement is faulty as there isn't any code shared in the question.
By your approach, yes you will run out of the connection as the entity population would surely be much faster than persisting that entity in the database and since you are doing it recursively your application will run out connections at one point of time if the amount of the data is very high, numbers are certainly a factor here.
The other approach I would prefer is that you can prepare your entities(Assuming all the data is for a common entity class) and store in a collection, once it is ready you can persist all of it in one transaction using saveAll() method.
If the data is not for common entities you can create multiple lists of different entities and initiate the DB operations after processing the excel sheet.
Related
I use ORMLite on a solution made by server and clients.
On server side I use PostgreSql, on client side I use SQLite.
In code, I use the same ORMLite methods, without taking care of the DB that is managed (PostgreSql or SQLite).
Let's say that:
Table A corresponds to class A
I have an Arraylist of objects A
I want to insert all items of ArrayList in DB.
Today I use a for() cycle, and I insert them one by one (doing it in Transaction Manager).
When the items are few, no problem, but now the items are becaming more and this is not probably the best way, also because I lock the DB for long time.
I'm searching a way to insert all the items in one step, so to go quickly, to not lock the DB for long time. I understood that it should be a sort of Stored Procedures (I'm not expert...).
To be noted that some items could be new (that is it not exist already an item with the same primary key id), then must be performed and INSERT; other items could be existing, so it should be performed an UPDATE.
Thank you
I'm searching a way to insert all the items in one step, so to go quickly, to not lock the DB for long time.
So there are two ways to do this that I know of: transactions and disabling auto-commit. If you are inserting into the database and it needs to all happen "at once" from a consistency standpoint, transactions are the only way to go. If you just want to insert and update a large number of records with higher performance then you can disable auto-commit, do the operations, and then commit. Depending on the database implementation, this is what the TransactionManager is really doing.
I understood that it should be a sort of Stored Procedures...
I don't see how stored procedures helps you at all. They aren't magic.
but now the items are becoming more and this is not probably the best way, also because I lock the DB for long time.
I don't think there is a magic solution to this. If you are pushing a large number of objects to the database and you need the data to be transactional, then locks are going to be have to be held during the updates. One thing to realize is that postgres should handle this a ton better than Sqlite. Sqlite does not (I don't think) have row level locking meaning that the whole DB is paused during transactions. Postgres has a much more mature locking system and should be more performant in this situation. This is also why Sqlite is so fast in many other operations because it doesn't have to burdened with the lock complexity.
One thing to consider is to rearchitect your schema. Try to figure out the minimal amount of data that needs to be transactionally inserted. For example, maybe just the object relationships needs to be changed transactionally but all of the data can be stored later. For example, you could have an AccountOwner object which just has 2 ids while all of information about the Account can be stored outside of the transaction. This makes your schema more complicated but maybe much faster.
Hope something here helps.
you can user entityManager.merge([list of items]);
the entityManager will insert the list in one shot.
Merge create the object if it doesn't exist in the database and updated if already exsit.
I am developing a dictionary application and using many external sources to collect the data.
This data is collected from those sources only for the first time, after that i persist it to my db and fetch it from their.
The problem i am facing is, some words like set, cut, put etc have 100's of meanings and many examples as well. It takes around 10 seconds to Persist all this data to mysql. I am using mybatis to persist data. And because of this, the response time is getting screwed up. Without this database persist, i get response in 400-500ms, if i show data directly after fetching from sources.
I am trying to find a way to persist the data in background. I am using MVC pattern so dao layer is separate.
Is it a good idea to use threading in the dao layer as a solution? Or should I use some messaging tool like Kafka to send a message to persist the given word in background? What else can I do?
Note: I prefer MySQL as the db right now, will probably use redis for caching later on.
My global answer on question + further comments:
Do not bulk insert with Mybatis foreach. Instead you shall execute the statement in a java iteration over the list of object to store, using ExecutorType Reuse or Batch(Read the documentation).
For transactions, in main mybatis-config xml, configure the environment:
transactionManager type JDBC to manage the transaction in the code session = sessionFactory.openSession(); session.commit(); session.rollback();
transactionManager type MANAGED to let the container manage.
Furthermore, you can let the web app send the response, while a new thread takes its time to store the data.
I'm working with a code base that is new to me, and it uses iBatis.
I need to update or add to an existing table, and it may involve 20,000+ records.
The process will run once per day, and run in the middle of the night.
I'm getting the data from a web services call. I plan to get the data, then populate one model type object per record, and pass each model type object to some method that will read the data in the object, and update/insert the data into the table.
Example:
ArrayList records= new ArrayList();
Foo foo= new Foo();
foo.setFirstName("Homer");
foo.setLastName("Simpson");
records.add(foo);
//make more Foo objects, and put in ArrayList.
updateOrInsert(records); //this method then iterates over the list and calls some method that does the updating/inserting
My main question is how to handle all of the updating/inserting as a transaction. If the system goes down before all of the records are read as used to update/insert the table, I need to know, so I may go back to the web services call and try again when the system is ok.
I am using Java 1.4, and the db is Oracle.
I would highly recommend you consider using spring batch - http://static.springsource.org/spring-batch/
The framework provides lot of the essential features required for batch processing - error reporting, transaction management, multi-threading, scaling, input validation.
The framework is very well designed and very easy to use.
The approach you have listed might not perform very well, since you are waiting to read all objects, storing them all in memory and then inserting in the database.
You might want to consider designing the process as follows:
Create a cache capable of storing 200 objects
Invoke the webservice to fetch the data
Create an instance of an object, validate and store the data in the object's fields
Add the object to the cache.
When the cache is full, perform a batch commit of the objects in the cache to the database
Continue with step 1
SpringBatch will allow you to perform batch commits, control the size of the batch commits, perform error handling when reading input - in your case retry the request, perform error handling while writing the data to the database.
Have a look at it.
We have a large table of approximately 1 million rows, and a data file with millions of rows. We need to regularly merge a subset of the data in the text file into a database table.
The main reason for it being slow is that the data in the file has references to other JPA objects, meaning the other jpa objects need to be read back for each row in the file. ie Imagine we have 100,000 people, and 1,000,000 asset objects
Person object --> Asset list
Our application currently uses pure JPA for all of its data manipulation requirements. Is there an efficient way to do this using JPA/ORM methodologies or am I going to need to revert back to pure SQL and vendor specific commands?
why doesnt use age old technique: divide and conquer? Split the file into small chunks and then have parallel processes work on these small files concurrently.
And use batch inserts/updates that are offered by JPA and Hibernate. more details here
The ideal way in my opinion though is to use batch support provided by plain JDBC and then commit at regular intervals.
You might also wants to look at spring batch as it provided split/parallelization/iterating through files etc out of box. I have used all of these successfully for an application of considerable size.
One possible answer which is painfully slow is to do the following
For each line in the file:
Read data line
fetch reference object
check if data is attached to reference object
if not add data to reference object and persist
So slow it is not worth considering.
I have a database which has around 150K records of data with a primary key on the table. The data size for each record will take less than 1kB. The processing time for constructing a POJO from the DB record takes about 1-2 secs(there is some business logic that takes too much time). This is read-only data. Hence I'm planning to implement caching the data. What I'm thinking to do is. Load the data in subsets(200 records each time) and create a thread that'll construct the POJOs and keep them in a hashtable. While the cache is being loaded(when I start the application) the User will see a wait sign. For storing the data in HashTable is an issue I'll actually store the processed data in to another DB table(marshall the POJO to xml).
I use a third party API to load the data from database. Once I load a record I'll have load the data I'll have to load associations for the loaded data and then associations for the association found at the top level. It's like loading a family tree.
I can't use Hibernate or any ORM framework as I'm using a third party API to load the data which is shipped with the database it self(it's a product). More over I don't think loading data once is not a big issue.
If there is a possibility to fine tune the business logic I wouldn't have asked this question here.
Caching the data on demand is an option, but I'm trying to see if I can do anything better.
Suggest me if there is a better idea that you are aware of. Thank you./
Suggest me if there is a better idea that you are aware of.
Yes, fix the business logic so that it doesn't take 1 to 2 seconds per record. That's a ridiculously long time.
Before you do that, profile your application to make sure that it is really the business logic that is causing the slow record loading, and not something else. (For example, it could be a pathological data structure, or a database issue.)
Once you've fixed the root cause of the slow record loading, it is still a good idea to cache the read-only records, but you probably don't need to preload the cache. Instead, just load the records on demand.
It sounds like you are reinventing the wheel. I'd be looking to use hibernate. Apart from simplifying the code to access the database, hibernate has built-in caching and lazy loading of data so it only creates objects as you request them. Ergo, a lot of what you describe above is already in place and you can concentrate on sorting out your business logic. I suspect that once you solve the business logic performance issue, there will be no need to do such as complicated caching system and hibernate defaults will be sufficient.
As maximdim said in a comment, preloading the whole thing will take a lot of time. If your system is not very strange, the user won't need all data at once. Just cache on demand instead. I would also recommend using an established caching solution, such as EHCache, which has persistence via DiskStore -- the only issue is that whatever you cache in this case has to be Serializable. Since you can marshall it as XML, I'm betting you can serialize it too, which should be faster.
In a past project, we had to query a very busy, very sluggish service running in an off-site mainframe in order to assemble one of the entities. Average response times from our app were dominated by this query. Since the data we retrieved was mostly read-only caching with EHCache solved our problems.
jdbm has a nice, persistent map implementation (http://code.google.com/p/jdbm2/) - that may help you do local caching - it would certainly be a lot faster than serializing your POJOs to XML and writing them back into a SQL database.
If your data is truly read-only, then I'd think that the best solution would be to treat the source database as an input queue that feeds your app database. Create a background process (heck, a service would be better), and have it monitor the source database and keep your app database synced.