I am developing a dictionary application and using many external sources to collect the data.
This data is collected from those sources only for the first time, after that i persist it to my db and fetch it from their.
The problem i am facing is, some words like set, cut, put etc have 100's of meanings and many examples as well. It takes around 10 seconds to Persist all this data to mysql. I am using mybatis to persist data. And because of this, the response time is getting screwed up. Without this database persist, i get response in 400-500ms, if i show data directly after fetching from sources.
I am trying to find a way to persist the data in background. I am using MVC pattern so dao layer is separate.
Is it a good idea to use threading in the dao layer as a solution? Or should I use some messaging tool like Kafka to send a message to persist the given word in background? What else can I do?
Note: I prefer MySQL as the db right now, will probably use redis for caching later on.
My global answer on question + further comments:
Do not bulk insert with Mybatis foreach. Instead you shall execute the statement in a java iteration over the list of object to store, using ExecutorType Reuse or Batch(Read the documentation).
For transactions, in main mybatis-config xml, configure the environment:
transactionManager type JDBC to manage the transaction in the code session = sessionFactory.openSession(); session.commit(); session.rollback();
transactionManager type MANAGED to let the container manage.
Furthermore, you can let the web app send the response, while a new thread takes its time to store the data.
Related
I am given a situation where there is a database and it's getting used from last 6 months. But from now on, a new database will be used. All the insertion operation would happen in the new database but for retrievals or all gets, a search has to be made in both the old and new databases. Design a microservice and how can the database configuration be done to achieve this?
Though not practical but you can define multiple DataSource in your spring boot project. Define a controller that intercept the get call. route the call to your service which will have the logic to transact between two different sources to build response for your rest queries. You can find an example from here :-
https://www.baeldung.com/spring-data-jpa-multiple-databases
Another thing you can do is introduce, elastic search and index all your old db data which will be part of get inquiry call to it and use elastic to fire query than db.
I have a scenario where in data is uploaded from excel sheet to mysql db. I am using spring data jpa. And the service calls the entities recursively after stuffing them with data taken from excel sheet to save in db. This creates "unable to acquire jdbc connections" after a certain load.
I tried with #Transactional to know advantage. Then I am thinking of using EntityManager manually in code and control transaction boundary so that all recursive save calls of entities happen within one transaction and thereby one connection object. I just wanted to check would it be a nice idea or is there any other approach I should take which is more performant. Needless to say anyhow I have to do it through entities.
My answer is completely based on the assumption that the way of implementing the requirement is faulty as there isn't any code shared in the question.
By your approach, yes you will run out of the connection as the entity population would surely be much faster than persisting that entity in the database and since you are doing it recursively your application will run out connections at one point of time if the amount of the data is very high, numbers are certainly a factor here.
The other approach I would prefer is that you can prepare your entities(Assuming all the data is for a common entity class) and store in a collection, once it is ready you can persist all of it in one transaction using saveAll() method.
If the data is not for common entities you can create multiple lists of different entities and initiate the DB operations after processing the excel sheet.
I have a use case like,
Read bulk records from multiple tables(more than 10000 records)
Business logic to validate records
Updating validated records to a different table other than where records were retrieved in Same Database.
I would like implement my use case with spring batch and scheduler to run at
certain point of time.
I have read about spring batch and understand that there is a ItemReader, ItemProcessor, and ItemWriter as job in chunk to execute activity.
Also I would like to implement it using multi threading by defining taskExecutor(org.springframework.core.task.SimpleAsyncTaskExecutor).I have decided to go with the below approach
Read records from DB with query by calling DAO implemented in other module with spring hibernate transaction manager in ItemReader.
Process the records each at a time in ItemProcessor
Update the record to table in ItemWriter with commit interval of some number.
I am new to Spring batch processing So I would like to understand if this is a good solution to implement or if is there any better way to implement it. I also have few questions regarding how DB connections and transactions will be maintained.
Will there be one connection and transaction for the whole batch job? Or will multiple connections and transaction be opened at certain points of execution? How this process will be handled?
How to effectively process the above use case with multi-threading to process records with 10, or 20 threads at a time?
Can someone please provide a brief explanation to understand more on this concept or provide any samples to understand more?
Thanks in advance.
Your approach sounds good to me .
I will try to answer your first question.
Will there be one connection and transaction for the whole batch job? Or will multiple connections and transaction be opened at certain points of execution? How this process will be handled?
You can have multiple data source and multiple transaction managers , but managing it will be difficult , as you will have to take care of many of the thing that spring batch manager can do on its own .
Since most of the spring batch operations like Restart , stop etc need metadata which is stored in Db by spring batch . If you try to play with it then those operations might not work very well .
I would suggest you to have both spring batch tables and your business specific tables inside the same data source .
That way you need to have only one data source and one transaction manager and you do not have to worry about transaction issues that you might face.
I explain better my question since from the title it could be not very clear, but I didn't find a way to summarize the problem in few work. Basically I have a web application whose DB have 5 tables. 3 of these are managed using JPA and eclipselink implementation. The other 2 tables are manager directly with SQL using the package java.sql. When I say "managed" I mean just that query, insertion, deletion and updates are performed in two different way.
Now the problem is that I have to monitor the response time of each call to the DB. In order to do this I have a library that use aspects and at runtime I can monitor the execution time of any code snippet. Now the question is, if I want to monitor the response time of a DB request (let's suppose the DB in remote, so the response time will include also network latency, but actually this is fine), what are in the two distinct case described above the instructions whose execution time has to be considered.
I make an example in order to be more clear.
Suppose tha case of using JPA and execute a DB update. I have the following code:
EntityManagerFactory emf = Persistence.createEntityManagerFactory(persistenceUnit);
EntityManager em=emf.createEntityManager();
EntityToPersist e=new EntityToPersist();
em.persist(e);
Actually it is correct to suppose that only the em.persist(e) instruction connects and make a request to the DB?
The same for what concern using java.sql:
Connection c=dataSource.getConnection();
Statement statement = c.createStatement();
statement.executeUpdate(stm);
statement.close();
c.close();
In this case it is correct to suppose that only the statement.executeUpdate(stm) connect and make a request to the DB?
If it could be useful to know, actually the remote DBMS is mysql.
I try to search on the web, but it is a particular problem and I'm not sure about what to look for in order to find a solution without reading the JPA or java.sql full specification.
Please if you have any question or if there is something that is not clear from my description, don't hesitate to ask me.
Thank you a lot in advance.
In JPA (so also in EcliplseLink) you have to differentiate from SELECT queries (that do not need any transaction) and queries that change the data (DELETE, CREATE, UPDATE: all these need a transacion). When you select data, then it is enough the measure the time of Query.getResultList() (and calls alike). For the other operations (EntityManager.persist() or merge() or remove()) there is a mechanism of flushing, which basically forces the queue of queries (or a single query) from the cache to hit the database. The question is when is the EntityManager flushed: usually on transaction commit or when you call EntityManager.flush(). And here again another question: when is the transaction commit: and the answer is: it depends on your connection setup (if autocommit is true or not), but a very correct setup is with autocommit=false and when you begin and commit your transactions in your code.
When working with statement.executeUpdate(stm) it is enough to measure only such calls.
PS: usually you do not connect directly to any database, as that is done by a pool (even if you work with a DataSource), which simply gives you a already established connection, but that again depends on your setup.
PS2: for EclipseLink probably the most correct way would be to take a look in the source code in order to find when the internal flush is made and to measure that part.
I'm working with a code base that is new to me, and it uses iBatis.
I need to update or add to an existing table, and it may involve 20,000+ records.
The process will run once per day, and run in the middle of the night.
I'm getting the data from a web services call. I plan to get the data, then populate one model type object per record, and pass each model type object to some method that will read the data in the object, and update/insert the data into the table.
Example:
ArrayList records= new ArrayList();
Foo foo= new Foo();
foo.setFirstName("Homer");
foo.setLastName("Simpson");
records.add(foo);
//make more Foo objects, and put in ArrayList.
updateOrInsert(records); //this method then iterates over the list and calls some method that does the updating/inserting
My main question is how to handle all of the updating/inserting as a transaction. If the system goes down before all of the records are read as used to update/insert the table, I need to know, so I may go back to the web services call and try again when the system is ok.
I am using Java 1.4, and the db is Oracle.
I would highly recommend you consider using spring batch - http://static.springsource.org/spring-batch/
The framework provides lot of the essential features required for batch processing - error reporting, transaction management, multi-threading, scaling, input validation.
The framework is very well designed and very easy to use.
The approach you have listed might not perform very well, since you are waiting to read all objects, storing them all in memory and then inserting in the database.
You might want to consider designing the process as follows:
Create a cache capable of storing 200 objects
Invoke the webservice to fetch the data
Create an instance of an object, validate and store the data in the object's fields
Add the object to the cache.
When the cache is full, perform a batch commit of the objects in the cache to the database
Continue with step 1
SpringBatch will allow you to perform batch commits, control the size of the batch commits, perform error handling when reading input - in your case retry the request, perform error handling while writing the data to the database.
Have a look at it.