I have a Spring Boot API linked to a mongodb database.
On a specific route I get the events for a user given (it's parsing a big collection with millions documents); the problem is that I get these documents in >20s but when I use mongoshell I get them in 0,5s.
I've already added an index on the userId (it got a way faster).
I've googled the problem but I don't see answers about this (or maybe I didn't get the point).
My method does a very basic thing :
public Collection<Event> getEventsForUser(final String tenantId, final String orgId, final String userId)throws EventNotFoundException {
Collection<MongoEvent> mongoEvents = mongoEventRepository.findByTenantIdAndOrganizationIdAndUserIdIgnoreCase(tenantId, orgId, userId);
if (mongoEvents != null && !mongoEvents.isEmpty())
return mongoEvents.stream().map(MongoEvent::getEvent).collect(Collectors.toList());
throw new EventNotFoundException("Events not found.");
}
Is it normal or is there a solution to optimize the query?
Thanks!
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm new to spring framework so the questions might come up as silly.
I have a database with almost 5000 entries in it. I need to create a GET request which takes 5 parameters to filter the data. Depending on what parameters are present, the request will filter the data. I was able to do it but I don't think I am doing it efficiently. So here are my questions:
First, Which one is a better approach? Retrieving all data from database using repository.findAll() and then using stream + filter to filter out the data OR writing the query in JPA repository interface and then simply calling those methods? Which one would be more efficient?
Second, What is the best way to retrieve a huge amount of data? Like in my case, there are 5000 entries. So how should I retrieve them? I've read something about Pageable but not 100% sure. Is that the way to go or is there any other better option?
Any help appreciated. Thanks :)
For the first question is better to retrieve only required records from DB, instead of retrieve all entries and then filter them on Java .Writing the query in JPA repository one of the options , but also you can use CriteriaQuery to do this . CriteriaQuery given you more manipulate on fillture items on programmatically way . Also it help you with your second question .
Yes Pagination is one of approach , special for Web Applications . The Main idea of pagination is to dividing large records of data to smaller chunks (Pages) , user search for his record on first chuck (Page) then he/she will request the a second page if he/she did found it .
Below example summarize your two queries . In this example am trying to retrive/search on large number of orders .
Bean OrderSearchCriteria.java , use to identify filter parameter .
public class OrderSearchCriteria {
private String user ;
private Date periodFrom ;
private Date periodTo ;
private String status ;
private Integer pageLimit ;
private Integer page ;
private Integer offset ;
private String sortOrder ;
.....
}
Repository
public interface OrderRepository extends JpaRepository<Order, Integer> , JpaSpecificationExecutor<Order>{}
Below using CriteriaQuery to filter orders based on submitted criteria .
#Service
public class OrderServiceImpl implements OrderService{
......
#Override
public Page<Order> orderSearch(OrderSearchCriteria orderSearchCriteria) {
if (orderSearchCriteria.getPage() == null)
orderSearchCriteria.setPage(orderSearchCriteria.getOffset() / orderSearchCriteria.getPageLimit());
return orderRepository.findAll(OrderSearchSpecificaton.orderSearch(orderSearchCriteria) ,
PageRequest.of(orderSearchCriteria.getPage(), orderSearchCriteria.getPageLimit()));
}
private static class OrderSearchSpecificaton {
public static Specification<Order> orderSearch(OrderSearchCriteria orderSearchCriteria) {
return new Specification<Order>() {
private static final long serialVersionUID = 1L;
#Override
public Predicate toPredicate(Root<Order> root, CriteriaQuery<?> query, CriteriaBuilder criteriaBuilder) {
List<Predicate> predicates = new ArrayList<>();
if (!StringUtils.isEmpty(orderSearchCriteria.getUser()) && !orderSearchCriteria.getUser().toUpperCase().equals("ALL")) {
Join<Order, User> userJoin = root.join("user") ;
predicates.add(criteriaBuilder.equal(userJoin.get("name") ,orderSearchCriteria.getUser()));
}
if (!StringUtils.isEmpty(orderSearchCriteria.getStatus()) && !orderSearchCriteria.getStatus().toUpperCase().equals("ALL")) {
predicates.add(criteriaBuilder.equal(root.get("status") ,orderSearchCriteria.getStatus()));
}
if (orderSearchCriteria.getPeriodFrom() != null) {
predicates.add(criteriaBuilder.greaterThanOrEqualTo(root.get("entryDate"), orderSearchCriteria.getPeriodFrom())) ;
}
if (orderSearchCriteria.getPeriodTo()!= null) {
predicates.add(criteriaBuilder.lessThan(root.get("entryDate"), orderSearchCriteria.getPeriodTo())) ;
}
if (!StringUtils.isEmpty(orderSearchCriteria.getSortOrder())) {
if (orderSearchCriteria.getSortOrder().toUpperCase().equals("DESC")) {
query.orderBy(criteriaBuilder.desc(root.get("entryDate"))) ;
}
else {
query.orderBy(criteriaBuilder.asc(root.get("entryDate"))) ;
}
}
return criteriaBuilder.and(predicates.toArray(new Predicate[predicates.size()]));
}
};
}
}
Call orderSearch from Controller
#ResponseBody
#RequestMapping(path = "/order/search" , method = RequestMethod.POST)
public HashMap<String, Object> orderSearch(#RequestBody OrderSearchCriteria orderSearchCriteria) {
Page<Order> page = getOrderService().orderSearch(orderSearchCriteria) ;
HashMap<String, Object> result = new HashMap<>() ;
result.put("total", page.getTotalElements());
result.put("rows", page.getContent());
return result ;
}
I hope this can help you .
What is better depends on context. Only you know what is better in your context. Nevertheless I'd suggest you to consider following solution.
1) Use Spring Data JPA Specifications
You say that some of 5 parameters can be present, some not. I'd suggest you to use Spring Data JPA Specifications. Here is a good article and examples.
The idea is following. For each of your 5 parameters you create a specification. In this example these are methods customerHasBirthday() and isLongTermCustomer() in the class CustomerSpecifications.
Then you create a query dynamically, depending on what parameters are present:
if (parameter1 is present){
add specification 1 to the "where" clause
}
...
if (parameter5 is present){
add specification 5 to the "where" clause
}
Then calls findAll() using the resulting aggregated specification.
Of course other solutions are possible: You can build a JPQL Query as a string dynamically, depending on what parameters are present. Or you can dynamically build a native SQL query. But specifications have one more advantage: pageable queries in Spring accept only specifications.
2) Use Paging
If your application has only 2-3 users that send only a few requests per hour, then loading 5000 items per request might work well. But if all the results need to be rendered in browser, this can take a lot of resources on the client and can be a performance problem.
If you have more users that send more requests, then also on the server side CPU and RAM can be insufficient and you can face performance problems and, as a consequence, very long response time for users.
That's why I'd suggest you to use Paging. You can limit the number of elements in the response. Suppose you set page size to 100. Then each request will need less resources:
On database level: Instead of 5000 database will return only 100 elements, it will be better performance
Application will create from JDBC response only 100 Java objects instead of 5000 -> less memory and less CPU used
Application will have less overhead with converting Java objects to JSON, again less memory and less CPU
The response time will be better, because sending of 100 elements from application to the user takes less time than sending 5000 elements
Browser performance can be better. It depends on the client logic. In case client application is not smart and renders every response element, this will be a higher performance, because rendering of 100 elements will be faster than rendering of 5000 elements.
There are many tutorials about paging, do one or two that you like.
I have a list of status enum values which I am currently iterating over and using a basic counter to store how many in my list have the specific value that I am looking for. I want to improve greatly on this however and think that there may be a way to use some kind of JPA query on a paging and sorting repository to accomplish the same thing.
My current version which isn't as optimized as I would like is as follows.
public enum MailStatus {
SENT("SENT"),
DELETED("DELETED"),
SENDING("SENDING"),
}
val mails = mailService.getAllMailForUser(userId).toMutableList()
mails.forEach { mail ->
if (mail.status === MailStatus.SENT) {
mailCounter++
}
}
With a paging and sorting JPA repository is there some way to query this instead and get a count of all mail that has a status of sent only?
I tried the following but seem to be getting everything rather than just the 'SENT' status.
fun countByUserIdAndMailStatusIn(userId: UUID, mailStatus: List<MailStatus>): Long
I'm using Spring's named queries to access my data and came upon an issue.
I have a rather long named query in my spring data repo:
List<LandscapeLocationEntity> findByIdCustomerIdAndIdProductGroupAndIdProductIdAndIdLocationIdInAndActiveFlag(String customerId, String ProductGroup, String productId, List<Integer> locationId, boolean activeFlag);
The query works perfectly fine as long as I provide a List<Integer> with only one entry. As soon as there is another entry it will throw a java.sql.SQLException: Borrow prepareStatement from pool failed.
As you can see in the screenshot, the call is different from the actual query (LOCATIONID(?,?) vs. LOCATIONID(?)).
I have a workaround which just executes the queries separately but that's not the way I want to have it in the long term.
If you need further information please tell me.
UPDATE:
To prove my point I removed all the other attributes and still get the same error:
Query is now: List<LandscapeLocationEntity> findByIdLocationIdIn(List<Integer> locations);
I am using DynamoDBMapper for a class, let's say "User" (username being the primary key) which has a field on it which says "Status". It is a Hash+Range key table, and everytime a user's status changes (changes are extremely infrequent), we add a new entry to the table alongwith the timestamp (which is the range key). To fetch the current status, this is what I am doing:
DynamoDBQueryExpression expr =
new DynamoDBQueryExpression(new AttributeValue().withS(userName))
.withScanIndexForward(false).withLimit(1);
PaginatedQueryList<User> result =
this.getMapper().query(User.class, expr);
if(result == null || result.size() == 0) {
return null;
}
for(final User user : result) {
System.out.println(user.getStatus());
}
This for some reason, is printing all the statuses a user has had till now. I have set scanIndexForward to false so that it is in descending order and I put limit of 1. I am expecting this to return the latest single entry in the table for that username.
However, when I even look into the wire logs of the same, I see a huge amount of entries being returned, much more than 1. For now, I am using:
final String currentStatus = result.get(0).getStatus();
What I am trying to understand here is, what is whole point of the withLimit clause in this case, or am I doing something wrong?
In March 2013 on the AWS forums a user complained about the same problem.
A representative from Amazon sent him to use the queryPage function.
It seems as if the limit is not preserved for elements but rather a limit on chunk of elements retrieved in a single API call, and the queryPage might help.
You could also look into the pagination loading strategy configuration
Also, you can always open a Github issue for the team.
I notice strange behavior when querying the GAE datastore. Under certain circumstances Filter does not work for integer fields. The following java code reproduces the problem:
log.info("start experiment");
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
int val = 777;
// create and store the first entity.
Entity testEntity1 = new Entity(KeyFactory.createKey("Test", "entity1"));
Object value = new Integer(val);
testEntity1.setProperty("field", value);
datastore.put(testEntity1);
// create the second entity by using BeanUtils.
Test test2 = new Test(); // just a regular bean with an int field
test2.setField(val);
Entity testEntity2 = new Entity(KeyFactory.createKey("Test", "entity2"));
Map<String, Object> description = BeanUtilsBean.getInstance().describe(test2);
for(Entry<String,Object> entry:description.entrySet()){
testEntity2.setProperty(entry.getKey(), entry.getValue());
}
datastore.put(testEntity2);
// now try to retrieve the entities from the database...
Filter equalFilter = new FilterPredicate("field", FilterOperator.EQUAL, val);
Query q = new Query("Test").setFilter(equalFilter);
Iterator<Entity> iter = datastore.prepare(q).asIterator();
while (iter.hasNext()) {
log.info("found entity: " + iter.next().getKey());
}
log.info("experiment finished");
the log looks like this:
INFO: start experiment
INFO: found entity: Test("entity1")
INFO: experiment finished
For some reason it only finds the first entity even though both entities are actually stored in the datastore and both 'field' values are 777 (I see it in the Datastore Viewer)! Why does it matter how the entity is created? I would like to use BeanUtils, because it is convenient.
The same problem occurs on the local devserver and when deployed to GAE.
Ok I found out what is going on. The "problem" is that for some reason BeanUtils transforms integers into strings. A string looks exactly the same in the datastore viewer but it is of course not the same. This pretty much fooled me. I should have studied the apache BeanUtils manual or something.
Have you given the datastore 1 second after writing before you query the data? Sometimes you don't have to (ancestor queries, perhaps) but other times you do. The GAE/J documentation will give full details.
The fact that the entities are created with BeanUtils is completely irrelevant. If the entities are in the datastore (you can see them in the viewer) and the field value is indexed (it does not show "unindexed" next to value in datastore viewer) then you can query for them using a filter. This works... its is the basic functionality of the datastore.
Given the entities are created and indexed, I suggest that Ian Marshalls suggestion is probably correct. To test this, go to the preferences for App Engine and un-tick "Enable local HRD support". This will ensure that when you write an Entity you can query for it immediately.
It is not important if you store an Integer or int or any other numeric value - they are all stored as a long value internally and when you read your value back you will get a Long (despite storing an Integer)