Performance OpenJPA query (3000+ records) is slow

Performance OpenJPA query (3000+ records) is slow - java

I'm using Websphere Application Server 7 with buildin OpenJPA 1.2.3 and an Oracle database. I have the following entity:
#NamedNativeQuery(name=Contract.GIVE_ALL_CONTRACTS,
query="SELECT number, name \n" +
"FROM contracts \n" +
"WHERE startdate <= ?1 \n" +
"AND enddate > ?1",
resultSetMapping = Contract.GIVE_ALL_CONTRACTS_MAPPING)
#SqlResultSetMapping(name = Contract.GIVE_ALL_CONTRACTS_MAPPING,
entities = { #EntityResult(entityClass = Contract.class, fields = {
#FieldResult(name = "number", column = "number"),
#FieldResult(name = "name", column = "name")
})
})
#Entity
public class Contract {
public static final String GIVE_ALL_CONTRACTS = "Contract.giveAllContracts";
public static final String GIVE_ALL_CONTRACTS_MAPPING = "Contract.giveAllContractsMapping";
#Id
private Integer number;
private String name;
public Integer getNumber() {
return number;
}
public String getName() {
return name;
}
}
And the following code to retrieve the contracts:
Query query = entityManager.createNamedQuery(Contract.GIVE_ALL_CONTRACTS);
query.setParameter(1, referenceDate);
List contracts = query.getResultList();
entityManager.clear();
return contracts;
The retrieved contracts are passed to a webservice.
Executing this query in Oracle developer takes around 0,35 seconds for 3608 records.
The call to query.getResultList() takes around 4 seconds.
With a logger in the constuctor of the entity, it logs that there are about 10-20 entities created with the same timestamp. Then 0,015 seconds it does something else. I guess OpenJPA stuff.
Is there a way to speed up OpenJPA? Or is the only solution caching?

Object creation may have its fair share in the performance hit. While running your code in the server, you're not only querying the database but also you allocate memory and create a new Contract object for each row. An expanding heap or garbage collection cycle may count for idle periods that you observed.
I'd suggest you skim through OpenJPA documentation on how to process large results sets.

I suggest you downloading VisualVM and set up a profiling for the packages involved. VisualVM can show the time spent in different methods that will sum up to 0.35sec in your case theoretically. You will be able to analyze the distribution of the total time between your code, OpenJPA and the network IO. This will help you to identify the bottleneck.

Related

Hibernate+ postgres batch update does not work

Is there any way to do batch updates ?
i create a simple entity
#Entity
#Getter
#Setter
public class A {
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "A_ID_GENERATOR")
#SequenceGenerator(name = "A_ID_GENERATOR", sequenceName = "a_id_seq")
private Long id;
private String name;
}
next step i generated 10000 objects of class A and put them to db
next step i get list of A from db ,set new name and save them again
#PutMapping
#Transactional
public String updateAllTest(){
var list=aRepository.findAll();
for (int i = 0; i <list.size() ; i++) {
list.get(i).setName("AA"+i);
}
return "OK";
}
what did i expect- i expect that hibernate will do batch update
and hibernate did it -hibernate statistics says - it execute 200 batches ( batch size=500)
next i go to db log files and what i see there- there are no batches- only single updates -10 000 rows
it looks like same with batch insert without adding reWriteBatchedInserts=true to JDBC driver
so is there any way to do batch updates in postgres with hibernate or no?

The key thing to understand is that reusing the same server side statement handle for multiple executions is batching. The log is just telling you about every execution, but that doesn't mean it is slow. It's doing exactly what it should do.

Spring data - insert data depending on previous insert

I need to save data into 2 tables (an entity and an association table).
I simply save my entity with the save() method from my entity repository.
Then, for performances, I need to insert rows into an association table in native sql. The rows have a reference on the entity I saved before.
The issue comes here : I get an integrity constraint exception concerning a Foreign Key. The entity saved first isn't known in this second query.
Here is my code :
The repo :
public interface DistributionRepository extends JpaRepository<Distribution, Long>, QueryDslPredicateExecutor<Distribution> {
#Modifying
#Query(value = "INSERT INTO DISTRIBUTION_PERIMETER(DISTRIBUTION_ID, SERVICE_ID) SELECT :distId, p.id FROM PERIMETER p "
+ "WHERE p.id in (:serviceIds) AND p.discriminator = 'SRV' ", nativeQuery = true)
void insertDistributionPerimeter(#Param(value = "distId") Long distributionId, #Param(value = "serviceIds") Set<Long> servicesIds);
}
The service :
#Service
public class DistributionServiceImpl implements IDistributionService {
#Inject
private DistributionRepository distributionRepository;
#Override
#Transactional
public DistributionResource distribute(final DistributionResource distribution) {
// 1. Entity creation and saving
Distribution created = new Distribution();
final Date distributionDate = new Date();
created.setStatus(EnumDistributionStatus.distributing);
created.setDistributionDate(distributionDate);
created.setDistributor(agentRepository.findOne(distribution.getDistributor().getMatricule()));
created.setDocument(documentRepository.findOne(distribution.getDocument().getTechId()));
created.setEntity(entityRepository.findOne(distribution.getEntity().getTechId()));
created = distributionRepository.save(created);
// 2. Association table
final Set<Long> serviceIds = new HashSet<Long>();
for (final ServiceResource sr : distribution.getServices()) {
serviceIds.add(sr.getTechId());
}
// EXCEPTION HERE
distributionRepository.insertDistributionPerimeter(created.getId(), serviceIds);
}
}
The 2 queries seem to be in different transactions whereas I set the #Transactionnal annotation. I also tried to execute my second query with an entityManager.createNativeQuery() and got the same result...

Invoke entityManager.flush() before you execute your native queries or use saveAndFlush instead.
I your specific case I would recommend to use
created = distributionRepository.saveAndFlush(created);
Important: your "native" queries must use the same transaction! (or you need a now transaction isolation level)
you also wrote:
I don't really understand why the flush action is not done by default
Flushing is handled by Hibernate (it can been configured, default is "auto"). This mean that hibernate will flush the data at any point in time. But always before you commit the transaction or execute an other SQL statement VIA HIBERNATE. - So normally this is no problem, but in your case, you bypass hibernate with your native query, so hibernate will not know about this statement and therefore it will not flush its data.
See also this answer of mine: https://stackoverflow.com/a/17889017/280244 about this topic

Custom Iterator in Datastore shows zero using JAVA

I used my datastore entity to store an int value number=0. In an add function I used count = number++. But first time app push to GAE, it shows 0. Then it start from 1. So I changed as int number =10 even though I get the value is zero but datastore store as 10. How can I get the updated value in java page after inserting a record? When I try to get the current value, it shows zero but in datastore it store as 10. Please help me to out this problem. how to get value 10. This happens only at the first time of deploy and first ticket only then it shows correct value (iterator value)
thanks
here my code
#Entity
public class Ticket {
//static int nextID = 17;
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String title;
private Text description;
static int nextID = 10;
private int current;
public Ticket(String title, Text description) {
current = nextID++;
this.title = title;
this.priority = priority;
this.description = description;
}

Google Cloud Platform has multiple datacenters and they do not always hold the same state of the same record. This is especially true if you query for the value immediately after increasing it. You can use ancestor queries to retrieve the Entity, that will make it up to date and consistent. Please look at the article [1] for details.
Here's a link with ancestor query [2]. The idea is that when you use an ancestor query, it forces the query to return data after all changes are finalized (for that query). This ensures up-to-date strong consistency.
[1] - https://cloud.google.com/developers/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/
[2] - https://cloud.google.com/appengine/docs/java/datastore/queries#Java_Ancestor_queries
===---=== Example (from the 2nd link I provided)
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Entity tom = new Entity("Person", "Tom");
Key tomKey = tom.getKey();
datastore.put(tom);
Entity weddingPhoto = new Entity("Photo", tomKey);
weddingPhoto.setProperty("imageURL",
"http://domain.com/some/path/to/wedding_photo.jpg");
Entity babyPhoto = new Entity("Photo", tomKey);
babyPhoto.setProperty("imageURL",
"http://domain.com/some/path/to/baby_photo.jpg");
Entity dancePhoto = new Entity("Photo", tomKey);
dancePhoto.setProperty("imageURL",
"http://domain.com/some/path/to/dance_photo.jpg");
Entity campingPhoto = new Entity("Photo");
campingPhoto.setProperty("imageURL",
"http://domain.com/some/path/to/camping_photo.jpg");
List<Entity> photoList = Arrays.asList(weddingPhoto, babyPhoto,
dancePhoto, campingPhoto);
datastore.put(photoList);
Query photoQuery = new Query("Photo")
.setAncestor(tomKey);
// This returns weddingPhoto, babyPhoto, and dancePhoto,
// but not campingPhoto, because tom is not an ancestor
List<Entity> results = datastore.prepare(photoQuery)
.asList(FetchOptions.Builder.withDefaults());
===---===
See these parts:
Entity weddingPhoto = new Entity("Photo", tomKey);
Entity dancePhoto = new Entity("Photo", tomKey);
This generates an Entity with an ancestor key "tomKey".
Now save the Entities into the Datastore:
List<Entity> photoList = Arrays.asList(weddingPhoto, babyPhoto,
dancePhoto, campingPhoto);
datastore.put(photoList);
When you need to fetch the results, perform a special query:
Query photoQuery = new Query("Photo")
.setAncestor(tomKey);
This makes sure the photoQuery isn't just a random query aimed at any random datacenter pulling out any data; it makes sure it fetches the up-to-date data from the datastore.

Memory leak with paged JPA queries under JBoss AS 5.1

I'm trying to integrate Hibernate Search into one of the projects I'm currently working on. The first step in such an endeavour is fairly simply - index all the existing entities with Hibernate Search(which uses Lucene under the hood). Many of the tables mapped to entities in the domain model contain a lot of records(> 1 million) and I'm using simple pagination technique to split them into smaller units. However I'm experiencing some memory leak while indexing the entities. Here's my code:
#Service(objectName = "LISA-Admin:service=HibernateSearch")
#Depends({"LISA-automaticStarters:service=CronJobs", "LISA-automaticStarters:service=InstallEntityManagerToPersistenceMBean"})
public class HibernateSearchMBeanImpl implements HibernateSearchMBean {
private static final int PAGE_SIZE = 1000;
private static final Logger LOGGER = LoggerFactory.getLogger(HibernateSearchMBeanImpl.class);
#PersistenceContext(unitName = "Core")
private EntityManager em;
#Override
#TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
public void init() {
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
Session s = (Session) em.getDelegate();
SessionFactory sf = s.getSessionFactory();
Map<String, EntityPersister> classMetadata = sf.getAllClassMetadata();
for (String key : classMetadata.keySet()) {
LOGGER.info("Class: " + key + "\nEntity name: " + classMetadata.get(key).getEntityName());
Class entityClass = classMetadata.get(key).getMappedClass(EntityMode.POJO);
LOGGER.info("Class: " + entityClass.getCanonicalName());
if (entityClass != null && entityClass.getAnnotation(Indexed.class) != null) {
index(fullTextEntityManager, entityClass, classMetadata.get(key).getEntityName());
}
}
}
#TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
public void index(FullTextEntityManager pFullTextEntityManager, Class entityClass, String entityName) {
LOGGER.info("Class " + entityClass.getCanonicalName() + " is indexed by hibernate search");
int currentResult = 0;
Query tQuery = em.createQuery("select c from " + entityName + " as c order by oid asc");
tQuery.setFirstResult(currentResult);
tQuery.setMaxResults(PAGE_SIZE);
List entities;
do {
entities = tQuery.getResultList();
indexUnit(pFullTextEntityManager, entities);
currentResult += PAGE_SIZE;
tQuery.setFirstResult(currentResult);
} while (entities.size() == PAGE_SIZE);
LOGGER.info("Finished indexing for " + entityClass.getCanonicalName() + ", current result is " + currentResult);
}
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void indexUnit(FullTextEntityManager pFullTextEntityManager, List entities) {
for (Object object : entities) {
pFullTextEntityManager.index(object);
LOGGER.info("Indexed object with id " + ((BusinessObject)object).getOid());
}
}
}
It's just a simple MBean, whose init method I execute manually via JBoss's JMX console. When I monitor the execution of the method in the JVisualVM I see that the memory usage constantly grows until all the heap is consumed and although a lot of garbage collections happen no memory get freed that leads me to believe I have introduced a memory leak in my code. I however cannot spot the offending code, so I'm hoping for your assistance in locating it.
The problem is certainly not in the indexing itself, because I get the leak even without it, so I think I'm not doing the pagination right. The only reference to the entities that I have, however, is the list entities, that should be easily garbage collected after each iteration of the loop calling indexUnit.
Thanks in advance for your help.
EDIT
Changing the code to
List entities;
do {
Query tQuery = em.createQuery("select c from " + entityName + " as c order by oid asc");
tQuery.setFirstResult(currentResult);
tQuery.setMaxResults(PAGE_SIZE);
entities = tQuery.getResultList();
indexUnit(pFullTextEntityManager, entities);
currentResult += PAGE_SIZE;
tQuery.setFirstResult(currentResult);
} while (entities.size() == PAGE_SIZE);
alleviated the problem. The leak is still there, but not as bad as it was. I guess there is something fault with the JPA query itself, keeping references it shouldn't, but who knows.

It looks like the injected EntityManager is holding on to a reference to all the entities returned from your query. It's a container managed EM so it should be closed or cleared automatically at the end of a transaction - but you're doing a bunch of non-transactional queries.
If you are just going to index the entities, you might want to call em.clear() at the end of the loop in init(). The entities will be detached (the EntityManager track changes made to them) but if they're just going to be GC'ed that shouldn't be a problem.

I don't think there is a "leak"; however, I do think that you're accumulating a high number of entities into the persistence context (yes, you are, since you're loading them) and, ultimately, eating all the memory. You need to clear the EM after each loop (without clear, paging doesn't help). Something like this:
do {
entities = tQuery.getResultList();
indexUnit(pFullTextEntityManager, entities);
pFullTextEntityManager.clear();
currentResult += PAGE_SIZE;
tQuery.setFirstResult(currentResult);
} while (entities.size() == PAGE_SIZE);

Seems like this question won't be finding a real solution. In the end I've just moved out the indexing code into a separate app - the leak is still there, but it doesn't matter that much, since the app is running to completion(with a huge heap) outside of the critical container.

Is there a way to get the count size for a JPA Named Query with a result set?

I like the idea of Named Queries in JPA for static queries I'm going to do, but I often want to get the count result for the query as well as a result list from some subset of the query. I'd rather not write two nearly identical NamedQueries. Ideally, what I'd like to have is something like:
#NamedQuery(name = "getAccounts", query = "SELECT a FROM Account")
.
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = q.getCount();
So let's say m is 10, s is 0 and there are 400 rows in Account. I would expect r to have a list of 10 items in it, but I'd want to know there are 400 rows total. I could write a second #NamedQuery:
#NamedQuery(name = "getAccountCount", query = "SELECT COUNT(a) FROM Account")
but it seems a DRY violation to do that if I'm always just going to want the count. In this simple case it is easy to keep the two in sync, but if the query changes, it seems less than ideal that I have to update both #NamedQueries to keep the values in line.
A common use case here would be fetching some subset of the items, but needing some way of indicating total count ("Displaying 1-10 of 400").

So the solution I ended up using was to create two #NamedQuerys, one for the result set and one for the count, but capturing the base query in a static string to maintain DRY and ensure that both queries remain consistent. So for the above, I'd have something like:
#NamedQuery(name = "getAccounts", query = "SELECT a" + accountQuery)
#NamedQuery(name = "getAccounts.count", query = "SELECT COUNT(a)" + accountQuery)
.
static final String accountQuery = " FROM Account";
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = ((Long)em.createNamedQuery("getAccounts.count").getSingleResult()).intValue();
Obviously, with this example, the query body is trivial and this is overkill. But with much more complex queries, you end up with a single definition of the query body and can ensure you have the two queries in sync. You also get the advantage that the queries are precompiled and at least with Eclipselink, you get validation at startup time instead of when you call the query.
By doing consistent naming between the two queries, it is possible to wrap the body of the code to run both sets just by basing the base name of the query.

Using setFirstResult/setMaxResults do not return a subset of a result set, the query hasn't even been run when you call these methods, they affect the generated SELECT query that will be executed when calling getResultList. If you want to get the total records count, you'll have to SELECT COUNT your entities in a separate query (typically before to paginate).
For a complete example, check out Pagination of Data Sets in a Sample Application using JSF, Catalog Facade Stateless Session, and Java Persistence APIs.

oh well you can use introspection to get named queries annotations like:
String getNamedQueryCode(Class<? extends Object> clazz, String namedQueryKey) {
NamedQueries namedQueriesAnnotation = clazz.getAnnotation(NamedQueries.class);
NamedQuery[] namedQueryAnnotations = namedQueriesAnnotation.value();
String code = null;
for (NamedQuery namedQuery : namedQueryAnnotations) {
if (namedQuery.name().equals(namedQueryKey)) {
code = namedQuery.query();
break;
}
}
if (code == null) {
if (clazz.getSuperclass().getAnnotation(MappedSuperclass.class) != null) {
code = getNamedQueryCode(clazz.getSuperclass(), namedQueryKey);
}
}
//if not found
return code;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Performance OpenJPA query (3000+ records) is slow - java

Related

Hibernate+ postgres batch update does not work

Spring data - insert data depending on previous insert

Custom Iterator in Datastore shows zero using JAVA

Memory leak with paged JPA queries under JBoss AS 5.1

Is there a way to get the count size for a JPA Named Query with a result set?

Categories

Resources