Hibernate row count on Criteria with already set Projection

Hibernate row count on Criteria with already set Projection - java

For a grid component I have in my web applications I have a "GridModel" class which gets passed a Criteria.
The GridModel class has a method to get the results for a specific page by adding setFirstResult(...) and setMaxResults(...) to the Criteria.
But I also need the total count of rows for the Criteria, so I have the following method:
public int getAvailableRows() {
Criteria c = criteriaProvider.getCriteria();
c.setProjection(Projections.rowCount());
return((Long)c.uniqueResult()).intValue();
}
This worked perfectly, but now I have a grid that requires a Criteria that already uses setProjection() in combination with setResultTransformer(). It seems that the getAvailableRows() method above overrides the setProjection() of the original Criteria creating wrong results.
Can I wrap a count Criteria around the original Criteria instead somehow? Or how would I solve this?

I've had a similar experience when trying to use the Projections.rowCount() in conjunction with a groupBy expression. I was able to circumvent things in a slightly 'hacky' manner by:
Remembering the previous projection and result transformer
Setting the projection on the Criteria to be a modified version (see below)
Perform the row count DB hit
Restore the previous projection + transformer so the Criteria can be used for actual result retrieving if
final Projection originalProjection = criteriaImpl.getProjection();
final ResultTransformer originalResultTransformer =
criteriaImpl.getResultTransformer();
final Projection rowCountProjection;
// If we identify that we have a function with a group by clause
// we need to handle it in a special fashion
if ( originalProjection != null && originalProjection.isGrouped() )
{
final CriteriaQueryTranslator criteriaQueryTranslator =
new CriteriaQueryTranslator(
(SessionFactoryImplementor)mySessionFactory,
criteriaImpl,
criteriaImpl.getEntityOrClassName(),
CriteriaQueryTranslator.ROOT_SQL_ALIAS );
rowCountProjection = Projections.projectionList()
.add( Projections.rowCount() )
.add( Projections.sqlGroupProjection(
// This looks stupid but is seemingly required to ensure we have a valid query
"count(count(1))",
criteriaQueryTranslator.getGroupBy(),
new String[]{}, new Type[]{} ) );
}
else
{
rowCountProjection = Projections.rowCount();
}
// Get total count of elements by setting a count projection
final Long rowCount =
(Long)criteria.setProjection( rowCountProjection ).uniqueResult();
A few caveats here:
This still wont give the expected results if you try and give it a criteria with a single sum projection as that is not considered an isGrouped() projection - it will splat the sum with a count. I don't consider this an issue because getting the rowcount for an expression of that nature probably doesnt make sense
When I was dealing with this I wrote some unit tests to make sure rowcount was as expected without projections, with property based projections and with groupby projections but I've written this from memory so can't guarantee small kinks won't need ironing out

Related

entityManager.createQuery() taking lot of time to build query and bind the parameters. Performance affected

We are using Spring JPA criteria query ( javax.persistence.criteria.CriteriaQuery) to fetch data from database. We use the javax.persistence.criteria.Predicate to build the predicates. We have 1500 'OR' predicates in one query. And each predicate having 6 'AND' predicates.
SELECT (*) FROM TABLE_ABC as T1 WHERE (t1.column1 = 'c11' AND
t1.column2 = 'c12' AND t1.column3 = 'c13' AND t1.column4 = 'c14' AND
t1.column5 = 'c15')
OR
(t1.column1 = 'c21' AND t1.column2 = 'c22'
AND t1.column3 = 'c23' AND t1.column4 = 'c24' AND t1.column5 = 'c25')
OR
(t1.column1 = 'c31' AND t1.column2 = 'c32'
AND t1.column3 = 'c33' AND t1.column4 = 'c34' AND t1.column5 = 'c35').....
Earlier we were using "org.hibernate.Criteria" and using 'Conjuction' and 'Disjunction' to build the same query. This approach was working efficiently. As the "org.hibernate.Criteria" is depricated we are moving to the javax-criteriaquery package. We are facing big degradation in performance. The drill down of logs indicates that time is consumed more in the step
=> entityManager.createQuery(), Which performs following operations
CriteriaCompiler.compile
CriteriaQueryImpl$1.buildCompiledQuery
CriteriaCompiler$1$1.bind
These operations are the more time consuming.
Is there any solution to make these execution faster?
Is 'javax.persistence.criteria.CriteriaQuery' the way forward?
Please help here!
Please see code below:
#Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.READ_COMMITTED)
public getData(List<DataDAO> dataReqList) {
{
CriteriaBuilder builder = em.getCriteriaBuilder();
CriteriaQuery<DataReq> criteriaQuery = builder.createQuery(DataReq.class);
Root<DataReq> dataReqRoot = criteriaQuery.from(DataReq.class);
Predicate[] predicateArr = new Predicate[dataReqList.size()];
for (DataDAO dataReq : dataReqList) {
predicateArr[i] = builder.and(
builder.equal(dataReqRoot.get(TEST_S), dataReq.getS()),
builder.equal(dataReqRoot.get(TEST_T2), dataReq.getT2()),
builder.equal(dataReqRoot.get(K1), dataReq.getK1()),
builder.equal(dataReqRoot.get(K2), dataReq.getK2()),
builder.equal(dataReqRoot.get(TEST_P), dataReq.getP()),
builder.equal(dataReqRoot.get(TEST_T1),
dataReq.getT1(),
builder.equal(dataReqRoot.get(TEST_I), dataReq.getI()));
i++;
}
List<Data> dataResultList = getResultList(builder, criteriaQuery, predicateArr);
}
private List<Data> getResultList(CriteriaBuilder builder,
CriteriaQuery<DataReq> criteriaQuery, Predicate[] predicateArr) {
criteriaQuery.where(builder.or(predicateArr));
TypedQuery<DataReq> query = entityManager.createQuery(criteriaQuery);
List<DataReq> dataReqList = null;
try {
dataReqList = query.getResultList();
} catch(Exception e) {
...
}
return convertToData(dataReqList);
}
The same query with "org.hibernate.Criteria" and using 'Conjuction' and 'Disjunction' works very efficiently in milliseconds.

For context, depending on the database you're using, this is like a dynamic IN predicate with row value expressions. If supported, you could also write:
WHERE (t1.column1, t1.column2, t1.column3, t1.column4, t1.column5, t1.column6) IN (
('c11', 'c12', 'c13', 'c14', 'c15', 'c16'),
('c21', 'c22', 'c23', 'c24', 'c25', 'c26'),
...
)
Such long IN lists will turn into problems not only in client libraries that produce dynamic SQL, but also on the server side. You mentioned bind variables, perhaps the old API you were using was not using bind variables after all, but inlined all the values into the query. I've seen that perform much better in Oracle for large sets of parameters, so this is one of the cases where inline values might be better than bind variables.
Since you're using Hibernate, you could try enabling
<property name="hibernate.criteria.literal_handling_mode" value="bind"/>
See HHH-9576 and this answer
A possibly even better solution using arrays
The above would (maybe) help restore the previous performance you've experienced, but depending on your IN list size, there might be even better solutions. I've blogged about an alternative where you could use arrays instead of individual bind values, in case you're using Oracle or PostgreSQL.
A possibly even better solution using temporary tables
Another option that I've seen work very often is to use temporary tables of the form (assuming Oracle):
CREATE GLOBAL TEMPORARY TABLE predicates (
column1 VARCHAR2(100),
column2 VARCHAR2(100),
column3 VARCHAR2(100),
column4 VARCHAR2(100),
column5 VARCHAR2(100),
column6 VARCHAR2(100)
)
And then, prior to running your query, batch insert all the various predicate values into that table and then semi join it:
WHERE (t1.column1, t1.column2, t1.column3, t1.column4, t1.column5, t1.column6) IN (
SELECT column1, column2, column3, column4, column5, column6
FROM predicates
)
If you don't have temporary tables, you can try ordinary tables instead, and add a transaction_id column to it, cleaning up its contents manually after your queries.

Page<> vs Slice<> when to use which?

I've read in Spring Jpa Data documentation about two different types of objects when you 'page' your dynamic queries made out of repositories.
Page and Slice
Page<User> findByLastname(String lastname, Pageable pageable);
Slice<User> findByLastname(String lastname, Pageable pageable);
So, I've tried to find some articles or anything talking about main difference and different usages of both, how performance changes and how sorting affercts both type of queries.
Does anyone has this type of knowledge, articles or some good source of information?

Page extends Slice and knows the total number of elements and pages available by triggering a count query. From the Spring Data JPA documentation:
A Page knows about the total number of elements and pages available. It does so by the infrastructure triggering a count query to calculate the overall number. As this might be expensive depending on the store used, Slice can be used as return instead. A Slice only knows about whether there’s a next Slice available which might be just sufficient when walking through a larger result set.

The main difference between Slice and Page is the latter provides non-trivial pagination details such as total number of records(getTotalElements()), total number of pages(getTotalPages()), and next-page availability status(hasNext()) that satisfies the query conditions, on the other hand, the former only provides pagination details such as next-page availability status(hasNext()) compared to its counterpart Page. Slice gives significant performance benefits when you deal with a colossal table with burgeoning records.
Let's dig deeper into its technical implementation of both variants.
Page
static class PagedExecution extends JpaQueryExecution {
#Override
protected Object doExecute(final AbstractJpaQuery repositoryQuery, JpaParametersParameterAccessor accessor) {
Query query = repositoryQuery.createQuery(accessor);
return PageableExecutionUtils.getPage(query.getResultList(), accessor.getPageable(),
() -> count(repositoryQuery, accessor));
}
private long count(AbstractJpaQuery repositoryQuery, JpaParametersParameterAccessor accessor) {
List<?> totals = repositoryQuery.createCountQuery(accessor).getResultList();
return (totals.size() == 1 ? CONVERSION_SERVICE.convert(totals.get(0), Long.class) : totals.size());
}
}
If you observe the above code snippet, PagedExecution#doExecute method underlyingly calls PagedExecution#count method to get the total number of records satisfying the condition.
Slice
static class SlicedExecution extends JpaQueryExecution {
#Override
protected Object doExecute(AbstractJpaQuery query, JpaParametersParameterAccessor accessor) {
Pageable pageable = accessor.getPageable();
Query createQuery = query.createQuery(accessor);
int pageSize = 0;
if (pageable.isPaged()) {
pageSize = pageable.getPageSize();
createQuery.setMaxResults(pageSize + 1);
}
List<Object> resultList = createQuery.getResultList();
boolean hasNext = pageable.isPaged() && resultList.size() > pageSize;
return new SliceImpl<>(hasNext ? resultList.subList(0, pageSize) : resultList, pageable, hasNext);
}
}
If you observe the above code snippet, to findout whether next set of results present or not (for hasNext()) the SlicedExecution#doExecute method always fetch extra one element(createQuery.setMaxResults(pageSize + 1)) and skip it based on the pageSize condition(hasNext ? resultList.subList(0, pageSize) : resultList).
Application:
Page
Use when UI/GUI expects to displays all the results at the initial stage of the search/query itself, with page numbers to traverse(ex., bankStatement with pagenumbers)
Slice
Use when UI/GUI expects to doesnot interested to show all the results at the initial stage of the search/query itself, but intent to show the records to traverse based on scrolling or next button click event (ex., facebook feed search)

Lucene 6 - How to influence ranking with numeric value?

I am new to Lucene, so apologies for any unclear wording. I am working on an author search engine. The search query is the author name. The default search results are good - they return the names that match the most. However, we want to rank the results by author popularity as well, a blend of both the default similarity and a numeric value representing the circulations their titles have. The problem with the default results is it returns authors nobody is interested in, and while I can rank by circulation alone, the top result is generally not a great match in terms of name. I have been looking for days for a solution for this.
This is how I am building my index:
IndexWriter writer = new IndexWriter(FSDirectory.open(Paths.get(INDEX_LOCATION)),
new IndexWriterConfig(new StandardAnalyzer()));
writer.deleteAll();
for (Contributor contributor : contributors) {
Document doc = new Document();
doc.add(new TextField("name", contributor.getName(), Field.Store.YES));
doc.add(new StoredField("contribId", contributor.getContribId()));
doc.add(new NumericDocValuesField("sum", sum));
writer.addDocument(doc);
}
writer.close();
The name is the field we want to search on, and the sum is the field we want to weight our search results with (but still taking into account the best match for the author name). I'm not sure if adding the sum to the document is the correct thing to do in this situation. I know that there will need to be some experimentation to figure out how to best blend the weighting of the two factors, but my problem is I don't know how to do it in the first place.
Any examples I've been able to find are either pre-Lucene 4 or don't seem to work. I thought this was what I was looking for, but it doesn't seem to work. Help appreciated!

As demonstrated in the blog post you linked, you could use a CustomScoreQuery; this would give you a lot of flexibility and influence over the scoring process, but it is also a bit overkill. Another possibility is to use a FunctionScoreQuery; since they behave differently, I will explain both.
Using a FunctionScoreQuery
A FunctionScoreQuery can modify a score based on a field.
Let's say you create you are usually performing a search like this:
Query q = .... // pass the user input to the QueryParser or similar
TopDocs hits = searcher.search(query, 10); // Get 10 results
Then you can modify the query in between like this:
Query q = .....
// Note that a Float field would work better.
DoubleValuesSource boostByField = DoubleValuesSource.fromLongField("sum");
// Create a query, based on the old query and the boost
FunctionScoreQuery modifiedQuery = new FunctionScoreQuery(q, boostByField);
// Search as usual
TopDocs hits = searcher.search(query, 10);
This will modify the query based on the value of field. Sadly, however, there isn't a possibility to control the influence of the DoubleValuesSource (besides by scaling the values during indexing) - at least none that I know of.
To have more control, consider using the CustomScoreQuery.
Using a CustomScoreQuery
Using this kind of query will allow you to modify a score of each result any way you like. In this context we will use it to alter the score based on a field in the index. First, you will have to store your value during indexing:
doc.add(new StoredField("sum", sum));
Then we will have to create our very own query class:
private static class MyScoreQuery extends CustomScoreQuery {
public MyScoreQuery(Query subQuery) {
super(subQuery);
}
// The CustomScoreProvider is what actually alters the score
private class MyScoreProvider extends CustomScoreProvider {
private LeafReader reader;
private Set<String> fieldsToLoad;
public MyScoreProvider(LeafReaderContext context) {
super(context);
reader = context.reader();
// We create a HashSet which contains the name of the field
// which we need. This allows us to retrieve the document
// with only this field loaded, which is a lot faster.
fieldsToLoad = new HashSet<>();
fieldsToLoad.add("sum");
}
#Override
public float customScore(int doc_id, float currentScore, float valSrcScore) throws IOException {
// Get the result document from the index
Document doc = reader.document(doc_id, fieldsToLoad);
// Get boost value from index
IndexableField field = doc.getField("sum");
Number number = field.numericValue();
// This is just an example on how to alter the current score
// based on the value of "sum". You will have to experiment
// here.
float influence = 0.01f;
float boost = number.floatValue() * influence;
// Return the new score for this result, based on the
// original lucene score.
return currentScore + boost;
}
}
// Make sure that our CustomScoreProvider is being used.
#Override
public CustomScoreProvider getCustomScoreProvider(LeafReaderContext context) {
return new MyScoreProvider(context);
}
}
Now you can use your new Query class to modify an existing query, similar to the FunctionScoreQuery:
Query q = .....
// Create a query, based on the old query and the boost
MyScoreQuery modifiedQuery = new MyScoreQuery(q);
// Search as usual
TopDocs hits = searcher.search(query, 10);
Final remarks
Using a CustomScoreQuery, you can influence the scoring process in all kinds of ways. Remember however that the method customScore is called for each search result - so don't perform any expensive computations there, as this would severely slow down the search process.
I've creating a small gist of a full working example of the CustomScoreQuery here: https://gist.github.com/philippludwig/14e0d9b527a6522511ae79823adef73a

Modifying a large set of objects using JPA / EclipseLink

I need to iterate 50k objects and change some fields in them.
I'm limited in memory so I don't want to bring all 50k objects into memory at once.
I thought doing it with the following code using cursor, but I was wondering whether all the objects I've processes using the cursor are left in the Entity Manager cache.
The reason I don't want to do it with offset and limit is because the database needs to work much harder since each page is a complete new query.
From previous experience once the Entity manager cache gets bigger, updates become real slow.
So usually I call flush and clear after every few hundreds of updates.
The problem here is that flushing / clearing will break the cursor.
I will be happy to learn the best approach of updating a large set of objects without loading them all into memory.
Additional information on how EclipseLink cursor works in such scenraio will be valuable too.
JpaQuery<T> jQuery = (JpaQuery<T>) query;
jQuery.setHint(QueryHints.RESULT_SET_TYPE, ResultSetType.ForwardOnly)
.setHint(QueryHints.SCROLLABLE_CURSOR, true);
Cursor cursor = jQuery.getResultCursor();
Iterator<MyObj> cursorIterator = cursor.iterator();
while (cursorIterator.hasNext()) {
MyObj myObj = cursorIterator.next();
ChangeMyObj(myObj);
}
cursor.close();

Use pagination + entityManager.clear() after each page. Also execute every page in a single transaction OR you will have to create/get a new EntityManager after an exception occurs (ar least with Hibernate: the EntityManager instance could be in an inconsistent state after an exception).

Try this sample code:
List results;
int index= 0;
int max = 100;
do {
Query query= manager.createQuery("JPQL QUERY");
query.setMaxResults(max).
setFirstResult(index);
results = query.getResultList( );
Iterator it = results.iterator( );
while (it.hasNext( )) {
Object c = (Object)it.next( );
}
entityManager.clear( );
index = index + results.getSize( );
} while (results.size( ) > 0);

Is there a way to get the count size for a JPA Named Query with a result set?

I like the idea of Named Queries in JPA for static queries I'm going to do, but I often want to get the count result for the query as well as a result list from some subset of the query. I'd rather not write two nearly identical NamedQueries. Ideally, what I'd like to have is something like:
#NamedQuery(name = "getAccounts", query = "SELECT a FROM Account")
.
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = q.getCount();
So let's say m is 10, s is 0 and there are 400 rows in Account. I would expect r to have a list of 10 items in it, but I'd want to know there are 400 rows total. I could write a second #NamedQuery:
#NamedQuery(name = "getAccountCount", query = "SELECT COUNT(a) FROM Account")
but it seems a DRY violation to do that if I'm always just going to want the count. In this simple case it is easy to keep the two in sync, but if the query changes, it seems less than ideal that I have to update both #NamedQueries to keep the values in line.
A common use case here would be fetching some subset of the items, but needing some way of indicating total count ("Displaying 1-10 of 400").

So the solution I ended up using was to create two #NamedQuerys, one for the result set and one for the count, but capturing the base query in a static string to maintain DRY and ensure that both queries remain consistent. So for the above, I'd have something like:
#NamedQuery(name = "getAccounts", query = "SELECT a" + accountQuery)
#NamedQuery(name = "getAccounts.count", query = "SELECT COUNT(a)" + accountQuery)
.
static final String accountQuery = " FROM Account";
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = ((Long)em.createNamedQuery("getAccounts.count").getSingleResult()).intValue();
Obviously, with this example, the query body is trivial and this is overkill. But with much more complex queries, you end up with a single definition of the query body and can ensure you have the two queries in sync. You also get the advantage that the queries are precompiled and at least with Eclipselink, you get validation at startup time instead of when you call the query.
By doing consistent naming between the two queries, it is possible to wrap the body of the code to run both sets just by basing the base name of the query.

Using setFirstResult/setMaxResults do not return a subset of a result set, the query hasn't even been run when you call these methods, they affect the generated SELECT query that will be executed when calling getResultList. If you want to get the total records count, you'll have to SELECT COUNT your entities in a separate query (typically before to paginate).
For a complete example, check out Pagination of Data Sets in a Sample Application using JSF, Catalog Facade Stateless Session, and Java Persistence APIs.

oh well you can use introspection to get named queries annotations like:
String getNamedQueryCode(Class<? extends Object> clazz, String namedQueryKey) {
NamedQueries namedQueriesAnnotation = clazz.getAnnotation(NamedQueries.class);
NamedQuery[] namedQueryAnnotations = namedQueriesAnnotation.value();
String code = null;
for (NamedQuery namedQuery : namedQueryAnnotations) {
if (namedQuery.name().equals(namedQueryKey)) {
code = namedQuery.query();
break;
}
}
if (code == null) {
if (clazz.getSuperclass().getAnnotation(MappedSuperclass.class) != null) {
code = getNamedQueryCode(clazz.getSuperclass(), namedQueryKey);
}
}
//if not found
return code;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hibernate row count on Criteria with already set Projection - java

Related

entityManager.createQuery() taking lot of time to build query and bind the parameters. Performance affected

Page<> vs Slice<> when to use which?

Lucene 6 - How to influence ranking with numeric value?

Modifying a large set of objects using JPA / EclipseLink

Is there a way to get the count size for a JPA Named Query with a result set?

Categories

Resources