hibernate paging is resulting in select and update calls - java

I am trying to implement paging in hibernate and i am seeing some weird behavior from hibernate. I have tried two queries with the same result
List<SomeData> dataList = (List<SomeData>) session.getCurrentSession()
.createQuery("from SomeData ad where ad.bar = :bar order by ad.id.name")
.setString("bar", foo)
.setFirstResult(i*PAGE_SIZE)
.setMaxResults(PAGE_SIZE)
.setFetchSize(PAGE_SIZE) // page_size is 1000 in my case
.list();
and
List<SomeData> datalist= (List<SomeData>) session.getCurrentSession()
.createCriteria(SomeData.class)
.addOrder(Order.asc("id.name"))
.add(Expression.eq("bar", foo))
.setFirstResult(i*PAGE_SIZE)
.setMaxResults(PAGE_SIZE)
.list();
I have this in a for loop and each time this query runs, the run time increases. The first call returns in 100 ms, the second in 150 and the 5th call takes 2 seconds and so on.
Looking in the server (MySql 5.1.36) logs, I see that the select query does get generated properly with the LIMIT clause but for each record that is returned, hibernate for some reason also emits an update query. after the first result, it updates 1000 records, after the second result, it updates 2000 records and so on. So for a page size of 1000 and 5 iterations of the loop, the database is getting hit with 15,000 queries (5K + 4K + 3K + 2K + 1K ) Why is that happening?
I tried making a native SQL query and it worked as expected. The query is
List asins = (List) session.getCurrentSession()
.createSQLQuery("SELECT * FROM some_data where foo = :foo order by bar
LIMIT :from , :page")
.addScalar(..)
.setInteger("page", PAGE_SIZE)
.setInteger("from", (i*PAGE_SIZE))
... // set other params
.list();
My mapping class has setters/getters for the blob object as
void setSomeBlob(Blob blob){
this.someByteArray = this.toByteArray(blob)
}
void Blob getSomeBlob(){
return Hibernate.createBlob(someByteArray)
}

Turn on bound parameters logging (you can do that by setting "org.hibernate.type" log level to "TRACE") to see what specifically is being updated.
Most likely you're modifying the entities after they've been loaded - either explicitly or implicitly (e.g. returning different value from getter or using a default value somewhere).
Another possibility is that you've recently altered (one of) the table(s) you're selecting from and column default in the table doesn't match default value in the entity.

Related

Activiti HistoricProcessInstanceQuery returned with missing processVariables

I am trying to query HistoricProcessInstances from Activiti historyService including the processVariables. But some of the processes have missing variables in the returned list. I have monitored the database to see the sql query that Activiti had been created, and it turned out, the query joins 3 tables together, and can only return 20 000 records. I have approximately 550 processes with 37 processVariables each, so that's going to be 20 350 records.
In the monitored SQL query there is a rnk (rank) created to each line in the result and its always between 1 and 20 000.
...from ACT_HI_PROCINST RES
left outer join ACT_HI_VARINST VAR ON RES.PROC_INST_ID_ = VAR.EXECUTION_ID_ and VAR.TASK_ID_ is null
inner join ACT_HI_VARINST A0 on RES.PROC_INST_ID_ = A0.PROC_INST_ID_
WHERE RES.END_TIME_ is not NULL and
A0.NAME_= 'processOwner'and
A0.VAR_TYPE_ = 'string' and
A0.TEXT_ = 'user123'
) RES
) SUB WHERE SUB.rnk >= 1 AND
SUB.rnk < 20001 and
Is there any possible solution that I can increase this threshold or create a HistoricProcessInstanceQuery with include only specific processVariables?
My code snippet for the query:
processHistories = historyService.createHistoricProcessInstanceQuery()
.processDefinitionKey(processKey).variableValueEquals(VariableNames.processOwner, username)
.includeProcessVariables().finished().orderByProcessInstanceStartTime().desc().list();
You can use NativeQuery from HistoryService.createNativeHistoricProcessInstanceQuery
enter your SQL (copy from the actual historic process instance query without ranks where clause)
Likely this is more a restriction imposed by your database than Activiti/Flowable.

JPA concurrent postgresql counter column with retrieving value

Pre-requisites
Postgresql
Spring boot with spring data jpa
Problem
I have 2 tables. Products and ProductsLocationCounter. Each product has a location_id and counter_value fields among others. location_id is also the primary key of ProductsLocationCounter.
The ProductsLocationCounter is meant to keep a counter of the number of products grouped by a specific location_id whenever a new product is added.
The problem is that I also need the counter value at that point in time to be attached to the product entity.
So the flow would be like
1. create product
2. counter_value = get counter
3. increment counter
4. product.counter_value = counter_value
Of course this has to be done in a concurrent matter.
Now, i've read/tried different solutions.
this stackoverflow post suggests that i should let the db to handle the concurrency, which sounds fine by me. But the trick is that I need the value of the counter in the same transaction. So I've created a trigger
CREATE FUNCTION maintain_location_product_count_fun() RETURNS TRIGGER AS
$$
DECLARE
counter_var BIGINT;
BEGIN
IF TG_OP IN ('INSERT') THEN
select product_location_count.counter into counter_var from product_location_count WHERE id = new.location_id FOR UPDATE;
UPDATE product_location_count SET counter = counter + 1 WHERE id = new.location_id;
UPDATE products SET counter_value = counter_var WHERE location_id = new.location_id;
END IF;
RETURN NULL;
END
$$
LANGUAGE plpgsql;
CREATE TRIGGER maintain_location_product_count_trig
AFTER INSERT ON products
FOR EACH ROW
EXECUTE PROCEDURE maintain_location_product_count_fun();
and tested it with a parallel stream
IntStream.range(1, 5000)
.parallel()
.forEach(value -> {
executeInsideTransactionTemplate(status -> {
var location = locationRepository.findById(location.getId()).get();
return addProductWithLocation(location)
});
});
Got no duplication on the counter_value column. Is this trigger safe for multi-threaded apps? Haven't worked with triggers/postgresql functions before. Not sure what to expect
The second solution I tried was to add PESIMISTIC_WRITE on findById method of the ProductsLocationCounter entity but i ended up getting
cannot execute SELECT FOR UPDATE in a read-only transaction even though i was executing the code in a #Transactional annotated method ( which by default has read-only false).
The third one was to update and retrieve the value of the counter in the same statement but spring jpa doesn't allow that (nor the underlying db) as the update statement only return the number of rows affected
Is there any other solution or do i need to add something to the trigger function to make it threadsafe? Thank you
This is how I've achieved what i needed.
Long story short, i've used a sql function and I called it inside repository. I didn't need the trigger anymore.
https://stackoverflow.com/a/74208072/3018285

Cassandra Pagination Using Datastax driver 3.6: Null paging state and fetch size not honoured

We are trying to make an Application that returns paginated results from cassandra db for a UI.
UI would pass fetchSize and pagingState to our API and based on that we would return a List<MyObject> of size=fetchSize. If pagingState is passed we would resume the query from last page (as mentioned in cassandra docs : https://docs.datastax.com/en/developer/java-driver/3.6/manual/paging/)
Please note that I'm using Cassandra driver version 3.6.
But when we implemented this, Cassandra always returns all entries in the database ignoring the fetch size, which in turn results null value for ResultSet.getExecutionInfo().getPagingState(). How do I solve this?
I created 16 records in my database for MyObject and tried passing fetch size as 5 to get them. All 16 records have same partition key ID-1.
// Util method to invoke Statement. "session" is cassandra session
public static ResultSet execute(int pageSize, Statement statement, String pageState) {
if (isVoid(pageSize)) {
pageSize=-1;
}
statement.setFetchSize(pageSize);
if (!isVoid(pageState)) {
statement.setPagingState(PagingState.fromString(pageState));
}
return session.execute(statement);
}
// Accesor interface method for my query that returns a Statement
object
#Query("SELECT * FROM " + MY_TABLE + " WHERE id=:id")
Statement getAll(#Param("id") String id);
// Main Code returning list of MyObject that has an object Mapper ->
//mapper
Statement statement=accessor.getAll("ID1");
ResultSet rs=execute(5,statement,null );
List<MyObject> list=mapper.map(rs).all();
String pageState=rs.getExecutionInfo().getPagingState();
In the above code, I expected Cassandra to return a list of 5 MyObject objects and have a string value for my pageState variable.
Neither worked as expected.
List had a size of 16 (Basically it fetched all records)
and because of above, pageState was null as all records were already fetched.
What am I missing here?
EDIT:
From observation ResultSet will honour fetchSize passed in the statement, but when we map it to List<MyObject> using all() method, it fetches all the results in the database(of size = Cluster wide fetchSize).
So when I invoked Result#one method 5(= pageSize) times and pushed them in a List, I got the paging state as well as results of size page size.
Sample Util method for above
public static <T> List<T> getPaginatedList(ResultSet resultSet, Mapper<T> mapper,int pageSize) {
List<T> entities=new ArrayList<>();
Result<T> result=mapper.map(resultSet);
IntStream.range(1,pageSize).forEach(i->{
entities.add(result.one());
});
return entities;
}
What is the performance impact of this?
As you were able to discern, the reason you are getting all results back despite the fact that you are specifying setFetchSize is because fetch size simply sets the requested size of each requested page. When you invoke all(), the driver transparently pages through all results.
Calling one() individually will not have a performance impact when compared to all(), however I would recommend changing your logic for consuming the page as I would expect IntStream.range(1, pageSize) to fail if you've exhausted your result set (i.e. you set fetch size to 500, but there are only 495 rows). Instead you could use IntStream.range(1, resultSet.getAvailableWithoutFetching()).
You could also choose to iterate over the result set until ResultSet.isExhausted() returns true to prevent fetching the next page.

The query in HibernateSearch does not sort by updated data

I'm using HibernateSearch 5.7.1.Final with ElasticSearch from Java.
I have two main types of objects in my model,
Book and Journal, which inherit from class Record
and share some common properties: rank and title.
The typical rank for a book is 1 and typical value for a journal
is 2, but sometimes it can be different.
I created some data with default rank, but for one book
I wanted to test if updates to the rank value will be respected
in the query results.
I first set it value to 0 before saving:
...
Book book03 = new Book("Special Book", prus);
book03.setRank(0);
session.save(book03);
Then I obtain the object using Hibernate (not HibernateSearch)
mechanisms, and update the value:
session.clear();
session.beginTransaction();
Criteria specialCriteria = session.createCriteria(Book.class);
Book special = (Book) specialCriteria.add(Restrictions.eq("title", "Special Book")).uniqueResult();
special.setRank(6);
session.saveOrUpdate(special);
session.getTransaction().commit();
Now I want to obtain all books and journals and sort them by rank.
I combine a query for books and a query for journals using a boolean query.
I don't know how to obtain all objects of a specific type, without including any criteria, so I query
for objects that have anything in a title.
session.clear();
sessionFactory.getCache().evictEntityRegions();
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.clear();
QueryBuilder bookQueryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(Book.class).get();
QueryBuilder journalQueryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(Journal.class).get();
QueryBuilder generalQueryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(Record.class).get();
org.apache.lucene.search.Query bookLuceneQuery
= bookQueryBuilder.keyword().wildcard().onField("title").matching("*").createQuery();
org.apache.lucene.search.Query journalLuceneQuery
= journalQueryBuilder.keyword().wildcard().onField("title").matching("*").createQuery();
org.apache.lucene.search.Query combinedLuceneQuery = generalQueryBuilder
.bool()
.should( bookLuceneQuery )
.should( journalLuceneQuery )
.createQuery();
FullTextQuery combinedQuery = fullTextSession.createFullTextQuery(combinedLuceneQuery, Record.class);
Sort sort = generalQueryBuilder.sort()
.byField("rank").asc()
.createSort();
combinedQuery.setSort(sort);
List result = combinedQuery.list();
System.out.println("\n\nSearch results for books and journals:");
for(Object object : result)
{
Record record = (Record) object;
System.out.println(record.getTitle());
System.out.println(record.getRank());
System.out.println();
}
The problem is that I obtain the following results:
Special Book
6
book01
1
book02
1
book04
1
journal01
2
journal02
2
It looks like the value finally updated to 6
(this value is displayed when printing the object),
but during sorting the value 0 is used
(that's why the Special Book got to the top).
As you can see I tried resetting the session so that nothing gets cached,
but it didn't help.
My another hypothesis is that the rank is not the only element
that affects the sorting, but also some relevance is taken into consideration,
because the problem does not occur when I don't use the boolean
query. When I only ask for books, Special Book is the last
in the result list. But I would prefer to use the combined query
so that I can paginate the results.
By default Elasticsearch (and Hibernate Search, when it uses Elasticsearch) is near real-time, meaning there will be a slight delay (1 second by default) before your updates affect the results of search queries. The moment when updates are taken into account is called a "refresh".
Did you try adding a slight pause (e.g. 2s) after your update?
If it solves the issue, then the refresh delay was causing it. In real-world situation the delay may not be a problem, because you will rarely perform an update and then do a query just after that requires the index to be perfectly up to date.
If you do have such requirements though, you can set the property hibernate.search.default.elasticsearch.refresh_after_write to true. Be aware that this will impact performance negatively, though.

Couchbase query does not see documents add recently

I'm performing a test with CouchBase 4.0 and java sdk 2.2. I'm inserting 10 documents whose keys always start by "190".
After inserting these 10 documents I query them with:
cb.restore("190", cache);
Thread.sleep(100);
cb.restore("190", cache);
The query within the 'restore' method is:
Statement st = Select.select("meta(c).id, c.*").from(this.bucketName + " c").where(Expression.x("meta(c).id").like(Expression.s(callId + "_%")));
N1qlQueryResult result = bucket.query(st);
The first call to restore returns 0 documents:
Query 'SELECT meta(c).id, c.* FROM cache c WHERE meta(c).id LIKE "190_%"' --> Size = 0
The second call (100ms later) returns the 10 documents:
Query 'SELECT meta(c).id, c.* FROM cache c WHERE meta(c).id LIKE "190_%"' --> Size = 10
I tried adding PersistTo.MASTER in the 'insert' statement, but it neither works.
It seems that the 'insert' is not persisted immediately.
Any help would be really appreciated.
Joan.
You're using N1QL to query the data - and N1QL is only eventually consistent (by default), so it only shows up after the indices are recalculated. This isn't related to whether or not the data is persisted (meaning: written from RAM to disc).
You can try to change the scan_consitency level from its default - NOT_BOUNDED - to get consistent results, but that would take longer to return.
read more here
java scan_consitency options

Categories

Resources