Modifying a large set of objects using JPA / EclipseLink

Modifying a large set of objects using JPA / EclipseLink - java

I need to iterate 50k objects and change some fields in them.
I'm limited in memory so I don't want to bring all 50k objects into memory at once.
I thought doing it with the following code using cursor, but I was wondering whether all the objects I've processes using the cursor are left in the Entity Manager cache.
The reason I don't want to do it with offset and limit is because the database needs to work much harder since each page is a complete new query.
From previous experience once the Entity manager cache gets bigger, updates become real slow.
So usually I call flush and clear after every few hundreds of updates.
The problem here is that flushing / clearing will break the cursor.
I will be happy to learn the best approach of updating a large set of objects without loading them all into memory.
Additional information on how EclipseLink cursor works in such scenraio will be valuable too.
JpaQuery<T> jQuery = (JpaQuery<T>) query;
jQuery.setHint(QueryHints.RESULT_SET_TYPE, ResultSetType.ForwardOnly)
.setHint(QueryHints.SCROLLABLE_CURSOR, true);
Cursor cursor = jQuery.getResultCursor();
Iterator<MyObj> cursorIterator = cursor.iterator();
while (cursorIterator.hasNext()) {
MyObj myObj = cursorIterator.next();
ChangeMyObj(myObj);
}
cursor.close();

Use pagination + entityManager.clear() after each page. Also execute every page in a single transaction OR you will have to create/get a new EntityManager after an exception occurs (ar least with Hibernate: the EntityManager instance could be in an inconsistent state after an exception).

Try this sample code:
List results;
int index= 0;
int max = 100;
do {
Query query= manager.createQuery("JPQL QUERY");
query.setMaxResults(max).
setFirstResult(index);
results = query.getResultList( );
Iterator it = results.iterator( );
while (it.hasNext( )) {
Object c = (Object)it.next( );
}
entityManager.clear( );
index = index + results.getSize( );
} while (results.size( ) > 0);

Related

Hibernate row count on Criteria with already set Projection

For a grid component I have in my web applications I have a "GridModel" class which gets passed a Criteria.
The GridModel class has a method to get the results for a specific page by adding setFirstResult(...) and setMaxResults(...) to the Criteria.
But I also need the total count of rows for the Criteria, so I have the following method:
public int getAvailableRows() {
Criteria c = criteriaProvider.getCriteria();
c.setProjection(Projections.rowCount());
return((Long)c.uniqueResult()).intValue();
}
This worked perfectly, but now I have a grid that requires a Criteria that already uses setProjection() in combination with setResultTransformer(). It seems that the getAvailableRows() method above overrides the setProjection() of the original Criteria creating wrong results.
Can I wrap a count Criteria around the original Criteria instead somehow? Or how would I solve this?

I've had a similar experience when trying to use the Projections.rowCount() in conjunction with a groupBy expression. I was able to circumvent things in a slightly 'hacky' manner by:
Remembering the previous projection and result transformer
Setting the projection on the Criteria to be a modified version (see below)
Perform the row count DB hit
Restore the previous projection + transformer so the Criteria can be used for actual result retrieving if
final Projection originalProjection = criteriaImpl.getProjection();
final ResultTransformer originalResultTransformer =
criteriaImpl.getResultTransformer();
final Projection rowCountProjection;
// If we identify that we have a function with a group by clause
// we need to handle it in a special fashion
if ( originalProjection != null && originalProjection.isGrouped() )
{
final CriteriaQueryTranslator criteriaQueryTranslator =
new CriteriaQueryTranslator(
(SessionFactoryImplementor)mySessionFactory,
criteriaImpl,
criteriaImpl.getEntityOrClassName(),
CriteriaQueryTranslator.ROOT_SQL_ALIAS );
rowCountProjection = Projections.projectionList()
.add( Projections.rowCount() )
.add( Projections.sqlGroupProjection(
// This looks stupid but is seemingly required to ensure we have a valid query
"count(count(1))",
criteriaQueryTranslator.getGroupBy(),
new String[]{}, new Type[]{} ) );
}
else
{
rowCountProjection = Projections.rowCount();
}
// Get total count of elements by setting a count projection
final Long rowCount =
(Long)criteria.setProjection( rowCountProjection ).uniqueResult();
A few caveats here:
This still wont give the expected results if you try and give it a criteria with a single sum projection as that is not considered an isGrouped() projection - it will splat the sum with a count. I don't consider this an issue because getting the rowcount for an expression of that nature probably doesnt make sense
When I was dealing with this I wrote some unit tests to make sure rowcount was as expected without projections, with property based projections and with groupby projections but I've written this from memory so can't guarantee small kinks won't need ironing out

DAO Class in java taking too much time to fetch Data

I have the following Data in my DAO class
public List<Environment> fetchMiddlewareVersions() throws SQLException{
System.out.println("reached version");
Environment environment;
List<Environment> environments=new ArrayList<Environment>();
try{
connection=DBConnectionUtil.getConnection();
statement=connection.createStatement();
statement.setFetchSize(100);
preparedStatement=connection.prepareStatement("select * from middleware_version_details order by application,environment");
resultSet=preparedStatement.executeQuery();
while(resultSet.next())
{
environment = new Environment();
environment.setAppName(resultSet.getString("APPLICATION"));
environment.setHostName(resultSet.getString("HOSTNAME"));
environment.setSoftwareComponent(resultSet.getString("SOFTWARE_COMPONENT"));
environment.setVersion(resultSet.getString("VERSION"));
environment.setInstallPath(resultSet.getString("INSTALL_PATH"));
environment.setRemarks(resultSet.getString("REMARKS"));
environment.setEnvironmental(resultSet.getString("ENVIRONMENT"));
environments.add(environment);
}
}
By the time I get the entire data into JSP page, it has consumed 20-30 seconds already. How do I Increase speed of the Fetch. I tried DynaCache and it hasn't helped.

So barring any sort of connectivity issues, it almost always comes down to the number of records you're fetching. If you're fetching A TON of records, the method will not return until it has gone through each item and created an array object.
I would try adding a LIMIT and OFFSET clause to your SQL Statement to only retrieve records, say, 25 at a time. setFetchSize( int ) does not affect the number of overall records, only the number of records the underlying transport will fetch at a time from your sever. (also, move your SQL query into a static final variable:
private static final String SQL_FETCH_MIDDLEWARE_VERSION =
"SELECT * FROM middleware_version_details order by application, environment " +
"LIMIT = ? OFFSET = ?";
then set the limit and the offset in your prepared statement like so:
preparedStatement.setInt( 1, <RECORD COUNT> );
preparedStatement.setInt( 2, <RECORD START> );
Third, do you have an index on application and environment? If you do not and you will be constantly ordering, filtering and joining on those columns, you should add an index.
Fourth, and it's a minor point but one that I adhere to, is that doing resultSet.getString( "<COLUMN NAME>" ) will cause another call to go look up the column index. It's not usually a huge deal, but if you're trying to be as performant as possible, you should use the numeric index. You can do this by creating private static variables holding the index:
private static int INDEX_ENVIRONMENT = 6;
or you can use a counter and just insure that the columns are in the correct order, something like this:
while(resultSet.next())
{
int iC = 0;
environment = new Environment();
environment.setAppName(resultSet.getString( iC++ ));
environment.setHostName(resultSet.getString( iC++ ));
environment.setSoftwareComponent(resultSet.getString( iC++ ));
environment.setVersion(resultSet.getString( iC++ ));
environment.setInstallPath(resultSet.getString( iC++ ));
environment.setRemarks(resultSet.getString( iC++ ));
environment.setEnvironmental(resultSet.getString( iC++ ));
environments.add(environment);
}
Just insure that you're setting the variables in the correct order and it will be slightly more performant. I like this counter approach as well because it allows me to easily adapt to changing schemas.

ATG Repository API

Im trying to update multiple records via an ATG class extending GenericService.
However im running against a roadblock.
How do I do a multiple insert query where i can keep adding all the items / rows into the cached object and then do a single command sync with the table using item.add() ?
Sample code
the first part is to clear out the rows in the table before insertion happens (mighty helpful if anyone knows of a way to clear all rows in a table without having to loop through and delete one by one).
MutableRepository repo = (MutableRepository) feedRepository;
RepositoryView view = null;
try{
view = getFeedRepository().getView(getFeedRepositoryFeedDataDescriptorName());
RepositoryItem[] items = null;
if(view != null){
QueryBuilder qb = view.getQueryBuilder();
Query getFeedsQuery = qb.createUnconstrainedQuery();
items = view.executeQuery(getFeedsQuery);
}
if(items != null && items.length>0){
// remove all items in the repository
for(RepositoryItem item :items){
repo.removeItem(item.getRepositoryId(), getFeedRepositoryFeedDataDescriptorName());
}
}
for(RSSFeedObject rfo : feedEntries){
MutableRepositoryItem feedItem = repo.createItem(getFeedRepositoryFeedDataDescriptorName());
feedItem.setPropertyValue(DB_COL_AUTHOR, rfo.getAuthor());
feedItem.setPropertyValue(DB_COL_FEEDURL, rfo.getFeedUrl());
feedItem.setPropertyValue(DB_COL_TITLE, rfo.getTitle());
feedItem.setPropertyValue(DB_COL_FEEDURL, rfo.getPublishedDate());
RepositoryItem item = repo.addItem(feedItem) ;
}

The way I interpret your question is that you want to add multiple repository items to your repository but you want to do it fairly efficiently at a database level. I suggest you make use of the Java Transaction API as recommended in the ATG documentation, like so:
TransactionManager tm = ...
TransactionDemarcation td = new TransactionDemarcation ();
try {
try {
td.begin (tm);
... do repository item work ...
}
finally {
td.end ();
}
}
catch (TransactionDemarcationException exc) {
... handle the exception ...
}
Assuming you are using a SQL repository in your example, the SQL INSERT statements will be issued after each call to addItem but will not be committed until/if the transaction completes successfully.

ATG does not provide support for deleting multiple records in a single SQL statement. You can use transactions, as #chrisjleu suggests, but there is no way to do the equivalent of a DELETE WHERE ID IN {"1", "2", ...}. Your code looks correct.
It is possible to invoke stored procedures or execute custom SQL through an ATG Repository, but that isn't generally recommended for portability/maintenance reasons. If you did that, you would also need to flush the appropriate portions of the item/query caches manually.

Hibernate memory management

I have an application that uses hibernate. At one part I am trying to retrieve documents. Each document has an account number. The model looks something like this:
private Long _id;
private String _acct;
private String _message;
private String _document;
private String _doctype;
private Date _review_date;
I then retrieve the documents with a document service. A portion of the code is here:
public List<Doc_table> getDocuments(int hours_, int dummyFlag_,List<String> accts) {
List<Doc_table> documents = new ArrayList<Doc_table>();
Session session = null;
Criteria criteria = null;
try {
// Lets create a previous Date by subtracting the number of
// subtractHours_ passed.
session = HibernateUtil.getSession();
session.beginTransaction();
if (accts == null) {
Calendar cutoffTime = Calendar.getInstance();
cutoffTime.add(Calendar.HOUR_OF_DAY, hours_);
criteria = session.createCriteria(Doc_table.class).add(
Restrictions.gt("dbcreate_date", cutoffTime.getTime()))
.add(Restrictions.eq("dummyflag", dummyFlag_));
} else
{ criteria = session.createCriteria(Doc_table.class).add(Restrictions.in("acct", accts));
}
documents = criteria.list();
for (int x = 0; x < documents.size(); x++) {
Doc_table document = documents.get(x);
......... more stuff here
}
This works great if I'm retrieving a small number of documents. But when the document size is large I get a heap space error, probably because the documents take up a lot of space and when you retrieve several thousand of them, bad things happen.
All I really want to do is retrieve each document that fits my criteria, grab the account number and return a list of account numbers (a far smaller object than a list of objects). If this were jdbc, I would know exactly what to do.
But in this case I'm stumped. I guess I'm looking for a way where I can bring just get the account numbers of the Doc_table object back.
Or alternatively, some way where I can retrieve documents one at a time from the database using hibernate that fit my criteria (instead of bringing back the whole List of objects which uses too much memory).

There are several ways to deal with the problem:
loading the docs in batches of an smaller size
(The way you noticed) not to query for the Document, but only for the account numbers:
List accts = session.createQuery("SELECT d._acct FROM Doc d WHERE ...");
or
List<String> accts = session.createCriteria(Doc.class).
setProjection(Projections.property("_acct")).
list();
When there is a special field in you Document class that contains the huge amount Document byte data, then you could map this special field as a Lazy loaded field.
Create a second entity class (read only) that contains only the fields that you need and map it to the same table

Instead of fetching all documents i.e, all records at once, try to limit the rows being fetched. Also, deploy a strategy where in you can store documents temporarily as flat files and fetch them later or delete after usage. Though its a bit long process,its efficient way of handling and delivering documents from database.

How do I check to see if a column name exists in a CachedRowSet?

I am querying data from views that are subject to change. I need to know if the column exists before I do a crs.get******().
I have found that I can query the metadata like this to see if a column exist before I request the data from it:
ResultSetMetaData meta = crs.getMetaData();
int numCol = meta.getColumnCount();
for (int i = 1; i < numCol + 1; i++)
if (meta.getColumnName(i).equals("name"))
return true;
Is there a simpler way of checking to see if a column exists?
EDIT
It must be database agnostic. That is why I am referencing the CachedRowSet instead of the database.

There's not a simpler way with the general JDBC API (at least not that I know of, or can find...I've got exactly the same code in my home-grown toolset.)
Your code isn't complete:
ResultSetMetaData meta = crs.getMetaData();
int numCol = meta.getColumnCount();
for (int i = 1; i < numCol + 1; i++) {
if (meta.getColumnName(i).equals("name")) {
return true;
}
}
return false;
That being said, if you use proprietary, database-specific API's and/or SQL queries, I'm sure you can find more elegant ways of doing the same thing. Bbut you'd have to write custom code for each database you need to deal with. I'd stick with the JDBC APIs, if I were you.
Is there something about your proposed solution that makes you think it's incorrect? It seems simple enough to me.

you could take the shorter approach of using the fact that findColumn() will throw an SQLException for InvalidColumName if the column isn't in the CachedRowSet.
for example
try {
int foundColIndex = results.findColumn("nameOfColumn");
} catch {
// do whatever else makes sense
}
Likely an abuse of Exception Handling (per EffectiveJava 2nd ed item 57) but it is an alternative to looping through all the columns from the meta data.

Which Database?
I think in Oracle there are tables where the columns are listed.
I don't remember if it work for views also, but I guess they do, it was something like:
select colum_name from all_views where view_name like 'myview'
or
select name from all_objects where object_name like 'myview' and object_type='view'
I don't remember exactly the syntax. You should have spacial permissions though.
Every RDBMS should have something similar.
You can also perform the query
select * from myView where 1 = 0 ;
And from the metadata get the columns, if what you want it to avoid fetching the data before to know if the columns are present.

No, there really isn't a better way. You may want to relook at the problem. If you can redefine the problem, sometimes it makes the solution simpler because the problem has changed.

WARNING: following comment purely from memory without any supporting paperwork :)
If I recall correctly there is a mysterious problem that rears its ever-so-ugly-head when the oracle cached rowset implementation is used with connection pooling. There appears to be a silent reference to the connection held within the cached rowset object (even though it's supposed to be disconnected) which closes another connection subsequently opened from pool on garbage collection. For this reason I eventually gave up and wrote my own data object layer (these days I'd hand that over to spring & hibernate).

Old thread, but I've just faced the same problem and ended up with an utility function:
private Set<String> getColumnNames(ResultSet cached) throws SQLException {
ResultSetMetaData metaData = cached.getMetaData();
return IntStream.range(1, metaData.getColumnCount())
.mapToObj(i -> {
try {
return metaData.getColumnName(i);
} catch (SQLException e) {
throw new RuntimeException(e);
}
}).collect(toSet());
}
It'd be quite elegent if we wouldn't have to catch exceptions inside a lambda (without some ugly hacks)

Following on from the top answer in this thread from Jared, and one of its under-rated comments from corsiKa:
ResultSetMetaData meta = crs.getMetaData();
int numCol = meta.getColumnCount();
Set<String> columns = new HashSet<>;
for (int i = 1; i <= numCol; i++)
{
columns.add(meta.getColumnName(i));
}
return columns.contains("name");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Modifying a large set of objects using JPA / EclipseLink - java

Use pagination + entityManager.clear() after each page. Also execute every page in a single transaction OR you will have to create/get a new EntityManager after an exception occurs (ar least with Hibernate: the EntityManager instance could be in an inconsistent state after an exception).

Related

Hibernate row count on Criteria with already set Projection

DAO Class in java taking too much time to fetch Data

ATG Repository API

Hibernate memory management

How do I check to see if a column name exists in a CachedRowSet?

Categories

Resources