Ehcache - why are the entries so big?

Ehcache - why are the entries so big? - java

I have a fairly simple data model like:
class MyParent {
// 7 fields here, some numeric, some String, not longer than 50 chars total
Set<MyChild> children;
}
class MyChild {
int ownerId;
// 3 more fields, numeric or dates
}
MyParent, MyChild and MyParent.children are all cached with read-only.
I have 40,000 instances of MyParent and 100,000 instances of MyChild. That yields 180,000 entries in cache (if you add 40,000 MyParent.children).
I want to cache everything, grouped by ownerId. Not wanting to reinvent the wheel, I wanted to use query cache like:
Query query = session
.createQuery(
"select distinct p from MyParent p join fetch p.children c where c.ownerId = :ownerId");
query.setParameter("ownerId", ownerId);
query.setCacheable(true);
query.setCacheRegion("MyRegion");
query.list();
For all 1,500 values of ownerId.
Cache works, but I noticed it's huge! Measured with Ehcache.calculateInMemorySize(), on average each entry is over one kilobyte big. In order to cache ~180,000 entries I would need over 200 MB. That's outragous, given that the entries themselves are much smaller.
Where does the overhead come from and how can I decrease it?

I'm not sure from the question what cache you used to do the math, but let me use the MyParent class as an example. Given what you explained about the class, on a 64bit VM with compressedOops enabled, a MyParent instance would be a little below 500 bytes in heap. And that is without the Set, I'll explain why later (it'd be another 128 bytes on top otherwise). The cache also needs to hold the key for that entry, which comes added to the calculation...
Hibernate doesn't directly use the primary key the key to something it stores in the cache, but a CacheKey entry. That instance holds the pk of the entity the value represents as well as four other fields: type, the Hibernate type mapping; entityOrRoleName, the entity or collection-role name; tenantId, the tenant identifier associated this data; and finally, the hashCode of the pk (see org.hibernate.type.Type.getHashCode).
Now sadly it all doesn't end here, the value for that entry isn't the MyParent instance, but a CacheEntry instance. This time, besides more metadata (subClass, the entity's name, which defaults to FQCN; lazyPropertiesAreUnfetched, a boolean; and the optimisitc locking value out of the entity), that instance still doesn't hold the MyParent instance, but a disassembled representation of it. This representation is an array of the state (all properties) of the entity.
I guess that with this information, the "estimated" sizes of your hibernate caches will make more sense. I'd like to stress out that these are only estimations, and if I remember correctly how it is being calculated, it probably is slightly above reality. Indeed some information in the CacheKey for instance probably should be accounted for differently. As of Ehcache 2.5, you will be able to enable memory based tuning on Caches (and even at the CacheManager level). When that is being done, cache entries are precisely measured and the calculateInMemorySize() will give you the real measured size of the cache.
You can download the beta for 2.5 now from the ehcache.org. Also note that when using byte-based sizing on your caches, the sizing engine will account for these shared instances across cached entries in Hibernate's cache types. You can read more on the way this all works here : http://ehcache.org/documentation/configuration.html#Memory_Based_Cache_Sizing_Ehcache_2.5_and_higher
Hope that helps you make more sense out of it all...
Alex

Related

Hibernate Batch Save Nested Objects

I have the below class structure:
class A{
int id;
List<B> blist;
List<C> clist;
List<D> dlist;
}
I get a json as an input which is mapped to object A by a mapper. Now, i have object A which has the list of B,C and D objects. I want to use batching to save the insert time taken. I went through the documentation which describes the solution if I want to save multiple parent objects. How would I use the batching capability in my case which has nested list of objects of multiple type.
I have enabled batch inserts using
<property name="hibernate.jdbc.batch_size">50</property>
This by itself doesnt give me any batching unless I clear and flush the session. Any suggestions on how do I go about with this?

The problem is that you're using IDENTITY strategy.
Whenever you save a new entity, Hibernate will place it into the Session's 1LC; however, in order to do that the identifier must be known. The problem with IDENTITY strategy is that Hibernate must actually perform the insert to determine the identifier value.
In the end, batch insert capabilities are disabled.
You should either try to load your data using business key values that are known up front or worse case use SEQUENCE generation type with a sequence optimizer to minimize the database hit. This will allow batch inserts to work.
UPDATE
For situations where you have no business key that defines the uniqueness for a row and your database doesn't have SEQUENCE support, you could manage the identifiers yourself. You can either elect to do this using a custom identifier generator or just doing this in your loop as code.
The caveat here is that this solution is not thread-safe. You should guarantee that at no point would you ever be running this logic in two threads simultaneously, which is typically not something one does anyway with bulk data loads.
Define a variable to store your identifier in. We will need to initialize this variable based on the existing max value of the identifier in the database. If no rows in the database exist, we likely will want to initialize it as 1.
Long value = ... // createQuery ( "SELECT MAX(id) FROM YourEntity" )
value = ( value == null ? 1L : value + 1);
The next step is to change the #Id annotated field. It should not be marked as #GeneratedValue since we're going to allow the application to provide the value.
For each row you're going to insert, simply call your #setId( value ) method with the value variable generated from step 1.
Increment your value variable by 1.

Hibernate associations using too much memory

I have a table "class" which is linked to tables "student" and "teachers".
A "class" is linked to multiple students and teachers via foriegn key relationship.
When I use hibernate associations and fetch large number of entities(tried for 5000) i am seeing that it is taking 4 times more memory than if i just use foreign key place holders.
Is there something wrong in hibernate association?
Can i use any memory profiler to figure out what's using too much memory?
This is how the schema is:
class(id,className)
student(id,studentName,class_id)
teacher(id,teacherName,class_id)
class_id is foreign key..
Case #1 - Hibernate Associations
1)in Class Entity , mapped students and teachers as :
#Entity
#Table(name="class")
public class Class {
private Integer id;
private String className;
private Set<Student> students = new HashSet<Student>();
private Set<Teacher> teachers = new HashSet<Teacher>();
#OneToMany(fetch = FetchType.EAGER, mappedBy = "classRef")
#Cascade({ CascadeType.ALL })
#Fetch(FetchMode.SELECT)
#BatchSize(size=500)
public Set<Student> getStudents() {
return students;
}
2)in students and teachers , mapped class as:
#Entity
#Table(name="student")
public class Student {
private Integer id;
private String studentName;
private Class classRef;
#ManyToOne
#JoinColumn(name = "class_id")
public Class getClassRef() {
return classRef;
}
Query used :
sessionFactory.openSession().createQuery("from Class where id<5000");
This however was taking a Huge amount of memory.
Case #2- Remove associations and fetch seperately
1)No Mapping in class entity
#Entity
#Table(name="class")
public class Class {
private Integer id;
private String className;
2)Only a placeholder for Foreign key in student, teachers
#Entity
#Table(name="student")
public class Student {
private Integer id;
private String studentName;
private Integer class_id;
Queries used :
sessionFactory.openSession().createQuery("from Class where id<5000");
sessionFactory.openSession().createQuery("from Student where class_id = :classId");
sessionFactory.openSession().createQuery("from Teacher where class_id = :classId");
Note - Shown only imp. part of the code. I am measuring memory usage of the fetched entities via JAMM library.
I also tried marking the query as readOnly in case #1 as below, which does not improve memory usage very much ; just a very little. So that's not the solve.
Query query = sessionFactory.openSession().
createQuery("from Class where id<5000");
query.setReadOnly(true);
List<Class> classList = query.list();
sessionFactory.getCurrentSession().close();
Below are the heapdump snapshots sorted by sizes. Looks like the Entity maintained by hibernate is creating the problem..
Snapshot of Heapdump for hibernate associations program
Snapshot of heapdump for fetching using separate entities

You are doing a EAGER fetch with the below annotation. This will in turn fetch all the students without even you accessing the getStudents(). Make it lazy and it will fetch only when needed.
From
#OneToMany(fetch = FetchType.EAGER, mappedBy = "classRef")
To
#OneToMany(fetch = FetchType.LAZY, mappedBy = "classRef")

When Hibernate loads a Class entity containing OneToMany relationships, it replaces the collections with its own custom version of them. In the case of a Set, it uses a PersistentSet. As can be seen on grepcode, this PersistentSet object contains quite a bit of stuff, much of it inherited from AbstractPersistentCollection, to help Hibernate manage and track things, particularly dirty checking.
Among other things, the PersistentSet contains a reference to the session, a boolean to track whether it's initialized, a list of queued operations, a reference to the Class object that owns it, a string describing its role (not sure what exactly that's for, just going by the variable name here), the string uuid of the session factory, and more. The biggest memory hog among the lot is probably the snapshot of the unmodified state of the set, which I would expect to approximately double memory consumption by itself.
There's nothing wrong here, Hibernate is just doing more than you realized, and in more complex ways. It shouldn't be a problem unless you are severely short on memory.
Note, incidentally, that when you save a new Class object that Hibernate previously was unaware of, Hibernate will replace the simple HashSet objects you created with new PersistentSet objects, storing the original HashSet wrapped inside the PersistentSet in its set field. All Set operations will be forwarded to the wrapped HashSet, while also triggering PersistentSet dirty tracking and queuing logic, etc. With that in mind, you should not keep and use any external references to the Set from before saving, and should instead fetch a new reference to Hibernate's PersistentSet instance and use that if you need to make any changes (to the set, not to the students or teachers within it) after the initial save.

Regarding the huge memory consumption you are noticing, one potential reason is Hibernate Session has to maintain the state of each entity it has loaded the form of EntityEntry object i.e., one extra object, EntityEntry, for each loaded entity. This is needed for hibernate automatic dirty checking mechanism during the flush stage to compare the current state of entity with its original state (one that is stored as EntityEntry).
Note that this EntityEntry is different from the object that we get to access in our application code when we call session.load/get/createQuery/createCriteria. This is internal to hibernate and stored in the first level cache.
Quoting form the javadocs for EntityEntry :
We need an entry to tell us all about the current state of an object
with respect to its persistent state Implementation Warning: Hibernate
needs to instantiate a high amount of instances of this class,
therefore we need to take care of its impact on memory consumption.
One option, assuming the intent is only to read and iterate through the data and not perform any changes to those entities, you can consider using StatelessSession instead of Session.
The advantage as quoted from Javadocs for Stateless Session:
A stateless session does not implement a first-level cache nor
interact with any second-level cache, nor does it implement
transactional write-behind or automatic dirty checking
With no automatic dirty checking there is no need for Hibernate to create EntityEntry for each entity of loaded entity as it did in the earlier case with Session. This should reduce pressure on memory utilization.
Said that, it does have its own set of limitations as mentioned in the StatelessSession javadoc documentation.
One limitation that is worth highlighting is, it doesn't lazy loading the collections. If we are using StatelessSession and want to load the associated collections we should either join fetch them using HQL or EAGER fetch using Criteria.
Another one is related to second level cache where it doesn't interact with any second-level cache, if any.
So given that it doesn't have any overhead of first-level cache, you may want to try with Stateless Session and see if that fits your requirement and helps in reducing the memory consumption as well.

Yes, you can use a memory profiler, like visualvm or yourkit, to see what takes so much memory. One way is to get a heap dump and then load it in one of these tools.
However, you also need to make sure that you compare apples to apples. Your queries in case#2 sessionFactory.openSession().createQuery("from Student where class_id = :classId");
sessionFactory.openSession().createQuery("from Teacher where class_id = :classId");
select students and teachers only for one class, while in case #1 you select way more. You need to use <= :classId instead.
In addition, it is a little strange that you need one student and one teacher record per one class. A teacher can teach more than one class and a student can be in more than one class. I do not know what exact problem you're solving but if indeed a student can participate in many classes and a teacher can teach more than one class, you will probably need to design your tables differently.

Try #Fetch(FetchMode.JOIN), This generates only one query instead of multiple select queries. Also review the generated queries. I prefer using Criteria over HQL(just a thought).
For profiling, use freewares like visualvm or jconsole. yourkit is good for advanced profiling, but it is not for free. I guess there is a trail version of it.
You can take the heapdump of your application and analyze it with any memory analyzer tools to check for any memory leaks.
BTW, I am not exactly sure about the memory usage for current scenario.

Its likely the reason is the bi-directional link from Student to Class and Class to Students. When you fetch Class A (id 4500), The Class object must be hydrated, in turn this must go and pull all the Student objects (and teachers presumably) associated with this class. When this happens each Student Object must be hydrated. Which causes the fetch of every class the Student is a part of. So although you only wanted class A, you end up with:
Fetch Class A (id 4900)
Returns Class A with reference to 3 students, Student A, B, C.
Student A has ref to Class A, B (id 5500)
Class B needs hydrating
Class B has reference to Students C,D
Student C needs hydrating
Student C only has reference to Class A and B
Student C hydration complete.
Student D needs hydrating
Student D only has reference to Class B
Student B hydration complete
Class B hydration complete
Student B needs hydrating (from original class load class A)
etc... With eager fetching, this continues until all links are hydrated. The point being that its possible you end up with Classes in memory that you didn't actually want. Or whose id is not less than 5000.
This could get worse fast.
Also, you should make sure you are overriding the hashcode and equals methods. Otherwise you may be getting redundant objects, both in memory and in your set.
One way to improve is either change to LAZY loading as other have mentioned or break the bidirectional links. If you know you will only ever access students per class, then don't have the link from student back to class. For student/class example it makes sense to have the bidirectional link, but maybe it can be avoided.

as you say you "I want "all" the collections". so lazy-loading won't help.
Do you need every field of every entity? In which case use a projection to get just the bits you want. See when to use Hibernate Projections.
Alternatively consider having minimalist Teacher-Lite and Student-Lite entity that the full-fat versions extend.

Ignoring some entity fields during saving in Objectify 4

I am trying to use Objectify #IgnoreSave annotation along with simple If condition (IfEmpty, IfNull) but it seems that it is not working. Without If condition the actual value is not persisted as expected, however, when I use some If condition, it is always persisted (e.g. if IfNull condition used and null value provided, it is persisted and hence original value in datastore deleted).
...
#IgnoreSave(IfNull.class)
private String email;
...
...
this.objectify.save().entity(userDetails).now();
...
Is there any additional configuration needed? Or has anyone experienced the same?

From "hence original value in datastore deleted" it sounds like you misunderstand a fundamental characteristic of the GAE datastore - entities are stored whole. If you #IgnoreSave a field, it will be ignored during save and thus the field will not be present in the datastore. You do not get to update some fields and not others.

storing huge HashMap in database

I have something like:
import java.util.HashMap;
import java.util.List;
public class A {
HashMap<Long, List<B>> hashMap = new HashMap<Long, List<B>>();
}
class B{
int a;
int b;
int c;
}
And I want to store this in database, because it will be huge huge.
I will have more 250000000 keys in HashMap and each key representing huge list of data (say list size may be around 1000).
How I can do this for best performance on retrieving list of B's objects with Long hashKey from database?
Any other suggestions?
Thanks in advance.

To me, this looks like a classical One-To-Many or Many-To-Many association between two tables.
If each B belongs to only one A, then you would have a table A and a table B containing a foreign key to A.
If a given B can belong to multiple As, then you would have a table A, a table B, and a join table between the two tables.
Make sure to index the foreign keys.

As you have a very large data set of up to 1/4 bn * 20 * 1 k or about 5 TB, the main problem you have is that it can't be stored in memory and is too large to store on SSD, so you will have to access disk efficiently otherwise you are looking at a latency of about 8 ms per key This should be your primary concern otherwise it will take days just to access every key randomly once.
Unless you have a good understand of how to implement this with memory mapped files you will need to use a database, preferable one design to handle large numbers of records. You with also need a disk sub-system not only for capacity but to give you more spindles so you can increase the number of requests you can perform concurrently.

using infinispan you could just work with your huge map and have parts of it (the ones not recently accessed) stored to disk to save RAM. easier to do than writing a whole D layer and (i think) faster and uses less memory #runtime (the entire map isnt in memory ever)

You can map this directly as a one to many relationship. You need two tables. One to hold the key (let's call it KeyTable), and the other one to keep the B objects (BTable). On BTable with the B objects, you need a foreign key to the KeyTable. Then you can query something like this to get the objects with key 1234:
SELECT * FROM BTABLE WHERE key=1234;
For performance, you probably should code this using JDBC instead of something like Hibernate, to have better control of memory usage.

Map database column1, column2, columnN to a collection of elements

In legacy database tables we have numbered columns like C1, C2, C3, C100 or M1, M2, M3, M100.
This columns represent BLOB data.
It is not possible to change anything it this database.
By using JPA Embeddable we map all of the columns to single fields. And then during embedding we override names by using 100 override annotations.
Recently we have switched to Hibernate and I've found things like UserCollectionType and CompositeUserType. But I hadn't found any use cases that are close to mine.
Is it possible to implement some user type by using Hibernate to be able to map a bundle of columns to a collection without additional querying?
Edit:
As you probably noticed the names of columns can differ from table to table. I want to create one type like "LegacyArray" with no need to specify all of the #Columns each time I use this type.
But instead I'd use
#Type(type = "LegacyArrayUserType",
parameters =
{
#Parameter(name = "prefix", value = "A"),
#Parameter(name = "size", value = "128")
})
List<Integer> legacyA;
#Type(type = "LegacyArrayUserType",
parameters =
{
#Parameter(name = "prefix", value = "B"),
#Parameter(name = "size", value = "64")
})
List<Integer> legacyB;

I can think of a couple of ways that I would do this.
1. Create views for the collection information that simulates a normalized table structure, and map it to Hibernate as a collection:
Assuming your existing table is called primaryentity, I would create a view that's similar to the following:
-- untested SQL...
create view childentity as
(select primaryentity_id, c1 from primaryentity union
select primaryentity_id, c2 from primaryentity union
select primaryentity_id, c3 from primaryentity union
--...
select primaryentity_id, c100 from primaryentity)
Now from Hibernate's perspective, childentity is just a normalized table that has a foreign key to primarykey. Mapping this should be pretty straight forward, and is covered here:
http://docs.jboss.org/hibernate/stable/core/reference/en/html/collections.html
The benefits of this approach:
From Hibernate's point of view, the tables are normalized, it's a fairly simple mapping
No updates to your existing tables
The drawbacks:
Data is read-only, I don't think your view can be defined in an updatable manner (I could be wrong)
Requires change to the database, you may need to create lots of views
Alternately, if your DBA won't even let you add a view to the database, or if you need to perform updates:
2. Use Hibernate's dynamic model mapping facility to map your C1, C2, C3 properties to a Map, and have some code you your DAO layer do the appropriate conversation between the Map and the Collection property:
I have never done this myself, but I believe Hibernate does allow you to map tables to HashMaps. I'm not sure how dynamically Hibernate allows you to do this (i.e., Can you get away with simply specifying the table name, and having Hibernate automatically map all the columns?), but it's another way I can think of doing this.
If going with this approach though, be sure to use the data access object pattern, and ensure that the internal implementation (use of HashMaps) is hidden from the client code. Also be sure to check before writing to the database that the size of your collection does not exceed the number of available columns.
The benefits of this approach:
No change to the database at all
Data is updatable
O/R Mapping is relatively simple
The drawbacks:
Lots of plumbing in the DAO layer to map the appropriate types
Uses experimental Hibernate features that may change in the future

Personally, I think that design sounds like it breaks first normal form for relational databases. What happens if you need C101 or M101? Change your schema again? I think it's very intrusive.
If you add Hibernate to the mix it's even worse. Adding C101 or M101 means having to alter your Java objects, your Hibernate mappings, everything.
If you have 1:m relationships with C and M tables, you'd be able handle the cases I just cited by adding additional rows. Your Java objects contain Collection<C> or Collection<M>. Your Hibernate mappings are one-to-many that don't change.
Maybe the reason that you don't see any Hibernate examples to match your case because it's a design that's not recommended.
If you must, maybe you should look at Hibernate Component Mapping.
UPDATE: The fact that this is legacy is duly noted. My point in bringing up first normal form is as much for others who might find this question in the future as it is for the person who posted the question. I would not want to answer the question in such a way that it silently asserted this design as "good".
Pointing out Hibernate component mapping is pertinent because knowing the name of what you're looking for can be the key when you're searching. Hibernate allows an object model to be finer grained than the relational model it maps. You are free to model a denormalized schema (e.g., Name and Address objects as part of a larger Person object). That's just the name they give such a technique. It might help find other examples as well.

Sorry if I'm misunderstanding your problem here, I don't know much about Hibernate. But couldn't you just concatenate during selection from database to get something like what you want?
Like:
SELECT whatever
, C1||C2||C3||C4||...||C100 AS CDATA
, M1||M2||M3||M4||...||M100 AS MDATA
FROM ...
WHERE ...
(Of course, the concatenation operator differs between RDBMSs.)

[EDIT] I suggest to use a CompositeUserType. Here is an example. There is also a good example on page 228f in the book "Java Persistence With Hibernate".
That allows you to handle the many columns as a single object in Java.
The mapping looks like this:
#org.hibernate.annotations.Columns(columns = {
#Column(name="C1"),
#Column(name="C2"),
#Column(name="C3"),
...
})
private List<Integer> c;
Hibernate will load all columns at once during the normal query.
In your case, you must copy the int values from the list into a fixed number of columns in nullSafeSet. Pseudocode:
for (int i=1; i<numColumns; i++)
if (i < list.size())
resultSet.setInt(index+i, list.get(i));
else
resultSet.setNull(index+i, Hibernate.INTEGER.sqlType());
In nullSafeGet you must create a list and stop adding elements when a column is NULL. For additional safety, I suggest to create your own list implementation which doesn't allow to grow beyond the number of columns (inherit from ArrayList and override ensureCapacity()).
[EDIT2] If you don't want to type all the #Column annotations, use a code generator for them. That can be as simple as script which you give a name and a number and it prints #Column(...) to System.out. After the script ran, just cut&paste the data into the source.
The only other solution would be to access the internal Hibernate API to build that information at runtime but that API is internal, so a lot of stuff is private. You can use Java reflection and setAccessible(true) but that code probably won't survive the next update of Hibernate.

You can use UserTypes to map a given number of columns to any type you wish. This could be a collection if (for example) for collections are always bounded in size by a known number of items.
It's been a while (> 3 years) since I used Hibernate so I'm pretty rusty but I recall it being very easy to do; your BespokeUserType class gets passed the ResultSet to hydrate your object from it.

I too have never used Hibernate.
I suggest writing a small program in an interpreted language (such as Python) in which you can execute a string as if it were a command. You could construct a statement which takes the tedious work out of doing what you want to do manually.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.