Cache in Java web application

Cache in Java web application - java

I have question here regarding the cache technique in java web applications.
Suppose If i implement the ehcache, where the cached data will be stored?
will the cached data comes under GC covered area? i mean will GC deletes the java objects which i cached earlier?
after reading some cached framework sites, i understood that at core level they(caching framework) are using the hastable or hashmap, where data will be our value and key be depending on logic.
Suppose in ehcache
maxBytesLocalHeap="50m"
maxBytesLocalDisk="50G"
1. what i understood here is 50Mb(maxBytesLocalHeap) will be stored in heap memory(the data which comes under this memory will be observed by GC),
2. If maxBytesLocalDisk 50GB will stored in the local disk (assuming file will be stored as a flat file in temp folder of the server) where GC will not care about the entities or objects as it is out of heap memory.
Is my understanding is correct?
Thanks
Vijay

GC will delete your objects only if any other objects don't have references at it. GC doesn't know where your cached data it just looks for alone objects.
Yes, HashMap is commonly used to store cached data and retrieve it when needed.

Related

Alternate to Hashmap as cache other than in memory databases.?

I am using hashmap as a cache to store id and name. because it is frequently used and it lives through the lifetime of the application.
for every user using the application, around 5000 and more (depends on workspace) ids and names get stored in hashmap. At some point java.lang.OutOfMemoryError exception gets thrown. since I am saving a lot of (id, name) in hashmap.
I don't want to clear my hashmap cache value. but I know to be efficient we have to clear cache using the LRU approach or other approaches.
Note: I don't want to use Redis, Memcached, or any in-memory key-value store.
Usecase: slack will return id in place of the user name in every
message.
for eg: Hello #john doe = return Hello #dxap123.
I don't want an API hit for every message to get the user name.
Can somebody provide me an alternate efficient approach or correct me if I am doing something wrong in my approach.?

Like others have said 5000 shouldn't give you out of memory, but if you don't keep a limit on the size of the map eventually you will get out of memory error. You should cache the values that are most recently used or most frequently used to optimize the size of the map.
Google guava library has cache implementations which i think would fit your usecase
https://github.com/google/guava/wiki/CachesExplained

For 5000 key-value pairs, It should not through OutOfMemoryException. If It is throwing the same you are not managing the HashMap properly. If you have more than 5000 items and want an alternate for hashmap you can use ehcache, A widely adopted Java cache with tiered storage options instead of going with in-memory cache technologies.
The memory areas supported by Ehcache include:
On-Heap Store: Uses the Java heap memory to store cache entries and shares the memory with the application. The cache is also scanned by the garbage collection. This memory is very fast, but also very limited.
Off-Heap Store: Uses the RAM to store cache entries. This memory is not subject to garbage collection. Still quite fast memory, but slower than the on-heap memory, because the cache entries have to be moved to the on-heap memory before they can be used.
Disk Store: Uses the hard disk to store cache entries. Much slower than RAM. It is recommended to use a dedicated SSD that is only used for caching.
You can find the documentation here. http://www.ehcache.org/documentation/
If you are using spring-boot you can follow this article to implement the same.
https://springframework.guru/using-ehcache-3-in-spring-boot/

If the "names" are not unique, then try with calling on it String.intern() before inserting the "name" to the map, this would then reduce the memory usage.

Concurrent LRU Set

I have a program which, at some point persist a blob in a blob store.
For a given blob, a hash is used as an identifier derived always from the same algorithm, so when two blobs are identical, their BlobID are identical.
Persisting a blob is slow.
To make things worse, the program is used to put the same blob several times in a short time-frame (few seconds).
My initial idea was to use a concurrent set to keep track of what have been put in the blob store.
Sadly it will create a memory leak.
I'm looking for a kind of concurrent LRU (Least Recently Used) set in Java.
Is there such data structure? or can I build one with existing libraries? (such as Guava)

Java Application / ArrayList verses direct database queries

In short I want to know how effective it is to use arraylists in Java to hold objects with lot of data in it. How long an arraylist can grow and is there any issues using arraylist to hold 2000+ customer details (objects) while at runtime? Does it hit the performance in any way? Or is there any better way to design app which needs to quickly access data?
I am developing a new module (customer lead tracker) for my small ERM application which also handles payroll details for a company. So far the data was not so huge, now with this module I am expecting the data base to grow fast and I will have to load 2000+ customer details from database to perform different data manipulations, updates.
I wanted some suggestion as to which approach would be better,
Querying customer Database (100+ columns) and getting data to work with for each transaction. (A lot of seperate queries for each)
Load each row into objects save it in an arraylist at the beginning of and use the list to work with each row when required. And save the objects (rows) at end of a transaction?
Sorry if I have asked a dump question, I am really a start up independent developer this may sound a bit awkward from an experienced developer's perspective.

Depends on how much memory you have.Querying DB for each and every transaction is not a good approach as well.A better approach would be load data into memory depending on your memory size and once you are done with it, remove it and fire next set of db queries.In thi way you can optimize memory as well as db queries.

Any ArrayList can hold not much than 231-1 elements, due to int typed index of inner array.
There is an approach called in memory Db which implies that you hold a lot of data in memory for gain fast access to it. But this approach also implies, that:
a. you have a lot of memory, available for holding all necessary data (it could be several tens of gigabytes);
b. you db implements compact form of data storage. It means that db will not contain ready java-objects, but fragments of byte-array data, from which you will contstruct objects on demand.
So, you need to reckon, how much memory you will need for all data that you want to load into memory and decide whether this approach eligible or not.

DB4O performance retrieving a large number of objects

I'm interesting in using DB4O to store the training data for a learning algorithm. This will consist of (potentially) hundreds of millions of objects. Each object is on average 2k in size based on my benchmarking.
The training algorithm needs to iterate over the entire set of objects repeatedly (perhaps 10 times). It doesn't care what order the objects are in.
My question is this: When I retrieve a very large set of objects from DB4O, are they all loaded into memory, or are they pulled off disk as needed?
Clearly, pulling hundreds of millions of 2k objects into memory won't be practical on the type of servers I'm working with (they hvae about 19GB of RAM).
Is Db4o a wise choice here?

db4o activation mechanism allows you to control which object are loaded into memory. For complex object graphs you probably should us transparent activation, where db4o loads an object into memory as soon as it is used.
However db4o doesn't explicit remove object from memory. It just keeps a weak reference to all loaded objects. If a object is reachable, it will stay there (just like any other object). Optionally you can explicitly deactivate an object.
I just want to add a few notes to the scalability of db4o. db4o was built for embedding in application and devices. It was never built for large datasets. Therefore it has its limitations.
It is internally single-threaded. Most db4o operation block all other db4o operations.
It can only deal with relatively small databases. By default a database can only be 2GB. You can increase it up to 127 GB. However I think db4o operates well in the 2-16 GB range. Afterwards the database is probably to large for it. Anyway, hundreds of millions of 2K objects is way to large database. (100Mio 2K obj => 200GB)
Therefore you probably should look at larger object databases, like VOD. Or maybe a graph database like Neo4J is also a good choise for your problem?

How much session data is too much?

We are running into unusually high memory usage issues. And I observed that many places in our code we are pulling 100s of records from DB, packing it in custom data objects, adding it to an arraylist and storing in session. I wish to know what is the recommended upper limit storing data in session. Just a good practice bad practice kind of thing.
I am using JRockit 1.5 and 1.6GB of RAM. I did profiling with Jprobe and found that some parts of app have very heavy memory footprint. Most of this data is being into session to be used later.

That depends entirely on how many sessions are typically present (which in turn depends on how many users you have, how long they stay on the site, and the session timeout) and how much RAM your server has.
But first of all: have you actually used a memory profiler to tell you that your "high memory usage" is caused by session data, or are you just guessing?
If the only problem you have is "high memory usage" on a production machine (i.e. it can handle the production load but is not performing as well as you'd like), the easiest solution is to get more RAM for the server - much quicker and cheaper than redesigning the app.
But caching entire result sets in the session is bad for a different reason as well: what if the data changes in the DB and the user expects to see that change? If you're going to cache, use one of the existing systems that do this at the DB request level - they'll allow you to cache results between users and they have facilities for cache invalidation.

If you're storing data in session to improve performance, consider using true caching since cache is application-wide, whereas session is per-user, which results in unneccessary duplication of otherwise similar objects.
If, however, you're storing them for user to edit this objects (which I doubt, since hundreds of objects is way too much), try minimizing the amount of data stored or research optimistic concurrency control.

I'd say this heavily depends on the number of active sessions you expect. If you're writing an intranet application with < 20 users, it's certainly no problem to put a few MB in the session. However, if you're expecting 5000 live session for instance, each MB of data stored per session accounts for 5GB of RAM.
However, I'd generally recommend not to store any data from DB in session. Just fetch from DB for every request. If performance is an issue, use an application-wide cache (e.g. Hibernate's 2nd level cache).

What kind of data is it? Is it really needed per session or could it be cached at application level? Do you really need all the columns or only a subset? How often is it being accessed? What pages does it need to be available on? And so on.
It may make much more sense to retrieve the records from the DB when you really need to. Storing hundreds of records in session is never a good strategy.

I'd say try to store the minimum amount of data that will be enough to recreate the necessary environment in a subsequent request. If you're storing in memory to avoid a database round-trip, then a true caching solution such as Memcache might be helpful.
If you're storing these sessions in memory instead of a database, then the round-trip is saved, and requests will be served faster as long as the memory load is low, and there's no paging. Once the number of clients goes up and paging begins, most clients will see a huge degradation in response times. Both these variables and inversely related.
Its better to measure the latency to your database server, which is usually low enough in most cases to be considered as a viable means of storage instead of in-memory.

Try to split the data you are currently storing in the session into user-specific and static data. Then implement caching for all the static parts. This will give you a lot of reuse application-wide and still allow you to cache the specific data a user is working on.

You could also make per-user mini sqlite database and connect to it, and store the data the user is accessing in it, then just retrieve the records from it, while the user is requesting it, and after the user disconnects just delete the sqlite database.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cache in Java web application - java

GC will delete your objects only if any other objects don't have references at it. GC doesn't know where your cached data it just looks for alone objects. Yes, HashMap is commonly used to store cached data and retrieve it when needed.

Related

Alternate to Hashmap as cache other than in memory databases.?

Concurrent LRU Set

Java Application / ArrayList verses direct database queries

DB4O performance retrieving a large number of objects

How much session data is too much?

Categories

Resources