We are running into unusually high memory usage issues. And I observed that many places in our code we are pulling 100s of records from DB, packing it in custom data objects, adding it to an arraylist and storing in session. I wish to know what is the recommended upper limit storing data in session. Just a good practice bad practice kind of thing.
I am using JRockit 1.5 and 1.6GB of RAM. I did profiling with Jprobe and found that some parts of app have very heavy memory footprint. Most of this data is being into session to be used later.
That depends entirely on how many sessions are typically present (which in turn depends on how many users you have, how long they stay on the site, and the session timeout) and how much RAM your server has.
But first of all: have you actually used a memory profiler to tell you that your "high memory usage" is caused by session data, or are you just guessing?
If the only problem you have is "high memory usage" on a production machine (i.e. it can handle the production load but is not performing as well as you'd like), the easiest solution is to get more RAM for the server - much quicker and cheaper than redesigning the app.
But caching entire result sets in the session is bad for a different reason as well: what if the data changes in the DB and the user expects to see that change? If you're going to cache, use one of the existing systems that do this at the DB request level - they'll allow you to cache results between users and they have facilities for cache invalidation.
If you're storing data in session to improve performance, consider using true caching since cache is application-wide, whereas session is per-user, which results in unneccessary duplication of otherwise similar objects.
If, however, you're storing them for user to edit this objects (which I doubt, since hundreds of objects is way too much), try minimizing the amount of data stored or research optimistic concurrency control.
I'd say this heavily depends on the number of active sessions you expect. If you're writing an intranet application with < 20 users, it's certainly no problem to put a few MB in the session. However, if you're expecting 5000 live session for instance, each MB of data stored per session accounts for 5GB of RAM.
However, I'd generally recommend not to store any data from DB in session. Just fetch from DB for every request. If performance is an issue, use an application-wide cache (e.g. Hibernate's 2nd level cache).
What kind of data is it? Is it really needed per session or could it be cached at application level? Do you really need all the columns or only a subset? How often is it being accessed? What pages does it need to be available on? And so on.
It may make much more sense to retrieve the records from the DB when you really need to. Storing hundreds of records in session is never a good strategy.
I'd say try to store the minimum amount of data that will be enough to recreate the necessary environment in a subsequent request. If you're storing in memory to avoid a database round-trip, then a true caching solution such as Memcache might be helpful.
If you're storing these sessions in memory instead of a database, then the round-trip is saved, and requests will be served faster as long as the memory load is low, and there's no paging. Once the number of clients goes up and paging begins, most clients will see a huge degradation in response times. Both these variables and inversely related.
Its better to measure the latency to your database server, which is usually low enough in most cases to be considered as a viable means of storage instead of in-memory.
Try to split the data you are currently storing in the session into user-specific and static data. Then implement caching for all the static parts. This will give you a lot of reuse application-wide and still allow you to cache the specific data a user is working on.
You could also make per-user mini sqlite database and connect to it, and store the data the user is accessing in it, then just retrieve the records from it, while the user is requesting it, and after the user disconnects just delete the sqlite database.
Related
I am developing a web application in which I need to store session, user messages etc. I am thinking of using HashMap or H2 database.
Please let me know which is better approach in terms of performance and memory utilization. The web site has to support 10,000 users.
Thanks.
As usual with these questions, I would worry about performance as/when you know it's an issue.
10000 users is not a lot of data to hold in memory. I would likely start off with a standard Java collection, and look at performance when you predict it's going to cause you grief.
Abstract out the access to this Java collection such that when you substitute it, the refactoring required is localised (and perhaps make it configurable, such that you can easily perform before/after performance tests with your different solutions -H2, Derby, Oracle, etc. etc.)
If your session objects aren't too big (which should be the case), there is no need to persist them in a database.
Using a database for this would add a lot of complexity in a case when you can start with a few lines of code. So don't use a database, simply store them in a ligth memory structure (HashMap for example).
You may need to implement a way to clean your HashMap if you don't want to keep sessions in memory when the user left from a long time. Many solutions are available (the easiest is simply to have a background thread removing from time to time the too old sessions). Note that it's usually easier to clean a hashmap than a database.
Both H2 and Hash Map are gonna keep the data in memory (So from space point of view they are almost the same).
If look ups are simple like KEY VALUE then looking up in the Hash Map will be quicker.
If you have to do comparisons like KEY < 100 etc use H2.
In fact 10K user info is not that high a number.
If you don't need to save user messages - use the collections. But if the message is should be saved, be sure to use a database. Because after restart you lost all data.
The problem with using a HashMap for storing objects is that you would run into issues when your site becomes too big for one server and would need to be clustered in order to scale with demand. Then you would face problems with how to synchronise the HashMap instances on different servers.
A possible alternative would be to use a key-value store like Redis as you won't need the structure of a database or even use the distributed cache abilities of something like EHCache
I need to store about 100 thousands of objects representing users. Those users have a username, age, gender, city and country.
The users should be searchable by a range of age and any of the other attributes, but also a combination of attributes (e.g. women between 30 and 35 from Brussels). The results should be found quickly as it is one of the Server's services for many connected Clients). Users may only be deleted or added, not updated.
I've thought of a fast database with indexed attributes (like h2 db which seems to be pretty fast, and I've seen they have a in-memory mode)
I was wondering if any other option was possible before going for the DB.
Thank you for any ideas !
How much memory does your server have? How much memory would these objects take up? Is it feasible to keep them all in memory, or not? Do you really need the speedup of keeping in memory, vs shoving in a database? It does make it more complex to keep in memory, and it does increase hardware requirements... are you sure you need it?
Because all of what you describe could be ran on a very simple server and put in a very simple database and give you the results you want in the order of 100ms per request. Do you need faster than 100ms response time? Why?
I would use a RDBMS - there are plenty of good ORMs available, such as Hibernate, which allow you to transparently stuff the POJOs into a db. Once you've got the data access abstracted, you then have the freedom to decide how best to persist the data.
For this size of project, I would use the H2 database. It has both embedded and client/server modes, and can operate from disk or entirely in memory.
Most definitely a relational database. With that size you'll want a client-server system, not something embedded like Sqlite. Pick one system depending on further requirements. Indexing is a basic feature, most systems support it. Personally I'd try something that's popular and free such as MySQL or PostgreSQL so you can more easily google your way out of problems. If you make your SQL queries generic enough (no vendor-specific constructs), you can switch systems without much pain. I agree with bwawok, try whether a standard setup is good enough and think of optimizations later.
Did you think to use cache system like EHCache or Memcached?
Also If you have enough memory you can use some sorted collection like TreeMap as index map, or HashMap to search user by name (separate Map per field). It will take more memory but can be effective. Also you can find based on the user query experience the most frequently used query with the best selectivity and create comparator based on this query onli. In this case subset of the element will not be a big and can can be filter fast without any additional optimization.
http://www.grails.org/1.3.1+Release+Notes
Improved Query Caching
The findAll query method now supports
taking advantage of the 2nd level
cache.
Book.findAll("from Book as b where b.author=:author", [author:'Dan Brown'], [cache: true])
What advantages or disadvantages of using 2nd level cache ?
I'm developing web-server for iPhone application so i have a lot of parallel connections, DB queries, etc.
Generally, the 2nd level cache holds the application data previously retrieved from the database. The advantage is that you can make big savings on avoiding database calls for the same data. If 2nd level cache is going to be efficient or not depends on how your app is working with the data and also on the size of the data you can store in memory. Probably the only major disadvantage is that cache need to be invalidated when data is updated in the database. When that happens from your application, some frameworks can handle that automatically (e.g. write trough cache), but if database is changed externally you can only rely on the cace expiration.
I want to store data with every request (what user viewed what page of my site).
With each request I will put the data (~100 bytes) in the memcache.
Every 5 seconds I will persist that data from the memcache to the datastore.
How rare would data loss be in this scenario?
You basically can't trust memcache to keep your data at all. It's just a cache, after all, and can choose to evict data whenever it feels it is nessecary.
Having said that, in your particular scenario, the worst that will happen is you will lose 5 seconds worth of data. I don't think that's a big deal if you're just storing "pageview" data. Besides, unless the cache is running out of memory, there's really no need for it to evict data and so it's probably going to be fairly rare.
That depends on how heavily your app uses memcache for other things. Instead of this approach, though, I would suggest using task queues to store the data to the datastore for each request, without slowing the user-facing request down.
Do not trust memcached to remember your things for later. If you want something persisted then persist it there and then.
I want to cache data for a user session in a web application built on struts.What is the best way to do it .Currently we store certain information from the DB in java objects in the user's session .Works fine till now but people are now concerned about memory usage etc.
Any thought on how best to get around this problem.
Works fine till now but people are now
concerned about memory usage etc.
Being "concerned" is relatively meaningless - do they have any concrete reason for it? Statistics that show how much memory session objects are taking up? Along the same line: do you have concrete reasons for wanting to cache the data in the user session? Have you profiled the app and determined that fetching this data from the DB for each request is slowing down your app significantly?
Don't guess. Measure.
It's usually bad practice to store whole objects in the user session for this reason. You should probably store just the keys in the session and re-query the database when you need them again. This is trade-off, but usually acceptable for most use-cases. Querying the database on keys is usually acceptable to do between requests, rather than storing objects in session.
If you must have them in session, consider using something like an LRUMap (in Apache Collections).