Custom caching implementation in Java

Custom caching implementation in Java - java

I want to implement some sort of lightweight caching in Java which is easily integrable in Java and should be easy to deploy with a Java application.
The cache layer will be between the application and the database layer: no database caching, no Spring, no Hibernate, no EHcache, no http caching.
We can use a file system or a nano database so that the cache can be restored so that the cache can be restored after the process restart.
I tried LRU Cache:
http://stackoverflow.com/questions/224868/easy-simple-to-use-lru-cache-in-java
http://www.programcreek.com/2013/03/leetcode-lru-cache-java/
But I am not sure how to after overflow should I save database into database (which database will be better to use for faster insert and seek of data). Or I should use File System?
Any one has better inputs to implement caching mechanism in Java?

But I am not sure how to after overflow should I save database into database(which database will be better to use for faster insert ans seek ok data) Or I should use File System?
It depends on the use case. If your cached values are very big, you can store each of it in a file and use the hash of the cache key as file name.
If you have values small in size, storing them as separate files would be a lot of overhead, so it is better to store the cached entries into one or a couple of files. To implement this you need to learn about "external indexes" and "memory management" or "free space management" (e.g. best fit, next fit and compaction strategies). This actually leads to the implementation of a tiny database, so may be use one :) Some stuff that comes to my mind: LevelDB, MapDB, LMDB, RocksDB
Keep in mind that caching operations come in concurrently from the application, so the cache may evict a value and a request to the same key may come in at the same time. Will you implement just the basic operations like Cache.get and Cache.put or also CAS-operations like Cache.putIfAbsent? Do you want to efficiently use multi core system, as they are common today?
Still, when using a tiny database, you will need to prepare for some months of engineering work.
Any one has better inputs to implement caching mechanism in Java?
You can read my blog at cruftex.net for some more input to implement lightweight and fast caching in Java.
For a cache implementation with overflow you can take a look at imcache. But imcache is not a fully-fledged generic cache, because for example CAS-operations are missing, see the Cache interface
My own high performance Java cache implementation cache2k, features CAS-operations, events, loaders&writers, expiry, etc. and it will eventually get some overflow to disk, too. However, I am not sure about the time frame... When you are interested to work in this area: contributions are welcome!

Related

Is cache2k a persistent cache?

I have used the cache2k in my java project and it was so simple (key-value pair) and easy to use. Now I want to know is if cache2k is a persistent or non-persistent cache.
I found the answer in here
https://stackoverflow.com/a/23709996/12605243 which was said at 2014 stated that it was gonna be updated to persistent cache.
So my question is 'Am I using a persistent or non persistent cache?'. I have read their docs but unable to find it.

Basically its possible to add persitence via CacheLoader and CacheWriter. We use that in several ways to use file system or database as storage. When adding persistence this way the cache operates in the so called "cache through" mode. Some operations of the cache, especially get and put operate transparently and read or write the data via the loader and writer to the storage. Other operations, like CAS operations, just interact with the in-memory cache.
The persistence feature as it was planed was meant to be transparent for all cache operations. Although its feasible and the basic work is done in the internal infrastructure, we don't have a big need for it. Other features and tasks seem more important. However, I am happy to hear about potential use cases.

How Can I Make Fast Desktop Application With Remote Database?

I am going to make a desktop application with mysql database. My database tables are frequenlty changing -- almost 60% of the tables. So I think caching may be a bad idea. Can anyone suggest me:
How can I make a fast desktop application with a remote database ?
My language is Java.

The biggest problem with most projects that have performance as their primary concern is that people tend make some exotic choices that end up complicating the project without any real benefits. Unless you have previous actual hands-on experience with the environment you will be working start simple.
Set some realistic goals about how often you have to refresh your data before you start. If your data changes very frequently, eg. every second, does it make sense to try and show the changes in real time? A query every second will make everyone involved miserable.
Use a thread to take care of the queries. You don't need more than one, since any more will only make the race conditions in the database worst.
Design your database layer to be insulated from the rest of the application. Also time your DB-related operations from the beginning in order to track the impact of your optimizations.
Start with Hibernate / ORMLite. Although I cannot talk about ORMLite, I have used (optimized) Hibernate in heavy load environments without any problems. If you have complicated objects you should give it a try, it sure beats using plain JDBC and implementing the cache mechanism yourself.
Find out when you need lazy loading and when it's slowing you down (due to the select n+1 problem).
If you have performance issues optimize. You don't have to map every single relationship. Use custom SQL in separate methods to get the objects you need when you need them. You can write a query that only returns table ids and afterwards ask Hibernate to load the corresponding objects.
Optimize your SQL. Avoid joins, use subselects, where id in etc.
Implement (database) paging if it makes sense.
If all else fails, start using plain SQL. You' ll have already written the most complex queries and you'll know where your bigger bottlenecks are.
You could use a local SQLite to save the less volatile data and talk to the database mainly to get lists of ids and the stuff that you're missing. For example if you have users and orders, you can assume that you will have many more new orders per minute/second than users per hour.
To sum up, set clear performance goals before you start, always use a separate thread for data retrieval, avoid reinventing the wheel and keep it as simple as possible.

Here goes some generic approaches to the problem.
0) HW: make sure you are not having bottlenecks in you hardware, that you can cheaply increase. (adding HW is faster and cheaper that dev hours in most cases)
1) Caching:
Perhaps you can cache (locally or in a distributed cache like memcache) the 40% of data that tends to be immutable. You could invalidate the cache when data gets modified. You should choose the right entities and granularity level for building the keys.
2) Replication:
If the first is to much overhead, you could create slaves of your mysql and read from there. Again, you have to know when you can afford to have some stale data.
3) NoSQL:
Moving in that direction, but increasing the dev effort, you could move to some distributed store (take a look at the CAP theorem before making a choice)
Hope it helps

Depends on your database structure and application. You can use an object relational mapping library like ormlite and refresh objects loaded from database at the background with threads. With ormlite you may also use LazyForeignCollection to load only required data in your application.

Minimize unnecessary database call.
If your fields on database is changing, you can shift from relational to NoSQL database like MongoDB.
You can perform multithreading on the server side for data processing and clustering of application servers. While using multithreading use it effectively, be aware of the sychronized keyword, it will degrade the performance to some extend.
Perform best practice of coding, don't use more instance variable, try to use local variable, it will make you thread safe also.
You can use Mybatis for ORM also for large queries.
You can perform caching on DAO layer, service layer and even in client side but be sure to sychronize with the database, you can use different caching soutions.
You can do database indexing for first retrival.
Do not use same service for large data querying break it down into different services which will help u to process in multithreading way.
If the application is not very hard real time system you can use messaging solution also, like asychronously processing of data.

Options for In-memory databases (Open source and Java-based)

I've a web app that makes external web service calls on behalf of it's clients. I want to cache the data returns by some web services in the web app so that other clients can reuse this data and run filters and queries on this cached data.
The current architecture of the web app uses Apache Camel, Spring and Jetty. I'm looking for options (pros/cons) of in-memory database options.

Hazelcast (Java API) - you can distribute the in-memory datagrid (with map, multimap, sets, lists, queues, topics) over multiple nodes very easily & use load/store interface implementation with a disk based DB. You can do something similar with EHCache.
Redis is another option (use the Java client to access it). You can simply configure the conf file to write data to disk (or avoid it altogether) & should not have to write your own load/store classes.
Besides these, there are a number of options you could use. Not sure if you are only looking at open source options, looking at distributed options or not.
Hope it helps.

Have you considered using MemCached? It is not a database, but a caching system you can control from inside your application.
Here are a few more thoughts about in-memory databases. First almost every modern RDBMS has a memory caching system inside it. The more memory you give to the database server (and configure it for caching) the more that it will store in memory for later. If you put together a system with enough memory to cache all the tables, you will have an "in memory" cache without the overhead of another database.
Most total "in memory" databases are used for high volume/large data systems where performance is totally key. And, because they are for extreme performance systems, you are going to pay for them. Or more specifically, pay extra for them. For example, the SAP/Sybase DB's that support full in-memory can cost you from 40% to 300% more than our existing products.
So, in answer to your question, do you really need one?

Try Redisson - distributed and scalable familar Java data structures (Set, Map, ConcurrentMap, List, Queue, Lock, AtomicLong, CountDownLatch, Publish / Subscribe) on top of in-memory db Redis.

Can a streaming collection be implemented in Java?

I needed to implement a utility server that tracks few custom variables that will be sent from any other server. To track the variables, a key value collection, either JDK defined or custom needs to be used.
Here are few considerations -
Keeping all the variables in memory of the server all the time is memory intensive.
This server needs to be a very lightweight server and I do not want heavy database operations.
Is there a pre-defined streaming collection which can serialize the data after a threshold memory and retrieve it on need basis?
I hope I am clear in defining the problem statement.
Please suggest if any other better approach.

this thing looks very promising, but is in development stage...
JDBM3
Edit Current version of the file backed collections: MapDB.

Database
What you've described sounds exactly like you should use a database (i.e. indexed key/value store, too big for memory but want performance benefits of in-memory caching where possible).
I'd recommend a lightweight embedded database such as H2 - it's small, fast and should suit your purposes very well.

Have you thought of using an on the shelf nosql queue value store? Redis for example?
If you want it java only you have the option of using a lib like ehcache, it would have the functionalities you need.

Highest Performance Database in Java

I need ideas to implement a (really) high performance in-memory Database/Storage Mechanism in Java. In the range of storing 20,000+ java objects, updated every 5 or so seconds.
Some options I am open to:
Pure JDBC/database combination
JDO
JPA/ORM/database combination
An Object Database
Other Storage Mechanisms
What is my best option? What are your experiences?
EDIT: I also need like to be able to Query these objects

You could try something like Prevayler (basically an in-memory cache that handles serialization and backup for you so data persists and is transactionally safe). There are other similar projects.
I've used it for a large project, it's safe and extremely fast.
If it's the same set of 20,000 objects, or at least not 20,000 new objects every 5 seconds but lots of changes, you might be better off cacheing the changes and periodically writing the changes in batch mode (jdbc batch updates are much faster than individual row updates). Depends on whether you need each write to be transactionally wrapped, and whether you'll need a record of the change logs or just aggregate changes.
Edit: as other posts have mentioned Prevayler I thought I'd leave a note on what it does:
Basically you create a searchable/serializable object (typically a Map of some sort) which is wrapped in a Prevayler instance, which is serialized to disk. Rather than making changes directly to your map, you make changes by sending your Prevayler instance a serializable record of your change (just an object that contains the change instruction). Prevayler's version of a transaction is to write your serialization changes to disk so that in the event of failure it can load the last complete backup and then replay the changes against that. It's safe, although you do have to have enough memory to load all of your data, and it's a fairly old API, so no generic interfaces, unfortunately. But definitely stable and works as advertised.

I highly recommend H2. This is a kind of "second generation" version of HSQLDB done by one of the original authors. H2 allows us to unit-test our DAO layer without requiring an actual PostgreSQL database, which is awesome.
There is an active net group and mailing list, and the author Thomas Mueller is very responsive to queries (hah, little pun there.)

I don't know if it is the fastest option, but I've been very satisfied with H2 whenever I've used it. It's written by the same person who originally wrote Hypersonic (which later became HSQLDB).
Another option that is allegedly very fast is Prevayler.

It is a bit of an old question, but these days there is a whole lot of databases that have a level of performance of 20,000/s. Which database to chose depends on data structure and type of queries you'd like to be making. It also depends on overall volume.
We had similar problem with large volume of time series data, about 300,000 rec/s and we ended up writing a new database, with simple enough API and decent performance. It can do about 2,000,000 object writes/s and we did away without ORM.
It later evolved into QuestDB.

Try the following, it performs really well with Hibernate and other ORM frameworks
http://hsqldb.org/

Chronicle Map is an embeddable pure Java persistent database, providing a simple java.util.Map interface. It withstands about 1 million queries/updates per second from a single thread, consistent read/write performance and scales almost linearly to the number of cores in the machine.
Here are some recent performance research with actual numbers:
Comparison of Jetbrains Xodus, Oracle Berkeley DB JE BTree, MapDB TreeMap, Chronicle Map and H2 MVStore Map
LmdbJava Benchmarks

I would give a try to OrientDB.

Terracotta might also be an answer for you. It allows multiple VMs to share objects so you can distribute load etc...

You can also check out db4o

If you want to store all of your data in memory, you might want to look at Prevayler.
I've never used it myself, but it seems like a much better solution than using a relational database for those cases in which all of your data can be stored in memory.

Berkeley DB for Java is a fast in memory database, extremely useful for simple object graphs.

hsqldb is quite fast, but it is not ACID transaction-safe. The fastest java-database I know is db4o: benchmarks.
Edit: Please notice that Prevayler is not a database, see http://www.prevayler.org/wiki.jsp?topic=PrevaylerIsNotADatabase. If you're out of RAM, you're out of luck.

H2 is truly fantastic, indeed, in memory, normal server and transactional, you have it all. However It doesn't compare in performance to the object databases, I see Db4o mentioned, I have had much better performance with Neodatis in fact, and everything nicely set up in Maven repositories. Although not very robust, like a Ferrari, fast but not a truck like Oracle.

You can try CSQL (available under open source and enterprise version) It provides 30X performance improvement over disk based database systems and provides JDBC interface. It can be configured to work as stand alone main memory database or as a transparent cache to MySQL, Postgres, Oracle databases.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.