Can a streaming collection be implemented in Java? - java

I needed to implement a utility server that tracks few custom variables that will be sent from any other server. To track the variables, a key value collection, either JDK defined or custom needs to be used.
Here are few considerations -
Keeping all the variables in memory of the server all the time is memory intensive.
This server needs to be a very lightweight server and I do not want heavy database operations.
Is there a pre-defined streaming collection which can serialize the data after a threshold memory and retrieve it on need basis?
I hope I am clear in defining the problem statement.
Please suggest if any other better approach.

this thing looks very promising, but is in development stage...
JDBM3
Edit Current version of the file backed collections: MapDB.

Database
What you've described sounds exactly like you should use a database (i.e. indexed key/value store, too big for memory but want performance benefits of in-memory caching where possible).
I'd recommend a lightweight embedded database such as H2 - it's small, fast and should suit your purposes very well.

Have you thought of using an on the shelf nosql queue value store? Redis for example?
If you want it java only you have the option of using a lib like ehcache, it would have the functionalities you need.

Related

Custom caching implementation in Java

I want to implement some sort of lightweight caching in Java which is easily integrable in Java and should be easy to deploy with a Java application.
The cache layer will be between the application and the database layer: no database caching, no Spring, no Hibernate, no EHcache, no http caching.
We can use a file system or a nano database so that the cache can be restored so that the cache can be restored after the process restart.
I tried LRU Cache:
http://stackoverflow.com/questions/224868/easy-simple-to-use-lru-cache-in-java
http://www.programcreek.com/2013/03/leetcode-lru-cache-java/
But I am not sure how to after overflow should I save database into database (which database will be better to use for faster insert and seek of data). Or I should use File System?
Any one has better inputs to implement caching mechanism in Java?
But I am not sure how to after overflow should I save database into database(which database will be better to use for faster insert ans seek ok data) Or I should use File System?
It depends on the use case. If your cached values are very big, you can store each of it in a file and use the hash of the cache key as file name.
If you have values small in size, storing them as separate files would be a lot of overhead, so it is better to store the cached entries into one or a couple of files. To implement this you need to learn about "external indexes" and "memory management" or "free space management" (e.g. best fit, next fit and compaction strategies). This actually leads to the implementation of a tiny database, so may be use one :) Some stuff that comes to my mind: LevelDB, MapDB, LMDB, RocksDB
Keep in mind that caching operations come in concurrently from the application, so the cache may evict a value and a request to the same key may come in at the same time. Will you implement just the basic operations like Cache.get and Cache.put or also CAS-operations like Cache.putIfAbsent? Do you want to efficiently use multi core system, as they are common today?
Still, when using a tiny database, you will need to prepare for some months of engineering work.
Any one has better inputs to implement caching mechanism in Java?
You can read my blog at cruftex.net for some more input to implement lightweight and fast caching in Java.
For a cache implementation with overflow you can take a look at imcache. But imcache is not a fully-fledged generic cache, because for example CAS-operations are missing, see the Cache interface
My own high performance Java cache implementation cache2k, features CAS-operations, events, loaders&writers, expiry, etc. and it will eventually get some overflow to disk, too. However, I am not sure about the time frame... When you are interested to work in this area: contributions are welcome!

Java in-memory database-like object

I'm looking at a project that will benefit from an in-memory database-like object. I will have perhaps one or two thousand objects with the same structure, all inheriting from an abstract class. There will be a couple of string fields, an int, perhaps an enum or two (or maybe even a set of enums), and then a set of Strings. There would need to be a transient boolean field as well, but that likely wouldn't be a concern.
These objects would be instantiated and constructed from preset data, although additional ones can be created beyond that if need be. They may be stored in an XML file or something similar. I'd rather not hardwire the entire thing, of course, and using a local database like SQLite feels like overkill.
Storing these objects would be relatively simple if not for one thing: I want the user to easily be able to find the object they want from ANY of the values, most of which would be unique. This rules out a HashMap unless I want to wrap a massive bunch of them, which is hardly ideal. That leaves me looking for a sort of indexed, in-memory database-like object that supports retrieval via any field of the object. It may not have to store the objects directly, but could assemble them upon retrieval, or retrieve a "row" based on one field, get another field from the same "row" that serves as a sort of key, and then retrieve the object from a single HashMap based on that key.
In short, the idea is to easily and quickly retrieve objects with the same fields based on any field they contain. I've seen a variety of different libraries and such that may do this sort of thing, but there's a real myriad of these things out there. Whatever may work would need to be free and compatible with a variety of open licenses.
For an "in memory database-like object" please don't re-invent the wheel. However simple or complex your needs are, using a library that has already been through years (or decades) of debugging and optimization is absolutely the right thing to do if it can be made to suit your needs.
SQLite is a fine choice. It is not Java, however, and requires a separate JDBC driver (no JDBC driver for SQLite is maintained by the SQlite project).
Another RDBMS which is small footprint, reasonably lightweight, and robust is HSQLDB which includes a version 4.1 JDBC driver as part of the project. It is 100% Java. It provides native support for in memory tables. Any time I use the word "database" and "thousands of objects", my thoughts immediately go to HSQLDB.
The project is managed by the HSQLDB development group and is available here: http://www.hsqldb.org
I suggest going with the Caching solutions provided by Guava framework
https://code.google.com/p/guava-libraries/wiki/CachesExplained
A little late but you could use: http://www.mapdb.org
From their website:
MapDB is an embedded database engine for Java. It provides Maps and other collections backed by disk or off-heap memory storage. MapDB is free under Apache License.

Options for In-memory databases (Open source and Java-based)

I've a web app that makes external web service calls on behalf of it's clients. I want to cache the data returns by some web services in the web app so that other clients can reuse this data and run filters and queries on this cached data.
The current architecture of the web app uses Apache Camel, Spring and Jetty. I'm looking for options (pros/cons) of in-memory database options.
Hazelcast (Java API) - you can distribute the in-memory datagrid (with map, multimap, sets, lists, queues, topics) over multiple nodes very easily & use load/store interface implementation with a disk based DB. You can do something similar with EHCache.
Redis is another option (use the Java client to access it). You can simply configure the conf file to write data to disk (or avoid it altogether) & should not have to write your own load/store classes.
Besides these, there are a number of options you could use. Not sure if you are only looking at open source options, looking at distributed options or not.
Hope it helps.
Have you considered using MemCached? It is not a database, but a caching system you can control from inside your application.
Here are a few more thoughts about in-memory databases. First almost every modern RDBMS has a memory caching system inside it. The more memory you give to the database server (and configure it for caching) the more that it will store in memory for later. If you put together a system with enough memory to cache all the tables, you will have an "in memory" cache without the overhead of another database.
Most total "in memory" databases are used for high volume/large data systems where performance is totally key. And, because they are for extreme performance systems, you are going to pay for them. Or more specifically, pay extra for them. For example, the SAP/Sybase DB's that support full in-memory can cost you from 40% to 300% more than our existing products.
So, in answer to your question, do you really need one?
Try Redisson - distributed and scalable familar Java data structures (Set, Map, ConcurrentMap, List, Queue, Lock, AtomicLong, CountDownLatch, Publish / Subscribe) on top of in-memory db Redis.

java embedded database w/ ability to store as one file

I need to create a storage file format for some simple data in a tabular format, was trying to use HDF5 but have just about given up due to some issues, and I'd like to reexamine the use of embedded databases to see if they are fast enough for my application.
Is there a reputable embedded Java database out there that has the option to store data in one file? The only one I'm aware of is SQLite (Java bindings available). I tried H2 and HSQLDB but out of the box they seem to create several files, and it is highly desirable for me to have a database in one file.
edit: reasonably fast performance is important. Object storage is not; for performance concerns I only need to store integers and BLOBs. (+ some strings but nothing performance critical)
edit 2: storage data efficiency is important for larger datasets, so XML is out.
Nitrite Database http://www.dizitart.org/nitrite-database.html
NOsql Object (NO2 a.k.a Nitrite) database is an open source nosql
embedded document store written in Java with MongoDB like API. It
supports both in-memory and single file based persistent store.
H2 uses only one file, if you use the latest H2 build with the PAGE_STORE option. It's a new feature, so it might not be solid.
If you only need read access then H2 is able to read the database files from a zip file.
Likewise if you don't need persistence it's possible to have an in-memory only version of H2.
If you need both read/write access and persistence, then you may be out of luck with standard SQL-type databases, as these pretty much all uniformly maintain the index and data files separately.
Once i used an object database that saved its data to a file. It has a Java and a .NET interface. You might want to check it out. It's called db4o.
Chronicle Map is an embedded pure Java database.
It stores data in one file, i. e.
ChronicleMap<Integer, String> map = ChronicleMap
.of(Integer.class, String.class)
.averageValue("my-value")
.entries(10_000)
.createPersistedTo(databaseFile);
Chronicle Map is mature (no severe storage bugs reported for months now, while it's in active use).
Idependent benchmarks show that Chronicle Map is the fastest and the most memory efficient key-value store for Java.
The major disadvantage for your use case is that Chronicle Map supports only a simple key-value model, however more complex solution could be build on top of it.
Disclaimer: I'm the developer of Chronicle Map.
If you are looking for a small and fast database to maybe ship with another program I would check Apache Derby I don't know how you would define embedded-database but I used this in some projects as a debugging database that can be checked in with the source and is available on every developer machine instantaneous.
This isn't an SQL engine, but If you use Prevayler with XStream, you can easily create a single XML file with all your data. (Prevayler calls it a snapshot file.)
Although it isn't SQL-based, and so requires a little elbow grease, its self-contained nature makes development (and especially good testing) much easier. Plus, it's incredibly fast and reliable.
You may want to check out jdbm - we use it on several projects, and it is quite fast. It does use 2 files (a database file and a log file) if you are using it for ACID type apps, but you can drop directly to direct database access (no log file) if you don't need solid ACID.
JDBM will easily support integers and blobs (anything you want), and is quite fast. It isn't really designed for concurrency, so you have to manage the locking yourself if you have multiple threads, but if you are looking for a simple, solid embedded database, it's a good option.
Since you mentioned sqlite, I assume that you don't mind a native db (as long as good java bindings are available). Firebird works well with java, and does single file storage by default.
Both H2 and HSQLDB would be excellent choices, if you didn't have the single file requirement.
I think for now I'm just going to continue to use HDF5 for the persistent data storage, in conjunction with H2 or some other database for in-memory indexing. I can't get SQLite to use BLOBs with the Java driver I have, and I can't get embedded Firebird up and running, and I don't trust H2 with PAGE_STORE yet.

Highest Performance Database in Java

I need ideas to implement a (really) high performance in-memory Database/Storage Mechanism in Java. In the range of storing 20,000+ java objects, updated every 5 or so seconds.
Some options I am open to:
Pure JDBC/database combination
JDO
JPA/ORM/database combination
An Object Database
Other Storage Mechanisms
What is my best option? What are your experiences?
EDIT: I also need like to be able to Query these objects
You could try something like Prevayler (basically an in-memory cache that handles serialization and backup for you so data persists and is transactionally safe). There are other similar projects.
I've used it for a large project, it's safe and extremely fast.
If it's the same set of 20,000 objects, or at least not 20,000 new objects every 5 seconds but lots of changes, you might be better off cacheing the changes and periodically writing the changes in batch mode (jdbc batch updates are much faster than individual row updates). Depends on whether you need each write to be transactionally wrapped, and whether you'll need a record of the change logs or just aggregate changes.
Edit: as other posts have mentioned Prevayler I thought I'd leave a note on what it does:
Basically you create a searchable/serializable object (typically a Map of some sort) which is wrapped in a Prevayler instance, which is serialized to disk. Rather than making changes directly to your map, you make changes by sending your Prevayler instance a serializable record of your change (just an object that contains the change instruction). Prevayler's version of a transaction is to write your serialization changes to disk so that in the event of failure it can load the last complete backup and then replay the changes against that. It's safe, although you do have to have enough memory to load all of your data, and it's a fairly old API, so no generic interfaces, unfortunately. But definitely stable and works as advertised.
I highly recommend H2. This is a kind of "second generation" version of HSQLDB done by one of the original authors. H2 allows us to unit-test our DAO layer without requiring an actual PostgreSQL database, which is awesome.
There is an active net group and mailing list, and the author Thomas Mueller is very responsive to queries (hah, little pun there.)
I don't know if it is the fastest option, but I've been very satisfied with H2 whenever I've used it. It's written by the same person who originally wrote Hypersonic (which later became HSQLDB).
Another option that is allegedly very fast is Prevayler.
It is a bit of an old question, but these days there is a whole lot of databases that have a level of performance of 20,000/s. Which database to chose depends on data structure and type of queries you'd like to be making. It also depends on overall volume.
We had similar problem with large volume of time series data, about 300,000 rec/s and we ended up writing a new database, with simple enough API and decent performance. It can do about 2,000,000 object writes/s and we did away without ORM.
It later evolved into QuestDB.
Try the following, it performs really well with Hibernate and other ORM frameworks
http://hsqldb.org/
Chronicle Map is an embeddable pure Java persistent database, providing a simple java.util.Map interface. It withstands about 1 million queries/updates per second from a single thread, consistent read/write performance and scales almost linearly to the number of cores in the machine.
Here are some recent performance research with actual numbers:
Comparison of Jetbrains Xodus, Oracle Berkeley DB JE BTree, MapDB TreeMap, Chronicle Map and H2 MVStore Map
LmdbJava Benchmarks
I would give a try to OrientDB.
Terracotta might also be an answer for you. It allows multiple VMs to share objects so you can distribute load etc...
You can also check out db4o
If you want to store all of your data in memory, you might want to look at Prevayler.
I've never used it myself, but it seems like a much better solution than using a relational database for those cases in which all of your data can be stored in memory.
Berkeley DB for Java is a fast in memory database, extremely useful for simple object graphs.
hsqldb is quite fast, but it is not ACID transaction-safe. The fastest java-database I know is db4o: benchmarks.
Edit: Please notice that Prevayler is not a database, see http://www.prevayler.org/wiki.jsp?topic=PrevaylerIsNotADatabase. If you're out of RAM, you're out of luck.
H2 is truly fantastic, indeed, in memory, normal server and transactional, you have it all. However It doesn't compare in performance to the object databases, I see Db4o mentioned, I have had much better performance with Neodatis in fact, and everything nicely set up in Maven repositories. Although not very robust, like a Ferrari, fast but not a truck like Oracle.
You can try CSQL (available under open source and enterprise version) It provides 30X performance improvement over disk based database systems and provides JDBC interface. It can be configured to work as stand alone main memory database or as a transparent cache to MySQL, Postgres, Oracle databases.

Categories

Resources