java embedded database w/ ability to store as one file - java

I need to create a storage file format for some simple data in a tabular format, was trying to use HDF5 but have just about given up due to some issues, and I'd like to reexamine the use of embedded databases to see if they are fast enough for my application.
Is there a reputable embedded Java database out there that has the option to store data in one file? The only one I'm aware of is SQLite (Java bindings available). I tried H2 and HSQLDB but out of the box they seem to create several files, and it is highly desirable for me to have a database in one file.
edit: reasonably fast performance is important. Object storage is not; for performance concerns I only need to store integers and BLOBs. (+ some strings but nothing performance critical)
edit 2: storage data efficiency is important for larger datasets, so XML is out.

Nitrite Database http://www.dizitart.org/nitrite-database.html
NOsql Object (NO2 a.k.a Nitrite) database is an open source nosql
embedded document store written in Java with MongoDB like API. It
supports both in-memory and single file based persistent store.

H2 uses only one file, if you use the latest H2 build with the PAGE_STORE option. It's a new feature, so it might not be solid.

If you only need read access then H2 is able to read the database files from a zip file.
Likewise if you don't need persistence it's possible to have an in-memory only version of H2.
If you need both read/write access and persistence, then you may be out of luck with standard SQL-type databases, as these pretty much all uniformly maintain the index and data files separately.

Once i used an object database that saved its data to a file. It has a Java and a .NET interface. You might want to check it out. It's called db4o.

Chronicle Map is an embedded pure Java database.
It stores data in one file, i. e.
ChronicleMap<Integer, String> map = ChronicleMap
.of(Integer.class, String.class)
.averageValue("my-value")
.entries(10_000)
.createPersistedTo(databaseFile);
Chronicle Map is mature (no severe storage bugs reported for months now, while it's in active use).
Idependent benchmarks show that Chronicle Map is the fastest and the most memory efficient key-value store for Java.
The major disadvantage for your use case is that Chronicle Map supports only a simple key-value model, however more complex solution could be build on top of it.
Disclaimer: I'm the developer of Chronicle Map.

If you are looking for a small and fast database to maybe ship with another program I would check Apache Derby I don't know how you would define embedded-database but I used this in some projects as a debugging database that can be checked in with the source and is available on every developer machine instantaneous.

This isn't an SQL engine, but If you use Prevayler with XStream, you can easily create a single XML file with all your data. (Prevayler calls it a snapshot file.)
Although it isn't SQL-based, and so requires a little elbow grease, its self-contained nature makes development (and especially good testing) much easier. Plus, it's incredibly fast and reliable.

You may want to check out jdbm - we use it on several projects, and it is quite fast. It does use 2 files (a database file and a log file) if you are using it for ACID type apps, but you can drop directly to direct database access (no log file) if you don't need solid ACID.
JDBM will easily support integers and blobs (anything you want), and is quite fast. It isn't really designed for concurrency, so you have to manage the locking yourself if you have multiple threads, but if you are looking for a simple, solid embedded database, it's a good option.

Since you mentioned sqlite, I assume that you don't mind a native db (as long as good java bindings are available). Firebird works well with java, and does single file storage by default.
Both H2 and HSQLDB would be excellent choices, if you didn't have the single file requirement.

I think for now I'm just going to continue to use HDF5 for the persistent data storage, in conjunction with H2 or some other database for in-memory indexing. I can't get SQLite to use BLOBs with the Java driver I have, and I can't get embedded Firebird up and running, and I don't trust H2 with PAGE_STORE yet.

Related

Can a streaming collection be implemented in Java?

I needed to implement a utility server that tracks few custom variables that will be sent from any other server. To track the variables, a key value collection, either JDK defined or custom needs to be used.
Here are few considerations -
Keeping all the variables in memory of the server all the time is memory intensive.
This server needs to be a very lightweight server and I do not want heavy database operations.
Is there a pre-defined streaming collection which can serialize the data after a threshold memory and retrieve it on need basis?
I hope I am clear in defining the problem statement.
Please suggest if any other better approach.
this thing looks very promising, but is in development stage...
JDBM3
Edit Current version of the file backed collections: MapDB.
Database
What you've described sounds exactly like you should use a database (i.e. indexed key/value store, too big for memory but want performance benefits of in-memory caching where possible).
I'd recommend a lightweight embedded database such as H2 - it's small, fast and should suit your purposes very well.
Have you thought of using an on the shelf nosql queue value store? Redis for example?
If you want it java only you have the option of using a lib like ehcache, it would have the functionalities you need.

Using file system instead of database to store pdf files in jackrabbit

In our project we use jackrabbit with spring and tomcat to manage pdf files.
Currently MySql database is being used to store blob files (in terms of jackrabbit it's called BundleDbPersistenceManager).
As soon as the number of generated files grow we thought of using file system instead of database to boost performance and to eliminate replication overhead.
In the spec jackrabbit team recommend using BundleFsPersistenceManager instead but with comments like this
Not meant to be used in production environments (except for read-only uses)
Does anyone have any experience using BundleFsPersistenceManager and can reference any resources on painless migration from blobs in mysql database to files in the filesystem?
Thank you very much in advance
Persistence in Jackrabbit is a bit complicated, it makes sense to read the configuration overview documentation first.
In Jackrabbit, binaries are stored in the data store by default, and not in the persistence manager. Even if you use the BundleDbPersistenceManager, large binary files are stored in the data store. You can combine the (default) FileDataStore with the BundleDbPersistenceManager.
I would recommended to not use the BundleFsPersistenceManager, because data can get corrupt quite easily if the program gets killed while writing.

Opinion on data storage

I have an upcoming project where the core of it will be storing a mapping between two integers. ( 1234 in column A maps to 4567 in column B). There are roughly 1000 mappings. A lookup on the mappings will be done every time a user hits a certain url on the site.
It seems like inserting it into our relational database is overkill. The overhead of selecting it out on every hit seems high. On the other hand, storing it an XML file and loading that flat file from disk every time theres a hit, also seems less than optimal.
So my question is this: what is the ideal data structure and method to persist this mapping?
The system architecture is tomcat + apache + mysql. The code will be running in tomcat.
EDIT:
Mappings are static, I won't need to change them. Seems like the XML file in a hashmap is the way to go.
I would use a properties file or an XML file, load it into memory (as a HashMap<Integer, Integer>) on startup and then just serve from the hashmap.
If you need to change the mapping at execution time, you could either write it back immediately or potentially just write changes incrementally (and update the in-memory map), with a process to unify the original file and the changes on startup. This doesn't work terribly well if you need to scale to multiple servers, of course - at that point you need to work out what sort of consistency you need etc. A database is probably the simplest way of proceeding, but it depends on the actual requirements.
I agree a relational database seems a bit of overkill. You may want to look at a NoSQL database. MongoDB is my personal favourite, but there are plenty out there. Do a search on NoSQL databases.
A NoSQL database will allow you to store this mapping as a simple document, with extremely faster searching and updating of the data. Obviously it's another technology in your stack though, so that's something for you to consider.
You could try using an in-memory database like H2 or HSQLDB. The memory footprint will likely be larger than with in-memory hashmap and file, but on the upside you can use SQL for querying and updating and don't need to worry about concurrent access.

Alternative of Storing data except databases like mysql,sql etc

I had completed my project Address Book in Java core, in which my data is stored in database (MySql).
I am facing a problem that when i run my program on other computer than tere is the requirement of creating the hole data base again.
So please tell me any alternative for storing my data without using any database software like mysql, sql etc.
You can use an in-memory database such as HSQLDB, Derby (a.k.a JavaDB), H2, ..
All of those can run without any additional software installation and can be made to act like just another library.
I would suggest using an embeddable, lightweight database such as SQLite. Check it out.
From the features page (under the section Suggested Uses For SQLite):
Application File Format. Rather than
using fopen() to write XML or some
proprietary format into disk files
used by your application, use an
SQLite database instead. You'll avoid
having to write and troubleshoot a
parser, your data will be more easily
accessible and cross-platform, and
your updates will be transactional.
The whole point of StackOverflow was so that you would not have to email around questions/answers :)
You could store data in a filesystem, memory (use serialisation etc) which are simple alternatives to DB. You can even use HSQLDB which can be run completely in memory
If you data is not so big, you may use simple txt file and store everything in it. Then load it in memory. But this will lead to changing the way you modify/query data.
Database software like mysql, sql etc provides an abstraction in terms of implementation effort. If you wish to avoid using the same, you can think of having your own database like XML or flat files. XML is still a better choice as XML parsers or handlers are available. Putting your data in your customised database/flat files will not be manageable in the long run.
Why don't you explore sqlite? It is file based, means you don't need to install it separately and still you have the standard SQL to retrieve or interact with the data? I think, sqlite will be a better choice.
Just use a prevayler (.org). Faster and simpler than using a database.
I assume from your question that you want some form of persistent storage to the local file system of the machine your application runs on. In addition to that, you need to decide on how the data in your application is to be used, and the volume of it. Do you need a database? Are you going to be searching the data different fields? Do you need a query language? Is the data small enough to fit in to a simple data structure in memory? How resilient does it need to be? The answers to these types of questions will help lead to the correct choice of storage. It could be that all you need is a simple CSV file, XML or similar. There are a host of lightweight databases such as SQLite, Berkelely DB, JavaDB etc - but whether or not you need the power of a database is up to your requirements.
A store that I'm using a lot these days is Neo4j. It's a graph database and is not only easy to use but also is completely in Java and is embedded. I much prefer it to a SQL alternative.
In addition of the others answers about embedded databases I was working on a objects database that directly serialize java objects without the need for ORM. Its name is Sofof and I use it in my projects. It has many features which are described in its website page.

Highest Performance Database in Java

I need ideas to implement a (really) high performance in-memory Database/Storage Mechanism in Java. In the range of storing 20,000+ java objects, updated every 5 or so seconds.
Some options I am open to:
Pure JDBC/database combination
JDO
JPA/ORM/database combination
An Object Database
Other Storage Mechanisms
What is my best option? What are your experiences?
EDIT: I also need like to be able to Query these objects
You could try something like Prevayler (basically an in-memory cache that handles serialization and backup for you so data persists and is transactionally safe). There are other similar projects.
I've used it for a large project, it's safe and extremely fast.
If it's the same set of 20,000 objects, or at least not 20,000 new objects every 5 seconds but lots of changes, you might be better off cacheing the changes and periodically writing the changes in batch mode (jdbc batch updates are much faster than individual row updates). Depends on whether you need each write to be transactionally wrapped, and whether you'll need a record of the change logs or just aggregate changes.
Edit: as other posts have mentioned Prevayler I thought I'd leave a note on what it does:
Basically you create a searchable/serializable object (typically a Map of some sort) which is wrapped in a Prevayler instance, which is serialized to disk. Rather than making changes directly to your map, you make changes by sending your Prevayler instance a serializable record of your change (just an object that contains the change instruction). Prevayler's version of a transaction is to write your serialization changes to disk so that in the event of failure it can load the last complete backup and then replay the changes against that. It's safe, although you do have to have enough memory to load all of your data, and it's a fairly old API, so no generic interfaces, unfortunately. But definitely stable and works as advertised.
I highly recommend H2. This is a kind of "second generation" version of HSQLDB done by one of the original authors. H2 allows us to unit-test our DAO layer without requiring an actual PostgreSQL database, which is awesome.
There is an active net group and mailing list, and the author Thomas Mueller is very responsive to queries (hah, little pun there.)
I don't know if it is the fastest option, but I've been very satisfied with H2 whenever I've used it. It's written by the same person who originally wrote Hypersonic (which later became HSQLDB).
Another option that is allegedly very fast is Prevayler.
It is a bit of an old question, but these days there is a whole lot of databases that have a level of performance of 20,000/s. Which database to chose depends on data structure and type of queries you'd like to be making. It also depends on overall volume.
We had similar problem with large volume of time series data, about 300,000 rec/s and we ended up writing a new database, with simple enough API and decent performance. It can do about 2,000,000 object writes/s and we did away without ORM.
It later evolved into QuestDB.
Try the following, it performs really well with Hibernate and other ORM frameworks
http://hsqldb.org/
Chronicle Map is an embeddable pure Java persistent database, providing a simple java.util.Map interface. It withstands about 1 million queries/updates per second from a single thread, consistent read/write performance and scales almost linearly to the number of cores in the machine.
Here are some recent performance research with actual numbers:
Comparison of Jetbrains Xodus, Oracle Berkeley DB JE BTree, MapDB TreeMap, Chronicle Map and H2 MVStore Map
LmdbJava Benchmarks
I would give a try to OrientDB.
Terracotta might also be an answer for you. It allows multiple VMs to share objects so you can distribute load etc...
You can also check out db4o
If you want to store all of your data in memory, you might want to look at Prevayler.
I've never used it myself, but it seems like a much better solution than using a relational database for those cases in which all of your data can be stored in memory.
Berkeley DB for Java is a fast in memory database, extremely useful for simple object graphs.
hsqldb is quite fast, but it is not ACID transaction-safe. The fastest java-database I know is db4o: benchmarks.
Edit: Please notice that Prevayler is not a database, see http://www.prevayler.org/wiki.jsp?topic=PrevaylerIsNotADatabase. If you're out of RAM, you're out of luck.
H2 is truly fantastic, indeed, in memory, normal server and transactional, you have it all. However It doesn't compare in performance to the object databases, I see Db4o mentioned, I have had much better performance with Neodatis in fact, and everything nicely set up in Maven repositories. Although not very robust, like a Ferrari, fast but not a truck like Oracle.
You can try CSQL (available under open source and enterprise version) It provides 30X performance improvement over disk based database systems and provides JDBC interface. It can be configured to work as stand alone main memory database or as a transparent cache to MySQL, Postgres, Oracle databases.

Categories

Resources