Highly reliable storage for a 'log' / time series

Highly reliable storage for a 'log' / time series - java

In an application I'm working on, I need a write-behind data log. That is, the application accumulates data in memory, and can hold all the data in memory. It must, however, persist, tolerate reasonable faults, and allow for backup.
Obviously, I could write to a SQL database; Derby springs to mind for easy embedding. I'm not tremendously fond of the dealing with a SQL API (JDBC however lipsticked) and I don't need any queries, indices, or other decoration. The records go out, and on restart, I need to read them all back.
Are there any other suitable alternatives?

Try using a just a simple log file.
As data comes in, store in memory and write (append) to a file. write() followed by fsync() will guarantee (on most systems, read your system and filesystem docs carefully) that the data is written to persistent storage (disc). These are the same mechanisms any database engine would use to get data in persistent storage.
On restart, reload the log. Occasionally, trim the front of the log file so data usage doesn't grow infinitely. Or, model the log file as a circular buffer the same size as what you can hold in memory.

Have you looked at (now Oracle) Berkeley DB for Java? The "Direct Persistence Layer" is actually quite simple to use. Docs here for DPL.
Has different options for backups comes with a few utilities. Runs embedded.
(Licensing: a form of the BSD License I beleive.)

Related

SQLite/Java: enable memory caching, redice I/O as much as possible

I am using SQLite to persist non-critical information to disk. Database I am working with is relatively small, up to 10Gb. At the same time workstation has plenty of RAM to keep it all in memory.
What I want is to reduce disk writes as much as possible, dumping whole database to disk once an hour would be brilliant solution.
Java <> SQLite connection is done via org.xerial.sqlite-jdbc JDBC driver.
Connection string is like:
"jdbc:sqlite:/disk/persistence.db"

If the data is not critical, just use an in-memory database, and never write it to disk at all.
If the data is more critical and you do want to save it, you can use PRAGMA synchronous = OFF or even PRAGMA journal_mode = OFF to avoid some writes, at the cost of possible data corruption.
If you want to avoid all writes, use an in-memory database, and manually make an on-disk copy with the backup API.

Load existing SQLite database to memory

I have an existing database in a file. I want to load the database in memory; because I'm doing a lot queries and the database isn't very large (<50MB) to fasten those queries. Is there any way to do this?

50 MB easily fits in the OS file cache; you do not need to do anything.
If the file locking results in a noticeable overhead (which is unlikely), consider using the exclusive locking mode.

You could create a RAM drive and have the database use these files instead of your HDD/SSD hosted files. If you have insane performance requirements your could go for a in memory database as well.
Before you do for any in memory solutions: what is "a lot of queries" an what is the expected response time per query? Chances are that the database program isn't the performance bottleneck, but slow application code or inefficient queries / lack of indexes / ... .
I think SQLite does not support concurrent access to the database, which would waste a lot of performance. If write occur rather infrequently, you could boost your performance by keeping copies of the database and have different threads read different SQLite instances (never tried that).

Either of the solutions suggested by CL and Ray will not perform as well as a true in-memory database due to the simple fact of the file system overhead (irrespective of whether the data is cached and/or in a RAM drive; those measure will help, but you can't beat getting the file system out of the way, entirely).
SQLite allows multiple concurrent readers, but any write transaction will block readers until it is complete.
SQLite only allows a single process to use an in-memory database, though that process can have multiple threads.
You can't load (open) a persistent SQLite database as an in-memory database (at least, the last time I looked into it). You'll have to create a second in-memory database and read from the persistent database to load the in-memory database. But if the database is only 50 MB, that shouldn't be an issue. There are 3rd party tools that will then let you save that in-memory SQLite database and subsequently reload it.

MongoDB using much memory

Im tryin to migrate a mysql table to mongodb. My table has 6 million entries. Im using java with morphia. When i save about 1,2 million my memory is almost all consumed.
I've read that mongo store the data in memory and after save in disk. Is it possible to send something like a commit to free some amount of memory?

1) In terms of durability, you can tell the MongoDB java driver (which Morphia is using), which strategy to use, see https://github.com/mongodb/mongo-java-driver/blob/master/src/main/com/mongodb/WriteConcern.java#L53. It's simply a trade-off between speed: NONE (not even connectivity issues will cause an error) up to FSYNC_SAFE (the data is definitely written to disk).
For the internal details check out http://www.kchodorow.com/blog/2012/10/04/how-mongodbs-journaling-works/
2) Your whole data is mapped to memory (that's why the 32bit edition has a size limit of 2GB), however it is only actually loaded, when required. MongoDB leaves that to the operating system by using mmap. So as long as there is more RAM available, MongoDB will happily load all the data it needs into RAM to make queries very quick. If there is no more memory available, it's up to the operating system to swap out old stuff. This has the nice effect that your data will be kept in memory even if you restart the MongoDB process; only if you restart the server itself the data must be fetched from disk again. I think the downside is that the database process might have a slightly better understanding of what should be swapped out first in comparison to the operating system.
I'm not using MongoDB on Windows and haven't seen that message on Mac or Linux (yet), but the operating system should handle that for you (and automatically swap out pieces of information as required). Have you tried setting the driver to JOURNAL_SAFE (should be a good compromise between data security and speed)? In that setting, no data should be lost, even if the MongoDB process dies.
3) In general MongoDB is built to use as much available memory as possible, but you might be able to restrict it with http://captaincodeman.com/2011/02/27/limit-mongodb-memory-use-windows/ - which I haven't tested, as we are using (virtual) Linux servers.

if you just want to release some memory mongodb uses, after your data is processed and mongod is idle, you can run this command
use admin
db.runCommand({closeAllDatabases: 1})
then , you will see the mapped,vsize, res that outputed by mongostat go down a lot.
I have try, and it works. Hope to helps , ^_^

Is there a Java local queue library I can use that keeps memory usage low by dumping to the hard drive?

This maybe not possible but I thought I might just give it a try. I have some work that process some data, it makes 3 decisions with each data it proceses: keep, discard or modify/reprocess(because its unsure to keep/discard). This generates a very large amount of data because the reprocess may break the data into many different parts.
My initial method was to send it to my executionservice that was processing the data but because the number of items to process was large I would run out of memory very quickly. Then I decided to maybe offload the queue off to a messaging server(rabbitmq) which works fine but now I'm bound by network IO. What I like about rabbitmq is it keeps messages in memory up to a certain level and then dumps old messages to the local drive so if I have 8 gigs of memory on my server I can still have a 100 gig message queue.
So my question is, is there any library that has a similar feature in Java? Something that I can use as a nonblocking queue that keeps only X items in queue(either by number of items or size) and writes the rest to the local drive.
note: Right now I'm only asking for this to be used on one server. In the future I might add more servers but because each server is self-generating data I would try to take messages from one queue and push them to another if one server's queue is empty. The library would not need to have network access but I would need to access the queue from another Java process. I know this is a long shot but thought if anyone knew it would be SO.

Not sure if it id the approach you are looking for, but why not using a lightweight database like hsqldb and a persistence layer like hibernate? You can have your messages in memory, then commit to db to save on disk, and later query them, with a convenient SQL query.

Actually, as Cuevas wrote, HSQLDB could be a solution. If you use the "cached table" provided, you can specify the maximum amount of memory used, exceeding data will be sent to the hard drive.

Use the filesystem. It's old-school, yet so many engineers get bitten with libraries because they are lazy. True that HSQLDB provides lots of value-add features, but in the context of being light weight....

Use of sqlite on network share

We are using SQLite (Xerial JDBC driver) on a windows desktop based Java application. Now we are moving on to a client-server version of the same application where multiple Java based Swing clients will be connecting to the same SQLite db file on the designated server Windows PC. Please correct me if I'm wrong:
Is keeping the SQLite database file over network share the only option to use SQLite in this mode? or is there some other solution that I am missing ?
Will using SQLite increase the chances of DB corruption ?
I don't see lot of concurrent update operations. There will be 5-10 clients trying to read & update the same DB. In that case, is it better to use an entperise grade DB (MySQL, Postgres)?

From the FAQ paragraph before the one quoted:
SQLite uses reader/writer locks to control access to the database.
(Under Win95/98/ME which lacks support for reader/writer locks, a
probabilistic simulation is used instead.) But use caution: this
locking mechanism might not work correctly if the database file is
kept on an NFS filesystem. This is because fcntl() file locking is
broken on many NFS implementations. You should avoid putting SQLite
database files on NFS if multiple processes might try to access the
file at the same time. On Windows, Microsoft's documentation says that
locking may not work under FAT filesystems if you are not running the
Share.exe daemon. People who have a lot of experience with Windows
tell me that file locking of network files is very buggy and is not
dependable. If what they say is true, sharing an SQLite database
between two or more Windows machines might cause unexpected problems.
I would not network share a SQLite database file as it appears you will be buying yourself nasty synchronization problems yielding hard to reproduce data corruption.
Put another way, you are using a general file sharing mechanism to substitute for the server capabilities of another DBMS. These other DBMS are tested specifically and field-hardened for multiple client access, though SQLite has great merits, this isn't one of them.

This is a FAQ:
[...] We are aware of no other embedded SQL database engine that
supports as much concurrency as SQLite. SQLite allows multiple
processes to have the database file open at once, and for multiple
processes to read the database at once. When any process wants to
write, it must lock the entire database file for the duration of its
update. But that normally only takes a few milliseconds. Other
processes just wait on the writer to finish then continue about their
business. Other embedded SQL database engines typically only allow a
single process to connect to the database at once. [...]
Also read SQLite is serverless.
Whether SQLite is sufficient for your needs is impossible to tell. If you have long-running update transactions, locking the whole database might be a serious issue. Since you're using JDBC to access it, there shouldn't be many problems switching to another database engine if necessary.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.