Java in-memory database-like object - java

I'm looking at a project that will benefit from an in-memory database-like object. I will have perhaps one or two thousand objects with the same structure, all inheriting from an abstract class. There will be a couple of string fields, an int, perhaps an enum or two (or maybe even a set of enums), and then a set of Strings. There would need to be a transient boolean field as well, but that likely wouldn't be a concern.
These objects would be instantiated and constructed from preset data, although additional ones can be created beyond that if need be. They may be stored in an XML file or something similar. I'd rather not hardwire the entire thing, of course, and using a local database like SQLite feels like overkill.
Storing these objects would be relatively simple if not for one thing: I want the user to easily be able to find the object they want from ANY of the values, most of which would be unique. This rules out a HashMap unless I want to wrap a massive bunch of them, which is hardly ideal. That leaves me looking for a sort of indexed, in-memory database-like object that supports retrieval via any field of the object. It may not have to store the objects directly, but could assemble them upon retrieval, or retrieve a "row" based on one field, get another field from the same "row" that serves as a sort of key, and then retrieve the object from a single HashMap based on that key.
In short, the idea is to easily and quickly retrieve objects with the same fields based on any field they contain. I've seen a variety of different libraries and such that may do this sort of thing, but there's a real myriad of these things out there. Whatever may work would need to be free and compatible with a variety of open licenses.

For an "in memory database-like object" please don't re-invent the wheel. However simple or complex your needs are, using a library that has already been through years (or decades) of debugging and optimization is absolutely the right thing to do if it can be made to suit your needs.
SQLite is a fine choice. It is not Java, however, and requires a separate JDBC driver (no JDBC driver for SQLite is maintained by the SQlite project).
Another RDBMS which is small footprint, reasonably lightweight, and robust is HSQLDB which includes a version 4.1 JDBC driver as part of the project. It is 100% Java. It provides native support for in memory tables. Any time I use the word "database" and "thousands of objects", my thoughts immediately go to HSQLDB.
The project is managed by the HSQLDB development group and is available here: http://www.hsqldb.org

I suggest going with the Caching solutions provided by Guava framework
https://code.google.com/p/guava-libraries/wiki/CachesExplained

A little late but you could use: http://www.mapdb.org
From their website:
MapDB is an embedded database engine for Java. It provides Maps and other collections backed by disk or off-heap memory storage. MapDB is free under Apache License.

Related

XML vs JSON vs SQLite for only reading data

I have a collection of 350 locations in the United States with each containing about 25 subcategories. The data structure looks something like this:
Location (ex: Albany, NY)
--> Things to do
--> Population
... 23 More
Which of the following would be best for loading this data into the app: JSON, XML, or SQLite? Just to clarify, I don't need to edit this data in any way. I simply need to read it so that the information can be loaded into TextView's.
Edit:
I'm attempting to implement Room and XML and so far the XML seems to be the simplest to implement. Is it bad practice to use the XML solution? It doesn't seem to be using too many resources and it isn't running slow at all when tested on a few devices. Would it still be a better practice to implement the Room solution?
Undoubtedly, among all of these RDB is the most efficient one, both in terms of storage and query response. I personally do not see any point in using xml and json as these have been traditionally used for exchange of data and are inefficient for storage and queries.
I would suggest that you evaluate the following:
a) how are you going to store the data: single file vs multiple files(for example by subject)
b) are you going to be doing updates on the strings or just appending(SQL will be better suited for updates but if it just reading data after a batch processing flat files might be better suited)
c) How complex are the queries that you want to implement.XML and SQL are better suited for queries that might try to address metadata (date stored, original location address, etc.) than JSON
Once you determine what you want to optimize: whether it is on adding metadata, fast updates, fast querying, ease of storage, fast retrieval of subject files, etc. then you can decide the tradeoffs with other less important goals. In this specific instance the devil is very much in the details.
In most cases it would be better to use a database because it increases readability and maintainability. Especially if you want to show these information inside a kind of list-view. If you use JSON or XML you'll have to parse or write a lot of code to switch between things or load them with a good performance. Consider the case of using Room, LiveData and a RecyclerView, this will reduce the code you'll need and improve( a lot) performance and readability of your app code. By the way you should provide more information about how you want to use and where you want to show these information. XML (or the Android resource system) should be used if you plan to use the resource system itself with its qualifiers to reduce your work. Most of the time JSON is used to communicate outside or with another app in an easy way or for REST requests/responses.
The one option that wouldn't make sense to use at all for your use case is SQLite. Unless you plan on running specific queries on the data for preprocessing before loading them into your view it doesn't worth the overhead (even if I don't imagine is a lot with 350 locations)
XML vs JSON serve the same usecase without much difference, read up their specifics in this website: https://www.json.org/xml.html
I would personally go for JSON due to the simplicity of the format.
Edit:
#simo-r Argument is also a valid one in regards to readability of your code. While there are libraries that can make reading json/xml easier by default Android has really good SQLite support so it might make sense to use it. Ultimately it is in your personal preference and where you see the project growing.
I have a collection of 350 locations in the United States with each containing about 25 subcategories.
The main issue is scalability
Will you, in the next few years, keep just a few hundred locations, or do you imagine, that, if your software becomes successful, your data would grow to many thousands of locations?
If yes: choose SQLite because it could store many records, in an efficient way. Don't forget to have a good database schema with appropriate indexes. See this and read about database normalization. Also, an SQLite database could later be migrated (with efforts) to PostGreSQL.
If no (your data has just a few megabytes): keep JSON or XML. The data is in the page cache.
Consider also YAML, and sometimes a mixed approach.
don't forget to document how your data is organized and accessed.
See also the data persistence chapter of this draft report
If you gonna simply bind data into text views, you can just store the text as strings.xml. As simple as that.
Go with JSON.
Advantages :
Low overhead ( Vs SQLite )
Lightweight parsers like Jackson available using which you can easily convert your data into custom object or data-structure if you need.
Maintainable. As most of the developers understand the format.
I would suggest using JSON. Reason below
JSON vs XML
JSON is lightweight than XML and would take fewer resources(network and storage). Performance of the app increases.
JSON parsing is easy and as mentioned above, its trivial.
JSON is friendly to javascript, in case it's required.
JSON vs SQLite
350 data set with 23 attributes, can be easily managed by JSON. RDBMS is not required.
SQLite becomes an overhead. It's an extra layer and layer comes with a cost. Especially if the application is containerized, the architecture becomes complicated. One needs to deal with volume mapping etc, in case of JSON you can keep the data as part of the application code.
Importantly, since data is static, keep the application stateless by keeping the data alongside the codebase. This makes lot more sense from architectural perspective.
Problem
You have a fixed set of information with a simple structure that you wish to deliver to clients.
Questions to Reflect On
Do I expect this information to significantly changed or modified ever?
Do I expect to increase the amount of information available?
What kind of help do I have? Do they have a background in software engineering or is it someone of a different profession that has to wear a lot of hats?
What is the scale of the project? Are you expecting a large amount of users or just people interested in a very niche application?
JSON or XML
JSON and XML provide similar services: they are both data transfer protocols. If the information is not expected to grow both might be a great option. If its public information, just serve these files statically over nginx. You can point a worker with limited software engineering experience to update these files; they're just files in a folder presented in a human readable format... its extremely simple to do. These updates should be minor and infrequent.
JavaScript Object Notation(JSON) Pros
solid browser and backend support
small size and fast parsing by the javascript engine
very human readable, easy for the untrained eye to make changes
Extensible Markup Language(XML) Pros
standard meta-data option
supports namespaces
solid backend support and is often baked into frameworks
This article explains XML and JSON differences really well (in 2020) if these highlights were not sufficient for your investigation.
Database System
There are a plethora of database systems out there. Their job is to efficiently retrieve specific information from a large volume of data stored. The key reason to use databases is scalability. Scalability means a number of things; I view it as adapting to drastic change. If you expect this information to frequently change or grow, go with a database.
Object Relational Mapping (ORM)
Databases can be cumbersome to use. I would recommend using an ORM on top of them. These encapsulate a database and makes it more user friendly (language specific). Room makes sense in your use case especially for java android development. Encapsulation also allows you to migrate to other databases later without change your code. Here's a good article that discusses Room and SQLite!
Miscellaneous
"Is it bad practice to use an XML solution?"
No. The important thing is that it works, is understandable, and runs efficiently. Just keep in mind that XML and JSON are data transfer protocols and they do THAT job well. This stackoverflow discussion may be helpful to gain a better picture of what that means; be sure to read more than just the accepted answer.
"It doesn't seem to be using too many resources and it isn't running slow at all when tested on a few devices."
Although testing for functionality is great, keep in mind that your test is not a load test and does not verify what you're trying to confirm. I would explore load testing, Wikipedia is a good place to start!

Can a streaming collection be implemented in Java?

I needed to implement a utility server that tracks few custom variables that will be sent from any other server. To track the variables, a key value collection, either JDK defined or custom needs to be used.
Here are few considerations -
Keeping all the variables in memory of the server all the time is memory intensive.
This server needs to be a very lightweight server and I do not want heavy database operations.
Is there a pre-defined streaming collection which can serialize the data after a threshold memory and retrieve it on need basis?
I hope I am clear in defining the problem statement.
Please suggest if any other better approach.
this thing looks very promising, but is in development stage...
JDBM3
Edit Current version of the file backed collections: MapDB.
Database
What you've described sounds exactly like you should use a database (i.e. indexed key/value store, too big for memory but want performance benefits of in-memory caching where possible).
I'd recommend a lightweight embedded database such as H2 - it's small, fast and should suit your purposes very well.
Have you thought of using an on the shelf nosql queue value store? Redis for example?
If you want it java only you have the option of using a lib like ehcache, it would have the functionalities you need.

C#/Java Best practice to save data locally

Lets say I want to make a program on c#.net for a video club to store clients and theirs rents.
What is the best and modern way (and standalone way) to store this data? xml, binary seriliazation, sqlite, access?
I will also need to query often the data. For example a client come, I search him via name, I find him and I add him a new rent. Also (now or on future) there will be data for dvds too (add dvds and on short there will relation between dvds and clients). Its like a database but because database is not standalone I want my program works only if user have on his pc .net installed. I could use mysql but this needs mysql server installed...
What you think is the best and most modern solution?
various sql's are good enuff: sqlite, sql ce, sql express.
Access may be the easiest starting solution as you can visualise the data without much trouble.
if it is not critical data, simply store it in an xml or json file using xml to object mapping like JAXB or xsd.exe or JSON to object mapping like GSON.
If it is critical best is to go with DB.
Since it's a small solution use either ms sql express or mysql. Both should be free (as in free beer).
Sql express is best for .net, mysql for java.
You can use an embedded database. If you don't like SQL databases (some can be embedded) you can try one of the many free "NoSQL" database http://nosql-database.org/ has a list of 122.
I think the best solution is the simplest which works. I don't think the most modern is necessarily going to be the best. For example, I'd be looking at the number of members. Is this number as big as from a large movie store chain? Or is for a small group of people? If only a small number, is it possible for you to use Excel or similar?
Are you choosing C# because you want to learn it or are you driven by the end result. i.e. a working and useful asset tracking system? (Which many already exist of course)
Other than that, many databases can be stored in a single file and are quite easy to create. I think there is an flat file driver for SQLite, and JDBC certainly has one.
You may also wish to consider the need to inform your members of your privacy policy with dealing with their personal information (eg. How do you keep private Mr Bloggs's dirty movie rental collection). We often forget this stuff in our eagerness for creating a cool application!
Good luck!

java embedded database w/ ability to store as one file

I need to create a storage file format for some simple data in a tabular format, was trying to use HDF5 but have just about given up due to some issues, and I'd like to reexamine the use of embedded databases to see if they are fast enough for my application.
Is there a reputable embedded Java database out there that has the option to store data in one file? The only one I'm aware of is SQLite (Java bindings available). I tried H2 and HSQLDB but out of the box they seem to create several files, and it is highly desirable for me to have a database in one file.
edit: reasonably fast performance is important. Object storage is not; for performance concerns I only need to store integers and BLOBs. (+ some strings but nothing performance critical)
edit 2: storage data efficiency is important for larger datasets, so XML is out.
Nitrite Database http://www.dizitart.org/nitrite-database.html
NOsql Object (NO2 a.k.a Nitrite) database is an open source nosql
embedded document store written in Java with MongoDB like API. It
supports both in-memory and single file based persistent store.
H2 uses only one file, if you use the latest H2 build with the PAGE_STORE option. It's a new feature, so it might not be solid.
If you only need read access then H2 is able to read the database files from a zip file.
Likewise if you don't need persistence it's possible to have an in-memory only version of H2.
If you need both read/write access and persistence, then you may be out of luck with standard SQL-type databases, as these pretty much all uniformly maintain the index and data files separately.
Once i used an object database that saved its data to a file. It has a Java and a .NET interface. You might want to check it out. It's called db4o.
Chronicle Map is an embedded pure Java database.
It stores data in one file, i. e.
ChronicleMap<Integer, String> map = ChronicleMap
.of(Integer.class, String.class)
.averageValue("my-value")
.entries(10_000)
.createPersistedTo(databaseFile);
Chronicle Map is mature (no severe storage bugs reported for months now, while it's in active use).
Idependent benchmarks show that Chronicle Map is the fastest and the most memory efficient key-value store for Java.
The major disadvantage for your use case is that Chronicle Map supports only a simple key-value model, however more complex solution could be build on top of it.
Disclaimer: I'm the developer of Chronicle Map.
If you are looking for a small and fast database to maybe ship with another program I would check Apache Derby I don't know how you would define embedded-database but I used this in some projects as a debugging database that can be checked in with the source and is available on every developer machine instantaneous.
This isn't an SQL engine, but If you use Prevayler with XStream, you can easily create a single XML file with all your data. (Prevayler calls it a snapshot file.)
Although it isn't SQL-based, and so requires a little elbow grease, its self-contained nature makes development (and especially good testing) much easier. Plus, it's incredibly fast and reliable.
You may want to check out jdbm - we use it on several projects, and it is quite fast. It does use 2 files (a database file and a log file) if you are using it for ACID type apps, but you can drop directly to direct database access (no log file) if you don't need solid ACID.
JDBM will easily support integers and blobs (anything you want), and is quite fast. It isn't really designed for concurrency, so you have to manage the locking yourself if you have multiple threads, but if you are looking for a simple, solid embedded database, it's a good option.
Since you mentioned sqlite, I assume that you don't mind a native db (as long as good java bindings are available). Firebird works well with java, and does single file storage by default.
Both H2 and HSQLDB would be excellent choices, if you didn't have the single file requirement.
I think for now I'm just going to continue to use HDF5 for the persistent data storage, in conjunction with H2 or some other database for in-memory indexing. I can't get SQLite to use BLOBs with the Java driver I have, and I can't get embedded Firebird up and running, and I don't trust H2 with PAGE_STORE yet.

Highest Performance Database in Java

I need ideas to implement a (really) high performance in-memory Database/Storage Mechanism in Java. In the range of storing 20,000+ java objects, updated every 5 or so seconds.
Some options I am open to:
Pure JDBC/database combination
JDO
JPA/ORM/database combination
An Object Database
Other Storage Mechanisms
What is my best option? What are your experiences?
EDIT: I also need like to be able to Query these objects
You could try something like Prevayler (basically an in-memory cache that handles serialization and backup for you so data persists and is transactionally safe). There are other similar projects.
I've used it for a large project, it's safe and extremely fast.
If it's the same set of 20,000 objects, or at least not 20,000 new objects every 5 seconds but lots of changes, you might be better off cacheing the changes and periodically writing the changes in batch mode (jdbc batch updates are much faster than individual row updates). Depends on whether you need each write to be transactionally wrapped, and whether you'll need a record of the change logs or just aggregate changes.
Edit: as other posts have mentioned Prevayler I thought I'd leave a note on what it does:
Basically you create a searchable/serializable object (typically a Map of some sort) which is wrapped in a Prevayler instance, which is serialized to disk. Rather than making changes directly to your map, you make changes by sending your Prevayler instance a serializable record of your change (just an object that contains the change instruction). Prevayler's version of a transaction is to write your serialization changes to disk so that in the event of failure it can load the last complete backup and then replay the changes against that. It's safe, although you do have to have enough memory to load all of your data, and it's a fairly old API, so no generic interfaces, unfortunately. But definitely stable and works as advertised.
I highly recommend H2. This is a kind of "second generation" version of HSQLDB done by one of the original authors. H2 allows us to unit-test our DAO layer without requiring an actual PostgreSQL database, which is awesome.
There is an active net group and mailing list, and the author Thomas Mueller is very responsive to queries (hah, little pun there.)
I don't know if it is the fastest option, but I've been very satisfied with H2 whenever I've used it. It's written by the same person who originally wrote Hypersonic (which later became HSQLDB).
Another option that is allegedly very fast is Prevayler.
It is a bit of an old question, but these days there is a whole lot of databases that have a level of performance of 20,000/s. Which database to chose depends on data structure and type of queries you'd like to be making. It also depends on overall volume.
We had similar problem with large volume of time series data, about 300,000 rec/s and we ended up writing a new database, with simple enough API and decent performance. It can do about 2,000,000 object writes/s and we did away without ORM.
It later evolved into QuestDB.
Try the following, it performs really well with Hibernate and other ORM frameworks
http://hsqldb.org/
Chronicle Map is an embeddable pure Java persistent database, providing a simple java.util.Map interface. It withstands about 1 million queries/updates per second from a single thread, consistent read/write performance and scales almost linearly to the number of cores in the machine.
Here are some recent performance research with actual numbers:
Comparison of Jetbrains Xodus, Oracle Berkeley DB JE BTree, MapDB TreeMap, Chronicle Map and H2 MVStore Map
LmdbJava Benchmarks
I would give a try to OrientDB.
Terracotta might also be an answer for you. It allows multiple VMs to share objects so you can distribute load etc...
You can also check out db4o
If you want to store all of your data in memory, you might want to look at Prevayler.
I've never used it myself, but it seems like a much better solution than using a relational database for those cases in which all of your data can be stored in memory.
Berkeley DB for Java is a fast in memory database, extremely useful for simple object graphs.
hsqldb is quite fast, but it is not ACID transaction-safe. The fastest java-database I know is db4o: benchmarks.
Edit: Please notice that Prevayler is not a database, see http://www.prevayler.org/wiki.jsp?topic=PrevaylerIsNotADatabase. If you're out of RAM, you're out of luck.
H2 is truly fantastic, indeed, in memory, normal server and transactional, you have it all. However It doesn't compare in performance to the object databases, I see Db4o mentioned, I have had much better performance with Neodatis in fact, and everything nicely set up in Maven repositories. Although not very robust, like a Ferrari, fast but not a truck like Oracle.
You can try CSQL (available under open source and enterprise version) It provides 30X performance improvement over disk based database systems and provides JDBC interface. It can be configured to work as stand alone main memory database or as a transparent cache to MySQL, Postgres, Oracle databases.

Categories

Resources