database connection: protocoling data for hundereds to million objects

database connection: protocoling data for hundereds to million objects - java

I am running economic simulations. hundreds or thousands of agents = objects have to protocol data over time, the data has always the same structure. It typically is composed from a number of Booleans, floats and integers, possibly arrays/list (between 5 and 100 different variables). During the simulation the database has no read access. After the simulation the data will not be changed anymore. For every simulation I will create a new database. The current programming-languages are java for one project and python for a second. Its also possible that in the future the project is run on a network. If it matters: the objects communicate via 0mq. We are using mySql and sqllight.
How do I connect the thousands of objects to the database. The end result should be all be in one database.
/currently we send the data via zeromq messaging to one object that writes into the database.

If you use python (and you're okay with storing the data in non-sql format), I would recommend the object database called ZODB (by zope). Essentially, you'll be creating a dictionary (which can contain literally any data type you want). I use this for my own research and it has been great. There's also a fairly reasonable learning curve, meaning it won't take you more than a few days to really master it.
Since you mention that during your simulation the database "has no read access" you can use ZODB as a standalone tool. However, if you find yourself running multiple simulations in parallel (say multithreading or cloud computing), you will have to look into ZEO (also by zope) in order to make this a solution.
Each one of your simulations need not be it's own database with ZODB, it can be a separate key of a single "simulation" database, where the data for each run will be neatly nested underneath each key. You'll be able to say something like: print Simulation['run99']['output'] just as easily as you would with say the 98th run of your simulation.

Related

Java : relational database vs static variable

I have a web application in which I'm maintaining many static Maps to store my relevant information. Since the application is deployed on a server. Each and every hit to the server side java uses these maps to match the key and get appropriate result and send back to the client side. My code contains a rank and retrieval feature so I have to read the entire keySet of each of these Maps.
My question is:
1. Is working with static variables better than storing this data in a local embedded DB like Apache Derby and then using it?
2. The use of this data is very frequent. So if I use database will that be faster approach? Since I read the full keyset the where clause may not come handy in many operations.
3. How does the server's memory gets impacted on holding data in static variables?
My no. of maps are fixed but the size of the Maps keeps increasing? Please suggest the better solution.

If you want the data to be saved regularly an embedded database like H2 makes sense. You then also have snapshots of the data, and development, structural changes are a bit more safe.
A real database also has an incredible power behind it: concurrency, caching and so on. An embedded (when file based) database less so.
The problem with maps is that the data extraction can become several indirections. It is more versatile to have SQL queries with joins on the tables.
So SQL is more abstract (does not prescribe the actual query implementation), and easier to test. SQL for instance releases the developer of programming reports.
So go for a database IMHO, when you are really doing hard work.

What you might want to consider is to store the data searched in map when it's searched.
For instance, if a user searches for something specific, that something is stored in the map so that the next user who searches for that gets the data directly from the map rather than the database.
There are some downsides though, as you need to make sure that if the data is changed on the database, the hashmap/cache should be cleared or updated with the new data, as to prevent feeding outdated data to the user.
As for the impact on the server's memory, it depends on the size of the data you're storing. It's hard to give you a precise answer, but you can however test that on your own:
long memoryBefore = Runtime.getRuntime().freeMemory();
// populate your map
long memoryAfter = Runtime.getRuntime().freeMemory();
System.out.println(memoryBefore - memoryAfter);
That should give you the amount of bytes used (more or less, depending on the operations you run between memoryBefore and memoryAfter, as you may have instantiated other classes/variables unrelated to the hashmap)

database or ObjectOutputStream, Object specific member or actual object for reference

I'm working on an application for a pharmacy , basically this application has a class "item" and another class "selling invoices" which logs selling processes .
So my question here if the pharmacy is expected to have about ten thousand products in stock, and I'm storing these products in a linked list of type Item, and storing the invoices in linked list also , then on closing the app i save them using object output stream and reload them upon the start, Is it a bad practice ? Have I to use database instead?
My second question is, if i continue on using linkedlist and object output stream , what is better for performance and memory, storing the actual item as a field member in the invoice class or just its ID and then getting the item upon recalling using this ID reference, so what's better ?
Thanks in advance .

It is a bad idea to use ObjectOutputStream like that.
Here are some of the reasons:
If your application crashes (or the power fails) before you "save", then all changes are lost.
Saving "all objects" is expensive.
Serialized objects are opaque. It is only practical to look at them from Java code.
Serialized objects are fragile. If your application classes change, you may find that old serialized objects can no longer be read. That's bad enough, but now consider what happens if your client wants to look at pharmacy records from 5 years ago ... from a backup tape.
Serialized objects provide no way of searching ... apart from reading all of the objects one at a time.
Designs which involve reading all objects into memory do not scale. You are liable to run out of memory. Or compromise on your requirements to avoid running out of memory.
By contrast:
A database won't lose any changes have been committed. They are much more resilient to things like application errors and system level failures.
Committing database changes is not as expensive, because you only write data that has changed.
Typical databases can be viewed, queried, and if necessary repaired using an off-the-shelf database tool.
Changing Java code doesn't break the database. And for some schema changes, there are ways to migrate the database schema and records to match an updated database.
Databases have indexes and query languages for implementing efficient search.
Databases scale because the primary copy of the data is on disk, not in memory.

Android: Best way to store large amount of sensor datas over long time

I'm fairly new to Android-Development and I got a general question about How-To:
My App gets Sensor-Data from Step-Detector (Detected steps gets added up).
Now I need to store those Steps (which will be a lot of Data).
The steps should be stored like this:
If Todays
steps are stored on per Hour basis.
Else
steps are stored on per Day basis
SharedPreferences falls out of this as it only stores KeyValues.
But can SQLite handle this? Or is there any other way?
A future feature could be to sync those data with a Server.
I mean this could end up in thousands of Entries, and the app will also support other large data sets which need to get stored in similar way.

Try using Realm noSql database for it. The point is, you can save entire database on sd card as separate file for each day and process it later. It is native and work very fast with large amount of data. You can process all your readings later on - open database, transform readings (perhaps interpolate values for older to shring data in size) and then upload it to the cloud and delete database file.
But, anyways, a database is just implementation details, consider abstracting out all your operations so you can replace db later on.
As far as I know, sqLite stores all tables in a single file, so you will need column for a date and all records will be stored in single table. Realm is more flexible for this task.

SQL Lite can be used , it will be there as long as your application exist in the device, however if you want you can use Cloud Service, Azure provides simple and easy to use App Service , which have easy tables , in which you can directly call the APIs and internally it takes care of making connection and inserting the data into table.You can use Free Tier of App Service to test the concept.

How to Store Data To Show at Chart using Java?

I have a Spring based Java application. I have two types of data.
First one is indexed document number at my application. Documents are indexed only 2 or 3 times a week.
Second one is number of searches. Many users searches something at my application. I want to visualize the search terms. Many data flows at any time.
What do you suggest me to store such kind of data using Java?
For first one I think that I can use RRD or something like that or I can even write data into a table at MySQL etc.
For second one I can use a more sophisticated database and I can use an in memory database as like H2 between my sophisticated database and user interface.
Any ideas?

Have you considered using Redis? It has great support for atomic increments if you wanted to track search counts and its also very fast since data is stored in-memory.

New to SQL - Organization and Optimization of Queries

For a thick-client project I'm working on, I have to remotely connect to a database (IBM i-series) and perfom a number of SQL related tasks:
Download/Update a set of local/offline 'control' data - this data may have changed between runs unnoticed.
On command, download data from multiple (15-20) tables and store separately into a single Java object. The names of the tables are known, but the schema name changes between runs and can change inter-run (as far as I know, PreparedStatements do not allow one to dynamically insert the schema).
I had considered using joins/unions/etc to perform all of these queries as one, but the project requires me to have in-memory separations between table data (instead of one big joined lump).
Perform between 2 and 100+ repetitions of (2)
The last factor is that this needs to be run on high-latency (potentially dial-up) network connections using Java 1.5 on the oldest computers possible.
Currently I run 15-20 dynamically constructed PreparedStatements but I know this to be rather inefficient (I measured, so as to avoid premature optimization ala Knuth).
What would be the most efficient and error-tolerant method of performing these tasks?
My thoughts:
Regarding (1), I really have no idea other than checking the entire table against the new table, at which point I feel I might as well just download the new (potentially and likely unchanged) table and replace the old one, but this takes more time.
For (2): Ideally I'd be able to construct something similar to an array of SELECT statements, send them all at once, and have the database return one ResultSet per internal query. From what I understand, however, neither Statement nor PreparedStatement support returning multiple ResultSet objects.
Lastly, the best way I can think of doing (3) is to batch a number of (2) operations.

There is nothing special about having moving requirements, but the single most important thing to use when talking to most databases is having a connection pool in your Java application and use it properly.
This also applies here. The IBM i DB2/400 database is quite fast, and the database driver available in the jt400 project (type 4, no native code) is quite good, so you can pull over quite a bit of data in a short while simply by generating SQL on the fly.
Note that if you only have a single schema you can tell in the conneciton which one you need, and can then use non-qualified table names in your SQL statements. Read the JDBC properties in the InfoCenter very carefully - it is a bit tricky to get right. If you need multiple schemaes, the "naming=system" allows for library lists - i.e. a list of schemaes to look for the tables, which can be very useful when done correctly. The IBM i folks can help you here.
That said, if the connection is the limiting factor, you might have a very strong case for running the "create object from tables" Java code directly on the IBM i. You should already now prepare for being able to measure the traffic to the database - either with network monitoring tooling, using p6spy or simply going through a proxy (perhaps even a throtteling one)

Ideally, you would have the database group provide you with a set of stored procedures to optimize the access to the database.
Since you don't have access, you may want to ask them if they have timestamp data in the database at the row level to see when records were modified, this way you can select only the data that's changed since some point in time.
What #ThorbjørnRavnAndersen is suggesting is moving the database code on to the IBM host and connecting to it via RMI or JMS from the client. So the server code would be a RMI or JMS Server that accesses the database on your behalf and returns you java objects instead of bringing SQL resultsets across the wire.
I would pass along your requirements to the database team and see if they can't do something for you. I'm sure they don't want all these remote clients bringing all the data down each time, so it would benefit them as much as it would benefit you.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.