Dynamic data storage with search and mapping for java project

Dynamic data storage with search and mapping for java project - java

I'm looking for good dynamic data storage mainly based on JAVA, or to have ability to be really easy used by JAVA.
Main problem in my project is in fast that our data structures will not remain stable, even more structures will be changed from time to time, so basically casual Relation Data Base will lost fight on that level, because drop and add new columns are pretty risky. Which means that some NOSQL or XML based or even file based storage will be usable there.
All inputs are coming from other resource which could be a SOAP callback, JASON call back, import from CSV file or manual input, based on that have to create entities and than to fill it with data.
Last thing on which I have to keep eye on is to bring unstructured, semi-structured and differently structured data in a unified form. Beside of this would be nice to have ability of maintenance of huge amount of data in accepted time duration.
Any ideas?

HyperSQL is open source and free.

Related

How to expand code/description to a complex object?

I want to present a list of the names/basic attributes of some complex objects (i.e. they are comprised of multiple collections of other objects) in a recycler view, then get the full object on user selection. For example, the top level objects are "Play Scripts", and each contains a number of "Spoken Lines" spoken by one of the "Actors" associated with the Play Script.
I'm trying to use the Android Architecture components to do this and have (using Florian # codinginflow.com 's tutorials) successfully used Room to create a simplified Play_Script class, DAO and Repository. I've also created some basic REST web services in ASP.Net which can serve up data from a MySQL db.
It strikes me that the path that I am going down will perform poorly and use excessive network bandwidth getting lots of data that I won't use. I'm getting every Play Script (including its Spoken Lines etc) just so that I have the Play Script "Name" and "Description" attributes to populate the Recycler.
In the olden days, I'd just "SELECT ID, Name, Description FROM Play_Script" and once the user had made their choice, I'd use the ID as the key to get everything else that I needed. I suspect that I'm missing something fundamental in the design of my data entities but can't come up with any keywords that would let me search for examples of this common sort of task being done well (/at all).
Please can you help this SO noob with his 1st question?
Cheers,
Z
Update 15 May:
Though I haven't had a response, from what I've been reading in recent weeks (e.g. re Dependency Injection) I suspect that there is no blanket approach for this sort of thing in Android development. It appears that people generally either retrieve extensive data and then use what they require or else build multiple Web Service APIs to return sparse data that includes keys that the client can use to expand when required. So, for example you might make both a "plays_light" and a "plays_detail" Get API.

My solution has been exactly as my May update - i.e. to extend the web API and offer a number of similar calls that return varying granularities of information. It's not particularly elegant and I suspect there may be better ways but it works. In general, I'm finding that the user tends to need less detail in the parent entities and more as we get to individual children/grandchildren.
I do now realise why some apps are so slow though: It's easy to be lazy in the web service design and just return loads of data - only a fragment of which will be used by the client - and justify this by convincing yourself that single API will be universally applicable and thus easier for whoever picks up my code down the line to understand.
Again, it could be my inexperience but I find the local caching of relational data on the Android side retrieved through the API calls quite clunky - lots of storing foreign keys and then re-parsing json to get data into the SQLite tables. I'd hoped Dagger would have been more useful in simplifying this than it has turned out to be so far. I actually unravelled a whole load of Dagger-related code just to preserve my sanity. Not sure I was entirely successful!
Better answers are still very much welcome.
Z

I want to preserve my data during service restart, but my data is not in simple variable name-value or table format. How should I go about this?

I want to preserve data during service restart, which uses a arraylist of {arraylist of integers} and some other variables.
Since it is about 40-60 MB, I don't want it be generated each time the service restarts(it takes a lot of time); I want to generate data once, and maybe copy it for next service restart.
How can it be done?
Please consider how will I go about putting a data structure similar to multidimensional array(3d or above) into file, before suggesting writing the data in a file; which when done, will likely take significant time to read too.

You can try writing your data after generation to a file. Then on next service restart, you can simply read that from the file.

If you need persistent data, then put it into database
https://developer.android.com/guide/topics/data/data-storage
or try some object database like http://objectbox.io/

So you're afraid reading from the file would take along time due to its size, the number and size of the rows (the inner arrays).
I think it might be worthy to stop for a minute and ask yourself whether you need all this data at once. Maybe you only need a portion of it at any given time and there are scenarios in which you don't use some (or maybe most) of the data? If this is likely, I would suggest that you'll compute the data on demand, when required, and only keep a memory based cache for future demand in the current session.
Otherwise, if you do need all the data at a given time, you have a trade-off here. Trade-off between size on disk and processing time. You can shrink the data using some algorithm, but it would be at the expense of the processing time. On the hand, you can just serialize your object of data and save it to disk as is. Less time, more disk space.
Another solution for your scenario, could be, to just use a DB and a cursor (room on top sqlite). I don't exactly know what it is that you're trying to do, but your arrays can easily be modeled into a DB. Model a single row as you'd like and add to that model the outer index of the array. Then save the models into the DB, potentially making the outer index field the primary key if the DB.
Regardless of the things I wrote, try to think if you really need this data persistent on your client, maybe you can store it at the server side? If so, there are other storage and access solutions which are not included at the Android client side.

Thank you all for answering this question.
This is what I have finally settled for:
Instead of using the structure as part of the app, I made this into a
tool, which will prepare data to be used with the main app. In doing
so, it also stopped the concern regarding service restart.
This tool will first read all the strings from input file(s).
Then put all of them into the structure one at a time.(This will be
the part which I was having doubts, and asked the question about.
Since all the data is into the structure here, as soon as program
terminates, this structured data is unusable.)
Now, I prepared another structure for putting this data into file,
and put all this data into file so that I do not need to read to all
input file again and again, but only few lines.
Then I thought, why spend time "read"ing files while I can hard code
it into my app. So, as final step of this preprocessing tool, I made
it into a class which has switch(input){case X: return Y}.
Now I will just have to put this class into the app I wanted to make.
I know this all sounds very abstract, even stretching the concept of abstract, if you want to know details, please let me know. I am also including link of my "tool". Please visit and let me know if there would have been some better way.
P.S. There could be errors in this tool yet, which if you find, let me know to fix them.
P.P.S.
link: Kompressor Tool

What is a good framework to implement data transformation rules through UI

Let me describe the problem. A lot of suppliers send us data files in various formats (with various headers). We do not have any control on the data format (what columns the suppliers send us). Then this data needs to be converted to our standard transactions (this standard is constant and defined by us).
The challenge here is that we do not have any control on what columns suppliers send us in their files. The destination standard is constant. Now I have been asked to develop a framework through which the end users can define their own data transformation rules through UI. (say field A in destination transaction is equal to columnX+columnY or first 3 characters of columnZ from input file). There will be many such data transformation rules.
The goal is that the users should be able to add all these supplier files (and convert all their data to my company data from front end UI with minimum code change). Please suggest me some frameworks for this (preferably java based).

Worked in a similar field before. Not sure if I would trust customers/suppliers to use such a tool correctly and design 100% bulletproof transformations. Mapping columns is one thing, but how about formatting problems in dates, monetary values and the likes? You'd probably need to manually check their creations anyway or you'll end up with some really nasty data consistency issues. Errors caused by faulty data transformation are little beasts hiding in the dark and jumping at you when you need them the least.
If all you need is a relatively simple, graphical way to design data conversations, check out something like Talend Open Studio (just google it). It calls itself an ETL tool, but we used for all kinds of stuff.

What database to use?

I'm new to databases, but I think I finally have a situation where flat files won't work.
I'm writing a program to analyze the outcomes of multiplayer games, where each game could have any number of players grouped into any number of teams. I want to allow players can win, tie, or leave partway through the game (and win/lose based on team performance).
I also might want to store historical player ratings (unless it's faster to just recompute that from their game history), so I don't know if that means storing each player's rating alongside each game played, or having a separate table for each player, or what.

I don't see any criteria that impacts database choice, but I'll list the free ones:
PostgreSQL
MySQL
SQL Server Express
Oracle Express
I don't recommend an embedded database like SQLite, because embedded databases make trade-offs in features to accommodate space & size concerns. I don't agree with their belief that data typing should be relaxed - it's lead to numerous questions on SO about about to deal with date/time filtration, among others...
You'll want to learn about normalization, getting data to Third Normal Form (3NF) because it enforces referential integrity, which also minimizes data redundancy. For example, your player stats would not be stored in the database - they'd be calculated at the time of the request based on the data onhand.

You didn't mention any need for locking mechanisms where multiple users may be competing to write the same data to the same resource (a database record or file in the case of flat files) simultaneously. What I would suggest is get a good book on database design and try to understand normalization rules in depth. Distributing data across separate tables have a performance impact, but they also have an effect on the ease-of-use of query construction. This is a very involving topic, and there's no simple answer to it. That's why companies hire database administrators to keep their data structures optimized.
You might want to look at SQLite, if you need a lightweight database engine.

Some good options were mentioned already, but I really think that on Java platform, H2 is a very good choice. It is perfect for testing (in-memory test database), but works very well also for embedded use cases and as stand-alone "real database". Plus it is easy to export as dump file, import from that, to move around. And works efficiently too.
It is developed by a very good Java DB guy, and is not his first take, and you can see this from maturity of the project. On top of this it is still being actively developed as well as supported.

A word on why nobody even mentions any of the "NoSQL" databases while you have used it as a tag:
Non-SQL databases are getting a lot of attention (or even outright hype) recently, because of some high-profile usecases, because they're new (and therefore interesting), and because their promise of incredible scalability (which is "sexy" to programmers). However, only a very few very big players actually need that kind of scalability - and you certainly don't.
Another factor is that SQL databases require you to define your DB schema (the structure of tables and columns) beforehand, and changing it is somewhat problematic (especially if you already have a very large database). Non-SQL databases are more flexible in that regard, but you pay for it with more complex code (e.g. after you introduce a new field, your code needs to be able to deal with elements where it's not yet present). It doesn't sound like you need this kind of flexibility either.

Try also OrientDB. It's free (Apache 2 license), run everywhere, supports SQL and it's really fast. Can insert 1,000,000 of records in 6 seconds on common hw.

Java Application to show a lot of charts and stats, storing the data?

I'm working on a Java application, one of its functions is to show detailed information in graph form with the odd statistic and "top 10" list here and there.
The data is being generated live by the application, consider it an internet "honeypot", data is the result of external attacks, the graphs will need to be of varying forms such as
Overall Statistics (Charts showing frequency of attacks per minute/hour/day, No. of attacks today, No. of attack-type attacks, Top 10 attackers)
Per Sensor (Charts showing frequency of attacks per minute/hour/day, Sensor 1 attacks today,No. of attack-type attacks, Top 10 attackers)
Per Attack-Type (Pie Chart)
The information for each attack type can vary quite a bit and there will be other information some have and some don't (e.g. a DoS will have an attacker-address whereas a Remote Exploit to upload a file will have attacker-address and file-name).
Initially I approached this by creating Classes, there is a DoS data structure within which all the details of that attack can be stored and these are store inside a vector, but this ended up becoming a serious headache very fast.
The obvious solution to me is to create a database (MySQL?) with a table for each attack type, from this, gaining all the 1., 2. and 3. information is merely an SQL query away.
However, I can't help but feel that my database solution is a tad nasy and that I'm missing something here, so after hitting my head against the problem I'm asking here.
Any pointers greatly appreciated!

I'd lean towards building the entire concept of 'attack' out as a class composed of all of the potential objects and fields necessary to describe any type of attack. You could specify interfaces as necessary to specify the contract of each particular attack type (for factory creation, etc) but then persist the entire object to a database with a schema pretty much identical to your implementation class structure. This should probably give you a pretty good ability to do the reporting that you want and I think implementation would be reasonably straightforward.
Without knowing just how large your attack tree is, it's a little difficult to be sure my approach is correct, but maybe this will be useful.

Not sure but what you're describing looks like an OLAP cube so maybe consider using a star schema or a snowflake schema and have a look at something like Pentaho:
A complete Business Intelligence platform that includes reporting, analysis (OLAP), dashboards, data mining and data integration (ETL).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.