Application Design Document DB

Application Design Document DB - java

I must develop an application that uses a un DB, and that is structured in this mode:
there are metadata tables(Document Type, Document Attributes...)
starting from Metadata Tables are created/modified (also during the normal application- lifecycle at runtime) metadata tables.
IE: If I create a new Document Type, inserting a new record in the METADATA table named "Document Type" (Contract, Invoice, Note..), the new relative DATA table is created into the DB. This table has as columns the attributes that I have definited in the other METADATA table (Document Attributes).
I'd like to use an ORM in order to map the METADATA Tables because these not change the structure at runtime.
Is it possible to map also the DATA Tables? I'd like to work with POJO also for the Data tables.
I think that is not possible to create classes at runtime (with reflection is possible, but is the best choiche?) and modyfy theyr structure.
This is probably the logic/problem used from CMS and CRM.
Have you some suggests
Have you any suggestions on how to structure my application, especially for the ORM / DB?
Thank you

JPA and eclipseLink would be my choice maybe.
You are a bit in the following position: a web application developed with the business logic online editable by the organisation (CMS), say with a rules engine. Very generic, sounds nice, reduces development costs. But: changes can now not be tested, cannot be put in version control, are immediately live. No debugging, no compiler intelligence and error messages, no cleanup (like code cleanup). In fact such a system is quite a quality risk and a drain on maintenance.
In your case, you need very strong tools: version control, separate staging server, controllable logic, reporting (automatic documentation of the model and interactions).
JPA can in principle create the database tables itself from the entities (or the other way round). So that is one layer. It maintains a meta model.
Originating from your meta model you indeed would need to do something clever.
Easiest, most achievable, would be to stay just with your meta model and define versioning etcetera. Manual code would the use generic data like Map's. You could store the code in the database too, versioning it, and use the Java Scripting API. But you would need to inform the extend to the management, and be convinced of the feasibility - create a prototype.
I definitely hope someone knows a better solution.
P.S.
Maybe a NoSQL database would be something, it looks a bit like a document DB.

It's difficult to offer good advice with such little information. However, you included the "JCR" tag and it sounds like your use case is similar to a content or document management system, so you should definitely consider JCR. See "When to use JCR over other options?" for a more general response, but I'll try to describe its benefits for your particular use case. (Always choose the best tool for the job, though.)
Perhaps the best feature of JCR for you is the ability to have a flexible schema that allows you to add (or change) the metadata for different documents and over time. You can either restrict the properties to ensure your apps only add accepted information, or you can choose to be a little looser and keep your options open to add additional metadata as needed. You can even break your metadata into "characteristics" that can be added to individual nodes via mixins. In short, JCR gives you a ton of flexibility in designing your data structure, while giving you several knobs to control how much enforcement/flexibility you use.
Using a JCR repository will also let you keep that metadata with the document (if you wanted; you can always separate them). And, this may give your applications great performance if they use access patterns that read a document and then get the metadata. Navigation is often much faster than querying.
Thirdly, many JCR repositories even support querying repository content (by mapping node types into relational table-like structures). JCR 2.0 has several languages, including XPath and a SQL-like language called "JCR-SQL2".
There are some object-mapping libraries for JCR, but they'll very likely hinder really making use of JCR's data structure flexibility. JCR itself is a Java API, and it has built-in support for events, security, queries, locking, versioning, etc.
If you do look at JCR, be sure to check out the different implementations, including Jackrabbit and ModeShape. They each offer something different (e.g., Jackrabbit is the reference implementation, while ModeShape offers some extensions and additional features like expanded query languages and sequencing; see this related question).

Related

XML vs JSON vs SQLite for only reading data

I have a collection of 350 locations in the United States with each containing about 25 subcategories. The data structure looks something like this:
Location (ex: Albany, NY)
--> Things to do
--> Population
... 23 More
Which of the following would be best for loading this data into the app: JSON, XML, or SQLite? Just to clarify, I don't need to edit this data in any way. I simply need to read it so that the information can be loaded into TextView's.
Edit:
I'm attempting to implement Room and XML and so far the XML seems to be the simplest to implement. Is it bad practice to use the XML solution? It doesn't seem to be using too many resources and it isn't running slow at all when tested on a few devices. Would it still be a better practice to implement the Room solution?

Undoubtedly, among all of these RDB is the most efficient one, both in terms of storage and query response. I personally do not see any point in using xml and json as these have been traditionally used for exchange of data and are inefficient for storage and queries.
I would suggest that you evaluate the following:
a) how are you going to store the data: single file vs multiple files(for example by subject)
b) are you going to be doing updates on the strings or just appending(SQL will be better suited for updates but if it just reading data after a batch processing flat files might be better suited)
c) How complex are the queries that you want to implement.XML and SQL are better suited for queries that might try to address metadata (date stored, original location address, etc.) than JSON
Once you determine what you want to optimize: whether it is on adding metadata, fast updates, fast querying, ease of storage, fast retrieval of subject files, etc. then you can decide the tradeoffs with other less important goals. In this specific instance the devil is very much in the details.

In most cases it would be better to use a database because it increases readability and maintainability. Especially if you want to show these information inside a kind of list-view. If you use JSON or XML you'll have to parse or write a lot of code to switch between things or load them with a good performance. Consider the case of using Room, LiveData and a RecyclerView, this will reduce the code you'll need and improve( a lot) performance and readability of your app code. By the way you should provide more information about how you want to use and where you want to show these information. XML (or the Android resource system) should be used if you plan to use the resource system itself with its qualifiers to reduce your work. Most of the time JSON is used to communicate outside or with another app in an easy way or for REST requests/responses.

The one option that wouldn't make sense to use at all for your use case is SQLite. Unless you plan on running specific queries on the data for preprocessing before loading them into your view it doesn't worth the overhead (even if I don't imagine is a lot with 350 locations)
XML vs JSON serve the same usecase without much difference, read up their specifics in this website: https://www.json.org/xml.html
I would personally go for JSON due to the simplicity of the format.
Edit:
#simo-r Argument is also a valid one in regards to readability of your code. While there are libraries that can make reading json/xml easier by default Android has really good SQLite support so it might make sense to use it. Ultimately it is in your personal preference and where you see the project growing.

I have a collection of 350 locations in the United States with each containing about 25 subcategories.
The main issue is scalability
Will you, in the next few years, keep just a few hundred locations, or do you imagine, that, if your software becomes successful, your data would grow to many thousands of locations?
If yes: choose SQLite because it could store many records, in an efficient way. Don't forget to have a good database schema with appropriate indexes. See this and read about database normalization. Also, an SQLite database could later be migrated (with efforts) to PostGreSQL.
If no (your data has just a few megabytes): keep JSON or XML. The data is in the page cache.
Consider also YAML, and sometimes a mixed approach.
don't forget to document how your data is organized and accessed.
See also the data persistence chapter of this draft report

If you gonna simply bind data into text views, you can just store the text as strings.xml. As simple as that.

Go with JSON.
Advantages :
Low overhead ( Vs SQLite )
Lightweight parsers like Jackson available using which you can easily convert your data into custom object or data-structure if you need.
Maintainable. As most of the developers understand the format.

I would suggest using JSON. Reason below
JSON vs XML
JSON is lightweight than XML and would take fewer resources(network and storage). Performance of the app increases.
JSON parsing is easy and as mentioned above, its trivial.
JSON is friendly to javascript, in case it's required.
JSON vs SQLite
350 data set with 23 attributes, can be easily managed by JSON. RDBMS is not required.
SQLite becomes an overhead. It's an extra layer and layer comes with a cost. Especially if the application is containerized, the architecture becomes complicated. One needs to deal with volume mapping etc, in case of JSON you can keep the data as part of the application code.
Importantly, since data is static, keep the application stateless by keeping the data alongside the codebase. This makes lot more sense from architectural perspective.

Problem
You have a fixed set of information with a simple structure that you wish to deliver to clients.
Questions to Reflect On
Do I expect this information to significantly changed or modified ever?
Do I expect to increase the amount of information available?
What kind of help do I have? Do they have a background in software engineering or is it someone of a different profession that has to wear a lot of hats?
What is the scale of the project? Are you expecting a large amount of users or just people interested in a very niche application?
JSON or XML
JSON and XML provide similar services: they are both data transfer protocols. If the information is not expected to grow both might be a great option. If its public information, just serve these files statically over nginx. You can point a worker with limited software engineering experience to update these files; they're just files in a folder presented in a human readable format... its extremely simple to do. These updates should be minor and infrequent.
JavaScript Object Notation(JSON) Pros
solid browser and backend support
small size and fast parsing by the javascript engine
very human readable, easy for the untrained eye to make changes
Extensible Markup Language(XML) Pros
standard meta-data option
supports namespaces
solid backend support and is often baked into frameworks
This article explains XML and JSON differences really well (in 2020) if these highlights were not sufficient for your investigation.
Database System
There are a plethora of database systems out there. Their job is to efficiently retrieve specific information from a large volume of data stored. The key reason to use databases is scalability. Scalability means a number of things; I view it as adapting to drastic change. If you expect this information to frequently change or grow, go with a database.
Object Relational Mapping (ORM)
Databases can be cumbersome to use. I would recommend using an ORM on top of them. These encapsulate a database and makes it more user friendly (language specific). Room makes sense in your use case especially for java android development. Encapsulation also allows you to migrate to other databases later without change your code. Here's a good article that discusses Room and SQLite!
Miscellaneous
"Is it bad practice to use an XML solution?"
No. The important thing is that it works, is understandable, and runs efficiently. Just keep in mind that XML and JSON are data transfer protocols and they do THAT job well. This stackoverflow discussion may be helpful to gain a better picture of what that means; be sure to read more than just the accepted answer.
"It doesn't seem to be using too many resources and it isn't running slow at all when tested on a few devices."
Although testing for functionality is great, keep in mind that your test is not a load test and does not verify what you're trying to confirm. I would explore load testing, Wikipedia is a good place to start!

JPA, Start with entities vs database schema

Which is better when using JPA, especially when starting a new project?
Start with designing entities and then let JPA generate the database or
Start with the database schema and let tools generate entity classes?
im part of a small company. im both the software developer and DBA. i have complete freedom for the application and db design
im just starting the project

If you want to design a database, then start with the schema. If you want to write software, then start with the entities. The point of an ORM is to let you think about an object model without having to worry about the database that stores it, so questions of this type actually confuse the issue somewhat by insinuating crossover between the realms. Are you a software developer or a DBA? That, much more than the fact that you're using JPA, is what will determine the correct answer for you.

Um - neither? The power of JPA is that you don't have to generate one from the other! Generation of entities or database schema might be a good starting place if you have one already in place; but the generated stuff is not something you will want to use long term.
You cannot simply design one side of the mapping without any consideration of the other. If you will share the database with other applications, you will need to give more weight to the database schema. If your application has a complex model, you will want to focus on the object model first, allowing it to be driven by the use cases you uncover as you develop your application.
I tend to start with the object model first (even without a backing database to start) because that allows me to see the application in action earlier and get a feel for what we really want to build. But integration with the database must happen earlier rather than later; as its constraints will quickly impose themselves on your object model. :-)

It depends on one's needs . Usually in a product development environment , there are different teams which work on database design , interface design and implementation . So in that case , you have no option than generating the JPA entities from the already database design.
Secondly , if you are starting from the scratch and you know what are you upto then you can probably start writing your own entities (java class) and then generate the database from that.

Better go for database Scheme. Because some of the features r not available in JPA generated database.In JPA We cant give default values for our column. Check HERE for the allowable attributes in JPA.

Its upto you. whether to go top-down or bottom up approach. In case this schema would be exclusive to your application, see how your team members and analysts understand ORM or DB. In my experience analysts are better understand in terms of tables. But if they are good to discuss in terms of classes or UML diagrams go with JPA. Also, take into consideration views of your DBA and build engineer.

If you have complete flexibility and are not restricted to a DB schema and want a clean object model in your java application then start with with the model first and generate the schema from the model. This also allows you to generate clean JSON representations from the clean model to serve as on-the-wire format for your objects using technologies like Jackson (or GSON).
Doing DB Schema first and reverse engineering model classes from it will result in relational concepts seeping into your model classes resulting in poor (polluted) model.
In summary do model first unless your hands are tied and you must map to some existing schema.

How can I persist a POJO with transactions, versioning, history, and hierarchy - using no-schema?

I am writing a system where I want to be able to persist various "Job Templates" to some kind of datastore. In our system, the Templates are domain objects which contain descriptions of how to do various mathematical calculations.
I would like to have some kind of robust store for these Template POJOs - preferably some kind of shared/remote repository so that multiple systems could access, and potentially modify, the Templates.
I am slightly overwhelmed by the number of NoSQL, graph-, document- and object-databases out there, and was hoping to get some guidance from people who've done this before!
In my ideal world, this would be a magical no-schema datastore, so that any Template object could be written out to it with a simple method call, and then retrieved again later. I would ideally like to have the following features -
1) Versioning - so I can "overwrite" a previous POJO, and track the version changes
2) History - so I can go back (time machine style) and retrieve prior versions
3) Transactions - so everything is consistent and bulletproof
4) Hierarchy - so I can group POJOs, and find them by path, etc
Some options appear to include Jackrabbit OCM, OrientDB, CouchDB. Also there is JDO (DataNucleus) vs JPA. Could anyone share any insights to cut through the fog of too much choice?

What an open question!
I use Jackrabbit, which fulfills all your needs and is very easy to get up and running. You can perform XPATH queries as well as several other types. Not very good on documentation but the JCR docs are usually enough to get by. There are other JCR implementations. You might want to consider them/find benchmarks if you intend on pushing the limits :)

Suggest a persistent strategy for a workflow system

I am in the process of creating a UI configuration tool for my pet project. One aspect of this tool lets the end user DEFINE his orchestration. I then need to save this orchestration definition into a database. There will be a executable version of this definition in a running system. The executable version is created dynamically on-demand.
Idea is to separate the DEFINITION from EXECUTABLE version so that I have the flexibility to choose the runtime version among BPMN or JPDL or a POJO based workflow solution (BeanFlow).
Limitation: I can't use the BPMN editors that come with frameworks like jBPM, Activiti etc as I wan't to use my own UI that is specific to my domain.
I need suggestions on HOW to PERSIST the definition.
Should I use rdbms tables? If so, is there a db schema I can borrow that is close to orchestration concepts?
Should I serialize my definition to BPMN/JPDL XML instance document?
Are there any other simple formats that I can use?

By "orchestration" I'm assuming you mean a finite state machine. Where the current state dictates what transitions can be followed to other states. The representation of states and transitions as edges and vertices often produces a directed acyclic graph, however there are times when the graph will cycle (e.g. draft -- submit for approval --> pending approval -- reject --> draft).
In practice, separating the definition from execution calls for a persistence format that can easily accommodate customization. As your system evolves you will find a number of unanticipated edge cases whose solution should not require altering a persistence schema, only code. This implies XML or a NoSQL solution - something whose schema is easily changed or non existent.
Now, having written my own XML definition for this purpose (for uninteresting reasons I'll exclude), my suggestion is using JPDL (or BPMN). Reason is their definitions likely incorporate whatever you're considering now, will in the future, and enable customization - such as hanging arbitrary data or behavior off them at a given point. You also get the advantage of tools already built - not just UI - for dealing with cycle detection and ensuring there is a path to completion for example.
Some of the interesting features I know JPDL possesses are an ability to help merge forked processes, timed tasks (including those that repeat periodically), and facilities for sending notification. This last item - notification - bears some further exposition. One of the things I've found with my own system is the need for sending out configurable email whose content is based on the data flowing through. These existing engines make that relatively easy by providing a way to plugin variables for instance into text that's then dynamically evaluated at run time before transmission. Also they provide bridges between the engine and whatever user store for the purpose of sending notifications to groups of people, tasking them and enforcing security policy.
Finally, depending on the scope of your system, you will probably still be using a database as well. What I suggest is storing off the XML and data being orchestrated into the database in a serialized format. Then, if the data is being altered as it travels through the execution, write out serializations of the data - and perhaps workflow if it is also changed - into a history/audit log table as well.

I would NOT use rdbms tables, or if you do, store the definitions as text blobs. Trying to make records for the definition is a bad idea because it's much more inflexible and difficult to change your definition over time. Many people would use different approaches, but I'd use JSON or YAML, and avoid XML. The motivation for that is to make it as simple as possible. Trying to use XML, especially a formalized specific format of XML is going to make you spend much more time meeting an exact specification that doesn't actually do anything to help what you're trying to accomplish. JSON and YAML are both very easy to work with from a code perspective. YAML is more easily readable by humans and easier to edit, and isn't as tricky for punctuation and escaping as JSON. JSON is more widely used, and is smaller than YAML. JSON also has a binary counterpart, BSON, if document size is a concern.
Once you have an importer/exporter that goes to/from your internal objects to your data format, then persisting using RDBMS, or other mechanisms, will be straightforward. You could even use CouchDB, which could offer other benefits to your application and may be a great fit.

Very good question! Here is my two cents:
RDBMS: if you do this you will be able to query the workflow instances, for example which tokens are at 'node X'?
Storing XML as clob: the simplicity is the truth of this solution, but you can't really query these just get them by id
NOSQL: there are a lot of different solutions for different problems. MongoDB is a popular solution, it provides document oriented persistence.

How about a simple serialisation of the composed UI using for example XStream and then store the serialised bits into the database as a binary column. Then when user logs in, get the associated data, deserialise, initialise if required and display.

What JDBC tools do you use for synchronization of data sources?

I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.

I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).

If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.

True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.

I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.