Create schema and tables on demand at the runtime

Create schema and tables on demand at the runtime - java

So as the title suggests - I need to create an application (preferably Spring Boot), which will create schemas and tables based on user input. Basically, a rest endpoint will be offered to the clients where they would upload their data model in json format. I'll be parsing the json and constructing the db artifacts (schema and tables) in runtime. And once all the tables are created, provide a rest endpoint (with unique identifier), to the client, to perform CRUD operations on their schema.
The approach I am considering currently is -
Create a super user in db , before deploying the app which will have priviliges to create new schemas and db
Create prepared statements to invoke schema/table creation on demand. The prepared statements will have place holders to take the schema name and table definition.
After proper authentication, allow users to upload their data model definition in json.
Clean the json and invoke the schema/table creation prepared statements.
Few questions that I had in mind -
Since all these DB operations will be invoked from a single super user's account, is it safe ?
The schemas and tables will be realized using native SQL queries instead of Hibernate's ORM capabilities. Is it safe/efficient ?
For the CRUD operations, is it possible to switch the db connection from super user to the client specific schema created in the earlier steps ? Or should I continue using the same super user for the CRUD operations?
It would be nice if it is possible to switch schemas in runtime using Hibernate/Spring-Boot.
What I would like is a general approach to this problem. I do not need any code.

A typical web application already has permissions to DELETE all the data for all the users.
JPA makes your queries slower, not faster. JPA can help with caching, but it doesn’t seem you need this.
Yes, you can have multiple datasources in Spring Boot. Look at this for example: https://www.baeldung.com/spring-abstract-routing-data-source
Be aware that your database might not like having millions of tables. Query planning, maintanance jobs, backups, etc all get performance penalties. Basically, databases are not designed for your use case.

Related

How join a record set that is returned from a web service with one of your sql tables

I thought about this solution: get data from web service, insert into table and then join with other table, but it will affect perfomance and, also, after this I must delete all that data.
Are there other ways to do this?

You don't return a record set from a web service. HTTP knows nothing about your database or result sets.
HTTP requests and responses are strings. You'll have to parse out the data, turn it into queries, and manipulate it.
Performance depends a great deal on things like having proper indexes on columns in WHERE clauses, the nature of the queries, and a lot of details that you don't provide here.
This sounds like a classic case of "client versus server". Why don't you write a stored procedure that does all that work on the database server? You are describing a lot of work to bring a chunk of data to the middle tier, manipulate it, put it back, and then delete it? I'd figure out how to have the database do it if I could.

no, you don't need save anything into database, there's a number of ways to convert XML to table without saving it into database
for example in Oracle database you can use XMLTable/XMLType/XQuery/dbms_xml
to convert xml result from webservice into table and then use it in your queries
for example:
if you use Oracle 12c you can use JSON_QUERY: Oracle 12С JSON
XMLTable: oracle-xmltable-tutorial
this week discussion about converting xml into table data

It is common to think about applications having a three-tier structure: user interface, "business logic"/middleware, and backend data management. The idea of pulling records from a web service and (temporarily) inserting them into a table in your SQL database has some advantages, as the "join" you wish to perform can be quickly implemented in SQL.
Oracle (as other SQL DBMS) features temporary tables which are optimized for just such tasks.
However this might not be the best approach given your concerns about performance. It's a guess that your "middleware" layer is written in Java, given the tags placed on the Question, and the lack of any explicit description suggests you may be attempting a two-tier design, where user interface programs connect directly with the backend data management resources.
Given your apparent investment in Oracle products, you might find it worthwhile to incorporate Oracle Middleware elements in your design. In particular Oracle Fusion Middleware promises to enable "data integration" between web services and databases.

How to handle concurrent sql updates, given database structure can change at runtime

I am developing spring mvc application
For now I am using innodb mysql but I have to develop the application to support other databases also.
Can any one please suggest me how to handle concurrent sql update on single record.
Suppose two users are trying to update same record then how to handle such scenario.
Note: My database structure is dependent on some configuration (It can change at runtime) and my spring controller is singleton in nature.
Thanks.
Update:
Just for reference I am going to implement version like https://stackoverflow.com/a/3618445/3898076).

Transactions are the way to go when it comes to concurrent sql updates, in spring you can use a transaction manager.
As for the database structure, as far as I know MySql does not support transactions for DDL commands, that is if you change the structure concurrently with updating, you're likely to run into problems.
To handle multiple users working on the same data, you need to implement a manual "lock" or "version" field on the table to keep track of last updates.

Risk of data contamination due to in memory processing - JAVA

I am developing java application based on spring framework.
It
Connects to a MySQL database
Gets data from MySQLTable1 in POJOs
Manipulates (update,delete) it in memory
Inserts into a Netezza database table
The above 4 processes are done for each client (A,B,C) every hour.
I am using a spring JDBC template to get the data like this:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
and read each record into a POJO before I write it to a Netezza table.
There are going to be multiple instance of this application running every hour through a scheduler.
So Client A and Client B can be running concurrently but the SELECT will be unique,
I mean data for:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
will be different from
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='B' AND COL4='CONDITION'
But remember all of these are stored in memory as POJOs.
My questions are :
Is there a risk of data contamination?
Is there a need to implement database transaction using spring data transaction manager?
Does my application really need to use something like Spring Batch to deal with this?
I appreciate your thoughts and feedback.
I know this is a perfect scenario for using an ETL tool but that is out of scope.

Is there a risk of data contamination?
It depend on what you are doing with your data but I don't see how you can have data contamination if every instance is independant, you just have to make sure that every instances that run concurrently are not working on the same data (Client ID).
Is there a need to implement database transaction using spring data transaction manager?
You will probably need a transaction for insertion into the Netezza table. You certainly want your data to have a consistent state in the result table. If an error occur in the middle of the process, you'll probably want to rollback everything that was inserted before it failed. Regarding the transaction manager, you don't especially need the Spring transaction manager, but since you are using Spring it might be a good option.
Does my application really need to use something like Spring Batch to deal with this?
Does it really need it, probably not, but Spring Batch was made for those kind of application, so it might help you to structure your application (Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management). Everything can be made without the framework and it might be overkill to use it if you have a really small application. But at the end, if you need those features, you'll probably want to use it...

Spring Batch is ETL, so using it would be a good fit for this use case and also a good alternative to a commercial ETL tool.
Is there a risk of data contamination? Client A and B read separate data, so they can never interfere with each other by reading or writing the same data by accident. The risk would be if two clients with the same ID are created, but that is not the case.
Is there a need to implement database transaction using spring data transaction manager?
There is no mandatory need to do that, although programatic transaction management has many pitfalls and is best avoided. Spring Batch would manage transactions for you, as well as other aspects such as paging.
Does my application really need to use something like Spring Batch to deal with this? There is no mandatory need to do this, although it would help a lot, especially in the paging aspect. How will you handle queries that return thousands of rows? Without a framework this needs to be handled manually.

Database independency using Hibernate

I am using Hibernate for ORM in my Java application. I want to write custom queries combining multiple tables and using DB functions like sum(salary).
I also want to support multiple databases without writing the SQLs again and again for each database. The approach currently followed
is having Stored Procedures specific to each DB (Oracle, MySQL etc) and whichever we want to support, we change the configuration file in the application.
What I am looking for is a solution very generic so that I need not write Stored Procedures or SQLs for every new functionality.

If you really want to keep it portable, you'll need to do it all with HQL.
There's no reason that you couldn't do multi-table joins and aggregate functions in HQL, you just need to limit yourself to the ones it supports.
Once you start doing database-vendor specific things, you are no longer database independent, by definition.

A perfect suite is HIbernate Criterias
Hibernate provides alternate ways of manipulating objects and in turn data available in RDBMS tables. One of the methods is Criteria API which allows you to build up a criteria query object programmatically where you can apply filtration rules and logical conditions.
http://www.tutorialspoint.com/hibernate/hibernate_criteria_queries.htm

Distributed query with Hibernate multi-tenancy

I am using Hibernate's multi-tenancy feature via JPA, with a database per tenant strategy. One of my requirements is to be able to run a query against a table that exists in each database but obviously with different data. Is this possible?
Thanks in advance for your time.

Nope. this is not possible because when hibernate runs queries it is already initialized with a connection. MT support in Hibernate is basically done a little "outside of Hibernate" itself. It's kind of feeding hibernate with a proper connection and when it's fed :) it's bound to that connection.
If you need cross-tenant queries you might want to reconsider multitenancy or change JPA provider to the one that support "shared schema approach" e.g. EclipseLink. With shared shema approach you have two choices:
run native query agains table containing mt-aware entities
create additional entity - dont mark it as multitenant - map it to the table containing mt-ware entities and run JPQL query in standard manner

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.