Design level query regarding Database access layer implementation for Java applications

Design level query regarding Database access layer implementation for Java applications - java

Objective :
I want to design a DAO layer for may application , which will be used by web-applications and core java programs also.
Fist is it a good approach to have webservice for each database operation
(like : http://sometomcatappurl//insertEmp,updateEmp)
I am planning to configure connection pool in context.xml of tomcat
Access the data-source using spring , perform operation using jdbc template
I will use batch update in few places where i need to process requests in bulk.
The above URL will be call from various internal applications to communicate with database.
Per day requests expected 10-Million Inserts , 20-Million Updates, 20-Million selects.
Concern: Java, Tomcat , DB oracle and mysql are the technology constraints -
Which approach will be the fastest and salable way dealing with huge database operators.
Can spring handle huge requests, is it a good approach to have a web-service to perform database operations.
Note: I don't want to bloat my applications by writing database connection every where by accessing from a properties file, that is the reason going for context declaration with pool size.

A webservice works best for coarse grained transactions. If webservice is just a wrapper around a JDBC connection, you have only added overhead to the process. The webservice should represent a complete logical unit of work that either all succeeds or all fails.
In my experience, performance is 95% about managing the database. Spring adds overheads measured in nanoseconds per transaction, but using a column in a query on a table with a well tuned index that is not necessary can add 10 milliseconds to a transaction.
I am not sure what you consider "huge database operators". The workload you describe appears pretty mundane to me.
A good design that I have seen used to improve the scalability of systems is to put the business transaction processing (those that resolve to database inserts and updates) behind a webservice layer on dedicated hardware, which has the DAO and DSO layers embedded with in it. However, database selects for user display were run directly against the database to minimize the overheads. The DB server was scaled to handle the workload, running on a high SAN appliance.

Related

Where to syncronize inside a Java WebApp

My employer has currently given me a project that has me scratching my head about synchronization.
I'm going to first talk about the situation I'm in:
I've been asked to create a pdf-report/quotation-tool that takes data (from csv-files; because the actual database the data is on is being used by old IBM software and they for reasons (unknown) don't want any direct access to this database (so instead of making copies of the data to other databases, they apparently found it incredibly fine to just create a folder on the server with loads and loads and loads of CSV-files.)), this piece of software is to load data into the application, query it, transform where needed, do calculations and then return with a pdf-file to the end-user.
The problem here is that getting, querying, and calculating things takes a fair amount of time, the other problem is: they want it to be a WebApp because the business team does not want to install any new software, they're mostly moving towards doing everything online (since the start of the pandemic), it being a WebApp means that every computation has to be done by the WebApp and getting the data likewise.
My question: Is each call to a servlet by a separate user treated as a separate servlet and should I only synchronize the methods on the business logic (getting and using the data); or should I write some code that puts itself in the middle of the servlet, receives a user-id (as reference), that then runs the business-logic in a synchronized-fashion, then receiving data and returning the pdf-file?
(I hope you get the gist of it...)
Everything will run on Apache Tomcat 8 if that helps. Build is Java 11lts.
Sorry, no code yet. But I've made some drawings.

With java web applications, the usual pattern is for the components to not have conversational state (meaning information specific to a specific user's request). If you need to keep state for a user on the server, you can use the http session. With a SPA or Ajax application it's often easier to keep a lot of that kind of state in the browser. The less state you keep on the server the easier things are as your application scales, you don't have to pin sessions to servers (messing up load balancing) or copy lots of session state across a cluster.
For simple (non-reactive) web apps that do blocking i/o, each request-response cycle gets its own dedicated thread from tomcat's pool. That thread delivers the http request to the servlet, handles the business logic and blocks while talking to the database, then carries the http response.
(Reactive webapps are going to be more complex to build, you will need a non-blocking database driver and you will have less choices for databases, so I would steer clear of those, at least for your first web application.)
The threadpool used by tomcat has to protect itself from concurrent access but that doesn't impact your code. Likewise there are 3rd party middletier caching libraries that have to deal with concurrency but you can avoid dealing with it directly. All of your logic is confined to one thread so it doesn't interfere with processing done by other threads unless there are shared mutable data structures. Those data structures would be the part of the application where synchronization might be one of several possible solutions.
Synchronization or other locking schemes are local to one instance of the application. If you want to stand up multiple instances of this application then you need to be aware each one would be locking separately from the others. So for some things it's better to do locking in the database, since that is shared across webapp instances.
If you can make use of a database to store your data, so that you can rely on the database for caching and indexing, then it seems likely your application should be able to avoid having doing a lot of locking.
If you want examples there are a lot of small examples for building web apps using spring at https://spring.io/guides. These are spring boot applications that are self hosted so you can put them together quickly and run them right away.
Going rogue with a database may not be the best course since databases need looking after by DBAs. My advice is put together two project plans, one for using a database, and one for using the flat files. The flat file one will have to allow for addressing issues like handling caching, indexing data, replication of data from the legacy database, and not having standard tools that generate pdfs from sql queries. The alternative plan using a database should have a lot less sorting out of infrastructure and a shorter time til you can get down to cranking out reports.

2PC or event publisher (Spring Batch) or Oracle Database Link

I have a situation wherein as part of a online transaction, I have to save some data into other database, a slight latency (few seconds) in updating the other database is fine. Now since both databases are Oracle, I have below 3 options, I need some insight as which one is better.
Oracle Database Links: Wherein I convert the SQL into PL/SQL and make my database take care of writing into another Oracle based database for DEV env both the databases are in same server as different schema while in production they happen to be two separate ORACLE RACs separated by a few routers and switches.
Spring Batch: Use a batch job somehow to pick the transactions from my source database and process and write into another target database. This way my online transactions would not fail it other database ever goes down or hits a perf issue or face a network issue. And if they ever fail I can code for job restart ability. Is Spring batch well suited for such event publishing case? Would I hit any challenge in future?
2-Phase-Commit: I simply implement 2PC and save the data in both the database in a transaction. Or maybe make it look more future proof and save in a messaging system and my source database.

Should I use AKKA for the periodical task

I have a terminal server monitor project. In the backend, I use the Spring MVC, MyBatis and PostgreSQL. Basically I query the session information from DB and send back to front-end and display it to users. But there is some large queries(like searching total users, total sessions, etc.), which slow down the system when user opens the website, So I want to do these queries as asynchronous tasks so the website could be opened fast rather than waiting for the query. Also, I would check terminal server state periodically from DB(every hour), and if terminal server fails or average load is too high, I would notifying admins. I do not know what should I use, maybe AKKA, or any other way to do these two jobs(1.do the large query asynchronously 2. do some periodical query)? Please help me, thanks!

You can achieve this using Spring and caching where necessary.
If the data you're displaying is not required to be "in real-time", but it can be "near real-time" you can read the data from the DB periodically and cache it. Your app then reads from the cache.
There's different approaches you can explore.
You can try to create a materialized view in PostgreSQL which will hold the statistic data you need. Depending on your requirements you have to see how to handle refresh intervals etc.
Another approach is to use application level cache - you can leverage Spring for that(Spring docs). You can populate the cache on start up and refresh it as necessary.
The task that runs every hour can be implemented again leveraging Spring (Spring docs) #Scheduled annotation.
To answer your question - don't use Akka - you have all the tools necessary to achieve the task in the Spring ecosystem.

Akka is not very relevant here, it is for event-driven programming model which deals with concurrency issues to build highly scalable multithreaded applications.
You can use Spring task scheduler for running heavy queries periodically. If you want to keep it simple, you can solve your problem by simply storing the data like total users, total sessions etc, in the global application context. And periodically update this data from database using spring scheduler. You can also store the same in a separate database table, so that this data can be easily loaded at the initialization time.
I really don't see why you need "memcached", "materialized views", "Websockets" and other heavy technologies and frameworks, for a caching a small set of data. All you need is maintain a set of global parameters in your application context, keep them updated using a scheduled task as frequently as desired.

Multiprocessing on web hosting

I have a java dynamic web app. I am exposing RESTful webservices for my android application.
The thing is that there are some services that do DB updates. Now, I want to host the application on public domain. I was wondering how parallel processing works on web hosting.
Say, my service /updateDB updates the database. Now, if there are two users who hit the same service at the same time, will the two of them run concurrently, because that will cause inconsistency in data. How exactly does the whole thing work.
Do I need to take care of synchronisation in my code?

Why kind of database are you using?
Certain database engines already have mechanisms in place to allow a transaction to be completed before another request over writes data. Most web developers do not have to worry about this because the application server (websphere, weblogic) and database (Mysql,Oracle) take care of these things for you.
(I am going to overly simplify this for you.)
A request to the webservice may perform one or more actions on the DB. These actions can be clumped together and be called a transaction. A transaction can include one or more of the following INSERT, UPDATE, DELETE etc. e.g A new customer registers for your webservice. the following actions take place which can be considered into a transaction.
Insert a new customer username password in the Customer table
Insert customers address in Address table
Update total customer count in Summary table
All the above actions can be completed as one transaction. If any of this fails then all actions will be reverted back automatically. Similarly if two customers registers simultaneously then the database will take care to not over write each other as well.
We can configure the database to make sure that every transaction should be completed before another transaction can dirty the data in a row.
In a database they are called ACID properties.
A - Atomicity - Every transaction must be complete, if anything in a transaction fails, then do not complete the transaction and also revert back every previous action within that transaction.
C - Consistency - make sure that every transaction that occurs will always update the database in a predefined manner. e.g. after every customer registration make sure that all the actions within it are executed
I - Isolation - if more than one request comes in, then they get executed on the database separately
D - Durabilty - after a transaction completes, the changes done should remain forever.
For example Mysql Database with the InnoDB engine supports this. There are other databases which support this as well.
You can read more here
http://java.dzone.com/articles/beginners-guide-acid-and
This is a very vast topic in databases.
Programming language have APIS which will help you write code in this manner. But the basic take away is that databases and applications servers will do most of the work for you. You just have to make sure to design the code structure to identify transactions and commit them appropriately).
Java and other programming languages are aware of ACID properties in DB and will help you achieve that goal.
Read more here about how you use Java to achieve things we mentioned above.
http://docs.oracle.com/javase/tutorial/jdbc/basics/transactions.html
Similarly other languages have similar functionality and APIs.
In google search for "java database transaction" or "<your favorite language>database transaction"

Risk of data contamination due to in memory processing - JAVA

I am developing java application based on spring framework.
It
Connects to a MySQL database
Gets data from MySQLTable1 in POJOs
Manipulates (update,delete) it in memory
Inserts into a Netezza database table
The above 4 processes are done for each client (A,B,C) every hour.
I am using a spring JDBC template to get the data like this:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
and read each record into a POJO before I write it to a Netezza table.
There are going to be multiple instance of this application running every hour through a scheduler.
So Client A and Client B can be running concurrently but the SELECT will be unique,
I mean data for:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
will be different from
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='B' AND COL4='CONDITION'
But remember all of these are stored in memory as POJOs.
My questions are :
Is there a risk of data contamination?
Is there a need to implement database transaction using spring data transaction manager?
Does my application really need to use something like Spring Batch to deal with this?
I appreciate your thoughts and feedback.
I know this is a perfect scenario for using an ETL tool but that is out of scope.

Is there a risk of data contamination?
It depend on what you are doing with your data but I don't see how you can have data contamination if every instance is independant, you just have to make sure that every instances that run concurrently are not working on the same data (Client ID).
Is there a need to implement database transaction using spring data transaction manager?
You will probably need a transaction for insertion into the Netezza table. You certainly want your data to have a consistent state in the result table. If an error occur in the middle of the process, you'll probably want to rollback everything that was inserted before it failed. Regarding the transaction manager, you don't especially need the Spring transaction manager, but since you are using Spring it might be a good option.
Does my application really need to use something like Spring Batch to deal with this?
Does it really need it, probably not, but Spring Batch was made for those kind of application, so it might help you to structure your application (Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management). Everything can be made without the framework and it might be overkill to use it if you have a really small application. But at the end, if you need those features, you'll probably want to use it...

Spring Batch is ETL, so using it would be a good fit for this use case and also a good alternative to a commercial ETL tool.
Is there a risk of data contamination? Client A and B read separate data, so they can never interfere with each other by reading or writing the same data by accident. The risk would be if two clients with the same ID are created, but that is not the case.
Is there a need to implement database transaction using spring data transaction manager?
There is no mandatory need to do that, although programatic transaction management has many pitfalls and is best avoided. Spring Batch would manage transactions for you, as well as other aspects such as paging.
Does my application really need to use something like Spring Batch to deal with this? There is no mandatory need to do this, although it would help a lot, especially in the paging aspect. How will you handle queries that return thousands of rows? Without a framework this needs to be handled manually.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.