I am looking how to properly architect a system I am building.
I'll try to give a high level view of the system's requirements:
The system uses various data providers (2 for now, but more to come). The data can be retrieved :
either directly from a remote database (hosted on the provider's server, direct MySQL access)
either directly from files on the local server (the provider pushes files by SCP). Here there would be to additional logic to check if the files are here and purge them / move them afterwards
The method to use depends on the provider.
The system then needs to regularly import data from these providers to a local database. There is a special importation logic which differs for every provider.
Putting all the code in a monolithic application obviously seems like a bad idea, so I'm looking into splitting it into multiple services.
I'm really not sure how to architect this, though, and would appreciate some advice.
Details:
Java + Spring if needed to expose an API between services
Scalability / speed not really important (I hate saying that - but in this case it doesn't matter whether the imports take 1 ms or 4 hours)
Everything needs to be hosted on a single server and no data can leave it (sensible data)
Any help highly appreciated!
Related
My employer has currently given me a project that has me scratching my head about synchronization.
I'm going to first talk about the situation I'm in:
I've been asked to create a pdf-report/quotation-tool that takes data (from csv-files; because the actual database the data is on is being used by old IBM software and they for reasons (unknown) don't want any direct access to this database (so instead of making copies of the data to other databases, they apparently found it incredibly fine to just create a folder on the server with loads and loads and loads of CSV-files.)), this piece of software is to load data into the application, query it, transform where needed, do calculations and then return with a pdf-file to the end-user.
The problem here is that getting, querying, and calculating things takes a fair amount of time, the other problem is: they want it to be a WebApp because the business team does not want to install any new software, they're mostly moving towards doing everything online (since the start of the pandemic), it being a WebApp means that every computation has to be done by the WebApp and getting the data likewise.
My question: Is each call to a servlet by a separate user treated as a separate servlet and should I only synchronize the methods on the business logic (getting and using the data); or should I write some code that puts itself in the middle of the servlet, receives a user-id (as reference), that then runs the business-logic in a synchronized-fashion, then receiving data and returning the pdf-file?
(I hope you get the gist of it...)
Everything will run on Apache Tomcat 8 if that helps. Build is Java 11lts.
Sorry, no code yet. But I've made some drawings.
With java web applications, the usual pattern is for the components to not have conversational state (meaning information specific to a specific user's request). If you need to keep state for a user on the server, you can use the http session. With a SPA or Ajax application it's often easier to keep a lot of that kind of state in the browser. The less state you keep on the server the easier things are as your application scales, you don't have to pin sessions to servers (messing up load balancing) or copy lots of session state across a cluster.
For simple (non-reactive) web apps that do blocking i/o, each request-response cycle gets its own dedicated thread from tomcat's pool. That thread delivers the http request to the servlet, handles the business logic and blocks while talking to the database, then carries the http response.
(Reactive webapps are going to be more complex to build, you will need a non-blocking database driver and you will have less choices for databases, so I would steer clear of those, at least for your first web application.)
The threadpool used by tomcat has to protect itself from concurrent access but that doesn't impact your code. Likewise there are 3rd party middletier caching libraries that have to deal with concurrency but you can avoid dealing with it directly. All of your logic is confined to one thread so it doesn't interfere with processing done by other threads unless there are shared mutable data structures. Those data structures would be the part of the application where synchronization might be one of several possible solutions.
Synchronization or other locking schemes are local to one instance of the application. If you want to stand up multiple instances of this application then you need to be aware each one would be locking separately from the others. So for some things it's better to do locking in the database, since that is shared across webapp instances.
If you can make use of a database to store your data, so that you can rely on the database for caching and indexing, then it seems likely your application should be able to avoid having doing a lot of locking.
If you want examples there are a lot of small examples for building web apps using spring at https://spring.io/guides. These are spring boot applications that are self hosted so you can put them together quickly and run them right away.
Going rogue with a database may not be the best course since databases need looking after by DBAs. My advice is put together two project plans, one for using a database, and one for using the flat files. The flat file one will have to allow for addressing issues like handling caching, indexing data, replication of data from the legacy database, and not having standard tools that generate pdfs from sql queries. The alternative plan using a database should have a lot less sorting out of infrastructure and a shorter time til you can get down to cranking out reports.
We maintain our server once a week.
Sometimes, the customer wishes that we change some settings which is already cached in server.
My colleague always write some JSP code to change these settings which are stored in the memory.
Is it a good method to use this kind of methodology?
If our project is not a Web container, which tools can help me?
Usually, in my experience, the server configuration is not stored only in memory of server:
What happens that after a configuration change, the server has been restarted / just went down for some system reason?
What happens if you have more than one instance of the same server to work on (a cluster of servers in other words)?
So, usually, people opt for various "externalized configuration" options that can range from "file-based" configuration + redeploy the whole cluster upon each configuration change, to configuration management servers (like Consul, etc.d, etc). There are also some solutions that came from (and used in) a java world: Apache Zookeeper, Spring cloud config server to name a few, there are others. In addition, sometimes, it's convenient to store the configurations in a database.
Now to your question: If your project is not a web container and you don't care that configuration will "disappear" after a server restart and you're not running a distributed cluster of servers, then, using JSP indeed doesn't seem appropriate in this case.
Maybe you should take a look at JMX - Java management extensions, that have a built-in solution so that you probably will be able to get rid of a web container (which seems to be not used by your team anyway other than for JSP modifications that you've described).
You basically need in memory cache, there are multiple solutions found in answers which include creating your own implementation or using existing java library. You can also get data from database and add cache over the database layer.
I'm trying to getting into understanding properly the package by feature approach.
1 - Let say I have 2 features that tap on the same data. For instance,
one feature could be visualizing bank account information with
different sophisticate possibilities. The other feature is about
making transaction from the bank account (We could well imagine that
this feature does not involve visualization, it could be simply
provided as a rest service).
1.a - The data model is shared across two features here. How does that impact the package by features. Shall we create redundant data models
class in the 2 package ? Shall we create a specific package for the
data model instead?
which leads me to the second question?
2- In general how are cross-cutting concern dealt with ?
2.a - For instance the case above when it comes to the data model?
2.b - Or, when it comes to the database access or some common access to an external service (shared by different feature but doing
something different with it)?
2.c - Else, the front-end or the overall bundling of the application in general.
What i mean here, is the following case: Currently i have an application which has
(i) a message transfer capability (between participant of the system)
(ii) It also has the messaging monitoring capability whereby it automatically detect rules violation and give penalties.
(iii) A visualization capability dedicated to the administrator of the system.
(iv) A notification capability provided to the administrator of the system to send message to participants.
(V) A violation cancellation capability for the admin as well. And so on.
The point is all of it has to be packaged in one application that i
call marketplace infrastructure. Should the marketplace infrastructure
that wires everything together have his own package ? Even if it is
not a feature.
I think the same could be applied some how in a Web-application as well. There has to be one central point that bundles all the feature modules / packages altogether. If each module define routes, controllers etc... There should be a central routes that import all routes for instance.
If the application has a database behind, this database is used by different feature, well who is going to start the database and wire every modules.
So bottom line is: what about the cross functional stuff (data models,
service access and etc..) and the bundling (wiring everything
together).
PS: By wiring i think about dependency injection, still the graph of object has to be defined somewhere.
Many thanks for any help.
I have multiple clients:
client 1 - 40 users
client 2 - 50 users
client 3 - 60 users
And I have a web application that is supposed to serve all the clients.
The application is deployed into Tomcat. Each client has it's own database.
What I want to implement is the single web application instance which servers all the clients. The client (and the database to connect to) is identified by the context path from the URL.
I.e. I imply the following scenario:
Some user requests the http://mydomain.com/client1/
Tomcat invokes a single instance of my application (no matter which context is requested)
My application processes the rest of the request thinking that it's deployed to /client1 context path, i.e. all redirect or relative URLs should be resolved against http://mydomain.com/client1/
When the client 2 requests the http://mydomain.com/client2/, I want my application (the same instance) now process it just like if it was deployed to /client2 context path.
Is this possible with Tomcat?
Your application has to do this not tomcat. Now you could deploy your application in three new contexts (client1, client2, client3) with slightly different configuration for the database, and if you are careful to use relative URLs (ie don't do things like /images) then you can do this without changes. This is the transparent way of making your application reusable in that your application is unaware of the global picture that you have 3 different instances of itself running. That means you can easily deploy more or more without having to change your application. You just configure a new instance and go. This only requires you don't use absolute URLs to resources. Using ServletContext.getContextPath() and using .. in your CSS, scripts, etc is helpful as well here.
Probably one of the biggest advantages working this way is that your app doesn't care about global concerns. Because its not involved in those decisions you can run 3 instances on one tomcat server, or if one client needs more scaling they can be moved to their own tomcat server easily. By making your app portable it has forced you to deal with how to install your app in any environment. This is a pillar of horizontal scaling which your situation could very much take advantage being you can split your DB data without having to rejoin them (huge advantage). The option you asked about doesn't force you to deal with this so when the time comes to deal with it it will be painful.
The other option is more involved and requires significant changes to your application to handle this. This is by parsing the incoming URL and pulling out the name of the client then using that name to look up in a configuration file for the database that should be used for that client. SpringMVC can handle things like extracting variables from URL paths. Then making sure you render everything back to them so it points to their portion of the URL. This probably would require a lot of the same requirements as the first. You can use absolute URLs for things like javascript, CSS, and images, but URLs to your app would have to be rewritten at runtime so that it is relative to the requesting client. The benefit is that your only load your application once.
Just as an aside, if you host your CSS, Javascript, images on a CDN in production then both of these options must be relative URL aware. Upsides and downsides to using CDNs as well.
While that sounds good it might not be a good thing because all clients use the same version of the app. Also if you bring down a the app to fix client1 to do maintenance it affects all clients. If you think you'll have to do customization per client then this option will get messy quick. Upgrading a single client means all clients must upgrade and depending on your business model this might not be compatible. Furthermore, I'm not entirely sure you'll save a lot of memory either running only a single version of the application because most apps only take up 10MB of code loaded. A vast majority of the memory is in the VM and processing requests, and using a single Tomcat instance means you share the VM. And with 1 or 3 instances running you still have the same number of requests. You might see a difference of 30-100MBs which in todays world is chump change, and all of those other concerns aren't addresses if you choose to save only a couple of MB.
Essentially there are facilities in Tomcat to aid you in doing this (multiple contexts), but its mostly up to your application to handle this especially if its a single instance.
Currently we are building web services applications with Spring, Hibernate, MySQL and tomcat. We are not using real application server- SoA architecture. Regarding the persistence layer - today we are using Hibernate with MySQL but after one year we may end up with MongoDB and Morphia.
The idea here is to create architecture of the system regardless concrete database engine or persistence layer and get maximum benefits.
Let me explain - https://s3.amazonaws.com/creately-published/gtp2dsmt1. We have two cases here:
Scenario one:
We have one database that is replicated (in the beginning no) and different applications. Each application represents on war that has it's one controllers, application context, servlet xml. Domain and persistence layer is imported as maven lib - there is one version for it that is included in each application.
Pros:
Small applications that are easy to maintain
Distributed solution - each application can be moved to it's own tomcat instance or different machine for example
Cons:
Possible problems when using hibernate session and sync of it between different applications. I don't know that is possible at all with that implementation.
Scenario two - one application that has internal logic to split and organize different services - News and User.
Pros:
One persistence layer - full featured of hibernate
More j2ee look with options to extend to next level- integrate EJB and move to application server
Cons:
One huge war application more efforts to maintain
Not distribute as in the first scenario
I like more the first scenario but I'm worried about Hibernate behavior in that case and all benefits that I can get from it.
I'll be very thankful for your opinion on that case.
Cheers
Possible problems when using hibernate session and sync of it between different applications. I don't know that is possible at all with that implementation.
There are a couple of solutions that solve this exact problem:
Terracotta
Take a look at Hibernate Distributed Cache Tutorial
Also there is a bit older slide share Scaling Hibernate with Terracotta that delivers the point in pictures
Infinispan
Take a look at Using Infinispan as JPA-Hibernate Second Level Cache Provider
Going with the first solution (distributed) may be the right way to go.
It all depends on what the business problem is
Of course distributed is cool and fault tolerant and, and,.. but RAM and disks are getting cheaper and cheaper, so "scaling up" (and having a couple hot hot replicas) is actually NOT all that bad => these are props to the the "second" approach you described.
But let's say you go with the approach #1. If you do that, you would benefit from switching to NoSQL in the future, since you now have replica sets / sharding, etc.. and actually several nodes to support the concept.
But.. is 100% consistency something that a must have? ( e.g. does the product has to do with money ). How big are you planning to become => are you ready to maintain hundreds of servers? Do you have complex aggregate queries that need to run faster than xteen hours?
These are the questions that, in addition to your understanding of the business, should help you land on #1 or #2.
So, this is very late answer for this but finally I'm ready to answer. I'll put some details here about further developing of the REST service application.
Finally I landed on solution #1 from tolitius's great answer with option to migrate to solution #2 on later stage.
This is the application architecture - I'll add graphics later.
Persistence layer - this holds domain model, all database operations. Generated from database model with Spring Roo, generated repository and service layer for easy migration later.
Business layer - here is located all the business logic necessary for the oprations. This layer depends on Persistence layer.
Presentation layer validation, controllers calling Business layer.
All of this is run on Tomcat without Application server extras. On later phase this can be moved to Application server and implement Service locator pattern fully.
Infrastructure - geo located servers with geo load balancer, MySQL replication ring between all of them and one backup server and one backup server in case of fail.
My idea was to make more modern system architecture but from my experience with Java technology this is a "normal risk" situation.
With more experience - more beautiful solutions :) Looking forward for this!