Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Consider the situation.
I am writing a statistical analysis app. The app has multiple tiers.
Frontend UI written for multiple device types, desktop, browser,
mobile.
Mid-tier servlet that offers a so-called REST service to
these frontend.
Backend that performs the extreme computation of the statistical
processing.
Which communicates with a further backend database
Due to the reason that statistical analysis requires huge amount of processing power, you would never dream of delegating such processing to the front-end.
The statistical analyses consists of procedures or a series of
work-flow steps.
Some steps may require so much processing power, you would not want
to repeat them.
If you have a work-flow of 20 steps, you cannot execute step 20
without first executing step 19, which cannot be executed without
first executing step 18, so on and so forth.
There are observation points, such that, for example, the
statistician must inspect results of steps 3, 7, 9, 14, 19 before
telling to client-side to proceed to the next step.
Each of these steps are a so-called request to the REST service, to
tell the backend supercomputer to progressively set up the
statistical model in memory.
There are many workflows. Some workflows may incidentally share step
results. e.g., Flow[dry]:Step[7] may share Flow[wet]:Step[10]. Due
to the amount of processing involved, we absolutely have prevent
repeating a step that might incidentally have already be
accomplished by another flow.
Therefore, you can see that in the so-called REST service being designed,
it is not possible that each request be independent of any previous request.
Therefore, how true can the following statement be?
All REST interactions are stateless. That is, each request contains
all of the information necessary for a connector to understand the
request, independent of any requests that may have preceded it.
Obviously, the application I described, requires that request be dependent on previous request. There are three possibilities that I can see concerning this app.
My app does not comply to REST, since it cannot comply to stateless requests. It may use the JAX-RS framework, but using JAX-RS and all the trappings of REST does not make it REST, simply because it fails the stateless criteria.
My app is badly designed - I should disregard trying to avoid the temporal and financial cost restacking up a statistical model even if it took 5 - 15 minutes for a workflow. Just make sure there is no dependence on previous requests. Repeat costly steps when necessary.
The stateless criteria is outdated. My understanding of REST is outdated/defective in that REST community have been constantly ignoring this criteria.
Is my app considered RESTful?
New Question: ISO 9000
Finally, in case my app is not completely considered RESTFul, would all references to "REST" need to be omitted to pass ISO 9000 certification?
new edit:
REST-in-piece
OK, my colleague and I have discussed this and decided to call such an architecture/pattern REST-in-piece = REST in piecemeal stages.
ISTM, you're reading too much into to statelessness. A REST API supports traditional CRUD operations. The API for CouchDB is good example of how DB state is updated by a series of stateless transactions.
Your task is to identify what the resources are and the "state transfers" between them. Each step in your workflow is a different state transfer, marked by a different URI. Each update/change to a resource has an accompanying POST/PATCH or an idempotent PUT or DELETE operation.
If you want to gain a better of understanding of what is means to be RESTful and the reasons behind each design choice, I recommend spending a hour reading Chapter 5 of Roy Fielding's Dissertation.
When making design choices, just think about what the principles of RESTful design are trying to accomplish. Setup your design so that queries are safe (don't change state) and that they are done in a ways that can be bookmarkable, cacheable, distributable, etc. Let each step in the workflow jump to a new state with a distinct URI so that a user can back-up, branch out different ways, etc. The whole idea is to create a scalable, flexible design.
You are updating an in memory model via a REST api. This means that you are maintaining state on the server between requests.
The REST-ful way of addressing this would be to make the client maintain the state by simply processing the request and returning all the information for constructing the next request in the response. The server then reconstructs the in memory model from the information in the request and then does its thing. That way, if you operate in a e.g. a clustered environment, any of the available servers would be able to handle the request.
Whether or not this is the most efficient way to do things depends on your application. There are loads of enterprise applications that use a server side session and elaborate load balancing to ensure that clients always use the same nodes in a cluster. So having server side state is an entirely valid design choice and there are plenty of ways of implementing this robustly. However, server side state generally complicates scaling out and REST in the purest sense is all about avoiding server side state and avoiding complexity.
A workaround/compromise is persisting the state in some kind of database or store. That way your nodes can fetch the state from disk before processing a request.
It all depends on what you need and what is acceptable for you. As the previous commenter mentioned, don't get too hung up on this whole stateful-ness thing. Clearly somebody will have to maintain state and the question is merely what the best place is to put that state for you and how you access it. Basically there are a couple of tradeoffs that basically have to do with various what-if type scenarios. For example, if the server crashes, do you want your client to re-run the entire set of requests to reconstruct the calculation or do you prefer to simply resend the last request? I can imagine that you don't really need high availability here and don't mind the low risk that something occasionally goes wrong for your clients. In that case, having the state on the server side in memory is an acceptable solution.
Assuming your server holds the computation state in some hash map, a REST-ful way of passing the state around then could be simply sending back the key for the model in the response. That's a perfectly REST-ful API and you can change the implementation to persist the state or do something else without changing the API when needed. And this is the main point of being REST-ful: decouple the implementation details from the API. Your client doesn't need to know where you put the state or how you store it. All it needs is a resource representation of that state that can be manipulated.
Of course the key should be represented as a URI. I recommend you read Jim Webber's "REST in practice". It's a great introduction to designing REST-ful APIs.
Related
My employer has currently given me a project that has me scratching my head about synchronization.
I'm going to first talk about the situation I'm in:
I've been asked to create a pdf-report/quotation-tool that takes data (from csv-files; because the actual database the data is on is being used by old IBM software and they for reasons (unknown) don't want any direct access to this database (so instead of making copies of the data to other databases, they apparently found it incredibly fine to just create a folder on the server with loads and loads and loads of CSV-files.)), this piece of software is to load data into the application, query it, transform where needed, do calculations and then return with a pdf-file to the end-user.
The problem here is that getting, querying, and calculating things takes a fair amount of time, the other problem is: they want it to be a WebApp because the business team does not want to install any new software, they're mostly moving towards doing everything online (since the start of the pandemic), it being a WebApp means that every computation has to be done by the WebApp and getting the data likewise.
My question: Is each call to a servlet by a separate user treated as a separate servlet and should I only synchronize the methods on the business logic (getting and using the data); or should I write some code that puts itself in the middle of the servlet, receives a user-id (as reference), that then runs the business-logic in a synchronized-fashion, then receiving data and returning the pdf-file?
(I hope you get the gist of it...)
Everything will run on Apache Tomcat 8 if that helps. Build is Java 11lts.
Sorry, no code yet. But I've made some drawings.
With java web applications, the usual pattern is for the components to not have conversational state (meaning information specific to a specific user's request). If you need to keep state for a user on the server, you can use the http session. With a SPA or Ajax application it's often easier to keep a lot of that kind of state in the browser. The less state you keep on the server the easier things are as your application scales, you don't have to pin sessions to servers (messing up load balancing) or copy lots of session state across a cluster.
For simple (non-reactive) web apps that do blocking i/o, each request-response cycle gets its own dedicated thread from tomcat's pool. That thread delivers the http request to the servlet, handles the business logic and blocks while talking to the database, then carries the http response.
(Reactive webapps are going to be more complex to build, you will need a non-blocking database driver and you will have less choices for databases, so I would steer clear of those, at least for your first web application.)
The threadpool used by tomcat has to protect itself from concurrent access but that doesn't impact your code. Likewise there are 3rd party middletier caching libraries that have to deal with concurrency but you can avoid dealing with it directly. All of your logic is confined to one thread so it doesn't interfere with processing done by other threads unless there are shared mutable data structures. Those data structures would be the part of the application where synchronization might be one of several possible solutions.
Synchronization or other locking schemes are local to one instance of the application. If you want to stand up multiple instances of this application then you need to be aware each one would be locking separately from the others. So for some things it's better to do locking in the database, since that is shared across webapp instances.
If you can make use of a database to store your data, so that you can rely on the database for caching and indexing, then it seems likely your application should be able to avoid having doing a lot of locking.
If you want examples there are a lot of small examples for building web apps using spring at https://spring.io/guides. These are spring boot applications that are self hosted so you can put them together quickly and run them right away.
Going rogue with a database may not be the best course since databases need looking after by DBAs. My advice is put together two project plans, one for using a database, and one for using the flat files. The flat file one will have to allow for addressing issues like handling caching, indexing data, replication of data from the legacy database, and not having standard tools that generate pdfs from sql queries. The alternative plan using a database should have a lot less sorting out of infrastructure and a shorter time til you can get down to cranking out reports.
We're designing an architecture for communicating several applications and we have decided to use Mirth as (pseudo)ESB. In our processes we want to give back control to users as soon as we can, so when an action is fired by an user (for example, pressing Save button after filling in a form) some (necessary) changes are made in database and then a message has to be sent to another system. User doesnt have to wait until message is sent, so our applications gives back control when database changes are done. Message composition is done in background asynchronously. But we donĀ“t really know which approach we should follow:
a) Start a new thread in our app where we collect all necessary data (starting from "primary data", this is, some primary keys that allow us to find all information) to fill an HL7 message and send it to queue where Mirth is listening.
b) Send "primary data" to Mirth and delegate HL7 message composition to it.Mirth can access directly to database to collect necessary data or another option could be invoking some REST/SOAP services of our own.
In case of option B, we have some doubts about how to invoke Mirth:
b.1) Our app makes database modifications and writes primary data on a queue (distributed transaction).
b.2) Our app makes database modifications and call a SOAP or Rest service published by Mirth which all it does is writing message on a queue where Mirth is also reading (no distributed transaction in our app).
Some argue that composing message in our app and using Mirth only as a broker is "missusing" Mirth. On the other side, there is some mates that find accessing app database from Mirth is very intrusive and it should not know our schema. Last option, invoking an app service from Mirth which returns all necessary information for HL7 is like sending "primary data" from app to Mirth only to get it back when Mirth calls service (passing that data as a parameter).
Thank you for your advices.
I'm not sure if Mirth is the appropriate tool to use as an Enterprise Service Bus where your requirements include real time notifications/events to allow the user to proceed after submitting a form.
Without knowing more, such as the architecture in play, we can't really advise you.
IMO, as one who experienced with Mirth integration, as well as designing database dependent applications, I would say that Mirth isn't the appropriate tool for the job.
(1) There is not enough information for an "expert advice" and no single clear technically-justified answer
(2) Option (a) looks like least expensive and easiest to implement for the 1st version, especially with reuse of stable tested libraries like HAPI
(3) In your design treat your Enterprise service bus as a black box component and concentrate on designing the interfaces and clarifying the asynchronous message sequences. This way the service bus internals, the message routing and queuing decisions can be postponed to the deployment time with some coding effort and by following the adapter design pattern
(4) Arguments worded like "missusing", "intrusive", "like it", "nice" perhaps indicate a valid point of view but as such do not create a measurable, verifiable decision criteria or performance indicators and should not be used alone
(5) This is the right time to apply a decision making process and weight-evaluate the various options. As a minimal formal input I'd recommend the Plus/Minus/Interesting
(6) In your decision following points should not be ommited:
securing data privacy (health state is a private property protected by law in some countries)
fault tolerance (robustness, reliability, exception handling)
maintenance costs (do you have qualified people to maintain it, can the solution monitor and auto-correct itself or someone will have to review millions of lines of logs manually)
development costs (do you have qualified people already, how many lines of code can you reuse vs. how many will you have to create/debug)
(7) I'm sorry that my answer is not directly helpful, my choice would be to compose the message in a reliable secured application server, whatever that means in this case and regardless of how it's axons or pseudopods would be connected
Last but not the least: record the why you made the choice - forever, so that you can test and validate your assumptions any time later when the original decision makers get lost in the sands of time
How should I design an application comprised of numerous (but identical) independent processes that need to communicate data to an enterprise application and be monitored and accessible by a web interface?
Here's a more concrete example in Java:
The independent processes are multiple instances of a standalone J2SE application that receives on initialization data about a "user" entity and then starts doing stuff regarding this "user" (this is an infinite process and so any batch sort of design would be wrong here and also similarly, the starting time of these processes is irrelevant)
The enterprise application is a set of J2EE beans and web-services that implement business logic, DB access etc.. and that are (for example) hosted on GlassFish.
The web front is a set of JSPs (perhaps also on GlassFish) that work with the beans.
Now ideally, I want a way for the processes in (1) to be able to invoke methods from the beans in (2), but also for the beans in (2) to be able and update the processes (1) about things.
So these are the required flows of executions, assuming there are 10 independent process of (1) running for 10 different users (consider a "user" something easily identifiable by, say, a number):
Something happens in one of the processes of (1) and they invoke a method from the enterprise application (2) with some data.
One of the real, human, users (which was already identified by the web app) clicks something on a web-page of (3), this invokes a method in (2), and then some "magical" entity (which I have no idea how to name) finds the independent process from (1) that is responsible for this particular user and updates the process with some new data.
My best approach so far is to expose these J2SE apps by JMX and go from there, but I have one thing I don't understand - who or what should be holding a key-pair list of the sort "the process at URI X is responsible for user Y" and then directing the calls accordingly.
BTW, please feel free to give any advice outside of the Java platform (!), as long as it is a platform that can be scaled easily.
EDIT:
Also, is there a way to "host" such independent processes on some app-server? Something that will re-spawn processes if they fail, allow for deployment and monitoring of such processes on remote machines etc.?
There has been some time since I have used Java Message Service in the past so I am afraid I am not up-to-date with the technical details, but from your description it seems like it would suit your case, to handle communication between the adminstration GUI and the client processes.
There are various options (I believe you are interested for asynchronous communication) so you should take a look on the latest developments to examine yourself if it fits your case or not.
Regarding the data size that the server would exchange with the processes I believe this is a different topic and I must say that the answer depends. Would it be better to send all data in the message? Or would the message be just a notification so the client to be notified and then connect to some enterprise bean to check some new state? I would prefer the latter case but this is something you should decide based on your requirements. I wouldn't blindly exclude the first option unless I had some apparent evidence that this wouldn't work.
Regarding the scaling I don't think it can be much worse then the scaling of your rest of your beans. As much the server is concerned they processes are all clients that need to be served.
Please take the above advice with a grain of salt: I don't know specifics of your problem/design. I am speaking more about in a general way.
I hope that helps
Are there any recommendations, best practices or good articles on providing integration hooks ?
Let's say I'm developing a web based ordering system. Eventually I'd like my client to be able to write some code, packaged it into a jar, dump it into the classpath, and it would change the way the software behaves.
For example, if an order comes in, the code
1. may send an email or sms
2. may write some additional data into the database
3. may change data in the database, or decide that the order should not be saved into the database (cancel the data save)
Point 3 is quite dangerous since it interferes too much with data integrity, but if we want integration to be that flexible, is it doable ?
Options so far
1. provide hooks for specific actions, e.g. if this and that occurs, call this method, client will write implementation for that method, this is too rigid though
2. mechanism similar to servlet filters, there is code before the actual action is executed and code after, not quite sure how this could be designed though
We're using Struts2 if that matters.
This integration must be able to detect a "state change", not just the "end state" after the core action executes.
For example if an order changes state from In Progress to Paid, then it will do something, but if it changes from Draft to Paid, it should not do anything.The core action in this case would be loading the order object from the database, changing the state to Paid, and saving it again (or doing an sql update).
Many options, including:
Workflow tool
AOP
Messaging
DB-layer hooks
The easiest (for me at the time) was a message-based approach. I did a sort-of ad-hoc thing using Struts 2 interceptors, but a cleaner approach would use Spring and/or JMS.
As long as the relevant information is contained in the message, it's pretty much completely open-ended. Having a system accessible via services/etc. means the messages can tap back in to the main app in ways you haven't anticipated.
If you want this to work without system restarts, another option would be to implement handlers in a dynamic language (e.g., Groovy). Functionality can be stored in a DB. Using a Spring factory makes this pretty fun and reduces some of the complexity of a message-based approach.
One issue with a synchronous approach, however, is if a handler deadlocks or takes a long time; it can impact that thread at the least, or the system as a whole under some circumstances.
we have a J2EE app on which we're still working. It runs on Oracle DB. and the business tier is coded with EJB 2.0 with a rich client interface.
now, the application is going to be deployed on multiple sites. and each site will be creating new items (new contracts, etc.).
what we want to realize is to duplicate all new items added to a central location DB using same DB schema as local ones.
what do you think is the most effective way to accomplish this ?
I've thought about serializing all new items created and sending them to the remote site for integration through Java Message Service queue. is that approach good ?
and there's also going to be some changes to be replicated back to the satellites.
I would say that a synchronous relationship with the centre introduces coupling that you don't want. Hence your asynch idea seems pretty good to me. You presumably have some location-dependent identifier in the records so that new contracts creations in the different locations will not clash and you accept some latency in the replication to the centre.
So the simple case is that just use JMS messages from each location to the centre.
The nice thing about this appraoch is that the satelites don't even need to know about the database structure in the centre, it can be designed completely idepedently.
Things get more interesting if you also need to replicate changes back from the centre to the satelites. The big question is whether we might get conflicts between changes at the centre and changes at the satelite.
Simple case: Any data item has one "home". For example the originating satelite is the place where changes are made. Or after creation the centre is the only place ti make changes. In which case we can treat the centre as the "Hub", it can propogate changes out to the satelites. Simple JMS will do just fine for that.
Slightly harder case: Changes can be made anywhere, but only in one place at a time. Hence we can introduce somne kind if locking scheme. I would tend to have the Centre as the owner and use synchrnous web services to lock and update the data. Now we are coupled, but that necessary if we are to have a definitive owner.
Quite complex case: Anyone can changes anything anywhere without locking. It's kind of an "Act First, Apologise Later" approach. We take the optimistic approach that changes won't clash. We can send the changes for approval to the centre and the centre can either use an optimistic locking or merge non-conflicting changes approach. I would tend to do this by queueing changes a the originator, but actually processing them by synchronous calls. So decoupling the specification of change from the availability of the centre. Some more sophisticated Databases have diff/merge capabilities that may help with this.
Big questions are the extent to which you want to be coupled to the availability of the centre and the likelihood of conflicting changes. Quite often cunning application design can greatly reduce the likelihood of conflict.