best practices execute external programs in Java web apps

best practices execute external programs in Java web apps - java

I have a java app (in fact it is grails) I need to execute an external program. Preferably I want my app to be self-contained, i.e. the external scripts/programs to be part of the war file. This external script/program also needs to produce some files.
I guess, my question is if there is some kind of best practices how to do these sort of things so that the final product is not too flaky depending on app permissions and what not?

One of the things you need to ensure that, only one thread executes an instance of your program at a time. so you need some locking and synchronization there.
Imagine a scenario where multiple users/requests/threads trying to execute the same program with different input, that will be a disaster. so you either need to lock the program while one is executing and others wait, or you need to create new instances everytime you want to run the program. you should be very careful about this.
Also, you want to clean up after the program runs and if it produces any output.
You need to be careful if the user can pass malicious commands to your system and tries to hijack other applications.
Overall, you have to be careful about security and correctness (the first scheme i mentioned.)

Security - ensure that your app does not allow for the execution of arbitary (user supplied) code on the host system. Think SQL-Injection style attacks. If you need to pass around data, I suggest inserting it into a database first and then passing the primary key to your external process, this will help avoid buffer overflow type situations.
Robustness - can this program fail, or take along time, or have other unknown side effects. Isolate your main web app from this program by executing from a different thread, or even a different process.
Logging - if you need to collect logging from this external app, you may want to pass in a session id (or equivalent) so you can track back any errors to web sessions.

You could design a small administrative system that will track service requests. It would be a very useful component, as most projects have a purpose like this.
The app should be executed from a service, the request to that service itself should be asynchronous. Also on top of this you can get feedback and track that service status.

Related

Where to syncronize inside a Java WebApp

My employer has currently given me a project that has me scratching my head about synchronization.
I'm going to first talk about the situation I'm in:
I've been asked to create a pdf-report/quotation-tool that takes data (from csv-files; because the actual database the data is on is being used by old IBM software and they for reasons (unknown) don't want any direct access to this database (so instead of making copies of the data to other databases, they apparently found it incredibly fine to just create a folder on the server with loads and loads and loads of CSV-files.)), this piece of software is to load data into the application, query it, transform where needed, do calculations and then return with a pdf-file to the end-user.
The problem here is that getting, querying, and calculating things takes a fair amount of time, the other problem is: they want it to be a WebApp because the business team does not want to install any new software, they're mostly moving towards doing everything online (since the start of the pandemic), it being a WebApp means that every computation has to be done by the WebApp and getting the data likewise.
My question: Is each call to a servlet by a separate user treated as a separate servlet and should I only synchronize the methods on the business logic (getting and using the data); or should I write some code that puts itself in the middle of the servlet, receives a user-id (as reference), that then runs the business-logic in a synchronized-fashion, then receiving data and returning the pdf-file?
(I hope you get the gist of it...)
Everything will run on Apache Tomcat 8 if that helps. Build is Java 11lts.
Sorry, no code yet. But I've made some drawings.

With java web applications, the usual pattern is for the components to not have conversational state (meaning information specific to a specific user's request). If you need to keep state for a user on the server, you can use the http session. With a SPA or Ajax application it's often easier to keep a lot of that kind of state in the browser. The less state you keep on the server the easier things are as your application scales, you don't have to pin sessions to servers (messing up load balancing) or copy lots of session state across a cluster.
For simple (non-reactive) web apps that do blocking i/o, each request-response cycle gets its own dedicated thread from tomcat's pool. That thread delivers the http request to the servlet, handles the business logic and blocks while talking to the database, then carries the http response.
(Reactive webapps are going to be more complex to build, you will need a non-blocking database driver and you will have less choices for databases, so I would steer clear of those, at least for your first web application.)
The threadpool used by tomcat has to protect itself from concurrent access but that doesn't impact your code. Likewise there are 3rd party middletier caching libraries that have to deal with concurrency but you can avoid dealing with it directly. All of your logic is confined to one thread so it doesn't interfere with processing done by other threads unless there are shared mutable data structures. Those data structures would be the part of the application where synchronization might be one of several possible solutions.
Synchronization or other locking schemes are local to one instance of the application. If you want to stand up multiple instances of this application then you need to be aware each one would be locking separately from the others. So for some things it's better to do locking in the database, since that is shared across webapp instances.
If you can make use of a database to store your data, so that you can rely on the database for caching and indexing, then it seems likely your application should be able to avoid having doing a lot of locking.
If you want examples there are a lot of small examples for building web apps using spring at https://spring.io/guides. These are spring boot applications that are self hosted so you can put them together quickly and run them right away.
Going rogue with a database may not be the best course since databases need looking after by DBAs. My advice is put together two project plans, one for using a database, and one for using the flat files. The flat file one will have to allow for addressing issues like handling caching, indexing data, replication of data from the legacy database, and not having standard tools that generate pdfs from sql queries. The alternative plan using a database should have a lot less sorting out of infrastructure and a shorter time til you can get down to cranking out reports.

Securing Java Agent

I am currently working on a legacy application where database calls are kind of scattered all over. I need to execute some logic linked to security (business) every time some sort of DML is executed. For this, I am thinking of using a java agent and intercepting the calls and subsequently executing the business logic.
The issue is that this agent needs to be secured and I need to ensure that there is no way that a different agent developed with a similar but different logic is loaded.
Is there any kind of mutual authentication possible between the application and the java agent which would ensure that a wrong agent is in no way able to be loaded

Is there any kind of mutual authentication possible between the application and the java agent which would ensure that a wrong agent is in no way able to be loaded.
No. The application has (almost) no knowledge of the agent. Certainly there is no way to for the application to properly validate the agent.
I guess, you could design a "protocol" where the application only works if an agent calls a particular application method in a particular way. However that could be circumvented by reverse engineering the application or a real agent, and using that knowledge to write a bad agent that imitates the required behavior.
But I think that you are going about this the wrong way. There are many better (simpler, cleaner, more efficient) ways to inject behavior into a Java application. And I think I can detect that part of your motivation is that you want to modify the legacy application as little as possible. And that is the primary motivation for your complicated agent-based approach.
(My reaction to that is that you are likely to spend more effort on the agent stuff than you would save by not modifying the legacy app. And the result will be much harder to maintain.)
I also suspect that your requirement for mutual authentication of the application and the interceptors (however they are implemented) is not really necessary. A consequence of using agents rather than a fundamental requirement.
If I was doing this and I was this concerned about protecting against (insider) attempts to subvert the business rules, I would:
Choose an alternative mechanism
Implement the new code and modifications to the legacy app as required.
Audit the main application and interceptor logic
Join the two parts together (depending on how the mechanism work)
Put the combined system onto a secured machine or container.

The Java attach API is already secured in the way that an agent must either:
be specified on the command line.
be attached from a JVM running by the same OS user.
To establish both cases, you must own privileges that can already escalate beyond the privileges of the agent running within the JVM process. For these reasons, it should not be necessary to validate your agent from within the JVM.
In theory you could write a Java agent that instruments the instrumentation API and make sure this agent is attached first. The instrumentation API's instrumentation could then check the jar file of the attached agent prior to its attachment and compare it to some known seed, for example. If this seed does not match, you could fail the agent's initialization.

Combining java spring/thread and database access for time-critical web applications

I'm developing an MVC spring web app, and I would like to store the actions of my users (what they click on, etc.) in a database for offline analysis. Let's say an action is a tuple (long userId, long actionId, Date timestamp). I'm not specifically interested in the actions of my users, but I take this as an example.
I expect a lot of actions by a lot of (different) users par minutes (seconds). Hence the processing time is crucial.
In my current implementation, I've defined a datasource with a connection pool to store the actions in a database. I call a service from the request method of a controller, and this service calls a DAO which saves the action into the database.
This implementation is not efficient because it waits that the call from the controller and all the way down to the database is done to return the response to the user. Therefore I was thinking of wrapping this "action saving" into a thread, so that the response to the user is faster. The thread does not need to be finished to get the reponse.
I've no experience in these massive, concurrent and time-critical applications. So any feedback/comments would be very helpful.
Now my questions are:
How would you design such system?
would you implement a service and then wrap it into a thread called at every action?
What should I use?
I checked spring Batch, and this JobLauncher, but I'm not sure if it is the right thing for me.
What happen when there are concurrent accesses at the controller, the service, the DAO and the datasource level?
In more general terms, what are the best practices for designing such applications?
Thank you for your help!

Take a singleton object # apps level and update it with every user action.
This singleton object should have a Hashmap as generic, which should get refreshed periodically say after it reached a threshhold level of 10000 counts and save it to DB, as a spring batch.
Also, periodically, refresh it / clean it upto the last no.# of the records everytime it processed. We can also do a re-initialization of the singleton instance , weekly/ monthly. Remember, this might lead to an issue of updating the same in case, your apps is deployed into multiple JVM. So, you need to implement the clone not supported exception in singleton.

Here's what I did for that :
Used aspectJ to mark all the actions of the user I wanted to collect.
Then I sent this to log4j with an asynchronous dbAppender...
This lets you turn it on or off with log4j logging level.
works perfectly.

If you are interested in the actions your users take, you should be able to figure that out from the HTTP requests they send, so you might be better off logging the incoming requests in an Apache webserver that forwards to your application server. Putting a cluster of web servers in front of application servers is a typical practice (they're good for serving static content) and they are usually logging requests anyway. That way the logging will be fast, your application will not have to deal with it, and the biggest work will be writing a script to slurp the logs into a database where you can do analysis.

Typically it is considered bad form to spawn your own threads in a Java EE application.
A better approach would be to write to a local queue via JMS and then have a separate component, e.g., a message driven bean (pretty easy with EJB or Spring) which persists it to the database.
Another approach would be to just write to a log file and then have a process read the log file and write to the database once a day or whenever.
The things to consider are: -
How up-to-date do you need the information to be?
How critical is the information, can you lose some?
How reliable does the order need to be?
All of these will factor into how many threads you have processing your queue/log file, whether you need a persistent JMS queue and whether you should have the processing occur on a remote system to your main container.
Hope this answers your questions.

Recommendations on providing integration api

Are there any recommendations, best practices or good articles on providing integration hooks ?
Let's say I'm developing a web based ordering system. Eventually I'd like my client to be able to write some code, packaged it into a jar, dump it into the classpath, and it would change the way the software behaves.
For example, if an order comes in, the code
1. may send an email or sms
2. may write some additional data into the database
3. may change data in the database, or decide that the order should not be saved into the database (cancel the data save)
Point 3 is quite dangerous since it interferes too much with data integrity, but if we want integration to be that flexible, is it doable ?
Options so far
1. provide hooks for specific actions, e.g. if this and that occurs, call this method, client will write implementation for that method, this is too rigid though
2. mechanism similar to servlet filters, there is code before the actual action is executed and code after, not quite sure how this could be designed though
We're using Struts2 if that matters.
This integration must be able to detect a "state change", not just the "end state" after the core action executes.
For example if an order changes state from In Progress to Paid, then it will do something, but if it changes from Draft to Paid, it should not do anything.The core action in this case would be loading the order object from the database, changing the state to Paid, and saving it again (or doing an sql update).

Many options, including:
Workflow tool
AOP
Messaging
DB-layer hooks
The easiest (for me at the time) was a message-based approach. I did a sort-of ad-hoc thing using Struts 2 interceptors, but a cleaner approach would use Spring and/or JMS.
As long as the relevant information is contained in the message, it's pretty much completely open-ended. Having a system accessible via services/etc. means the messages can tap back in to the main app in ways you haven't anticipated.
If you want this to work without system restarts, another option would be to implement handlers in a dynamic language (e.g., Groovy). Functionality can be stored in a DB. Using a Spring factory makes this pretty fun and reduces some of the complexity of a message-based approach.
One issue with a synchronous approach, however, is if a handler deadlocks or takes a long time; it can impact that thread at the least, or the system as a whole under some circumstances.

How to detect database events with, for example, Java

Is there a way to detect database events, e.g. insert, update and delete, comparable to file access monitors like JNotify (can detect read, create, modify of files and directories)?
Looking for something like database event listeners because I don't want to do polling.
Thanks!

In general no. AFAIK, no such facility exists in JDBC or in the SQL standards.
It might be possible for certain databases / configurations using database specific functionality. For example, if the database can be configured to run arbitrary Java code in a trigger, you might be able to get it to send an event into a pubsub system that will deliver it to your application code.
But I think it would be better to modify your application code-base to generate the events itself.

It depends on the database you are using, as some allow you to run code that would allow you to make a call to a server, through a trigger, but, then you take a performance hit on these modifications, so you would want to use a webservice that doesn't send any information back to, to limit the performance hit.
It also depends on if you are using a server, as then you could use AspectJ to monitor any of the update queries.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.