When I receive a request on my API, I want to do a series of steps, each being a check or an enrichment. Each step could either succeed or fail. On Success, the next step should be carried out. On Failure, an end-step should be executed, and the flow is done. For that I have considered Spring State Machine, as it seems to fit the bill.
I have read up on the documentation and played around with it, but some things elude me:
Should there be a 1-to-1 relationship between a request and a State Machine, meaning that for every request, I make a new State Machine instance? Or should I somehow reuse a completed State Machine by resetting the machine for the next request?
What about cleanup of completed State Machines? There doesn't seem to be a way to destroy and clean a State Machine instance. If I create 1 per request, I've effectively introduced a memory leak, unless the framework somehow handles resources.
There is no absolutely correct answer to your question so I just need to leave some comments here. State machine as a concept is so loose that it gives you so many different ways to do things.
Whole concept if steps one after another kinda relates to how tasks recipe was implemented. It executes a dag of tasks and if parent task fails machine enters into error state giving user a chance to fix things and request machine to continue. statemachine-recipes-tasks statemachine-examples-tasks. Might be that this kind of use case would be a good candidate to create a new recipe as it is pretty generic.
Framework should clear things after machine has been stopped and eventually jvm should clear garbage. If you find something abnormal, please file a gh issue and we'll fix things.
We have sample statemachine-examples-eventservice which is reusing machines but I'm currently re-implementing that sample(it works but should be implemented better) as I was told by our head-chef that what I did there is dump SPR-15042. Machines cannot be used with a session scope and things go south if rich object(which ssm is) is serialised.
It is relatively easy to do a combination of states and choices which would do your step flow. It's only question how much you want this to be re-usable(thus generic recipe would be a good thing, PR's welcomed :) )
What comes for error handling something I presented in a statechart in gh-240 is also something to consider.
There has been some questions if ssm could work as a more generic flow engine but it's probably something it's never going to be as it would be a completely new project. Thought most of a flows could be handled as a separate recipes.
Related
Recently, I have faced one java interview question. It goes like this : "There are 3 Microservices (flow goes from 1st to 2nd to 3rd) which takes a minimum roundabout of 0.5sec to provide the response. But the web request should get response in 1sec itself. How to achieve this ?"
Any architecture design or pattern or any settings, need to do ?
It's a very vague question and there's no easy direct answer, it's more to identify in which direction you will go, what analisys options will you suggest. Its reliability engineering (SRE) which includes many tricks and approaches.
I would start by analyzing and clarifying what business process is implemented by this requests sequence, think about needless (it happens that not always Developers write correct code, hence some non-needed calls to services, DB etc)
Monitor network latency and identify where is a focus area. If the network takes significant time, then makes sense to improve network hardware or software, look for problems, bad packages when the Client resends data, "Package storm" issue etc. If the network is fine, focus on services.
Then consider caching data from downstream services (In-progress cache or Distributed depends on architecture and data type). This step should be done carefully with a full understanding of data nature, e.g Can it be cached, for which period, which way to use for refreshing/evicting data?
Pay attention to the possibility of code optimisation which is executed. It happens, that Developers don't keep in mind performance during implementation, hence can create functionality with unneeded operations (for example some sorting, filtering, synchronization (with locks), etc).
Part of 3., parallelize everything that is possible inside the code execution (no guarantee it helps), get rid of locks. For example, there can be some dependencies on DB or other sources before/after calls to downstream service, which may lead to unpredicted blockings, in such a situation make sense to do execution of tasks in parallel threads without blocking each other.
If no low-level tricks help, it can ring a bell to revisit the architecture of services, e.g if SLO 1s is very important, then maybe it makes sense to join 1+2 or 2+3 microservices into bigger service to reduce data transformation and transferring between service (need to calc before).
There are much more things to consider, depends how deep you would like to go.
Alright, so I have a Spring application that takes in a Network Representation and boots up virtual machines to represent the network that was passed in.
It uses a low level API to bring up the VMs, there is no database involved.
What I need to figure out how to do is handle the situation where a user submits a 10 node (or any number) network model and the application goes through and builds up the network (starting VMs), if a node fails to startup I want to be able to react to that. I would like to be able to roll back my changes (i.e destroy all nodes that were created).
I've been told that I need to look into "Transactions" but I am unsure whether or not that applies to this scenario when I'm not using a database.
As a side note, I do have logic to take down nodes if a user sends in that request.
My question is -- how do I handle this?
Also, is this the best stack overflow for this question?
It does seem that you are looking for transactional behavior, and specifically, for atomicity ("all or nothing"). But usually "transaction" connotes certain guarantees (particularly around ACID properties) that will be difficult or impossible to achieve where human-level timescales on the order of minutes are involved.
Probably "workflow with compensation for errors" is more what you would be looking for here.
I would implement this manually, perhaps with tool support (e.g. workflow engines). Kick off a process to spawn your network, and keep track of the current progress, such as VMs created, VMs in progress, etc. If there are errors that demand a rollback, then have another process that performs a cleanup. The behavior of the cleanup process itself could fail, so it might retry its various steps a couple times before generating a report that says "this cleanup step failed".
If there are shared resources involved then you would need to implement some kind of isolation mechanism as well. Sometimes this is easy enough--e.g., DHCP helps you avoid duplicate IPs. If you're updating a DNS zone file then you'd want to synchronize access to that to avoid concurrent writes. Etc.
I have to write a program that is thought to run 'forever' , meaning that it won't terminate regularly. Up until now I always wrote programs that would run and be terminated at the end of the day. The program has to do some synchronizations, pause for n minutes and than sync again.
AFAIK there should be no problem with my current implementation and it should theoretically run just fine, but I'm lacking any real-world experience.
So are there any 'patterns' or best practices for writing very robust and resource efficient java programs that have a very long runtime? What could be possible problems after for example a month/year of runtime?
Some background :
Java : 1.7 but compiled down to 1.5
OS : Windows (exact version is not certain yet)
Thanks in advance
Just a brain dump of all the things I've had to keep in mind when writing this kind of app.
Avoid Memory Leaks
I had an app that runs once at mid day, every day, and in that I had a FileWriter. I wasn't closing that properly, and then we started wondering why our virtual machine was going into melt down after a few weeks. Memory leaks can come in the form of anyhing really, with one of the most common examples being that you don't de-reference an object appropriately. For example, using a class's field as a method of temporary storage. Often the class persists, and so does the reference. This leaves you with objects, sitting in memory and doing nothing.
Use the right kind of Scheduler
I used a java Timer in that app, and later I learnt that it's better to use a ScheduledThreadPoolExecutor when another app was changing the System clock. So if you plan on keeping it completely Java based, I would strongly recommend using that over a Timer for all of the reasons detailed in this question.
Be mindful of memory usage and your environment
If your app is loading large amounts of data each and every day, and you have other apps running on the same server, you may want to be careful about the timing. For example, say at mid day, three of the apps run their scheduled operation, I would say running it at any other time would probably be a smart move. Be mindful of the environment in which you're executing your code in.
Error handling
You probably want to configure your app to let you know if something has gone wrong, without the app breaking down. If it's running at a certain time every few hours, that means people are probably depending on it, so I would have a function in your Java code that sends out an email to you, detailing the nature of the exception.
Make it configurable
Again, if it needs to run at various points in the day, you don't want to have to pull the thing down for a few hours to work out some minor changes to your code. Instead, port it into a java Properties file, or into an XML Config (or really, whatever). The advantage of this is that you can update your program and get it up and running before anyone really noticed the difference.
Be afraid of the static keyword
That bad boy will make objects persist, even when you destroy their parent reference. It is the mother of all memory leaks if you are not careful with it. It's fine for constants, and things that you know don't need to change and need to exist within the project to run well, but if you're using it for random values inside a project, you're going to quickly wonder why your app is crashing every few hours rather than syncing.
Props to #X86 for reminding me of that one.
Memory leaks are likely to be the biggest problem. Ensure that there are no long-term references held after an iteration of your logic. Even a relatively small object being referenced forever, will exhaust the memory eventually (and worse, it's going to be harder to detect during testing if the growth rate is 1GB/month). One approach that may help is using the snapshot functionality of profilers: take a snapshot during the pause, let the sync run a few times, and take another snapshot. Comparing these should show the delta between the synchronizations, which should hopefully be zero.
Cache maintenance is another issue. The overall size of a cache needs to be strictly limited (whereas often you can get away without in short-running programs, because everything seen will be small enough to not cause problems). Equally it's more important to do cache-invalidation properly - broadly speaking, everything that gets cached will become stale at some point while your program is still running, and you need to be able to detect this and take appropriate action. This can be tricky depending on where the golden source of the cached data is.
The last thing I'll mention is exception-handling. For short-running processes, it's often enough to simply let the process die when an exception is encountered, so the issue can be dealt with, and the app rerun. With a long-running process you'll likely need to be more defensive than this. Consider running parts of your program in threads, which can be restarted* if/when they fail. You may need a supervisor-type module, which checks that everything else is still heartbeating and reboots it if not. If appropriate to your structure, this is anecdotally a lot easier to achieve with actors-style libraries rather than Java's standard executors. And if it's at all possible, you may want to have hooks (perhaps exposed over JMX/MBeans) that let you modify the behaviour somewhat, to allow a short-term hack/workaround to be affected without having to bring the process down. Though this requires quite some amount of foresight to predict exactly what's going to go wrong in several months...
*or rather, the job can be restarted in another thread
I need to time/performance check a piece of code, in production.
The code has java stack. It most probably has log4j integrated. It interacts with a JMS, sends some request on it and pick some response from it. I need to prove that from the user event i.e. click on the front end to the point where it goes and waits for JMS, it is relatively fast. I need to prove (know) that most of the time that it takes, in the round trip is because it is waiting for some message from the JMS.
I am currently looking at http://perf4j.codehaus.org/devguide.html. However, I would like to poll the group for suggestions. A few restrictions that I need to work with are:
I need something that can be run on production. It needs to be something that I can switch on and off relatively easily.
It needs to be something that can not be too heavy memory / CPU usage wise.
It needs to be something that I can put into the existing code base with least amount of change in the existing code.
So, does anyone have any suggestions apart from http://perf4j.codehaus.org/devguide.html?
Aspects and JVM system arguments (for enabling disabling but requires a restart) or JMX if you need real time on/off.
We are considering development of a mission critical application in Java EE, and one thing that really impressed me is the lack of session isolation in the platform. Let me explain the scenario.
We have a native Windows application (a complete ERP solution) that receives about 2k LoC and 50 bug-fixes per month from sparse contributors. It also supports scripting, so the costumer can add their own logic and we have no clue about what such logic does. Instead of using a thread pool, each server node has a broker and a process pool. The broker receives a client request, enqueues it until a pooled instance is free, sends request to that instance, delivers response to client, and releases the instance back to the process pool.
This architecture is robust because with so many sparse contributions and custom scripting, it's not uncommon for a deployed version to have some serious bug such as an infinite loop, a long-waiting pessimistic lock, a memory corruption or memory leakage. We implemented a memory limit, a timeout for requests, and a simple watchdog. Whenever some process fails to answer correctly and on time, the broker simply kills it, so the watchdog detects and starts another instance. If a process crashes before it started to answer a request, the broker sends the same request to another pooled instance, and the user doesn't know about any failure on the server side (except in admin logs). This is nice because some instances are slowly trashed by bogus code as they work on requests. Because most session data is held at the client or (in rare cases) at a shared storage, it seems to work perfectly.
Now considering a move to Java EE, I couldn't find anything similar on the spec or popular application servers such as Glassfish and JBoss. Yes, I know that most cluster implementations do transparent fail-over with session replication, but we have small companies that use our system on a simple 2-node cluster (and we also have adventurers that use the system on a 1-node server). With a thread pool, I understand that a buggy thread can bring an entire node down, because the server cannot detect and safely kill it. Bringing an entire node down is much worst than killing a single process - we have deployments where each node has about 100 pooled process instances.
I know that IBM and SAP are aware of this problem, based on
http://www.trl.ibm.com/people/kawatiya/pub/Kawachiya07vee.pdf,
and
http://java.sys-con.com/node/47362
, respectively. But based on recent JSRs, forums and open-source tools, there isn't much activity on the community.
Now comes the questions!
If you have a similar scenario and
use Java EE, how did you solve?
Do you know about an upcoming
open-source product or change in
Java EE spec that can address this
issue?
Does .NET have the same problem? Can
you explain or cite references?
Do you know about some modern and
open platform that can address this
issue and is worth the task doing
ERP business logic?
Please, I have to ask you not tell about making more testing or any kind of QA investment, because we cannot force our costumers to make this on their own scripts. We also have cases where urgent bug-fixes must bypass QA, and while we force the customer to accept this, we cannot make him accept that a buggy software part can affect a range of unrelated features. This is issue is about robust architectures, not development process.
Thanks for your attention!
What you have stumbled upon is a fundamental issue regarding the use of Java and "hostile" applications.
It's a fundamental issue not just at the Java EE level, but at the core JVM level. The typical JVMs available have all sorts of issues with loading "unsafe code". From memory leaks, class loader leaks, resource exhaustion, and unclean thread kills, the typical JVM is simply not robust enough to handle badly behaving code well in a shared environment.
A simple example is memory exhaustion of the Java heap. As a basic rule, NOBODY (and by nobody, I specifically mean the core java library and just about every other 3rd party library out there) catches OutOfMemory exceptions. There are the rare few who do, but even they can do little about it. Typical code handles the exceptions they "expect" to handle, but let others fall through. Runtime exceptions (of which OOM is one) will happily bubble up through the call stack all the way to the top, leaving behind a wreckage of unchecked critical path code, leaving all sort of things in unknown state.
Things such as Constructors or static initializers which "can't fail" leaving behind uninitialized class members which are "never null". These damaged classes simply don't know they're damaged. Nobody knows they're damaged, and there's no way to clean them up. A Heap that hits OOM is an unsafe image and pretty much needs to be restarted (unless, of course, you wrote or audited ALL of the code yourself, which, naturally, you won't -- who would?).
Now, there may well be vendor specific JVMs which are better behaved and give you better control. The ones based on the Sun/Oracle JVM (i.e. most of them) do not.
So, it's not necessarily a Java EE issue, it's a JVM issue.
Hosting hostile code in the JVM is a bad idea. The only way it's practical is if you host a scripting language, and that scripting language implements some kind of resource control. That could be done, and you can tweak the existing ones as a start (JavaScript, Groovy, JPython, JRuby). The fact that these languages give users direct access to Java libraries makes them potentially dangerous, so you may have to restrict that as well to only aspects wrapped by script handlers. At this point, though, the "why use Java at all" question floats up.
You'll note Google App Engine does none of these. It spools up a separate JVM for each application that's being run, but even then it greatly restricts what can be done within those JVMs, notably through the existing Java security model. The distinction here is that these instances tend to be "long lived" so as not to endure the processing costs of startup and shutdown. I should say, they SHOULD be long lived, and those that are not do incur those costs.
You can make several instances of the JVM yourself, give them a bit of infrastructure to handle requests for logic, give them custom class loader logic to try and protect from class loader leaks, and minimally let you kill the instances off (they're simply a process) if you want. That can work, and probably work "ok" depending on the granularity of the calls, and the "start up" time for your logic. The start up time will minimally be the loading of the classes for the logic from run to run, that alone may make this a bad idea. And it certainly WON'T be "Java EE". Java EE is not set up to do this kind of thing. But you're not clear what Java EE features you're looking at either.
Effectively, this is what Apache and "mod_php" does. Several instances, as processes, individually handling requests, with badly behaving once being killed off as necessary. This is why PHP is common in the shared hosting business. In this structure, it's basically "safe".
I believe your scenario is highly untypical, thus it is improbable that there is a ready made framework/platform addressing this need. Java EE sort of assumes that the request processing code is written by the same team as the rest of the app, thus it need not be isolated, watched and reset that often, and bug fixes would be handled the same way in all parts of the system. This assumption greatly simplifies development, deployment, testing etc. for most of the projects, not forcing them to pay for something they don't need, And yes, it isn't suitable for everyone. If you want something fundamentally different, you probably need to implement a fair amount of failover logic yourself. Java EE does provide the fundamental building blocks for this though.
I believe (although have no concrete experience to prove it) that .NET or other platforms are basically built on similar assumptions.
We had a similar - though not so severe - port of a really enormous Perl site to Java. On receiving an HTTP request we instantiate a class and call its processRequest method. surrounded by try-catch and time measurement. Adding a timer and thread would suffice to be able to kill the thread. This probably is sufficient in real life.
A Java EE server like glassfish is an OSGi container you might have more isolating means.
Also you could run an array of (web or local) applications on which you dispatch your request via a central web applications. Those applications then are isolated.
Even more isolated are serialized sessions and operating system processes starting a new JVM.