I have a simple process like this:
The process doesn't have any user tasks, there are only some service tasks but the process will be started a lot of times and performance issue is important.
I set HistoryLevel to none for better performance and it was effective in a load test.
I have a question which I couldn't find in searching on the web.
Is there any way for disabling Runtime DB in Camunda? I'm not sure this is a rational goal or not but I want to know about that.
The process models are not read from the classpath, but deployed to the database and read from there. So Camunda requires a relation database even if you disable the history and have no asynchronous continuations or wait states in your process model.
However, if you do not require persistence at all, then you can simply configured an in-memory database, such as the H2 database Camunda ships for development purposes in its different distributions. You can switch the database url to e.g. jdbc:h2:mem:camunda-db (see https://www.h2database.com/html/features.html#in_memory_databases) to switch to an in-memory configuration.
24 Hour Fitness is running millions of process instances daily using a similar approach. You may be interested in this talk they gave at CamundaCon 2020.1:
https://vimeo.com/440715573
Related
I have a situation wherein as part of a online transaction, I have to save some data into other database, a slight latency (few seconds) in updating the other database is fine. Now since both databases are Oracle, I have below 3 options, I need some insight as which one is better.
Oracle Database Links: Wherein I convert the SQL into PL/SQL and make my database take care of writing into another Oracle based database for DEV env both the databases are in same server as different schema while in production they happen to be two separate ORACLE RACs separated by a few routers and switches.
Spring Batch: Use a batch job somehow to pick the transactions from my source database and process and write into another target database. This way my online transactions would not fail it other database ever goes down or hits a perf issue or face a network issue. And if they ever fail I can code for job restart ability. Is Spring batch well suited for such event publishing case? Would I hit any challenge in future?
2-Phase-Commit: I simply implement 2PC and save the data in both the database in a transaction. Or maybe make it look more future proof and save in a messaging system and my source database.
I have a terminal server monitor project. In the backend, I use the Spring MVC, MyBatis and PostgreSQL. Basically I query the session information from DB and send back to front-end and display it to users. But there is some large queries(like searching total users, total sessions, etc.), which slow down the system when user opens the website, So I want to do these queries as asynchronous tasks so the website could be opened fast rather than waiting for the query. Also, I would check terminal server state periodically from DB(every hour), and if terminal server fails or average load is too high, I would notifying admins. I do not know what should I use, maybe AKKA, or any other way to do these two jobs(1.do the large query asynchronously 2. do some periodical query)? Please help me, thanks!
You can achieve this using Spring and caching where necessary.
If the data you're displaying is not required to be "in real-time", but it can be "near real-time" you can read the data from the DB periodically and cache it. Your app then reads from the cache.
There's different approaches you can explore.
You can try to create a materialized view in PostgreSQL which will hold the statistic data you need. Depending on your requirements you have to see how to handle refresh intervals etc.
Another approach is to use application level cache - you can leverage Spring for that(Spring docs). You can populate the cache on start up and refresh it as necessary.
The task that runs every hour can be implemented again leveraging Spring (Spring docs) #Scheduled annotation.
To answer your question - don't use Akka - you have all the tools necessary to achieve the task in the Spring ecosystem.
Akka is not very relevant here, it is for event-driven programming model which deals with concurrency issues to build highly scalable multithreaded applications.
You can use Spring task scheduler for running heavy queries periodically. If you want to keep it simple, you can solve your problem by simply storing the data like total users, total sessions etc, in the global application context. And periodically update this data from database using spring scheduler. You can also store the same in a separate database table, so that this data can be easily loaded at the initialization time.
I really don't see why you need "memcached", "materialized views", "Websockets" and other heavy technologies and frameworks, for a caching a small set of data. All you need is maintain a set of global parameters in your application context, keep them updated using a scheduled task as frequently as desired.
I'm trying to determine all the things I need to consider when deploying jobs to a clustered environment.
I'm not concerned about parallel processing or other scaling things at the moment; I'm more interested in how I make everything act as if it was running on a single server.
So for I've determined that triggering a job should be done via messaging.
The thing that's throwing me for a loop right now is how to utilize something like the Spring Batch Admin UI (even if it's a hand rolled solution) in a clustered deployment. Getting the job information from a JobExplorer seems like one of the keys.
Is Will Schipp's spring-batch-cluster project the answer, or is there a more agreed upon community answer?
Or do I not even need to worry because the JobRepository will be pulling from a shared database?
Or do I need to publish job execution info to a message queue to update the separate Job Repositories?
Are there other things I should be concerned about, like the jobIncrementers?
BTW, if it wasn't clear that I'm a total noob to Spring batch, let it now be known :-)
Spring XD (http://projects.spring.io/spring-xd/) provides a distributed runtime for deploying clusters of containers for batch jobs. It manages the job repository as well as provides way to deploy, start, restart, etc the jobs on the cluster. It addresses fault tolerance (if a node goes down, the job is redeployed for example) as well as many other necessary features that are needed to maintain a clustered Spring Batch environment.
I'm adding the answer that I think we're going to roll with unless someone comments on why it's dumb.
If Spring Batch is configured to use a shared database for all the DAOs that the JobExplorer will use, then running is a cluster isn't much of a concern.
We plan on using Quarts jobs to create JobRequest messages which will be put on a queue. The first server to get to the message will actually kick off the Spring Batch job.
Monitoring running jobs will not be an issue because the JobExplorer gets all of it's information from the database and it doesn't look like it's caching information, so we won't run into cluster issues there either.
So to directly answer the questions...
Is Will Schipp's spring-batch-cluster project the answer, or is there a more agreed upon community answer?
There is some cool stuff in there, but it seems like over-kill when just getting started. I'm not sure if there is "community" agreed upon answer.
Or do I not even need to worry because the JobRepository will be pulling from a shared database?
This seems correct. If using a shared database, all of the nodes in the cluster can read and write all the job information. You just need a way to ensure a timer job isn't getting triggered more than once. Quartz already has a cluster solution.
Or do I need to publish job execution info to a message queue to update the separate Job Repositories?
Again, this shouldn't be needed because the execution info is written to the database.
Are there other things I should be concerned about, like the jobIncrementers?
It doesn't seem like this is a concern. When using the JDBC DAO implementations, it uses a database sequence to increment values.
I am looking for a pattern and/or framework which can model the following problem in an easily configurable way.
Every say 3 minutes, I needs to have a set of jobs kick off in a web application context that will concurrently hit web services to obtain the latest version of data, and push it off to a database. The problem is the database will be being heavily used to read the data from to do tons of complex calculations on the data. We are currently using spring so I have been looking at Spring Batch to run this process does anyone have any suggestions/patterns/examples of using Spring or other technologies of a similar system?
We have used ServletContextlisteners to kick off TimerTasks in our web applications when we needed processes to run repeatedly. The ServletContextListener kicks off when the app server starts the application or when the application is restarted. Then the timer tasks act like a separate thread that repeats your code for the specified period of time.
ServletContextListener
http://www.javabeat.net/examples/2009/02/26/servletcontextlistener-example/
TimerTask
http://enos.itcollege.ee/~jpoial/docs/tutorial/essential/threads/timer.html
Is refactoring the job out of the web application and into a standalone app a possibility?
That way you could stick the batch job onto a separate batch server (so that the extra load of the batch job wouldn't impact your web application), which then calls the web services and updates the database. The job can then be kicked off using something like cron or Autosys.
We're using Spring-Batch for exactly this purpose.
The database design would also depend on what the batched data is used for. If it is for reporting purposes, I would recommend separating the operational database from the reporting database, using a database link to obtain the required data from the operational database into the reporting database and then running the complex queries on the reporting database. That way the load is shifted off the operational database.
I think it's worth also looking into frameworks like camel-integration. Also take a look at the so called Enterprise Integration Patterns. Check the catalog - it might provide you with some useful vocabulary to think about the scaling/scheduling problem at hand.
The framework itself integrates really well with Spring.
What can be the pin points when executing performance test on an application in Java, which uses Hibenate as ORM Tool and Oracle 11G as database.
I am also thinking of bench-marking the applcation. So for that what should i do.
Thanks
The key things are:
1) as far as practical test the application using real-world usage scenarios - this can be rather complicated in practice - I've used Perl scripts based on www::mechanize and http::recorder for this in the past.
2) failing that ab or jmeter
3) record as mauch as possible (you don't mention what webserver you are using - if its apache, add %D in the logs)
4) make sure you saturate the system - you want to make sure you're getting some major garbage collections (or prove its homoeostatic - which is a very rare thing for a Java program)
5) analyse the webserver and gc logs.
The first place to start is to agree what is acceptable performance. Without that agreement, anything else is premature.
Different application types will have different pain points. Mix of read and writes, concurrent updates (especially on the same data - eg selling concert tickets or airplane seats), data volumes.
Not sure to what extent your app "uses Oracle 11G as database" or even what type of environment you have (i assume typical oltp), but from the Oracle side you can do several things (to name a few):
From an overall db standpoint, look into AWR (Automatic Workload Repository, formerly statspack). I believe this is built into Enterprise Manager as well.
SQL Trace + tkprof.
If using any pl/sql, DBMS_HPROF (Hierarchical profiler).
If using any pl/sql, log significant actions to log tables (via autonomous transactions), recording timestamps of each entry, action taken, etc. Roll your own or use an existing framework (there are several out there). Just make sure its flexible (can change level of logging output).
Have Hibernate log all executed SQL.
Check this link for configuration properties and set hibernate.show_sql to true. Once you can see what's being executed, check any unusual statements and profile them if you suspect they're slower than expected.
Afterwards checking their execution plan and tweaking them to be more optimized will help your application, and your database.
Not sure if using different technologies (e.g. Hibernate in your case) would change the way you would perform the performance test of an application.
There are standrard tools to run the performance test and they should be applicable with the technologies being used by your application too.
Making show_sql true would definitely help in looking at the queries and analyzing them further, but may not help in overall performance test of you application.
Look at the following post Java benchmarking tool for benchmarking tools.
Hope this helps.