A lot of database migrations in the spring project

A lot of database migrations in the spring project - java

I 'm having a project with more than 50 liquibase migrations.
 
I have tables: Currencies, Countries ... And they are filling in the migrations right now.
The problem is that for each integration test where context is running I have to do all of my 50 migrations. It takes time. And as you know spring is not the fastest framework.
What can I do? The time Gradle is spending for passing all tests is 10 minutes.
Of course, you may say it is the monolith, yes, it is but the customer doesn't want to split up logic because the average level of the team is quite low.
How can I speed up my integration tests?

Depending on the kind of migrations, they may not be an actual performance issue. I'm looking at about 130 migrations in one project at the moment, and while they do take a certain amount of time, it's nothing compared to the time it takes to set up and tear down the test context. Starting from a clean slate I'd expect it to shave off maybe 10-20 seconds at best.
It may make sense to restart for other reasons though. For example we have changesets from 2015 that are rolled back in other changesets, so they're just extra clutter. The documentation isn't very specific about it, but you can remove all changesets and start from the beginning in the middle of a project. However you need to be careful that you then know what the correct state of the database is (without any new changesets you might make). As mentioned in the docs, it usually means the state of the production database.
But remember, this does not guarantee a significant speed-up.

Related

Why should a testdatabase be created/deleted when testing?

I have been assigned with testing a mongodb database from a java backend. I got told that I had to create a database completely utilizing a script for this task.
But I have difficulties understanding the benefit of creating a database from scratch with a script, instead of having a permanent test database. Where I imagine data should be inserted on startup and cleaned on teardown in both cases.
Why is it beneficial from a testing perspective to creating and deleting a database when testing?

Sometimes tests fail and therefore it may happen that the teardown phase will never be reached.
Furthermore, deleting a database is the fastest and most effective way to clean it, although perhaps not the most efficient way to do so. But it guarantees that you do not forget something in your cleanup routine.
And in particular for performance tests it is important that the database is in exact the same state for each run, otherwise the run times cannot be compared with each other: an improvement in a consecutive run could have been caused just because tablespaces were already increased or similar things, and not because the code optimisation worked …

most of the times test means a predefined environment and a expected reaction of that environment against our assumed states. so for verifying it we need a pure automated and repeatable process as much as possible without interference of manual setup or configuration.
In software development process we try to consider as many as possible test cases for QA of product. when we talk about too many test cases each one should be isolated from the others. if it's not isolated well the result may varies in each execution round and eventually invalid testing process.

They need not be. However:
You lose portability.
You don't have a known start state for your test.

Why does Intellij unit test view show a much shorter run time than actual wall clock time of running tests?

The Intellij Run view shows the runtime of each individual test.
I've a module with about 800 tests for which Intellij indicates about 1s500ms. However, the tests clearly take significantly longer to run as measure by wall clock time.
These are pure unit tests, no external system (or file system) connections, so they are 100% CPU bound (barring operating system paging, etc but I've a ton of memory so that's not an issue).
What are the sources of the discrepancy? Does Intellij not take into account setup/teardown and other bootstrapping?
Any way to make it show the wall clock time it took to run all tests?
Any ideas how I can speed the tests up (eg can I make them run in parallel?)
Thank you.

The test runner only counts the setup and run time of the test. There is usually additional time spent for the build. Even if the project and tests have been built already, the build system first needs to figure out if there were any changes. Depending on your build system and the size of your project, this might take significant time. gradle is notorious for this. In Eclipse (back in the day, when I was still using it) this used to be much better, because it builds Java code very efficiently in the background. Modern build systems suffer from these additional delays, because they do not only compile and package, but they also check external dependencies.

How to speed up frequent writing

we created an java agent which does a check on our application suite to see if for instance the parent/child structure is still correct. Therefore it needs to check for 8000+ documents accros several applications.
The check itself goes very fast. We use a navigator to retrieve data from views and only read data from those entries. The problem is within our logging mechanism. Whenever we report a log entry with level SEVERE ( aka: A realy big issue ) the backend document is directly updated. This is becuase we dont want to lose any info about these issues.
In our test runs we see that everything runs smoot but as soon as we 'create' a lot of severe issues the performance drops enormously because of all the writes. I would like to see if there are any notes developers facing the same challenge.. How couuld we speed up the writing without losing any data?
-- added more info after comment from simon --
Its a scheduled agent which runs every night to check for inconsistencies. Goal is ofcourse to find inconsistencies and fix the cause and to eventualy have no inconsistencies reported at all.

Its a scheduled agent which runs every night to check for
inconsistencies.
OK. So there are a number of factors to take into account.
Are there any embedded Jars? When an agent has embedded jars the server has to detach them from the agent to the disk before they can run the code. This is done every time the agent executes. This can be a performance hit. If your agent spawns a number of times, remove the embedded jars and put them into the lib\ext folder on the server instead (requires server restart).
You mention it runs at night. By default general housekeeping processes run at night. Check the notes ini for Server Tasks scheduled and appraise what impact they have on the server/agent when running. For example:
ServerTasksAt1=Catalog,Design
ServerTasksAt2=Updall
ServerTasksAt5=Statlog
In this case if ran between 2-5 then UPDALL could have an impact on it. Also check program documents for scheduled executions.
In what way are you writing? If you are creating a document for each incident and the document contents is not much then the write time should be reasonable. What is liable to be a hit in performance is one of the following.
If you are multi threading those writes.
Pulling a log document, appending a line, saving and then repeating.
One last thing to think about. If you are getting 3000 errors, there must be a point where X amount of errors means that there is no point continuing and instead to alert the admin via SNMP/email/etc? It might be worth coding that in as well.
Other then that, you should probably post some sample code in relation to the write.

Hmm, difficult or general question.
As far as I understand, you update the documents in the view you are walking through. I would set view.AutoUpdate to false. This ensures that the view is not reloaded while you are running your code. This should speed up your code.
This is an extract from the Designer help:
Avoid automatically updating the parent view by explicitly setting
AutoUpdate to False. Automatic updates degrade performance and may
invalidate entries in the navigator ("Entry not found in index"). You
can update the view as needed with Refresh.
Hope that helps.
If that does not help you might want to post a code fragment or more details.

Create separate documents for each error rather than one huge document.
or
Write to a text file directly rather than a database and then pulling if necessary into a document. This should speed things up considerably.

SOAP Web service, Performance test possible?

I have a SOAP Web Service written in Java and using Spring-ws.
I need to know that if this can handle 2 million requests per day, and how its performance would be.
-To what extend the massive usage performance is related with my java code and architecture, anything I can improve on it?
-And which extend this is related with the Application Server I use, which app server should I use, what are the limitations, or settings..how can I set and test this performance?
Thanks

With SOAP having an architectural underpinning in the HTTP protocol there are literally dozens of commercial and open source tools which you can use to perform your load and scalability tests.
What you will want to do is make sure that whatever tool you select meets the technical needs of exercising your interface correctly, is a good match your your in house skills and contains the reporting that you need to determine success or failure in your testing efforts. If you get a tool which is a mismatch on any of the three fronts then you may as well have purchased the most expensive tool on the market and hired the most expensive consulting form to use them....sort or like driving nails with the butt end of a screwdriver instead of using the proper tool (a hammer).
Most of the commercial tools on the market today have lease/rental options for short term use, and then there are the open source varieties as well. Each tool has an engineered efficiency associated with core tasks such as script construction, test construction, test execution and analysis which is distinct to the tool. The commercial tools tend to have a more balanced weight across the tasks while the open source ones tend to have higher LsOE required on the edge tasks of script creation and analysis.
Given that you are going to be working with at least a couple of million samples (assuming you will want to run for at least 24 hours) then you need to make sure that the analysis tools have a demonstrable track record with large data sets. The long standing commercial performance test tools all have demonstrable track records at this level, the open source ones are hit and miss and in some cases analysis becomes a roll-your-own proposition against logged response time data. You can see where you can burn lots of time building your own analysis engine here!
What you want to do is technically possible. You may want to re-examine your performance requirements, here is why. I happen to be working with an organization that today uses a web services interface to service the needs of clients around the world. Their backend archive of transactional data is approaching 250TB of data from over ten years worth of work. On an hourly basis over the past year the high water mark was around 60K requests per hour. Projected over a 24 hour basis this still works out to less than your 2 million requests per day. If you test to this level and you find issues are you finding genuine issues or are you finding engineering ghosts, things that would never occur in production due to the differences in production versus test volumes. Modeling your load properly is always difficult, but the time spent nailing the load model for your mix of transactions and proper volume will be time well spent in not using your developer skills to chase performance ghosts and burning budget while doing so.
Good luck!

You could also use the StresStimulus which is a plugin for Fiddler

Keep in mind that 1 day (24 hours) is a large sample. I suspect that 2,000,000 hits won't be evenly spread across the 24 hours. Try to understand what the traffic pattern will look like. If 1/2 of the traffic comes between the hours of 1am and 4am (for example) you'll want to test for 1M hits in 3 hours as well as 2M in 24.

SOAPUI will be able to help you in load testing your web service bu computing the soap message from the wsdl:
http://www.soapui.org/
Apache JMeter, which is a performance/load test tool will help you load test it with either soap sampler or regular HTTP Sampler :
http://jmeter.apache.org/

Long-running stats process - thoughts on language choice?

I am on a LAMP stack for a website I am managing. There is a need to roll up usage statistics (a variety of things related to our desktop product).
I initially tackled the problem with PHP (being that I had a bunch of classes to work with the data already). All worked well on my dev box which was using 5.3.
Long story short, 5.1 memory management seems to suck a lot worse, and I've had to do a lot of fooling to get the long-term roll up scripts to run in a fixed memory space. Our server guys are unwilling to upgrade PHP at this time. I've since moved my dev server back to 5.1 so I don't run into this problem again.
For mining of MySQL databases to roll up statistics for different periods and resolutions, potentially running a process that does this all the time in the future (as opposed to on a cron schedule), what language choice do you recommend? I was looking at Python (I know it more or less), Java (don't know it that well), or sticking it out with PHP (know it quite well).
Edit: design clarification for commenter
Resolutions: The way the rollup script works currently, is I have some classes for defining resolutions and buckets. I have year, month, week, day -- given a "bucket number" each class gives a start and end timestamp that defines the time range for that bucket -- this is based on arbitrary epoch date. The system maintains "complete" records, ie it will complete its rolled up dataset for each resolution since the last time it was run, currently.
SQL Strat: The base stats are located in many dissimilar schemas and tables. I do individual queries for each rolled up stat for the most part, then fill one record for insert. Your are suggesting nested subqueries such as:
INSERT into rolled_up_stats (someval, someval, someval, ... ) VALUES (SELECT SUM(somestat) from someschema, SELECT AVG(somestat2) from someschema2)
Those subqueries will generate temporary tables, right? My experience is that had been slow as molasses in the past. Is it a better approach?
Edit 2: Adding some inline responses to the question
Language was a bottleneck in the case of 5.1 php -- I was essentially told I made the wrong language choice (though the scripts worked fine on 5.3). You mention python, which I am checking out for this task. To be clear, what I am doing is providing a management tool for usage statistics of a desktop product (the logs are actually written by an EJB server to mysql tables). I do apache log file analysis, as well as more custom web reporting on the web side, but this project is separate. The approach I've taken so far is aggregate tables. I'm not sure what these message queue products could do for me, I'll take a look.
To go a bit further -- the data is being used to chart activity over time at the service and the customer level, to allow management to understand how the product is being used. You might select a time period (April 1 to April 10) and retrieve a graph of total minutes of usage of a certain feature at different granularities (hours, days, months etc) depending on the time period selected. Its essentially an after-the-fact analysis of usage. The need seems to be tending towards real-time, however (look at the last hour of usage)

There are a lot of different approaches to this problem, some of which are mentioned here, but what you're doing with the data post-rollup is unclear...?
If you want to utilize this data to provide digg-like 'X diggs' buttons on your site, or summary graphs or something like that which needs to be available on some kind of ongoing basis, you can actually utilize memcache for this, and have your code keep the cache key for the particular statistic up to date by incrementing it at the appropriate times.
You could also keep aggregation tables in the database, which can work well for more complex reporting. In this case, depending on how much data you have and what your needs are, you might be able to get away with having an hourly table, and then just creating views based on that base table to represent days, weeks, etc.
If you have tons and tons of data, and you need aggregate tables, you should look into offloading statistics collection (and perhaps the database queries themselves) to a queue like RabbitMQ or ActiveMQ. On the other side of the queue put a consumer daemon that just sits and runs all the time, updating things in the database (and perhaps the cache) as needed.
One thing you might also consider is your web server's logs. I've seen instances where I was able to get a somewhat large portion of the required statistics from the web server logs themselves after just minor tweaks to the log format rules in the config. You can roll the logs every , and then start processing them offline, recording the results in a reporting database.
I've done all of these things with Python (I released loghetti for dealing with Apache combined format logs, specifically), though I don't think language is a limiting factor or bottleneck here. Ruby, Perl, Java, Scala, or even awk (in some instances) would work.

I have worked on a project to do a similar thing in the past, so I have actual experience with performance. You would be hard pressed to beat the performance of "INSERT ... SELECT" (not "INSERT...VALUES (SELECT ...)". Please see http://dev.mysql.com/doc/refman/5.1/en/insert-select.html
The advantage is that if you do that, especially if you keep the roll-up code in MySQL procedures, is that all you need from the outside is just a cron-job to poke the DB into performing the right roll-ups at the right times -- as simple as a shell-script with 'mysql <correct DB arguments etc.> "CALL RollupProcedure"'
This way, you are guaranteeing yourself zero memory allocation bugs, as well as having decent performance when the MySQL DB is on a separate machine (no moving of data across machine boundary...)
EDIT: Hourly resolution is fine -- just run an hourly cron-job...

If you are running mostly SQL commands, why not just use MySQL etc on the command line? You could create a simple table that lists aggregate data then run a command like mysql -u[user] -p[pass] < commands.sql to pass SQL in from a file.
Or, split the work into smaller chunks and run them sequentially (as PHP files if that's easiest).
If you really need it to be a continual long-running process then a programming language like python or java would be better, since you can create a loop and keep it running indefinitely. PHP is not suited for that kind of thing. It would be pretty easy to convert any PHP classes to Java.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.