I have a Java service which now will execute in a batch mode. Multi threaded support is added to the service so for every batch request a thread pool will be dedicated to execute the batch. The question is how do I test this? I have functional tests that pass under the threaded version of the service but, somehow, I feel there must be an idiom for testing this.
There really isn't a "good" way to do this. The best thing I can suggest would be TestNG, which allows you to annotate your test methods and cause them to be executed in n threads concurrently. For example:
#Test(invocationCount=10, threadPool=10)
public void testSomethingConcurrently() {
...
}
My TestNG knowledge is rusty at best, but AFAIK that should invoke the testSomethingConcurrently method 10 times concurrently. This is a nice and declarative way to run multi-threaded tests against your code.
You can of course do something similar in JUnit by spawning threads manually in a loop, but that's insanely ugly and difficult to work with. I had to do this once for a project I was working on; those tests were a nightmare to deal with since their failures weren't always repeatable.
Concurrency testing is difficult and prone to frustration, due to its non-deterministic nature. This is part of why there is such a big push these days to use better concurrency abstractions, ones which are easier to "reason about" and convince one's self of correctness.
Usually the things that bring down multithreaded applications are issues of timing. I suspect that to be able to perform unit testing on the full multithreaded environment would require huge changes to the code base to do that.
What you may well be able to do, though, is to test the implementation of the thread pool in isolation.
By substituting the body of the threads with test code you should be able to construct your pathological conditions of locking and resource usage.
Then, unit test the functional elements in a single threaded environment, where you can ignore timing.
While this isn't perfect it does guarantee repeatability which is of great importance for your unit testing. (as alluded to in Daniel Spiewak's answer)
I used
#Test(threadPoolSize = 100, invocationCount = 100)
#DataProvider(parallel = true)
executed 100 threads in parallel 100 times.
I agree with Daniel, concurrency testing is indeed very difficult.
I don't have a solution for concurrency testing, but I will tell you what I do when I want to test code involving multi-threading.
Most of my testing is done using JUnit and JMock. Since JMock doesn't work well with multiple threads, I use a extension to JMock that provides synchronous versions of Executor and ScheduledExecutorService. These allow me to test code targeted to be run by multiple threads in a single thread, where I'm also able to control the execution flow. As I said before, this doesn't test concurrency. It only checks the behavior of my code in a single threaded way, but it reduces the amount of errors I get when I switch to the multi thread executors.
Anyway I recommend the use of the new Java concurrency API. It makes things a lot more easy.
Related
If I want to benchmark methods in my Java class, which one is a better choice? What are the pros/cons between the two?
I have a lot of experience with both JMeter and JMH. I would say there are many cases where you can use either of them, but in scope of Java class benchmark testing, only JMH is really relevant, you can't do class-level benchmarking with JMeter, it will only be a crude approximation. Or it may work if your class is doing some long operations, where precision is not that important. Reasons:
JMeter's own overhead is significant. For example I have a JMH test that tests function X, and it averages at about 0.01ms. I also use this function in JMeter to set things up in custom Java sampler. That sampler typically shows average of about 20ms. That's 200 times difference, just as JMeter overhead.
With JMeter you cannot avoid JVM optimization, which makes benchmark effectively unrealistic (although due to problem #1 it may not matter at all). JMH has a mechanism to prevent this pitfall. I don't think in JMeter you can solve that issue in any clean way.
I also disagree with Dmitri T's notion that it has "Limited number of modes" - it actually has more relevant modes than JMeter, and they are easier to setup:
first of all it has several modes you can choose from (including "time to hold the load"). It also has an ability to use all modes at once (with separate results), so you don't have to choose a single mode, and you specify this by annotation or command-line parameter alone. In JMeter it's only possible with additional development (e.g. separate thread group or test).
JMH doesn't have ramp-up, but it has warmup, which allows to exclude initial executions from the result, thus making sure results are clean from the initial startup noise, which essentially the same goal as ramp-up
there's definitely ways to control iterations: number, their time, and so on; via annotation, or command line.
Also there's quite a few things that are very easy in JMH , where's in JMeter they require a lot of various workarounds. For example:
synchronizing iterations is a matter of annotation in JMH, but would require a careful setup in JMeter.
asymetric test, which allows you to test, for example, producer/consumer model at the same time, but measure them independently. In JMH you write your tests, mark them with annotation and you are done. In JMeter you'd need significant overhead to set it up right
In terms of reporting, JMH, much like JMeter has plugins for Jenkins and TeamCity, which produces result tables and graphs. Also it can publish results in various formats, which can be consumed, processed or stored by other tools.
So if JMH is so great, then what's JMeter is good for?
Mainly for testing various network protocols, while JMH was not built for that use case. This is where you probably don't care about JMeter's overhead or JVM optimization, and can take advantage of built-in samplers in JMeter. Nothing prevents you from testing any network protocol with JMH of course (as long as you use a proper library). But in JMeter you are relieved from writing a custom code to handle a communication protocols.
You can't/don't want to write Java code. In JMeter you can express your logic visually, which makes it possible for people who don't code to write the tests (although you still may need to control the logic of the test with some programming concepts, like loops or timers, and you may need some scripting in pre/post processing). Visual recording can also be attractive if you can use it (that is if you record HTTP test).
You may also feel that JMeter tests are on "functional", while JMH tests are on "unit" test level. But that's quite subjective.
If it's really your Java class and all you need is to measure number of operations per unit of time - go for JMH as it is more lightweight solution which can be integrated directly into your application build system (especially if you use Maven)
Pros:
Simplicity (given you're a Java developer)
Small footprint
Cons:
Limited number of modes (workload models) supported
Very limited reporting options which doesn't tell much to non-technical person
Not many integration options
JMeter is a multiprotocol load testing tool which can be used for assessing certain functions performance as well, there are 2 mechanisms available:
JSR223 Sampler and Groovy language - where you can write arbitrary code
JUnit Request sampler which detects JUnit tests in libraries in JMeter Classpath so you can execute them with increased number of threads.
Pros:
Extremely powerful in terms of workload models definition, you can configure threads ramp-up, ramp-down, iterations, time to hold the load, etc.
Can be run from command-line, IDE, Ant task, Maven plugin, etc.
Test report can be exported as CSV, HTML, charts, HTML Reporting Dashboard
Cons:
If you don't have previous JMeter experience you will need to learn another tool
Relatively large footprint (at least + ~100 MB of JVM heap) and reporting subsystem can negatively impact your test results, i.e. you will have higher throughput with JMH on the same hardware assuming you use 90% of available OS resources.
I'm opening this questions since I can't find easy to understand summarized information about this topic. There isn't even a good youtube-video that explains this.
I'm currently studying realtime programming and statical- and dynamical scheduling is a part of it. I just can't seem to get my head around it.
If there is someone who can explain the advantages and disadvantages with statical- and dynamical scheduling in a educational way, that would really be helpful.
What I've got so far is the following:
Statical scheduling:
Is a off-line approach where a schedule is generated manually. It can be modified during run-time, but isn't suggested because it then can cause the threads to miss it's deadlines. It's easy to implement and to analyze. Because it's easy to analyze it's easy to see if the system is going to make all of its deadlines.
Dynamical scheduling:
Is a on-line approach where the schedule is generated automatically. It can be modified during run-time by the system and it should't cause (in most cases) the threads to miss its deadlines. If the system changes it's easy to generate a new schedule since it's automatically generated. There isn't a guarantee that the system meets all its deadlines.
Anyone that can explain these two a bit better than me? Or perhaps add more information about these two. Perhaps illustrate it with a image so it'll be easier to wrap my head around it.
In simple terms,
Static Scheduling is the mechanism, where we have already controlled the order/way that the threads/processes are executing in our code (Compile time). If you have used any control(locks, semaphores, joins, sleeps) over threads in your program (to achieve some goal), then you have intended to use static (compile time) scheduling.
Dynamic Scheduling is the mechanism where thread scheduling is done by the operating systems based on any scheduling algorithm implemented in OS level. So the execution order of threads will be completely dependent on that algorithm, unless we have put some control on it (with static scheduling).
I think the term 'advantages' would not be the best term here. Simply when you are implementing any control over threads with your code to achieve some task, you should make sure that you have used minimal controls and also in most optimized way. :))
Addition:
Comparison between Static & Dynamic Scheduling
Generally we would never have a computer program which would completely depend on only one of Static or Dynamic Scheduling.
Instead we would have some programs which are pretty much in controlled from the code itself (Strongly static). This would be a good example for that.
And some programs would be strongly dynamic (weakly static). This would be a good example for that. There you might see other than the start of 2 threads, rest of the program execution would be a free flyer.
Please don't try to find a disclaimer criteria which would seal a program either a strongly static or strongly dynamic one. :))
Positives & Negatives
Dynamic Scheduling scheduling is faster in execution than static scheduling, since it's basically a free flyer without any intentional waits, joins etc. (any kind of synchronization/protection between threads).
Dynamic Scheduling is not aware of any thread dependencies (safeness, synchronization etc.). If you followed above sources I mentioned, you would probably have the idea.
So generally, how good multi-threading programmer you are, would depend on how limited restrictions, dependencies, bottlenecks you have implemented on your threads yet to achieve your task successfully. :))
I think I have covered quite a things. Please raise me questions if any. :))
Dynamic Scheduling –
o Main advantages (PROs):
- Enables handling cases of dependence unknown at compile time
- Simplifies compiler
- Allows compiled code to run efficiently on a different pipeline
o Disadvantages (CONs):
- Significant increase in hardware complexity
- Increased power consumption
- Could generate imprecise exceptions
During Static scheduling the order of the thread or processes is already controlled by the compiler . So it occurs at the compile time.
Here if there is a data dependency involving memory then it wouldn't be solved or recognised at compile time therefore the concept of Dynamic scheduling was introduced .
Dynamic scheduling also determines the order of excecution but here the hardware does this rather than the compiler.
I know support exists for running JUnit or TestNG test suites in parallel, but it requires specific configuration (such as specifying thread counts, for example) and most importantly do not prevent race conditions in non-thread-safe code.
Are there any tools for the JVM which transparently (ie, without explicit configuration) allocate individual tests to different CPU cores (using different threads in the same process or different processes), while preventing race conditions regardless of thread-safety?
If no such tool exists, what would be the best approach to implement one?
My strong suggestion would be to run your tests concurrently with the usual JUnit / TestNG tools.
The reason is simple: If a test fails due a race condition, then the test has done it's job perfectly - it has identified a bug in your design, code or concurrency assumptions that you should fix.
Anything that is non-thread-safe that is used simultaneously by multiple test threads (e.g. a mutable static singleton object that is being used on a global basis) is probably a design flaw - you should either make it thread safe or initialise it separately each time as a local object.
All,
Recently I developed a code that supposedly is a thread-safe class. Now the reason I have said 'supposedly' is because even after using the sync'ed blocks, immutable data structures and concurrent classes, I was not able to test the code for some cases because of the thread scheduling environment of JVM. i.e. I only had test cases on paper but could not replicate the same test environment. Is there any specific guidelines or something the experienced members over here who can share about how to test a multi-threaded environment.
First thing is, you can't ensure only with testing that your class is fully thread-safe. Whatever tests you run on it, you still need to have your code reviewed by as many experienced eyes as you can get, to detect subtle concurrency issues.
That said, you can devise specific test scenarios to try to cover all possible inter-thread timing scenarios, as you did. For ideas on this (and for designing thread-safe classes in general), it is recommended to read Java Concurrency in Practice.
Moreover, you can run stress tests, executing many threads simultaneously over an extended period of time. The number of threads should be way over the reasonable limit to make sure that thread contention happens often - this raises the chances of potential concurrency bugs to manifest over time.
Also, another thing I would recomend is for you to use code coverage measuring tools and set a high standar as your goal. For example, set a high goal for modified condition/decision coverage.
We use GroboUtils to create multi threaded tests.
If you have code that you plan to test in order to make it reliable, then make it single threaded.
Threading should be reserved for code that either doesn't particularly need to work, or is simple enough to be statically analysed and proven correct without testing.
I was just wondering whether we actually need the algorithm to be muti-threaded if it must make use of the multi-core processors or will the jvm make use of multiple core's even-though our algorithm is sequential ?
UPDATE:
Related Question:
Muti-Threaded quick or merge sort in java
I don't believe any current, production JVM implementations perform automatic multi-threading. They may use other cores for garbage collection and some other housekeeping, but if your code is expressed sequentially it's difficult to automatically parallelize it and still keep the precise semantics.
There may be some experimental/research JVMs which try to parallelize areas of code which the JIT can spot as being embarrassingly parallel, but I haven't heard of anything like that for production systems. Even if the JIT did spot this sort of thing, it would probably be less effective than designing your code for parallelism in the first place. (Writing the code sequentially, you could easily end up making design decisions which would hamper automatic parallelism unintentionally.)
Your implementation needs to be multi-threaded in order to take advantage of the multiple cores at your disposal.
Your system as a whole can use a single core per running application or service. Each running application, though, will work off a single thread/core unless implemented otherwise.
Java will not automatically split your program into threads. Currently, if you want you code to be able to run on multiple cores at once, you need to tell the computer through threads, or some other mechanism, how to split up the code into tasks and the dependencies between tasks in your program. However, other tasks can run concurrently on the other cores, so your program may still run faster on a multicore processor if you are running other things concurrently.
An easy way to make you current code parallizable is to use JOMP to parallelize for loops and processing power intensize, easily parellized parts of your code.
I dont think using multi-threaded algorithm will make use of multi-core processors effectively unless coded for effectiveness. Here is a nice article which talks about making use of multi-core processors for developers -
http://java.dzone.com/news/building-multi-core-ready-java