How to create a test environment for a multi-threaded application - java

All,
Recently I developed a code that supposedly is a thread-safe class. Now the reason I have said 'supposedly' is because even after using the sync'ed blocks, immutable data structures and concurrent classes, I was not able to test the code for some cases because of the thread scheduling environment of JVM. i.e. I only had test cases on paper but could not replicate the same test environment. Is there any specific guidelines or something the experienced members over here who can share about how to test a multi-threaded environment.

First thing is, you can't ensure only with testing that your class is fully thread-safe. Whatever tests you run on it, you still need to have your code reviewed by as many experienced eyes as you can get, to detect subtle concurrency issues.
That said, you can devise specific test scenarios to try to cover all possible inter-thread timing scenarios, as you did. For ideas on this (and for designing thread-safe classes in general), it is recommended to read Java Concurrency in Practice.
Moreover, you can run stress tests, executing many threads simultaneously over an extended period of time. The number of threads should be way over the reasonable limit to make sure that thread contention happens often - this raises the chances of potential concurrency bugs to manifest over time.

Also, another thing I would recomend is for you to use code coverage measuring tools and set a high standar as your goal. For example, set a high goal for modified condition/decision coverage.

We use GroboUtils to create multi threaded tests.

If you have code that you plan to test in order to make it reliable, then make it single threaded.
Threading should be reserved for code that either doesn't particularly need to work, or is simple enough to be statically analysed and proven correct without testing.

Related

Why should a testdatabase be created/deleted when testing?

I have been assigned with testing a mongodb database from a java backend. I got told that I had to create a database completely utilizing a script for this task.
But I have difficulties understanding the benefit of creating a database from scratch with a script, instead of having a permanent test database. Where I imagine data should be inserted on startup and cleaned on teardown in both cases.
Why is it beneficial from a testing perspective to creating and deleting a database when testing?
Sometimes tests fail and therefore it may happen that the teardown phase will never be reached.
Furthermore, deleting a database is the fastest and most effective way to clean it, although perhaps not the most efficient way to do so. But it guarantees that you do not forget something in your cleanup routine.
And in particular for performance tests it is important that the database is in exact the same state for each run, otherwise the run times cannot be compared with each other: an improvement in a consecutive run could have been caused just because tablespaces were already increased or similar things, and not because the code optimisation worked …
most of the times test means a predefined environment and a expected reaction of that environment against our assumed states. so for verifying it we need a pure automated and repeatable process as much as possible without interference of manual setup or configuration.
In software development process we try to consider as many as possible test cases for QA of product. when we talk about too many test cases each one should be isolated from the others. if it's not isolated well the result may varies in each execution round and eventually invalid testing process.
They need not be. However:
You lose portability.
You don't have a known start state for your test.

Advantages and disadvantages with Static- and Dynamic Scheduling

I'm opening this questions since I can't find easy to understand summarized information about this topic. There isn't even a good youtube-video that explains this.
I'm currently studying realtime programming and statical- and dynamical scheduling is a part of it. I just can't seem to get my head around it.
If there is someone who can explain the advantages and disadvantages with statical- and dynamical scheduling in a educational way, that would really be helpful.
What I've got so far is the following:
Statical scheduling:
Is a off-line approach where a schedule is generated manually. It can be modified during run-time, but isn't suggested because it then can cause the threads to miss it's deadlines. It's easy to implement and to analyze. Because it's easy to analyze it's easy to see if the system is going to make all of its deadlines.
Dynamical scheduling:
Is a on-line approach where the schedule is generated automatically. It can be modified during run-time by the system and it should't cause (in most cases) the threads to miss its deadlines. If the system changes it's easy to generate a new schedule since it's automatically generated. There isn't a guarantee that the system meets all its deadlines.
Anyone that can explain these two a bit better than me? Or perhaps add more information about these two. Perhaps illustrate it with a image so it'll be easier to wrap my head around it.
In simple terms,
Static Scheduling is the mechanism, where we have already controlled the order/way that the threads/processes are executing in our code (Compile time). If you have used any control(locks, semaphores, joins, sleeps) over threads in your program (to achieve some goal), then you have intended to use static (compile time) scheduling.
Dynamic Scheduling is the mechanism where thread scheduling is done by the operating systems based on any scheduling algorithm implemented in OS level. So the execution order of threads will be completely dependent on that algorithm, unless we have put some control on it (with static scheduling).
I think the term 'advantages' would not be the best term here. Simply when you are implementing any control over threads with your code to achieve some task, you should make sure that you have used minimal controls and also in most optimized way. :))
Addition:
Comparison between Static & Dynamic Scheduling
Generally we would never have a computer program which would completely depend on only one of Static or Dynamic Scheduling.
Instead we would have some programs which are pretty much in controlled from the code itself (Strongly static). This would be a good example for that.
And some programs would be strongly dynamic (weakly static). This would be a good example for that. There you might see other than the start of 2 threads, rest of the program execution would be a free flyer.
Please don't try to find a disclaimer criteria which would seal a program either a strongly static or strongly dynamic one. :))
Positives & Negatives
Dynamic Scheduling scheduling is faster in execution than static scheduling, since it's basically a free flyer without any intentional waits, joins etc. (any kind of synchronization/protection between threads).
Dynamic Scheduling is not aware of any thread dependencies (safeness, synchronization etc.). If you followed above sources I mentioned, you would probably have the idea.
So generally, how good multi-threading programmer you are, would depend on how limited restrictions, dependencies, bottlenecks you have implemented on your threads yet to achieve your task successfully. :))
I think I have covered quite a things. Please raise me questions if any. :))
Dynamic Scheduling –
o Main advantages (PROs):
- Enables handling cases of dependence unknown at compile time
- Simplifies compiler
- Allows compiled code to run efficiently on a different pipeline
o Disadvantages (CONs):
- Significant increase in hardware complexity
- Increased power consumption
- Could generate imprecise exceptions
During Static scheduling the order of the thread or processes is already controlled by the compiler . So it occurs at the compile time.
Here if there is a data dependency involving memory then it wouldn't be solved or recognised at compile time therefore the concept of Dynamic scheduling was introduced .
Dynamic scheduling also determines the order of excecution but here the hardware does this rather than the compiler.

Java Parallel Programming

I need to parallelize a CPU intensive Java application on my multicore desktop but I am not so comfortable with threads programming. I looked at Scala but this would imply learning a new language which is really time consuming. I also looked at Ateji PX Java parallel extensions which seem very easy to use but did not have a chance yet to evaluate it. Would anyone recommend it? Other suggestions welcome.
Thanks in advance for your help
Bill
I would suggest you try the built-in ExecutorService for distributing multiple tasks across multiple threads/cores. Do you have any requirements which this might not do for you?
The Java concurrency utilites:
http://download.oracle.com/javase/1.5.0/docs/guide/concurrency/overview.html
make parallel programming on Java even easier than it already was. I would suggest starting there - if you are uncomfortable with that level of working with threads, I would think twice about proceeding further. Parallelizing anything requires some level of technical comfort with how concurrent computation is done and coordinated. In my opinion, it can't get much easier than that framework - which is part of the reason why you see so few alternatives.
Second, the main thing you should think about is what the unit of work is for parallelization. If your unit of work is independent (i.e., each parallel task does not impact the others), this is generally far easier because you don't need to worry about much (or any) synchronization at all. Put effort into thinking how to model the problem so that computation is as independent as possible. If you model it well, you will almost certainly reduce the lines of code (which reduces the error, etc).
Admittedly, frameworks that automatically parallelize for you are less error prone, but can be suboptimal if your model unit of work doesn't play to their parallelization scheme.
I am the lead developer of Ateji PX. As you mention, guaranteeing thread safety is an important topic. It is also a very difficult one, and there's not much help out there beside hand-written and hand-checked #ThreadSafe annotations. See e.g. "The problem with threads".
We are currently working on a parallel verifier for Ateji PX. This has become possible because parallelism in Ateji PX is compositional (unlike threads) and based on a sound mathematical foundation, namely pi-calculus. Even without a tool, experience shows that expressing parallelism in an intuitive and compositional way makes it much easier to "think parallel" and catch errors earlier.
I browsed quickly through the Ateji PX web site. Seems to be a nice product but I'm afraid you will be disappointed at some point since Ateji PX only provides you an intuitive simple way of performing high level parallel operations such as distributing the work load on several workers, creating rendez-vous points between parallel tasks, etc. However as you can read in the FAQ in the section How do you detect and prevent data dependencies? Ateji PX does not ensure that the underlying code is thread safe. So at any rate you'll still be needing skills in Java thread programming.
Edit:
Also consider that when maintenance time will come and you won't be available to perform it, it'll be easier to find a contractor, employee or trainee with skills in standard Java multithread programming than in Ateji PX.
Last word, there's a free 30 days evaluation, try it.
Dont worry java 7 is coming up with Fork Join by Doug lea for Distributed Processing.

Is Multi-Threaded algorithm required to make use of Multi-core processors?

I was just wondering whether we actually need the algorithm to be muti-threaded if it must make use of the multi-core processors or will the jvm make use of multiple core's even-though our algorithm is sequential ?
UPDATE:
Related Question:
Muti-Threaded quick or merge sort in java
I don't believe any current, production JVM implementations perform automatic multi-threading. They may use other cores for garbage collection and some other housekeeping, but if your code is expressed sequentially it's difficult to automatically parallelize it and still keep the precise semantics.
There may be some experimental/research JVMs which try to parallelize areas of code which the JIT can spot as being embarrassingly parallel, but I haven't heard of anything like that for production systems. Even if the JIT did spot this sort of thing, it would probably be less effective than designing your code for parallelism in the first place. (Writing the code sequentially, you could easily end up making design decisions which would hamper automatic parallelism unintentionally.)
Your implementation needs to be multi-threaded in order to take advantage of the multiple cores at your disposal.
Your system as a whole can use a single core per running application or service. Each running application, though, will work off a single thread/core unless implemented otherwise.
Java will not automatically split your program into threads. Currently, if you want you code to be able to run on multiple cores at once, you need to tell the computer through threads, or some other mechanism, how to split up the code into tasks and the dependencies between tasks in your program. However, other tasks can run concurrently on the other cores, so your program may still run faster on a multicore processor if you are running other things concurrently.
An easy way to make you current code parallizable is to use JOMP to parallelize for loops and processing power intensize, easily parellized parts of your code.
I dont think using multi-threaded algorithm will make use of multi-core processors effectively unless coded for effectiveness. Here is a nice article which talks about making use of multi-core processors for developers -
http://java.dzone.com/news/building-multi-core-ready-java

How do you add tests for multi threaded support?

I have a Java service which now will execute in a batch mode. Multi threaded support is added to the service so for every batch request a thread pool will be dedicated to execute the batch. The question is how do I test this? I have functional tests that pass under the threaded version of the service but, somehow, I feel there must be an idiom for testing this.
There really isn't a "good" way to do this. The best thing I can suggest would be TestNG, which allows you to annotate your test methods and cause them to be executed in n threads concurrently. For example:
#Test(invocationCount=10, threadPool=10)
public void testSomethingConcurrently() {
...
}
My TestNG knowledge is rusty at best, but AFAIK that should invoke the testSomethingConcurrently method 10 times concurrently. This is a nice and declarative way to run multi-threaded tests against your code.
You can of course do something similar in JUnit by spawning threads manually in a loop, but that's insanely ugly and difficult to work with. I had to do this once for a project I was working on; those tests were a nightmare to deal with since their failures weren't always repeatable.
Concurrency testing is difficult and prone to frustration, due to its non-deterministic nature. This is part of why there is such a big push these days to use better concurrency abstractions, ones which are easier to "reason about" and convince one's self of correctness.
Usually the things that bring down multithreaded applications are issues of timing. I suspect that to be able to perform unit testing on the full multithreaded environment would require huge changes to the code base to do that.
What you may well be able to do, though, is to test the implementation of the thread pool in isolation.
By substituting the body of the threads with test code you should be able to construct your pathological conditions of locking and resource usage.
Then, unit test the functional elements in a single threaded environment, where you can ignore timing.
While this isn't perfect it does guarantee repeatability which is of great importance for your unit testing. (as alluded to in Daniel Spiewak's answer)
I used
#Test(threadPoolSize = 100, invocationCount = 100)
#DataProvider(parallel = true)
executed 100 threads in parallel 100 times.
I agree with Daniel, concurrency testing is indeed very difficult.
I don't have a solution for concurrency testing, but I will tell you what I do when I want to test code involving multi-threading.
Most of my testing is done using JUnit and JMock. Since JMock doesn't work well with multiple threads, I use a extension to JMock that provides synchronous versions of Executor and ScheduledExecutorService. These allow me to test code targeted to be run by multiple threads in a single thread, where I'm also able to control the execution flow. As I said before, this doesn't test concurrency. It only checks the behavior of my code in a single threaded way, but it reduces the amount of errors I get when I switch to the multi thread executors.
Anyway I recommend the use of the new Java concurrency API. It makes things a lot more easy.

Categories

Resources