Mocking a bit stream reader with Mockito

Mocking a bit stream reader with Mockito - java

I am currently debugging a rather complicated algorithm that fixes errors in a bit stream. A BitReader interface is quite simple, and the main reading method is like this:
/**
Reads bits from the stream.
#param length number of bits to read (<= 64)
#return read bits in the least significant bits
*/
long read(int length) throws IOException;
The objective is to test whether BitStreamFixer actually fixes the stream (in a way that is too hard to describe here). Basically I need to provide “broken” inputs to it and test whether its output is as correct as it can be (some inputs can't be fixed completely), like this:
BitStreamFixer fixer = new BitStreamFixer(input);
int word1 = fixer.readWord();
int word2 = fixer.readWord();
// possibly a loop here
assertEquals(VALID_WORD1, word1);
assertEquals(VALID_WORD2, word2);
// maybe a loop here too
Now, the BitStreamFixer class accepts an instance of BitReader. When unit testing the fixer, I obviously need one such instance. But where do I get one? I have two obvious options: either give it a real implementation of BitReader or mock it.
The former option is not really appealing because it would create a dependency on another object which has nothing to do with the class being tested. Moreover, it's not that easy because existing BitReader implementations read form input streams, so I'll need either a file or somehow prepared byte array, which is a tedious thing to do.
The latter option looks better and fits the usual unit testing approach. However, since I'm not even supposed to know what arguments the fixer will give to read, mocking it is not easy. I'll have to go with when(bitReader.read(anyInt())).thenAnswer(...) approach, implementing a custom answer that will create a lot of bit-fiddling logic to spoon-feed the object under test with proper bits in chunks of whatever size it asks for. Considering that bit streams I'm working with have rather complicated higher-level structure, it's not easy. And introducing logic in unit tests also doesn't smell good.
What do you think, is there any other option? Or maybe one of these can be improved in a way I fail to notice?

Write, test, and use a clear reusable test helper.
In a general sense, in unit testing, you're supposed to establish confidence in a system by watching it successfully interact with systems that you DO have confidence in. Of course you also want the system to be fast, deterministic, and easy to read/modify, but ultimately those come secondary to the assertion that your system work.
You've listed two options:
Use a mock BitReader, where you have enough confidence in predicting your system's interactions that you can set up the entire "when A then B" conversation. Mocking can be pretty easy when you have a small API surface of independent methods, like an RPC layer, but mocking can be very difficult when you have a stateful object with unpredictable method calls. Mocking is further useful to deterministically stub nondeterministic systems, like external servers or pseudorandom sources, or systems that don't exist yet; none of those is the case for you.
Because your read method can take a wide variety of parameters, each of which is valid and changes your system's state, then it's probably not a smart idea to use mocking here. Unless the order of calls that BitStreamFixer makes to BitReader is deterministic enough to make part of its contract, a mock BitReader will likely result in a brittle test: one that breaks when the implementation changes even if the system is perfectly functional. You'll want to avoid that.
Note that mocking should never yield "complicated logic", only complicated set-up. You're using mocks to avoid using real logic in your tests.
Use a real BitReader, which sounds like it will be painful and opaque to construct. This is probably the most realistic solution, though, especially if you've already finished writing and testing it.
You worry about "introducing new dependencies", but if your BitReader implementation exists and is fast, deterministic, and well-tested, then you shouldn't feel any worse about using it than using a real ArrayList or ByteArrayInputStream in your test. It sounds like the only real problem here is that creating the byte array would make it hard to maintain your test, which is a valid consideration.
In the comments, though, the real answer comes through: Build the BitWriter you're missing.
#Test public void shouldFixBrokenStream() {
BitReader bitReader = new StreamBitReader(BitWriter.create()
.pushBits(16, 0x8080)
.pushBits(12, 0x000) // invalid 12-bit sequence
.pushBits(16, 0x8080)
.asByteArrayInputStream());
BitStreamFixer fixer = new BitStreamFixer(bitReader);
assertEquals(0x80808080, fixer.read(32));
}
/** Of course, you could skip the BitReader yourself, and just make a new one. */
#Test public void shouldFixBrokenStream_bitReader() {
BitReader bitReader = new InMemoryBitReader();
bitReader.pushBits(16, 0x8080);
bitReader.pushBits(12, 0x000); // invalid 12-bit sequence
bitReader.pushBits(16, 0x8080);
BitStreamFixer fixer = new BitStreamFixer(bitReader);
assertEquals(0x80808080, fixer.read(32));
}
This is more readable than constructing an opaque bitstream offline and copy-pasting it into your test (particularly if well-commented), less brittle than mocks, and much more testable itself than an anonymous inner class (or Answer-based version of the same). It is also likely that you can use a system like that across multiple test cases, and possibly even multiple tests.

Related

Verify a temporary objects are created

I'm in a java context and am using Mockito (but I'm not bound to it) for basic mocking needs.
I have code like this
public class AuditInfoSerializer {
[..]
public Map<String, Object> doStuff(Object a) {
doOtherStuff("hello", new TempClass(someField, <someParams>));
doOtherStuff("world", new TempClass(someField, <otherParams>));
return getResult();
}
}
and in a test I want to verify that there are two instances of TempClass created with the correct set of parameters when I call the doStuff method.
Is this possible somehow?

You don't want to verify temporary data on the object under test. You want to mock dependencies and assert the object under test behavior : that is with this input you have this output.
Mock verifying is a trade off for methods to mock that return nothing but only produces side effect.
So use it only as you don't have the choice.
In your unit test, what you want is asserting what the method to test returns that is getResult().
Do that with Assert.assertEquals(...) not with Mockito.verify(...).

For the most part I agree with #davidxxx's point about the mock verifying tradeoff. If you have a setup that allows you to make assertions about an outcome like a map that is created as a result, go for it!
From an API perspective doStuff is a simple straight-forward method: Throw something at it, get something back. The information you are interested in will be contained in the map (this would be your assertion).
There is a lot going on under the hood before doStuff returns something. Many people tend to want to break up encapsulation when testing stuff. They are constantly looking for ways to uncover what is going on behind the curtains. I believe that's totally natural. But of course, it's also an anti pattern. It doesn't matter what tool you (mis)use to break natural boundaries (mocking frameworks, custom reflection, "back doors" in your code base, etc). It is always wrong. As #Michael already pointed out, the call to doOtherStuff is indeed an implementation detail. Take the perspective of client code that makes a call to doStuff. Is it interested in how the map is created? I doubt it. This should also be your testing perspective.
One last thing about using verification in tests. I would like to mitigate the trade off statement. I really don't like the generalization here. Verification is not always the less attractive choice compared to real assertions:
// Valid test without any verifaction
#Test
void method_foo_returns_gibberish (#Mock SomeInput someInput) {
// Maybe this is just to prevent an NPE ...
when(someInput.readStuff()).thenReturn("bla");
assertEquals("gibberish", Foo.foo(someInput));
}
// Test made possible by verification
#Test
void method_foo_is_readonly (#Mock SomeInput someInput) {
Foo.foo(someInput);
verify(someInput.readStuff());
verifyNoMoreInteractions(mockedList);
}
This is just the most obvious example that I could think of. There is a fraction of BDD geniuses who strive to build their whole architecture around verification driven tests! Here is an excellent article by Martin Fowler
When talking about testing, most of the time, there is no black and white. Using mocks and verification means writing different tests.
As always, it's about picking the right tool.

Randomized testing in Java

I write a lot of unit tests. Often, you need to write carefully considered test cases by hand, a form of whitebox testing. If you are lucky enough to work for a company with a separate quality assurance engineers, perhaps someone else writes test cases for you (kind of a mix between white and black box testing).
Many times, however, randomized testing would find many bugs and would serve as a great complement to hand-written cases.
For example, I might have a self-contained class and be able to express the invariants and broad-stroke behavior of the class simply (such as "this method never throws an exception" or "this method always returns a positive value"). I would like a test framework that just bashes on my class and checks the invariants.
A similar case: I often have a class which implements similar functionality to another class (but does it with different performance characteristics or with some added functionality). I would to A vs B test the two classes in a randomized way. For example, if I was implementing TreeMap, I could use HashMap as a comparable implementation (modulo a few differences due to the sorted behavior of TreeMap) and check most of the basic functionality in a randomized way. Simlarly, someone implementing LinkedList could use ArrayList as a comparable implementation and vice-versa.
I've written some basic stuff to do this in the past, but it is painstaking to set to up all the boilerplate to:
Create objects with random initial state
Apply random mutations
Create mappings between "like" objects for A vs B testing
Define invariants and rules such as "when will exceptions be thrown"
I still do it from time to time, but I want to reduce my effort level. For example, are there frameworks that remove or simplify the required boilerplate?
Otherwise, what techniques are used to do randomized testing in Java?
This is related, but not the same as fuzz testing. Fuzz testing seems to focus on random inputs to a single entity, in hope of triggering bad behavior, often with an adaptive input model based on dynamic coverage observations. That's covers a lot of the above, but doesn't cover, stuff like A vs B testing when comparable implementations exist, or invariant checking. In any case, I'm also interested in decent fuzz testing libraries for Java.

I think what you're trying to find is a library for Property Based Testing in Java (see types of randomized testing). Shortly: instead of testing the value of the result you're testing a property of it. E.g. instead of checking that 2+2 is 4 you're checking properties like:
random1 + 0 = random1
random1 + random2 >= random1
...
Take a look at this article that explains Property Based Testing in details.
Another option that you mention is to check with your Test Oracle - something that knows the true answer (e.g. old bullet-proof algorithm). So you pass a random variable both to old and new algorithm and you check that the results are equal.
Couple of Java libraries:
JUnit QuickCheck - a specialized lib for Property Based Testing. Allows you to define the properties and passes random values for these properties to check. So far (06/2016) it's pretty young, so you may want to check out ScalaCheck since it's possible to write Scala tests for Java code.
Datagen - random values generator for Java in case standard randomizers are not enough. Disclaimer: I'm the author.

How to implement a complex algorithm with TDD

I have a fairly complex algorithm I would like to implement (in Java) with TDD. The algorithm, for translation of natural languages, is called stack decoding.
When I tried to do that I was able to write and fix some simple test cases (empty translation, one word, etc'..), but I am not able to get to the direction of the algorithm I want. I mean that I can't figure out how to write a massive amount of the algorithm as described below in baby steps.
This is the pseudo code of the algorithm:
1: place empty hypothesis into stack 0
2: for all stacks 0...n − 1 do
3: for all hypotheses in stack do
4: for all translation options do
5: if applicable then
6: create new hypothesis
7: place in stack
8: recombine with existing hypothesis if possible
9: prune stack if too big
10: end if
11: end for
12: end for
13: end for
Am I missing any way I can do the baby steps, or should I just get some coverage and do the major implementation?

TL;DR
Recast your task as testing an implementation of an API.
As the algorithm is expressed in terms of concepts (hypothesis, translation option) that are themselves quite complicated, program it bottom-up by first developing (using TDD) classes that encapsulate those concepts.
As the algorithm is not language specific, but abstracts language specific operations (all translation options, if applicable), encapsulate the language specific parts in a separate object, use dependency injection for that object, and test using simple fake implementations of that object.
Use your knowledge of the algorithm to suggest good test cases.
Test the API
By focusing on the implementation (the algorithm), you are making a mistake. Instead, first imagine you had a magic class which did the job that the algorithm performs. What would its API be? What would its inputs be, what would its outputs be? And what would the required connections between the inputs and outputs be. You want to encapsulate your algorithm in this class, and recast your problem as generating this class.
In this case, its seems the input is a sentence that has been tokenized (split into words), and the output is a tokenized sentence , which has been translated into a different language. So I guess the API is something like this:
interface Translator {
/**
* Translate a tokenized sentence from one language to another.
*
* #param original
* The sentence to translate, split into words,
* in the language of the {#linkplain #getTranslatesFrom() locale this
* translates from}.
* #return The translated sentence, split into words,
* in the language of the {#linkplain #getTranslatesTo() locale this
* translates to}. Not null; containing no null or empty elements.
* An empty list indicates that the translator was unable to translate the
* given sentence.
*
* #throws NullPointerException
* If {#code original} is null, or contains a null element.
* #throws IllegalArgumentException
* If {#code original} is empty or has any empty elements.
*/
public List<String> translate(List<String> original);
public Locale getTranslatesFrom();
public Locale getTranslatesTo();
}
That is, an example of the Strategy design pattern. So your task becomes, not "how do I use TDD to implement this algorithm" but rather "how do I use TDD to implement a particular case of teh Strategy design pattern".
Next you need to think up a sequence of test cases, from easiest to hardest, that use this API. That is, a set of original sentence values to pass to the translate method. For each of those inputs, you must give a set of constraints on the output. The translation must satisfy those constraints. Note that you already have some constraints on the output:
Not null; not empty; containing no null or empty elements.
You will want some example sentences for which you know for sure what the algorithm should output. I suspect you will find there are very few such sentences. Arrange these tests from easiest to pass to hardest to pass. This becomes your TODO list while you are implementing the Translator class.
Implement Bottom Up
You will find that making your code passes more than a couple of these cases very hard. So how can you thoroughly test your code?
Look again at the algorithm. It is complicated enough that the translate method does not do all the work directly itself. It will delegate to other classes for much of the work
place empty hypothesis into stack 0
Do you need a Hypothesis class. A HypothesisStack class?
for all translation options do
Do you need a TranslationOption class?
if applicable then
Is there a method TranslationOption.isApplicable(...)?
recombine with existing hypothesis if possible
Is there a Hypothesis.combine(Hypothesis) method? A Hypothesis.canCombineWith(Hypothesis) method?
prune stack if too big
Is there a HypothesisStack.prune() method?
Your implementation will likely need additional classes. You can implement each of those individually using TDD. Your few tests of the Translator class will end up being integration tests. The other classes will be easier to test than the Translator because they will have precisely defined, narrow definitions of what they ought to do.
Therefore, defer implementing Translator until you have implemented those classes it delegates to. That is, I recommend that you write your code bottom up rather than top down. Writing code that implements the algorithm you have given becomes the last step. At that stage you have classes that you can use to write the implementation using Java code that looks very like the pseudo code of your algorithm. That is, the body of your translate method will be only about 13 lines long.
Push Real-Life Complications into an Associated Object
Your translation algorithm is general purpose; it can be used to translate between any pair of languages. I guess that what makes it applicable to translating a particular pair of languages is for all translation options and if applicable then parts. I guess the latter could be implemented by a TranslationOption.isApplicable(Hypothesis) method. So what makes the algorithm specific to a particular language is generating the translation options. Abstract that to a a factory object that the class delegates to. Something like this:
interface TranslationOptionGenerator {
Collection<TranslationOption> getOptionsFor(Hypothesis h, List<String> original);
}
Now so far you have probably thinking in terms of translating between real languages, with all their nasty complexities. But you do not need that complexity to test your algorithm. You can test it using a fake pair of languages, which are much simpler than real languages. Or (equivalently) using a TranslationOptionGenerator that is not as rich as a practical one. Use dependency injection to associate the Translator with a TranslationOptionGenerator.
Use Simple Fake Implementations instead of Realistic Associates
Now consider some of the most extreme simple cases of a TranslationOptionGenerator that the algorithm must deal with:
Such as when there are no options for the empty hypothesis
There is only one option
There are only two options.
An English to Harappan translator. Nobody knows how to translate to Harappan, so this translator indicates it is unable to translate every time.
The identity translator: a translator that translates English to English.
A Limited English to newborn translator: it can translate only the sentences "I am hungry", "I need changing" and "I am tired", which it translates to "Waaaa!".
A simple British to US English translator: most words are the same, but a few have different spellings.
You can use this to generate test cases that do not require some of the loops, or do not require the isApplicable test to be made.
Add those test cases to your TODO list. You will have to write fake simple TeranslatorOptionGenerator objects for use by those test cases.

The key here is to understand that "baby steps" do not necessarily mean writing just a small amount of production code at a time. You can just as well write a lot of it, provided you do it to pass a relatively small & simple test.
Some people think that TDD can only be applied one way, namely by writing unit tests. This is not true. TDD does not dictate the kind of tests you should write. It's perfectly valid to do TDD with integration tests that exercise a lot of code. Each test, however, should be focused in a single well-defined scenario. This scenario is the "baby step" that really matters, not the number of classes that may get exercised by the test.
Personally, I develop a complex Java library exclusively with integration-level tests, rigorously following the TDD process. Quite often, I create a very small and simple-looking integration test, which ends up taking a lot of programming effort to make pass, requiring changes to several existing classes and/or the creation of new ones. This has been working well for me for the past 5+ years, to the point I now have over 1300 such tests.

TL;DR: Start with nothing and don't write any code that a test does not call for. Then you can't help but TDD the solution
When building something via TDD, you should start with nothing then implement test cases until it does the thing you want. That's why they call it red green refactoring.
Your first test would be to see if the internal object has 0 hypothesis (an empty implementation would be null. [red]). Then you initialize the hypothesis list [green].
Next you would write a test that the hypothesis is checked (it's just created [red]). Implement the "if applicable" logic and apply it to the one hypothesis [green].
You write a test that when a hypothesis is applicable, then create a new hypothesis (check if there's > 1 hypothesis for a hypothesis that's applicable [red]). Implement the create hypothesis logic and stick it in the if body. [green]
conversely, if a hypothesis is not applicable, then do nothing ([green])
Just kinda follow that logic to individually test the algorithm over time. It's much easier to do with an incomplete class than a complete one.

Generate random but static test data

When designing test cases, I want to be able to use data that is random but static.
If I use data that is not random, then I will use trivial examples that are representative of the data I expect, rather than the data I have guarded against in my code. For example, if my code is expecting a string with a max length of 15 characters then I would rather specify these constraints and have the data generated for me, within those constraints, rather than some arbitrary example which may be, due to my expectations, within a more strict set of constraints.
If I use data that is not static, then my tests won't be repeatable. It is no good using a string that changes every time the test is run f the test then fails occasionally. It would be much better to use a consistent string and then specify more constraints upon how that string is generated (and obviously make the same checks in my code), if and when a bug is found.
Is this a good strategy for test data?
If so, then I know how to achieve both of these goals independently. For static, but non-random data I just enter something arbitrary e.g. foo. For something random but not static, I just use apache random utils e.g. randomString(5). How can I get both?
Note that when data must be unique, it would also be handy to have some way to specify that two pieces of generated data must be distinct. Randomness does this most of the time but cannot be relied upon, obviously, without having unreliable tests!
TL;DR: How can I specify the type of data I want to generate, without having randomised generated data?

Use a random with a constant seed. You can use the Random(long seed) constructor for it.
The RandomStringUtils.random() method can accept a Random source, which you could have created with a constant seed as described.
Using a constant seed is very useful for making experiments reproduceable - and using them is a very good practice, IMO.

don't do it. it gives you a headache, makes your tests unreadable and gives you no benefit. you already see the problems: specification of constraints. so let's go to the imaginary benefits. you worry about that manually you provide more constrained data then random data. but you want to use same data every time (same seed). so how do you know that random data are better than your manually provided data? how do you know that you chose seed properly? if you are not sure if your test data are good enough then:
simplify your code (extract methods/classes, avoid ifs, avoid nulls, be more immutable and functional)
look at edge cases and include them in your tests
look at generated data and check if some of them differs from what you were thinking of and add those data to your tests
use mutation testing
whenever a bug is discovered dufing development, uat or production, add those data to your tests
do truly random (not repetitive), long running tests. every generated data that breaks the tests should be logged and add to your deterministic unit tests.
by pretending to use random data you just lie to yourself. the data is not random, you don't control it and it makes you stop thinking about edge cases of your code. so don't do it, face the truth and make your tests readable and check more conditions

What you are describing is property based testing - the best known example being Haskell's quickcheck.
http://www.haskell.org/haskellwiki/Introduction_to_QuickCheck1
There have been a number of java ports such as
https://bitbucket.org/blob79/quickcheck
https://github.com/kjw/supercheck
https://github.com/pholser/junit-quickcheck
The Quickcheck philosophy emphasises the use of random data, but most (all?) of the java ports allow you to set a fixed seed so that the generated values are repeatable.
I've never got round to actually trying this approach, but I would hope it would make your tests more readable (rather than less readable as piotrek suggests), by separating the values from the tests.
If knowledge of the values is important to understand the test/SUT behavior then it is the wrong approach.

Instancio is a data generation library for unit tests that does what you are looking. E.g. if you need a random string of certain length:
Foo foo = Instancio.of(Foo.class)
.generate(field("fooString"), gen -> gen.string().length(10))
.create();
To generate a random predictable values, you can supply a seed:
Foo foo = Instancio.of(Foo.class)
.generate(field("fooString"), gen -> gen.string().length(10))
.withSeed(123)
.create();
Or if you use JUnit 5:
#ExtendWith(InstancioExtension.class)
class ExampleTest{
#Seed(1234)
#Test
void example {
Foo foo = Instancio.of(Foo.class)
.generate(field("fooString"), gen -> gen.string().length(10))
.create();
// ...
}
}
If you need predictable data always, you can configure a global seed value through a properties file. This way you won't need to specify it in the code.
https://github.com/instancio/instancio/

How can I simplify testing of side-effect free methods in Java?

Functions (side-effect free ones) are such a fundamental building block, but I don't know of a satisfying way of testing them in Java.
I'm looking for pointers to tricks that make testing them easier. Here's an example of what I want:
public void setUp() {
myObj = new MyObject(...);
}
// This is sooo 2009 and not what I want to write:
public void testThatSomeInputGivesExpectedOutput () {
assertEquals(expectedOutput, myObj.myFunction(someInput);
assertEquals(expectedOtherOutput, myObj.myFunction(someOtherInput);
// I don't want to repeat/write the following checks to see
// that myFunction is behaving functionally.
assertEquals(expectedOutput, myObj.myFunction(someInput);
assertEquals(expectedOtherOutput, myObj.myFunction(someOtherInput);
}
// The following two tests are more in spirit of what I'd like
// to write, but they don't test that myFunction is functional:
public void testThatSomeInputGivesExpectedOutput () {
assertEquals(expectedOutput, myObj.myFunction(someInput);
}
public void testThatSomeOtherInputGivesExpectedOutput () {
assertEquals(expectedOtherOutput, myObj.myFunction(someOtherInput);
}
I'm looking for some annotation I can put on the test(s), MyObject or myFunction to make the test framework automatically repeat invocations to myFunction in all possible permutations for the given input/output combinations I've given, or some subset of the possible permutations in order to prove that the function is functional.
For example, above the (only) two possible permutations are:
myObj = new MyObject();
myObj.myFunction(someInput);
myObj.myFunction(someOtherInput);
and:
myObj = new MyObject();
myObj.myFunction(someOtherInput);
myObj.myFunction(someInput);
I should be able to only provide the input/output pairs (someInput, expectedOutput), and (someOtherInput, someOtherOutput), and the framework should do the rest.
I haven't used QuickCheck, but it seems like a non-solution. It is documented as a generator. I'm not looking for a way to generate inputs to my function, but rather a framework that lets me declaratively specify what part of my object is side-effect free and invoke my input/output specification using some permutation based on that declaration.
Update: I'm not looking to verify that nothing changes in the object, a memoizing function is a typical use-case for this kind of testing, and a memoizer actually changes its internal state. However, the output given some input always stays the same.

If you are trying to test that the functions are side-effect free, then calling with random arguments isn't really going to cut it. The same applies for a random sequence of calls with known arguments. Or pseudo-random, with random or fixed seeds. There's a good chance are that a (harmful) side-effect will only occur with any of the sequence of calls that your randomizer selects.
There is also a chance that the side-effects won't actually be visible in the outputs of any of the calls that you are making ... no matter what the inputs are. They side-effects could be on some other related objects that you didn't think to examine.
If you want to test this kind of thing, you really need to implement a "white-box" test where you look at the code and try and figure out what might cause (unwanted) side-effects and create test cases based on that knowledge. But I think that a better approach is careful manual code inspection, or using an automated static code analyser ... if you can find one that would do the job for you.
OTOH, if you already know that the functions are side-effect free, implementing randomized tests "just in case" is a bit of a waste of time, IMO.

I'm not quite sure I understand what you are asking, but it seems like Junit Theories (http://junit.sourceforge.net/doc/ReleaseNotes4.4.html#theories) could be an answer.

In this example, you could create a Map of key/value pairs (input/output) and call the method under test several times with values picked from the map. This will not prove, that the method is functional, but will increase the probability - which might be sufficient.
Here's a quick example of such an additional probably-functional test:
#Test public probablyFunctionalTestForMethodX() {
Map<Object, Object> inputOutputMap = initMap(); // this loads the input/output values
for (int i = 0; i < maxIterations; i++) {
Map.Entry test = pickAtRandom(inputOutputMap); // this picks a map enty randomly
assertEquals(test.getValue(), myObj.myFunction(test.getKey());
}
}
Problems with a higher complexity could be solved based on the Command pattern: You could wrap the test methods in command objects, add the command object to a list, shuffle the list and execute the commands (= the embedded tests) according to that list.

It sounds like you're attempting to test that invoking a particular method on a class doesn't modify any of its fields. This is a somewhat odd test case, but it's entirely possible to write a clear test for it. For other "side effects", like invoking other external methods, it's a bit harder. You could replace local references with test stubs and verify that they weren't invoked, but you still won't catch static method calls this way. Still, it's trivial to verify by inspection that you're not doing anything like that in your code, and sometimes that has to be good enough.
Here's one way to test that there are no side effects in a call:
public void test_MyFunction_hasNoSideEffects() {
MyClass systemUnderTest = makeMyClass();
MyClass copyOfOriginalState = systemUnderTest.clone();
systemUnderTest.myFunction();
assertEquals(systemUnderTest, copyOfOriginalState); //Test equals() method elsewhere
}
It's somewhat unusual to try to prove that a method is truly side effect free. Unit tests generally attempt to prove that a method behaves correctly and according to contract, but they're not meant to replace examining the code. It's generally a pretty easy exercise to check whether a method has any possible side effects. If your method never sets a field's value and never calls any non-functional methods, then it's functional.
Testing this at runtime is tricky. What might be more useful would be some sort of static analysis. Perhaps you could create a #Functional annotation, then write a program that would examine the classes of your program for such methods and check that they only invoke other #Functional methods and never assign to fields.
Randomly googling around, I found somebody's master's thesis on exactly this topic. Perhaps he has working code available.
Still, I will repeat that it is my advice that you focus your attention elsewhere. While you CAN mostly prove that a method has no side effects at all, it may be better in many cases to quickly verify this by visual inspection and focus the remainder of your time on other, more basic tests.

have a look at http://fitnesse.org/: it is used often for Acceptance Test but I found it is a easy way to run the same tests against huge amount of data

In junit you can write your own test runner. This code is not tested (I'm not sure if methods which get arguments will be recognized as test methods, maybe some more runner setup is needed?):
public class MyRunner extends BlockJUnit4ClassRunner {
#Override
protected Statement methodInvoker(final FrameworkMethod method, final Object test) {
return new Statement() {
#Override
public void evaluate() throws Throwable {
Iterable<Object[]> permutations = getPermutations();
for (Object[] permutation : permutations) {
method.invokeExplosively(test, permutation[0], permutation[1]);
}
}
};
}
}
It should be only a matter of providing getPermutations() implementation. For example it can take data from some List<Object[]> field annotated with some custom annotation and produce all the permutations.

I think the term you're missing is "Parametrized Tests". However it seems to be more tedious in jUnit that in the .Net flavor. In NUnit, the following test executes 6 times with all combinations.
[Test]
public void MyTest(
[Values(1,2,3)] int x,
[Values("A","B")] string s)
{
...
}
For Java, your options seem to be:
JUnit supports this with version 4. However it's a lot of code (it seems, jUnit is adamant about test methods not taking parameters). This is the least invasive.
DDSteps, a jUnit plugin. See this video that takes values from appropriately named excel spreadsheet. You also need to write a mapper/fixture class that maps values from the spreadsheet into members of the fixture class, that are then used to invoke the SUT.
Finally, you have Fit/Fitnesse. It's as good as DDSteps, except for the fact that the input data is in HTML/Wiki form. You can paste from an excel sheet into Fitnesse and it formats it correctly at the push of a button. You need to write a fixture class here too.

Im afraid that I dont find the link anymore, but Junit 4 has some help functions to generate testdata. Its like:
public void testData() {
data = {2, 3, 4};
data = {3,4,5 };
...
return data;
}
Junit will then thest your methods will this data. But as I said, I cant' find the link anymore (forgot the keywords) for a detailed (and correct) example.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.