java unit test of a method interacting with binary files in filesystem

java unit test of a method interacting with binary files in filesystem - java

I'm quite new to java programming, but I'll try to use the correct terms and avoid misunderstandings as much as possible.
I've found some answers to topics quite similar to my problem but or I just cannot see how they really fit to my problem, or maybe they really just don't fit. Some of them use mocked objects but I'm not sure it is the right option in my case.
General description
I need to have an array of objects which information is loaded from a random accessed binary files. The first bytes of the binary files are the header of the files which define how the data is stored in the files, basically says the length of some fields which help to compute the position of desired data in the files.
So now I want to test the method that will be called to load the desired data, which is specified by UnitListElement object, to the Unit object. For this I only focus on a single reading of a binary file.
More detailed view
I have a java class called Unit with some attributes, let's say a,*b* and c. The value for this attributes is loaded with a method call getDataFromBinFile:
public class Unit{
public double[] a;
public double[] b;
public double[] c;
getDataFromBinFile(UnitListElement element){
<here loads the data from the binary file with random access>
}
}
The method for loading the data from the binary file, opens the binary file and access to the desired data in the binary file. The desired data to be read is specified in a UnitListElement object:
public class UnitListElement{
public String pathOfFile;
public int beginToReadAt; // info related to where the desired data begins
public int finishReading; // info related to where the desired data ends
}
The attributes beginToReadAt and finishReading time references which are used, along with the binary file's header, to compute the first and last byte positions to read from the binary file.
So what I need to do is a test where I call the method getDataFromBinFile(unitListEl) and test whether the info returned is correct or not.
options for solutions
1st option
In some posts with similar problems propose to use mock objects. I've tried to find documentation about mocking objects but I haven't found any easy beginners guide. So although not understanding mock objects very much, my impression is that the do not fit into this case since what I want to test is the reading of the binary file, not just the interaction with other objects.
2nd option
Another option is to create the binary file for the test inside the test with a helper method, f.i. with a #BeforeClass, and run the test with this temporary file and then delete it with a #AfterClass method.
Question
What do you think is the best practice considering a TDD approach? Do mock objects really fit in this case? If they do, is there any documentation with basic examples for total beginners?
or on the other hand, the creation of the file is more suitable for testing reading methods?
Thanks
Lots of thanks in advance.

Mocking can be applied to your case, but it is in fact not strictly necessary here. All you need is decouple the actual data processing logic in getDataFromBinFile from the code reading the bytes from files.
You can achieve this in (at least) two ways:
With mocks: hide the file reading code behind an interface method which takes a UnitListElement and returns a byte array, then use this in getDataFromBinFile. Then you can mock this interface in your tests with a mock reader which just returns some predefined bytes without accessing any files. (Alternatively, you can move the file reading logic into UnitListElement itself, as for now it seems to be a POD class.)
Without mocks: change the signature of getDataFromBinFile to take a byte array parameter instead of a UnitListElement. In your real production code, you can read the data from the file position described by the UnitListElement, then pass it to getDataFromBinFile. In your unit tests, you can just pass any binary data to it directly. (Note that in this case, it makes sense to rename your method to something like getDataFromBytes.)
For mocking, I have been using EasyMock so far. I find its documentation fairly easy to understand, hope that helps.

I don't have much experience in TDD. Is not required to use mocking when you are testing read/write to a file, best option is to have a test version of file on which you test will run. Mocking is meant to be used when you can not easily create a testable object for your use case, i.e if you are testing interaction with a server for example.

I don't prefer creating the test binary files , as any change in the format of file being read means changing the test files as well ( and thus the tests ) .
Since you are following a TDD approach , you must be having the tests written out for the "UnitListElement" class , hence for the situation mocking seems to be a better solution . Your objective is to test the "getDataFromBinFile" method and not the "UnitListElement" class methods (currently) hence you can mock "UnitListElement" class ( or interface inherited by it and passed to getDataFromBinFile method ) . Mocking "UnitListElement" means you can return predefined or any specific return values to any method calls in the class whenever it is accessed in "getDataFromBinFile" method . Finally you could use the returned values from your mock in the "getDataFromBinFile" method and assert for the return value of the method after your business logic is performed . I haven't used too many mocking frameworks , however most often i have been using EasyMock framework .For a start you can get a basic example of EasyMock over here

Just make a test binary file.
This process is reading a file. So there is no reason to worry about the file system. the file will always be deterministic (if you altered the file durning reading that would be an other story)
if you want to do a test with the objects after you've read them in, I would suggest just creating them in your test (unless this is very hard to do, like a sound file)
Also, I would suggest the abstraction of a stream instead of a file, but I would STILL test this with a test file. btw: make sure the test file is small, it's a test after all.
Some people might argue "test aren't suppose to hit the file system" but where do you think the .class files are loaded from?
Also, I would get the stream via the java classLoader
this.getClass().getResourceAsStream("yourfile.name");
happy testing!
Llewellyn Falco
http://www.approvaltests.com

Related

Test Retrieval from Data Structure without testing Store

Let's say I want to write some JUnit tests for a new, untested data structure BlackBox. It has an interface similar to a Map, but there's no way to tell what is going on inside of it:
blackBox.get(key);
blackBox.put(key, value);
How do I correctly unit test .get and .put in the two following scenarios? I cannot figure out how to test the two independently.
I am using TDD, and therefore want to write the tests first.
BlackBox has been written by someone else and I want to test it.
I know that if I had access to the source, I can do the following:
Whitebox.setInternalState(blackBox, "storage", storageObject);
assertEquals(blackBox.get("key"), expectedAnswer");
I can do the opposite to test .put(). The issue is that those tests rely on the implementation of the class.
So how can I individually test .get and .put without knowing or relying on the implementation details of the class?

I cannot figure out how to test the two independently.
why do you want to do that? does the contract state they are independent? i don't think so. i'm guessing the contract says:
new object is created as empty so get will return null / throw exception.
when you put something, you can get it.
you can not get what you didn't put.
when you put many times on same key you will get the latest value.
and so on. you can test each of those invariants. when you use Whitebox you start to test implementation details, not the contract and that makes the refactoring (e.g. using faster implementation) much harder

A better way to call static methods in user-submitted code?

I have a large data set. I am creating a system which allows users to submit java source files, which will then be applied to the data set. To be more specific, each submitted java source file must contain a static method with a specific name, let's say toBeInvoked(). toBeInvoked will take a row of the data set as an array parameter. I want to call the toBeInvoked method of each submitted source file on each row in the data set. I also need to implement security measures (so toBeInvoked() can't do I/O, can't call exit, etc.).
Currently, my implementation is this: I have a list of the names of the java source files. For each file, I create an instance of the custom secure ClassLoader which I coded, which compiles the source file and returns the compiled class. I use reflection to extract the static method toBeInvoked() (e.g. method = c.getMethod("toBeInvoked", double[].class)). Then, I iterate over the rows of the data set, and invoke the method on each row.
There are at least two problems with my approach:
it appears to be painfully slow (I've heard reflection tends to be slow)
the code is more complicated than I would like
Is there a better way to accomplish what I am trying to do?

There is no significantly better approach given the constraints that you have set yourself.
For what it is worth, what makes this "painfully slow" is compiling the source files to class files and loading them. That is many orders of magnitude slower than the use of reflection to call the methods.
(Use of a common interface rather than static methods is not going to make a measurable difference to speed, and the reduction in complexity is relatively small.)
If you really want to simplify this and speed it up, change your architecture so that the code is provided as a JAR file containing all of the compiled classes.

Assuming your #toBeInvoked() could be defined in an interface rather than being static (it should be!), you could just load the class and cast it to the interface:
Class<? extends YourInterface> c = Class.forName("name", true, classLoader).asSubclass(YourInterface.class);
YourInterface i = c.newInstance();
Afterwards invoke #toBeInvoked() directly.
Also have a look into java.util.ServiceLoader, which could be helpful for finding the right class to load in case you have more than one source file.

Personally, I would use an interface. This will allow you to have multiple instance with their own state (useful for multi-threading) but more importantly you can use an interface, first to define which methods must be implemented but also to call the methods.
Reflection is slow but this is only relative to other options such as a direct method call. If you are scanning a large data set, the fact you have to pulling data from main memory is likely to be much more expensive.

I would suggest following steps for your problem.
To check if the method contains any unwanted code, you need to have a check script which can do these checks at upload time.
Create an Interface having a method toBeInvoked() (not a static method).
All the classes which are uploaded must implement this interface and add the logic inside this method.
you can have your custom class loader scan a particular folder for new classes being added and load them accordingly.
When a file is uploaded and successfully validated, you can compile and copy the class file to the folder which class loader scans.
You processor class can lookup for new files and then call toBeInvoked() method on loaded class when required.
Hope this help. (Note that i have used a similar mechanism to load dynamically workflow step classes in Workflow Engine tool which was developed).

Suggestions for a java Mock File (to mock java.io.File)

Does anyone have suggestions for a java mock File object?
I Am using a thirdparty class which need to get a java.io.File object as argument.
I receive the data for this file in a stream over a webservice (also one of their products).
One solution is to write all this data to a file and offer this to the class. This is a solution I don't like: it takes away the advantage of using the webservice in stead of just downloading the file.
Quicker and more efficient would be to put this data from memory in a Mock File and offer this Mock File to the thirdparty class.
It would probably have to be a MockFile extending the java.io.File and overriding all the functions that do actual interfacing with the file on the hard disk.
I know the thirdparty should have used a stream as an input argument in stead of a file. However, this is beyond my influence.

This is just a suggestion based on my understanding of your question.
I believe, you must be doing something like this,
public void doSomething(){
//Pre processing
Object result=new ThirdPartyCode().actualMethod(file);
//Post processing
}
Mock objects make more sense from an unit testing perspective.
Your objective is not to unit test the third party library function.Whereas it is to unit test doSomething() method. So probably you can create a wrapper around the third party function.Maybe something like this,
public class Wrapper implements MyWrapper{
public Object invokeThirdPartyFunction(File file){
new ThirdPartyCode().actualMethod(file);
}
}
Now you can create a mock wrapper(implementing the same interface) and use this mock wrapper for all your junit cases.

Does the tested class only query the mock File's name, attributes etc., or does it actually attempt to open the file?
In the former case, you can easily create your mock using e.g. EasyMock or an equivalent mocking framework.
The latter case is more tricky, and I am afraid if the input stream is created internally by the class, you have no choice other than actually creating a real test file on the HD.

You could load the 3rd party code using an ASM based classloader that maps java.io.File to your own "fake" implementation. It's a bit of work, and needs to be performed carefully... For example you will need to also map FileInputStream, etc.

You don't use file (or any external dependency in Unit tests). Except using mocks, your approaches will result in problematic tests.
See this javaranch article for more

Save object in debug and than use it as stub in tests

My application connects to db and gets tree of categories from here. In debug regime I can see this big tree object and I just thought of ability to save this object somewhere on disk to use in test stubs. Like this:
mockedDao = mock(MyDao.class);
when(mockedDao.getCategoryTree()).thenReturn(mySavedObject);
Assuming mySavedObject - is huge enough, so I don't want to generate it manually or write special generation code. I just want to be able to serialize and save it somewhere during debug session then deserialize it and pass to thenReturn in tests.
Is there is a standard way to do so? If not how is better to implement such approach?

I do love your idea, it's awesome!
I am not aware of a library that would offer that feature out of the box. You can try using ObjectOutoutStream and ObjectInputStream (ie the standard Java serialization) if your objects all implement Seriablizable. Typically they do not. In that case, you might have more luck using XStream or one of its friends.

We usually mock the entire DB is such scenarios, reusing (and implicitly testing) the code to load the categories from the DB.
Specifically, our unit tests run against an in-memory database (hsqldb), which we initialize prior to each test run by importing test data.

Have look at Dynamic Managed Beans - this offers a way to change values of a running java application. Maybe there's a way to define a MBean that holds your tree, read the tree, store it somewhere and inject it again later.

I've run into this same problem and considered possible solutions. A few months ago I wrote custom code to print a large binary object as hex encoded strings. My toJava() method returns a String which is source code for a field definition of the object required. This wasn't hard to implement. I put log statements in to print the result to the log file, and then cut and paste from the log file to a test class. New unit tests reference that file, giving me the ability to dig into operations on an object that would be very hard to build another way.
This has been extremely useful but I quickly hit the limit on the size of bytecode in a compilation unit.

Writing long test method names to describe tests vs using in code documentation

For writing unit tests, I know it's very popular to write test methods that look like
public void Can_User_Authenticate_With_Bad_Password()
{
...
}
While this makes it easy to see what the test is testing for, I think it looks ugly and it doesn't display well in auto-generated documentation (like sandcastle or javadoc).
I'm interested to see what people think about using a naming schema that is the method being tested and underscore test and then the test number. Then using the XML code document(.net) or the javadoc comments to describe what is being tested.
/// <summary>
/// Tests for user authentication with a bad password.
/// </summary>
public void AuthenticateUser_Test1()
{
...
}
by doing this I can easily group my tests together by what methods they are testing, I can see how may test I have for a given method, and I still have a full description of what is being tested.
we have some regression tests that run vs a data source (an xml file), and these file may be updated by someone without access to the source code (QA monkey) and they need to be able to read what is being tested and where, to update the data sources.

I prefer the "long names" version - although only to describe what happens. If the test needs a description of why it happens, I'll put that in a comment (with a bug number if appropriate).
With the long name, it's much clearer what's gone wrong when you get a mail (or whatever) telling you which tests have failed.
I would write it in terms of what it should do though:
LogInSucceedsWithValidCredentials
LogInFailsWithIncorrectPassword
LogInFailsForUnknownUser
I don't buy the argument that it looks bad in autogenerated documentation - why are you running JavaDoc over the tests in the first place? I can't say I've ever done that, or wanted generated documentation. Given that test methods typically have no parameters and don't return anything, if the method name can describe them reasonably that's all the information you need. The test runner should be capable of listing the tests it runs, or the IDE can show you what's available. I find that more convenient than navigating via HTML - the browser doesn't have a "Find Type" which lets me type just the first letters of each word of the name, for example...

Does the documentation show up in your test runner? If not that's a good reason for using long, descriptive names instead.
Personally I prefer long names and rarely see the need to add comments to tests.

I've done my dissertation on a related topic, so here are my two cents: Any time you rely on documentation to convey something that is not in your method signature, you are taking the huge risk that nobody would read the documentation.
When developers are looking for something specific (e.g., scanning a long list of methods in a class to see if what they're looking for is already there), most of them are not going to bother to read the documentation. They want to deal with one type of information that they can easily see and compare (e.g., names), rather than have to start redirecting to other materials (e.g., hover long enough to see the JavaDocs).
I would strongly recommend conveying everything relevant in your signature.

Personally I prefer using the long method names. Note you can also have the method name inside the expression, as:
Can_AuthenticateUser_With_Bad_Password()

I suggest smaller, more focussed (test) classes.
Why would you want to javadoc tests?

What about changing
Can_User_Authenticate_With_Bad_Password
to
AuthenticateDenieTest
AuthenticateAcceptTest
and name suit something like User

As a Group how do we feel about doing a hybrid Naming schema like this
/// <summary>
/// Tests for user authentication with a bad password.
/// </summary>
public void AuthenticateUser_Test1_With_Bad_Password()
{
...
}
and we get the best of both.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java unit test of a method interacting with binary files in filesystem - java

Related

Test Retrieval from Data Structure without testing Store

A better way to call static methods in user-submitted code?

Suggestions for a java Mock File (to mock java.io.File)

Save object in debug and than use it as stub in tests

Writing long test method names to describe tests vs using in code documentation

Categories

Resources