Which access modifier should I use for my methods in java?

Which access modifier should I use for my methods in java? - java

I thought that it's such a common question, but I can't find anything which helps me. I'm relatively new to java and exercising for a job application. For that I've started writing a tool to transform data. (eg. read CSV, translate some columns and write it as SQL inserts to a file)
If you're interested you find it here, I'll copy some code for my question: https://github.com/fvosberg/datatransformer
I've started with a Class which should read the CSV (and which will get more complex by enclosing some fields which should contain the seperator and so on). My IDE (IntellJ IDEA) suggested to use as strict access modifiers for my methods as possible. Why should I hide these methods (with private) from subclasses?
package de.frederikvosberg.datatransformer;
import java.io.BufferedReader;
import java.io.Reader;
import java.util.*;
class CSVInput {
private final BufferedReader _reader;
private String separator = ",";
public CSVInput(Reader reader) {
_reader = new BufferedReader(reader);
}
public List<SortedMap<String, String>> readAll() throws java.io.IOException {
List<SortedMap<String, String>> result = new LinkedList<>();
List<String> headers = readHeaders();
String line;
while ((line = _reader.readLine()) != null) {
result.add(
colsFromLine(headers, line)
);
}
_reader.close();
return result;
}
private List<String> readHeaders() throws java.io.IOException {
List<String> headers = new ArrayList<>();
String line = _reader.readLine();
if (line == null) {
throw new RuntimeException("There is no first line for the headers in the CSV");
}
return valuesFromLine(line);
}
public void setSeparator(String separator) {
this.separator = separator;
}
/**
* creates a list of values from a CSV line
* it uses the separator field
*
* #param line a line with values separated by this.separator
* #return a list of values
*/
private List<String> valuesFromLine(String line) {
return Arrays.asList(
line.split(this.separator)
);
}
private SortedMap<String, String> colsFromLine(List<String> headers, String line) {
SortedMap<String, String> cols = new TreeMap<>();
List<String> values = valuesFromLine(line);
Iterator<String> headersIterator = headers.iterator();
Iterator<String> valuesIterator = values.iterator();
while (headersIterator.hasNext() && valuesIterator.hasNext()) {
cols.put(headersIterator.next(), valuesIterator.next());
}
if (headersIterator.hasNext() || valuesIterator.hasNext()) {
throw new RuntimeException("The size of a row doesn't fit with the size of the headers");
}
return cols;
}
}
Another downside are the unit test. I would like to write separate tests for my methods. Especially the CSVInput::valuesFromLine method which will get more complex. My Unit test for this class is testing so much and I really don't want to have to many things in my head when developing.
Any suggestions from experienced Java programmers?
Thanks in advance
Replies to comments
Thank you for your comments. Let me answer to the comments here, for the sake of clarity.
"Why should I hide these methods (with private) from subclasses?" Why
do you keep your car keys away from your front door?
For security purposes, but why does it affect security when I change the access modifier of the colsFromLine method? This method accepts the headers as a parameter, so it doesn't rely on any internal state nor change it.
The next advantage of strict access modifiers I can think of is to help other developers to show them which method he should use and where the logic belongs to.
Don't write your test to depend on the internal implementation of the
functionality, just write a test to verify the functionality.
I don't. It depends on what do you mean with internal implementation. I don't check any internal states or variables. I just want to test the algorithm which is going to parse the CSV in steps.
"My Unit test for this class is testing so much" - If too many tests
on a class, you should rethink your design. It's very likely that your
class literally is doing too much, and should be broken up.
I don't have many tests on my class, but when I'm going the way I started, I'm going to write many tests for the same method (parsing the CSV) because it has many edge cases. The tests will grow in size because of different boilerplates. That is my concern why I'm asking here

To answer your direct question: you always strive to hide as much as possible from either client code, but also from subclasses.
The point is: you want (theoretically) to be able to change parts/all of your implementation without affecting other elements in your system. When client/subclass code knows about such implementation details ... sooner or later, such code starts relying on them. To avoid that, you keep them out of sight. The golden rule is: good OO design is about behavior (or "contracts") of objects and methods. You absolutely do not care how some method does its job; you only care about the what it does. The how part should be invisible!
Having said that, sometimes it does make sense to give "package protected" visibility to some of your methods; in order to make to make them available within your unit tests.
And beyond that: I don't see much point in extending your CsvInput (prefer camel case, even for class names!) class anyway. As usual: prefer composition over inheritance!
In any case, such kind of "assignments" are excellent material for practicing TDD. You write a test (that checks one aspect); and then you write the code to pass that test. Then you write another test checking "another" condition; and so on.

Related

Mockito style anyXXX methods for unit testing

While unit testing some methods, there can be some scenarios where value of some parameters do not matter and can be any value.
For example in this piece of code:
public void method(String arg1, String arg2, int arg3){
if(arg1 == null) throw new NullPointerException("arg1 is null");
//some other code
}
unit testing the behavior that when arg1 is null then NPE must be thrown, the values of other arguments do not matter, they can be any value or be null.
So I wanted to document the fact that the values do not matter for the method under test.
I thought of following options:
Option 1: Define constants of ANY_XXX
I thought of explicitly creating constants ANY_STRING and ANY_INT, which contain a fixed value which documents that it can be any value and the method under test does not care about the actual value.
I can put all these constants in a single class called Any and reuse them across all test classes.
Option 2: Random values for ANY_XXX
This option seems a bit hacky to me as I have read somewhere that randomness should not be brought into test cases. But in this scenario this randomness will not be visible as the parameters will not create any side effect.
Which approach would be more suitable for better, readable tests?
UPDATE:
While I can use ANY_XXX approach by defining constants in Any class, but I am also thinking of generating ANY_XXX values with some constraints such as
Any.anyInteger().nonnegative();
Any.anyInteger().negative();
Any.anyString().thatStartsWith("ab");
I am thinking that maybe Hamcrest Matchers can be used for creating this chaining. But I am not sure if this approach is a good one. Similar methods for anyObject() are already provided by Mockito but those only work on Mocks and spies and not on normal objects. I want to achieve the same for normal objects for more readable tests.
Why I want to do this?
Suppose I have a class
class MyObject{
public MyObject(int param1, Object param2){
if(param1 < 0) throw new IllegalArgumentException();
if(param2 == null) throw new NullPointerException();
}
}
And now while writing tests for constructor
class MyObjectTest{
#Test(expected=NullPointerException.class)
public void testConstructor_ShouldThrowNullpointer_IfSecondParamIsNull(){
//emphasizing the fact that value of first parameter has no relationship with result, for better test readability
new MyObject(Any.anyInteger().nonnegative(), null);
}
}

I see both og them quite a lot
Personally I disagree that randomness should not be brought into tests. Using randomness to some degree should make your tests more robust, but not necessarily easier to read
If you go for the first approach I would not create a constants class, but rather pass the values (or nulls) directly, since then you see what you pass in without the need to have a look in another class - which should make your tests more readable. You can also easily modify your tests later if you need the other parameters later on

My preference is to build up a utility class of constants along with methods to help with the creation of the constant values for tests, e.g.:
public final class Values {
public static final int ANY_INT = randomInt(Integer.MIN_VALUE, Integer.MAX_VALUE);
public static final int ANY_POSITIVE_INT = randomInt(0, Integer.MAX_VALUE);
public static final String ANY_ISBN = randomIsbn();
// etc...
public static int randomInt(int min, int max) { /* omitted */ }
public static String randomIsbn() { /* omitted */ }
// etc...
}
Then I would use static imports to pull the constants and methods I needed for a particular test class.
I use the ANY_ constants only in situations where I do not care about the value, I find that they can make the intent of the test clearer, for example:
// when
service.fooBar(ANY_INT, ANY_INT, ANY_INT, ANY_INT, 5);
It's clear that the value 5 is of some significance - although it would be better as a local variable.
The utility methods can be used for adhoc generation of values when setting up tests, e.g.:
// given
final String isbn1 = randomIsbn();
final String isbn2 = randomIsbn();
final Book[] books = { new Book(isbn1), new Book(isbn2) };
// when
bookRepository.store(books);
Again, this can help to keep the test classes concerned about the tests themselves and less about data set up.
In addition to this I have also used a similar approach from domain objects. When you combine the two approaches it can be quite powerful. e.g.:
public final class Domain {
public static Book book() {
return new Book(randomIsbn());
}
// etc...
}

I've faced the same problem when i've started to write unit tests for my project and had to deal with numerous of arrays, lists, integer inputs, strings etc.
So I decided to use QuickCheck and create a generator util class.
Using Generators in this library, you can generate primitive data types and String easily.
For example, when you want to Generate an integer; simply use IntegerGenerator class.You can define maximum and minimum values in the constructor of the generator.You can also use CombinedGeneratorSamples class to generate data structures like lists, maps and arrays.
Another feature of this library is implementing Generator interface for custom class generators.

You're overthinking and creating unnecessary barriers for your project :
if you want to document your method, do it with words! that's why the Javadoc is here for
if you want to test your method with "any positive int" then just call it with a couple different positive ints. In your case, ANY does not mean testing every possible integer value
if you want to test your method with "a string that starts with ab", call it with "abcd", then "abefgh" and just add a comment on the test method !
Sometimes we are so caught with frameworks and good practices that it takes common sense away.
In the end : most readable = simplest

How about using a caller method for the actual method.
//This is the actual method that needs to be tested
public void theMethod(String arg1, String arg2, int arg3, float arg4 ){
}
Create a caller method that calls the method with the required parameters and default(or null) values for the rest of the params and run your test case on this caller method
//The caller method
#Test
public void invokeTheMethod(String param1){
theMethod(param1, "", 0, 0.0F); //Pass in some default values or even null
}
Although you will have to be pretty sure that passing default values on theMethod(...) for the other parameters wont cause any NPE.

i see 3 options:
never pass nulls, forbid your team passing nulls. nulls are evil. passing null should be an exception, not a rule
simply use annotation in production code: #NotNull or sth like that. if u use lombok, this annotation will also do the actual validation
and if u really have to do it in tests then simply create a test with proper name:
static final String ANY_STRING = "whatever";
#Test
public void should_throw_NPE_when_first_parameter_is_null() {
object.method(null, ANY_STRING, ANY_STRING); //use catch-exception or junit's expected
}

If you're willing to give JUnitParams' framework a go, you could parametrize your tests specifying meaningful names to your parameters:
#Test
#Parameters({
"17, M",
"2212312, M" })
public void shouldCreateMalePerson(int ageIsNotRelevant, String sex) throws Exception {
assertTrue(new Person(ageIsNotRelevant, sex).isMale());
}

I'm always in favor of the constants approach. The reason is that I believe it gets more readable than chaining several matchers.
Instead of your example:
class MyObjectTest{
#Test(expected=NullPointerException.class)
public void testConstructor_ShouldThrowNullpointer_IfSecondParamIsNull(){
new MyObject(Any.anyInteger().nonnegative(), null);
}
}
I would d:
class MyObjectTest{
private static final int SOME_NON_NEGATIVE_INTEGER = 5;
#Test(expected=NullPointerException.class)
public void testConstructor_ShouldThrowNullpointer_IfSecondParamIsNull(){
new MyObject(SOME_NON_NEGATIVE_INTEGER, null);
}
}
Also, I prefer the use of 'SOME' over 'ANY', but that's also a matter of personal taste.
If you're considering testing the constructor with a number of different variants as you mentioned (nonNegative(), negative(), thatStartsWith(), etc.), I would that instead you write parameterized tests. I recommend JUnitParams for that, here's how I'd do it:
#RunWith(JUnitParamRunner.class)
class MyObjectTest {
#Test(expected = NullPointerException.class)
#Parameters({"-4000", "-1", "0", "1", "5", "10000"})
public void testConstructor_ShouldThrowNullpointer_IfSecondParamIsNull(int i){
new MyObject(i, null);
}
...
}

I suggest you go with constant values for those parameters which may be arbitrary. Adding randomness makes your test runs not repeatable. Even if parameter values "don't matter" here, actually the only "interesting" case is when a test fails, and with random behavior added in, you might not be able to reproduce the error easily. Also, simpler solutions are often better, and easier to maintain: using a constant is certainly simpler than using random numbers.
Of course if you go with constant values, you could put these values in static final fields, but you could also put them in methods, with names such as arbitraryInt() (returning e.g. 0) and so on. I find the syntax with methods cleaner than with constants, as it resembles Mockito's any() matchers. It also allows you to replace the behavior more easily in case you need to add more complexity later on.
In case you want to indicate that a parameter doesn't matter and the parameter is an object (not primitive type), you can also pass empty mocks, like so: someMethod(null, mock(MyClass.class)). This conveys to a person reading the code that the second parameter can be "anything", since a newly created mock has only very basic behavior. It also doesn't force you to create your own methods for returning "arbitrary" values. The downside is it doesn't work for primitive types or for classes which can't be mocked, e.g. final classes like String.

Ok.... I see a big Problem with you approach!
The other value doesn't matter? Who guarantees this? The Writer of the Test, the writer of the Code? What if you have a Method, which throws some unrelated Exception if the first Parameter is exactly 1000000 even if the second parameter is NULL ?
You have to formulate your Test-Cases: What is the Test-Specification... What do you want to proof? Is it:
In some cases if the first parameter is some arbitrary value and the second is null, this method should throw a NullPointerException
For any possible first Input value, if the second value is NULL the method should always throw a NullPointerException
If you want to test the first case - your approach is ok. Use a constant, a random value, a Builder... whatever you like.
But if your specification actually requires the 2nd condition all of your presented solutions are not up for the task, since they only test some arbitrary value. A good test should still be valid if the programmer changes some code in the method. This means the right way to test this method would be a whole series of Testcases, testing all corner-cases as with all other methods. So each critical value which can lead to a different execution-path should be checked - or you need a testsuite which checks for code-path completeness...
Otherwise your test is just bogus and there to look pretty...

How do I deal with excessive arguments in constructors?

I am in the process of making an application which is meant to be a personal pet project of mine, designed around comparing two vehicles against 1 another from a computer game I play
the vehicles have a bunch of stats such as speed, healthpoints, turret traversal etc. and I want to create a small app that will highlight the differences between the vehicles, however i have come to a stumbling block where the arguments taken for the constructors for each vehicles are huge and difficult to read
Here is an example of object creation with the given constructor:
HeavyTank T110E5 = new HeavyTank("T110E5", 2200,54.56d, 875, 37, 30, 254,76,38, 203,127,70,300, 202, 6,32, 400,745,10);
I am sure you would guess that this application I am making, is a tank comparer based off World of tanks, where I am hard coding the tank stats, but as you can see, the arguments taken are difficult to read, making it difficult to create new objects without getting confused. Each tank has different stats so this means I would have to hard code close to 100+ tanks individually. If someone here has a solution of reducing the mess, or recommendations I am willing to listen.
I would also like to reinstate a point I made up at the top, this application is not for commercial purposes and is purely just a pet personal project of mine.

I would suggest storing your tank data in a simple file format, for example a CSV file with one line per tank. Then your tank object can take something like an InputStream as a parameter and stream in the tank details from the file.
And if you use a CSV file you can just use a typical spreadsheet program to edit your tank details quite easily.

This sounds like a perfect use case for the builder pattern.
There are lots of good examples on how to implement this. Here's how Google uses it in the CacheBuilder class of Guava:
LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
.maximumSize(10000)
.expireAfterWrite(10, TimeUnit.MINUTES)
.removalListener(MY_LISTENER)
.build(
new CacheLoader<Key, Graph>() {
public Graph load(Key key) throws AnyException {
return createExpensiveGraph(key);
}
});
This particular example even uses method chaining, but that's less important. The main feature is that newBuilder() sets some default parameters which you can then selectively adjust with one or more method calls. Finally, you call build() to actually create the instance (and calling the actual constructor with the long list of parameters).

JavaDoc + good IDE will help you figure out which arguments you need to provide.
Otherwise, you could wrap the arguments in a single object and pass that to the constructor. An overloaded constructor would work well, because you could unpack the arguments and pass it to the original.
If you know the arguments in advance, you would always put them in a Properties file and read them using java.util.Properties.

Use a fluent interface. Though generally used outside the ccnstructor, why not partly inside.

You might consider putting that information in a properties file that you could then easily edit. Your constructor would then take the name of the properties file and have the additional job of enforcing completeness and validation. While I still favor strongly-typed constructors, a property file approach might be easier to use in the long run.

If you have a mix of required and optional arguments (with sensible defaults), you can use the builder pattern. An example from Effective Java:
public class NutritionalFacts {
private int sodium;
private int fat;
private int carbo;
public static class Builder {
private int sodium;
private int fat;
private int carbo;
public Builder(int s) {
this.sodium = s;
}
public Builder fat(int f) {
this.fat = f;
return this;
}
public Builder carbo(int c) {
this.carbo = c;
return this;
}
public NutritionalFacts build() {
return new NutritionalFacts(this);
}
}
private NutritionalFacts(Builder b) {
this.sodium = b.sodium;
this.fat = b.fat;
this.carbo = b.carbo;
}
}
Then you can do things like:
NutritionFacts cocaCola = new NutritionFacts.Builder(240, 8). calories(100).sodium(35).carbohydrate(27).build();
The first two arguments are required (240 and 8). The other attributes are optional and can be supplied through aptly-named methods.
You can still use the builder pattern if you have a large number of required parameters; you can check that they have been set in the build() method.

I recommend implementing a Standard Constructor without any Parameters. Setters and Getters can be generated via eclipse. This helps keeping a look at all and adding or removing members.
The next thing you should do is to write all your vehicle stuff in a XML file. The tags help you with the vehicle names, values etc. Also adding and removing vehicles will be easier.
You can access XML files via JDOM. Just read the vehicle tags and call the setter methods with the values from the XML file. This can be done with a loop.
In the end, the only thing you have to take care of is your vehicle XML file.
Feel free to ask if there are any questions.

You have three reasonable options that I see:
1) Use the standard contructor("string", value, "string", value,...) format.
This is likely more effort than you will want to put into this program.
2) Use a constructor() method and then setPropertyName(value) methods for each property.
This will allow you to easily add them, but you still need to do it by hand.
3) Read the data in from a file.
Setup the file using columns with names (to avoid just moving the "I forgot what value comes next" problem to a different place) This should be the best.

My suggestion would be to create an object to hold these types with a specific parameter list upon initialization. You can divide up that object based on what types go together logically. Then you'll only need two or three constructors with five parameters each. For example:
HeavyTank T110E5 = new HeavyTank("T110E5", 2200,54.56d, 875, 37, 30, 254,76,38, 203,127,70,300, 202, 6,32, 400,745,10);
would become:
StatObject1 stat1 = new StatObject1(2200, 54.56d);
StatObject2 stat2 = new StatObject2(875, 37, 30);
StatObject3 stat3 = new StatObject3(254, 76, 38);
...
HeavyTank T110E5 = new HeavyTank("T110E5", stat1, stat2, stat3, ...);
or however that works out logically for your system. The benefit of this is that you can apply this to any other tank you create, and can create modifiers in every StatObject class specific to that data.

Using { } to segment large blocks of code to improve code-readability - Good practice?

I'm considering the option of using anonymous { } code blocks to logically distinguish "code blocks" inside the same method call, something that (theoretically) should improve readability of the code.
I'm wondering which of the following 2 code segments is better to your eyes?
Also, are the 2 code segments compile to the same bytecode?, In other words, can using { } hurt in any way the performance of the code?
Option 1: Code block without { } identation
public static String serviceMatch(HttpServletRequest servletRequest, RequestTypeEnum requestTypeEnum, ...censorsed..., RequestStatistics requestStatistics) {
Request request;
// We get the parser that fits the ...censorsed..., effectively transforming the HTTPReqeuest to application local "Request*" object
RequestParser parser = RequestParserFactory.getParser(...censorsed...);
// Populate basic parameters, the "heavy" data will be lazy loaded
request = parser.parse(servletRequest);
// Instead of polluting the parsers let's put it here... (unless we identify meaningful justifications for the other alternative of changing RequestParser.parse() interface.
request.requestType = requestTypeEnum;
// Store the request statistics object on the request, so that we have access to it from all over the code
request.requestStatistics = requestStatistics;
// Update timestamp when request was parsed
request.requestStatistics._1_end_parseRequest = System.currentTimeMillis();
/*
* ...censorsed...
*/
MatchResult matchResult = Matcher.findMatch(...censorsed...);
/*
* ...censorsed...
*/
String reply = ReplyFormatFactory.getFormatter(...censorsed...
// Update timestamp when reply finished construction
request.requestStatistics._6_end_formatReply = System.currentTimeMillis();
return reply;
}
Option 2: Code block with { } identation
public static String serviceMatch(HttpServletRequest servletRequest, RequestTypeEnum requestTypeEnum, ...censorsed..., RequestStatistics requestStatistics) {
Request request;
/*
* Request parsing block
*/
{
// We get the parser that fits the ...censorsed..., effectively transforming the HTTPReqeuest to application local "Request*" object
RequestParser parser = RequestParserFactory.getParser(...censorsed...);
// Populate basic parameters, the "heavy" data will be lazy loaded
request = parser.parse(servletRequest);
// Instead of polluting the parsers let's put it here... (unless we identify meaningful justifications for the other alternative of changing RequestParser.parse() interface.
request.requestType = requestTypeEnum;
// Store the request statistics object on the request, so that we have access to it from all over the code
request.requestStatistics = requestStatistics;
}
// Update timestamp when request was parsed
request.requestStatistics._1_end_parseRequest = System.currentTimeMillis();
/*
* ...censorsed...
*/
MatchResult matchResult = Matcher.findMatch(...censorsed...);
/*
* ...censorsed...
*/
String reply = ReplyFormatFactory.getFormatter(...censorsed...
// Update timestamp when reply finished construction
request.requestStatistics._6_end_formatReply = System.currentTimeMillis();
return reply;
}
Thanks for the review, Maxim.

If you're looking into adding extra { }'s within the same method just for the sake of readability, my advice would be to consider refactoring your method into several smaller methods. These smaller methods have the advantage of being easier to understand by themselves, and being more reusable (if they are "loosely coupled"). See the single responsibility principle.

If you come to the state that it would be handy to put the brackets around some part of code (like in Option 2), you should move it to its own method. That's what improves readability.
By the way, I also think you don't really need to comment every single line of your code. For example the timestamp update is self-explanatory even without the comment.

I don't generally add a brace-delimited block without some syntactic reason, but if a variable will only be needed within a limited scope, I'd rather created a nested scope than define the variable in the middle of a larger one (since in the latter case there's no clear indication when the variable goes out of 'useful' scope).
As for pulling out such a code block into another method, I think it's a good idea if the resulting method both (1) has a reasonable batch of parameters, and (2) can be given a name that describes its behavior as well as the actual code does. If using the method would require passing an excessive number of parameters, or if one would have to look at the code in the method to understand what its caller is doing, then I think it's better to use an anonymous scoping block.

I think this is a bit subjective, no right or wrong answer... my opinion is don't do it. Separate blocks of code with comment blocks that precede and explain why they are different, but don't use the braces. When I see braces, I immediately think there should be a leading if, while, or something... and not finding is is a little weird.

You should probably use separate methods instead. You can call the first block processRequest. Anyone who reads this code will be able to see which parameters are used, what data is returned, what it does (even without comments). Blocks don't provide such information.
Bytecode will likely be the same.

I sometimes prefer to use the second option. That happens when extracting separate methods would lead to mess with multiple return parameters (that is, wrapping them in artificial object).

Lighttpd has a comment blocks in configuration file, made in this style;
#{{{ module name
module.option = value;
module.option = value;
#}}}
So you can just comment instead of {}'ing your code.
In Perl, anything inside { }, sub { } or eval { } will be evaluated; however, keeping a large amount of { } blocks inside some sub-routine is considered bad enough to push the code out in smaller parts;
$html .= eval { $val =
&getNextPiece(); return $val; };
So the practice is known.

Braces are usually used to group statements for control structures and the like. I find them jarring when used for anything else.
If I have an overlong function that (for whatever reason) I don't want to split up, I break it apart into blocks with comments.

Braces { } have their purpose (even more in Java 7) and I think they are rarely used just for readability. Personally, if they are used like in Option 2 the first thing that comes to my mind is that, "Is this a static block?". Hence, I find Option 1 "more normal" and readable.
If you are really keen on sticking with one method and not refactoring this chuck of code as suggested by many here, then use comments as line separators instead. Something like:
/* -------------------------------------------- */
/* describe in detail here why you don't want to put this in another method */
/* so other readers will know why! */
// We get the parser that fits the ...censorsed..., effectively transforming the HTTPReqeuest to application local "Request*" object
RequestParser parser = RequestParserFactory.getParser(...censorsed...);
// Populate basic parameters, the "heavy" data will be lazy loaded
request = parser.parse(servletRequest);
// Instead of polluting the parsers let's put it here... (unless we identify meaningful justifications for the other alternative of changing RequestParser.parse() interface.
request.requestType = requestTypeEnum;
// Store the request statistics object on the request, so that we have access to it from all over the code
request.requestStatistics = requestStatistics;
}
/* -------- END of confusing block ------------- */
IMHO, comments are probably the best in making codes readable.

If you're developing in C# I would advise you to use #region ... #endregion instead for readability purpose.

How can I simplify testing of side-effect free methods in Java?

Functions (side-effect free ones) are such a fundamental building block, but I don't know of a satisfying way of testing them in Java.
I'm looking for pointers to tricks that make testing them easier. Here's an example of what I want:
public void setUp() {
myObj = new MyObject(...);
}
// This is sooo 2009 and not what I want to write:
public void testThatSomeInputGivesExpectedOutput () {
assertEquals(expectedOutput, myObj.myFunction(someInput);
assertEquals(expectedOtherOutput, myObj.myFunction(someOtherInput);
// I don't want to repeat/write the following checks to see
// that myFunction is behaving functionally.
assertEquals(expectedOutput, myObj.myFunction(someInput);
assertEquals(expectedOtherOutput, myObj.myFunction(someOtherInput);
}
// The following two tests are more in spirit of what I'd like
// to write, but they don't test that myFunction is functional:
public void testThatSomeInputGivesExpectedOutput () {
assertEquals(expectedOutput, myObj.myFunction(someInput);
}
public void testThatSomeOtherInputGivesExpectedOutput () {
assertEquals(expectedOtherOutput, myObj.myFunction(someOtherInput);
}
I'm looking for some annotation I can put on the test(s), MyObject or myFunction to make the test framework automatically repeat invocations to myFunction in all possible permutations for the given input/output combinations I've given, or some subset of the possible permutations in order to prove that the function is functional.
For example, above the (only) two possible permutations are:
myObj = new MyObject();
myObj.myFunction(someInput);
myObj.myFunction(someOtherInput);
and:
myObj = new MyObject();
myObj.myFunction(someOtherInput);
myObj.myFunction(someInput);
I should be able to only provide the input/output pairs (someInput, expectedOutput), and (someOtherInput, someOtherOutput), and the framework should do the rest.
I haven't used QuickCheck, but it seems like a non-solution. It is documented as a generator. I'm not looking for a way to generate inputs to my function, but rather a framework that lets me declaratively specify what part of my object is side-effect free and invoke my input/output specification using some permutation based on that declaration.
Update: I'm not looking to verify that nothing changes in the object, a memoizing function is a typical use-case for this kind of testing, and a memoizer actually changes its internal state. However, the output given some input always stays the same.

If you are trying to test that the functions are side-effect free, then calling with random arguments isn't really going to cut it. The same applies for a random sequence of calls with known arguments. Or pseudo-random, with random or fixed seeds. There's a good chance are that a (harmful) side-effect will only occur with any of the sequence of calls that your randomizer selects.
There is also a chance that the side-effects won't actually be visible in the outputs of any of the calls that you are making ... no matter what the inputs are. They side-effects could be on some other related objects that you didn't think to examine.
If you want to test this kind of thing, you really need to implement a "white-box" test where you look at the code and try and figure out what might cause (unwanted) side-effects and create test cases based on that knowledge. But I think that a better approach is careful manual code inspection, or using an automated static code analyser ... if you can find one that would do the job for you.
OTOH, if you already know that the functions are side-effect free, implementing randomized tests "just in case" is a bit of a waste of time, IMO.

I'm not quite sure I understand what you are asking, but it seems like Junit Theories (http://junit.sourceforge.net/doc/ReleaseNotes4.4.html#theories) could be an answer.

In this example, you could create a Map of key/value pairs (input/output) and call the method under test several times with values picked from the map. This will not prove, that the method is functional, but will increase the probability - which might be sufficient.
Here's a quick example of such an additional probably-functional test:
#Test public probablyFunctionalTestForMethodX() {
Map<Object, Object> inputOutputMap = initMap(); // this loads the input/output values
for (int i = 0; i < maxIterations; i++) {
Map.Entry test = pickAtRandom(inputOutputMap); // this picks a map enty randomly
assertEquals(test.getValue(), myObj.myFunction(test.getKey());
}
}
Problems with a higher complexity could be solved based on the Command pattern: You could wrap the test methods in command objects, add the command object to a list, shuffle the list and execute the commands (= the embedded tests) according to that list.

It sounds like you're attempting to test that invoking a particular method on a class doesn't modify any of its fields. This is a somewhat odd test case, but it's entirely possible to write a clear test for it. For other "side effects", like invoking other external methods, it's a bit harder. You could replace local references with test stubs and verify that they weren't invoked, but you still won't catch static method calls this way. Still, it's trivial to verify by inspection that you're not doing anything like that in your code, and sometimes that has to be good enough.
Here's one way to test that there are no side effects in a call:
public void test_MyFunction_hasNoSideEffects() {
MyClass systemUnderTest = makeMyClass();
MyClass copyOfOriginalState = systemUnderTest.clone();
systemUnderTest.myFunction();
assertEquals(systemUnderTest, copyOfOriginalState); //Test equals() method elsewhere
}
It's somewhat unusual to try to prove that a method is truly side effect free. Unit tests generally attempt to prove that a method behaves correctly and according to contract, but they're not meant to replace examining the code. It's generally a pretty easy exercise to check whether a method has any possible side effects. If your method never sets a field's value and never calls any non-functional methods, then it's functional.
Testing this at runtime is tricky. What might be more useful would be some sort of static analysis. Perhaps you could create a #Functional annotation, then write a program that would examine the classes of your program for such methods and check that they only invoke other #Functional methods and never assign to fields.
Randomly googling around, I found somebody's master's thesis on exactly this topic. Perhaps he has working code available.
Still, I will repeat that it is my advice that you focus your attention elsewhere. While you CAN mostly prove that a method has no side effects at all, it may be better in many cases to quickly verify this by visual inspection and focus the remainder of your time on other, more basic tests.

have a look at http://fitnesse.org/: it is used often for Acceptance Test but I found it is a easy way to run the same tests against huge amount of data

In junit you can write your own test runner. This code is not tested (I'm not sure if methods which get arguments will be recognized as test methods, maybe some more runner setup is needed?):
public class MyRunner extends BlockJUnit4ClassRunner {
#Override
protected Statement methodInvoker(final FrameworkMethod method, final Object test) {
return new Statement() {
#Override
public void evaluate() throws Throwable {
Iterable<Object[]> permutations = getPermutations();
for (Object[] permutation : permutations) {
method.invokeExplosively(test, permutation[0], permutation[1]);
}
}
};
}
}
It should be only a matter of providing getPermutations() implementation. For example it can take data from some List<Object[]> field annotated with some custom annotation and produce all the permutations.

I think the term you're missing is "Parametrized Tests". However it seems to be more tedious in jUnit that in the .Net flavor. In NUnit, the following test executes 6 times with all combinations.
[Test]
public void MyTest(
[Values(1,2,3)] int x,
[Values("A","B")] string s)
{
...
}
For Java, your options seem to be:
JUnit supports this with version 4. However it's a lot of code (it seems, jUnit is adamant about test methods not taking parameters). This is the least invasive.
DDSteps, a jUnit plugin. See this video that takes values from appropriately named excel spreadsheet. You also need to write a mapper/fixture class that maps values from the spreadsheet into members of the fixture class, that are then used to invoke the SUT.
Finally, you have Fit/Fitnesse. It's as good as DDSteps, except for the fact that the input data is in HTML/Wiki form. You can paste from an excel sheet into Fitnesse and it formats it correctly at the push of a button. You need to write a fixture class here too.

Im afraid that I dont find the link anymore, but Junit 4 has some help functions to generate testdata. Its like:
public void testData() {
data = {2, 3, 4};
data = {3,4,5 };
...
return data;
}
Junit will then thest your methods will this data. But as I said, I cant' find the link anymore (forgot the keywords) for a detailed (and correct) example.

Application design for processing data prior to database

I have a large collection of data in an excel file (and csv files). The data needs to be placed into a database (mysql). However, before it goes into the database it needs to be processed..for example if columns 1 is less than column 3 add 4 to column 2. There are quite a few rules that must be followed before the information is persisted.
What would be a good design to follow to accomplish this task? (using java)
Additional notes
The process needs to be automated. In the sense that I don't have to manually go in and alter the data. We're talking about thousands of lines of data with 15 columns of information per line.
Currently, I have a sort of chain of responsibility design set up. One class(Java) for each rule. When one rule is done, it calls the following rule.
More Info
Typically there are about 5000 rows per data sheet. Speed isn't a huge concern because
this large input doesn't happen often.
I've considered drools, however I wasn't sure the task was complicated enough for drols.
Example rules:
All currency (data in specific columns) must not contain currency symbols.
Category names must be uniform (e.g. book case = bookcase)
Entry dates can not be future dates
Text input can only contain [A-Z 0-9 \s]
etc..
Additionally if any column of information is invalid it needs to be reported when
processing is complete
(or maybe stop processing).
My current solution works. However I think there is room for improvement so I'm looking
for ideals as to how it can be improved and or how other people have handled similar
situations.
I've considered (very briefly) using drools but I wasn't sure the work was complicated enough to take advantage of drools.

If I didn't care to do this in 1 step (as Oli mentions), I'd probably use a pipe and filters design. Since your rules are relatively simple, I'd probably do a couple delegate based classes. For instance (C# code, but Java should be pretty similar...perhaps someone could translate?):
interface IFilter {
public IEnumerable<string> Filter(IEnumerable<string> file) {
}
}
class PredicateFilter : IFilter {
public PredicateFilter(Predicate<string> predicate) { }
public IEnumerable<string> Filter(IEnumerable<string> file) {
foreach (string s in file) {
if (this.Predicate(s)) {
yield return s;
}
}
}
}
class ActionFilter : IFilter {
public ActionFilter(Action<string> action) { }
public IEnumerable<string> Filter(IEnumerable<string> file) {
foreach (string s in file) {
this.Action(s);
yield return s;
}
}
}
class ReplaceFilter : IFilter {
public ReplaceFilter(Func<string, string> replace) { }
public IEnumerable<string> Filter(IEnumerable<string> file) {
foreach (string s in file) {
yield return this.Replace(s);
}
}
}
From there, you could either use the delegate filters directly, or subclass them for the specifics. Then, register them with a Pipeline that will pass them through each filter.

I think your method is OK. Especially if you use the same interface on every processor.
You could also look to somethink called Drules, currently Jboss-rules. I used that some time ago for a rule-heavy part of my app and what I liked about it is that the business logic can be expressed in for instance a spreadsheet or DSL which then get's compiled to java (run-time and I think there's also a compile-time option). It makes rules a bit more succint and thus readable. It's also very easy to learn (2 days or so).
Here's a link to the opensource Jboss-rules. At jboss.com you can undoubtedly purchase an offically maintained version if that's more to your companies taste.

Just create a function to enforce each rule, and call every applicable function for each value. I don't see how this requires any exotic architecture.

A class for each rule? Really? Perhaps I'm not understanding the quantity or complexity of these rules, but I would (semi-pseudo-code):
public class ALine {
private int col1;
private int col2;
private int coln;
// ...
public ALine(string line) {
// read row into private variables
// ...
this.Process();
this.Insert();
}
public void Process() {
// do all your rules here working with the local variables
}
public void Insert() {
// write to DB
}
}
foreach line in csv
new ALine(line);

Your methodology of using classes for each rule does sound a bit heavy weight but it has the advantage of being easy to modify and expand should new rules come along.
As for loading the data bulk loading is the way to go. I have read some informaiton which suggests it may be as much as 3 orders of magnitude faster than loading using insert statements. You can find some information on it here

Bulk load the data into a temp table, then use sql to apply your rules.
use the temp table, as a basis for the insert into real table.
drop the temp table.

you can see that all the different answers are coming from their own experience and perspective.
Since we don't know much about the complexity and number of rows in your system, we tend to give advice based on what we have done earlier.
If you want to narrow down to a 1/2 solutions for your implementation, try giving more details.
Good luck

It may not be what you want to hear, it isn't the "fun way" by any means, but there is a much easier way to do this.
So long as your data is evaluated line by line... you can setup another worksheet in your excel file and use spreadsheet style functions to do the necessary transforms, referencing the data from the raw data sheet. For more complex functions you can use the vba embedded in excel to write out custom operations.
I've used this approach many times and it works really well; its just not very sexy.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.