Here is my problem (big picture). I have a project which uses large and complicated (by which I mean contains multiple levels of nested structures) Matlab structures. This is predictably slow (especially when trying to load / save). I am attempting to improve runtimes by converting some of these structures into Java Objects. The catch is that the data in these Matlab structures is accessed in a LOT of places, so anything requiring a rewrite of access syntax would be prohibitive. Hence, I need the Java Objects to mimic as closely as possible the behavior of Matlab structures, particularly when it comes to accessing the values stored within them (the values are only set in one place, so the lack of operator overloading in java for setting isn't a factor for consideration).
The problem (small picture) that I am encountering lies with accessing data from an array of these structures. For example,
person(1)
.age = 20
.name
.first = 'John'
.last = 'Smith
person(2)
.age = 25
.name
.first = 'Jane'
.last = 'Doe'
Matlab will allow you to do the following,
>>age = [person(1:2).age]
age =
20 25
Attempting to accomplish the same with Java,
>>jperson = javaArray('myMatlab.Person', 2);
>>jperson(1) = Person(20, Name('John', 'Smith'));
>>jperson(2) = Person(25, Name('Jane', 'Doe'));
>>age = [jperson(1:2).age]
??? No appropriate method or public field age for class myMatlab.Person[]
Is there any way that I can get the Java object to mimic this behavior?
The first thought I had was to simply extend the Person[] class, but this doesn't appear to be possible because it is final. My second approach was to create a wrapper class containing an ArrayList of Person, however I don't believe this will work either because calling
wrappedPerson(1:2)
would either be interpreted as a constructor call to a wrappedPerson class or an attempt to access elements of a non-existent array of WrappedPerson (since java won't let me override a "()" operator). Any insight would be greatly appreciated.
The code I am using for my java class is
public class Person {
int _age;
ArrayList<Name> _names;
public Person(int age, Name name) {
_age = age;
_names.add(name);
}
public int age() {return _age;}
public void age(int age) {_age = age;}
public Name[] name() {return _names.toArray(new Name[0]);}
public void name(Name name) { _names.add(name);}
}
public class Name {
String _first;
String _last;
public Name(String first, String last) {
_first = first;
_last = last;
}
public int first() {return _first;}
public void first(String firstName) {_first = firstName;}
public int last() {return _last;}
public void last(String lastName) {_last = lastName;}
}
TL;DR: It's possible, with some fancy OOP M-code trickery. Altering the behavior of () and . can be done with a Matlab wrapper class that defines subsref on top of your Java wrapper classes. But because of the inherent Matlab-to-Java overhead, it probably won't end up being any faster than normal Matlab code, just a lot more complicated and fussy. Unless you move the logic in to Java as well, this approach probably won't speed things up for you.
I apologize in advance for being long-winded.
Before you go whole hog on this, you might benchmark the performance of Java structures as called from your Matlab code. While Java field access and method calls are much faster on their own than Matlab ones, there is substantial overhead to calling them from M-code, so unless you push a lot of the logic down in to Java as well, you might well end up with a net loss in speed. Every time you cross the M-code to Java layer, you pay. Have a look at the benchmark over at this answer: Is MATLAB OOP slow or am I doing something wrong? to get an idea of scale. (Full disclosure: that's one of my answers.) It doesn't include Java field access, but it's probably on the order of method calls due to the autoboxing overhead. And if you are coding Java classes as in your example, with getter and setter methods instead instead of public fields (that is, in "good" Java style), then you will be incurring the cost of Java method calls with each access, and it's going to be bad compared to pure Matlab structures.
All that said, if you wanted to make that x = [foo(1:2).bar] syntax work inside M-code where foo is a Java array, it would basically be possible. The () and . are both evaluated in Matlab before calling to Java. What you could do is define your own custom JavaArrayWrapper class in Matlab OOP corresponding to your Java array wrapper class, and wrap your (possibly wrapped) Java arrays in that. Have it override subsref and subsasgn to handle both () and .. For (), do normal subsetting of the array, returning it wrapped in a JavaArrayWrapper. For the . case:
If the wrapped object is scalar, invoke the Java method as normal.
If the wrapped object is an array, loop over it, invoke the Java method on each element, and collect the results. If the results are Java objects, return them wrapped in a JavaArrayWrapper.
But. Due to the overhead of crossing the Matlab/Java barrier, this would be slow, probably an order of magnitude slower than pure Matlab code.
To get it to work at speed, you could provide a corresponding custom Java class that wraps Java arrays and uses the Java Reflection API to extract the property of each selected array member object and collect them in an array. The key is that when you do a "chained" reference in Matlab like x = foo(1:3).a.b.c and foo is an object, it doesn't do a stepwise evaluation where it evaluates foo(1:3), and then calls .a on the result, and so on. It actually parses the entire (1:3).a.b.c reference, turns that in to a structured argument, and passes the entire thing in to the subsref method of foo, which has responsibility for interpreting the entire chain. The implicit call looks something like this.
x = subsref(foo, [ struct('type','()','subs',{{[1 2 3]}}), ...
struct('type','.', 'subs','a'), ...
struct('type','.', 'subs','b'), ...
struct('type','.', 'subs','c') ] )
So, given that you have access to the entire reference "chain" up front, if foo was a M-code wrapper class that defined subsasgn, you could convert that entire reference to a Java argument and pass it in a single method call to your Java wrapper class which then used Java Reflection to dynamically go through the wrapped array, select the reference elements, and do the chained references, all inside the Java layer. E.g. it would call getNestedFields() in a Java class like this.
public class DynamicFieldAccessArrayWrapper {
private ArrayList _wrappedArray;
public Object getNestedFields(int[] selectedIndexes, String[] fieldPath) {
// Pseudo-code:
ArrayList result = new ArrayList();
if (selectedIndexes == null) {
selectedIndexes = 1:_wrappedArray.length();
}
for (ix in selectedIndexes) {
Object obj = _wrappedArray.get(ix-1);
Object val = obj;
for (fieldName in fieldPath) {
java.lang.reflect.Field field = val.getClass().getField(fieldName);
val = field.getValue(val);
}
result.add(val);
}
return result.toArray(); // Return as array so Matlab can auto-unbox it; will need more type detection to get array type right
}
}
Then your M-code wrapper class would examine the result and decide whether it was primitive-ish and should be returned as a Matlab array or comma-separated list (i.e. multiple argouts, which get collected with [...]), or should be wrapped in another JavaArrayWrapper M-code object.
The M-code wrapper class would look something like this.
classdef MyMJavaArrayWrapper < handle
% Inherit from handle because Java objects are reference-y
properties
jWrappedArray % holds a DynamicFieldAccessArrayWrapper
end
methods
function varargout = subsref(obj, s)
if isequal(s(1).type, '()')
indices = s(1).subs;
s(1) = [];
else
indices = [];
end
% TODO: check for unsupported indexing types in remaining s
fieldNameChain = parseFieldNamesFromArgs(s);
out = getNestedFields( jWrappedArray, indices, fieldNameChain );
varargout = unpackResultsAndConvertIfNeeded(out);
end
end
end
The overhead involved in marshalling and unmarshalling the values for the subsasgn call would probably overwhelm any speed gain from the Java bits.
You could probably eliminate that overhead by replacing your M-code implementation of subsasgn with a MEX implementation that does the structure marshalling and unmarshalling in C, using JNI to build the Java objects, call getNestedFields, and convert the result to Matlab structures. This is way beyond what I could give an example for.
If this looks a bit horrifying to you, I totally agree. You're bumping up against the edges of the language here, and trying to extend the language (especially to provide new syntactic behavior) from userland is really hard. I wouldn't seriously do something like this in production code; just trying to outline the area of the problem you're looking around.
Are you dealing with homogeneous arrays of these deeply nested structures? Maybe it would be possible to convert them to "planar organized" structures, where instead of an array of structs with scalar fields, you have a scalar struct with array fields. Then you can do vectorized operations on them in pure M-code. This would make things a lot faster, especially with save and load, where the overhead scales per mxarray.
Related
Object A has method B(), and lives for most of the life of the application. B calls object C method D(). D() returns an array holding up to x MyData objects. MyData might be a POD (plain old data)/PDS (passive data structure) or might be more, but a MyData can be reused by calling methods or setting fields; its identity or functionality isn't cast in stone during construction or otherwise.
Currently B() is defined like:
class A {
public B() {
MyData[] amydata = c.D( 5 );
:
:
}
}
Currently D() is defined like:
MyData[] D( int iRows ) {
MyData[] amydata = new MyData[ iRows ];
for ( int i = 0; i < iRows; i++ ) {
if ( no more data )
return amydata;
amydata [ i ] = new MyData();
// calculate or fill in MyData structure.
}
return amydata;
}
A is going to be always, or for a long time (e.g., until the user reconfigures it) be asking for the same number of rows, even though the data will differ.
So what if I have the caller pass in the array reference:
class A {
int iRequestSize = 5;
int iResultSize;
MyData[] amydata = new MyData[ iRequestSize ];
public B() {
iResultSize = c.D( iRequestSize, amydata );
:
:
// use up to iResultSize even though array is potentially bigger.
}
}
// returns number of rows actually used
int D( int iRowsMax, MyData[] amydata ) {
for ( int i = 0; i < iRowsMax; i++ ) {
if ( no more data )
return i;
if ( amydata [ i ] == null )
amydata [ i ] = new MyData();
// calculate or fill in MyData structure.
}
return iRowsMax;
}
I'm a C++ guy and new to Java, but it seems that assuming MyData can be recycled like this, the second version should avoid creating and copying MyData's, as well as eliminating garbage collection?
I would say the second variant is worse.
In the first variant amydata and all the objects referenced by it can be garbage collected as soon as the method B() exits (assuming that B doesn't store a reference to amydata somewhere else.)
In the second variant amydata cannot be garbage collected as long as the instance of A lives.
Consider the case where upon the first call to D() it returns 5 references to MyData objects, but on subsequent calls it returns no more rows. In the first variant the amydata array and the 5 referenced MyData objects can be garbage collected as soon as B() returns. But in the second variant neither the amydata array nor the 5 MyData objects referenced through it can be garbage collected - possibly never during the whole runtime of your application.
Remember: the Java Garbage Collector is optimized for many short-lived objects
Disclaimer: Reading the OP's comments, I have to admit that I didn't get his real intent, i.e. to develop a soft-real-time application, avoiding garbage collection as much as possible, a very special and rare situation in the Java world.
So the following answer does not match his problem. But as a casual reader migrating from C++ to Java might stumble over this question and answer, he/she might get some useful hints on typical Java programming style.
Although the syntax of Java and C++ have quite some similarities, because of the very different runtime environments, you should adopt a different coding style.
As a decades-long Java guy, I'd surely prefer the original method signature. As a caller of the D() method, why should I create the results data structure instead of getting it from the method I am calling? That reverses the natural flow of data.
I know, in good old C times when dynamic memory management meant lots of headache, it was very common to prepare the result array outside of the function and have the function only fill in the results, the way you wrote the second version. But forget about that with Java, and just let the garbage collector do its job (and it's very good at that job). Typically trying to "help" the GC results in code that's in fact less efficient and harder to read. And if you really want to stick to that style, there's no need to pass both the max rows number and the array, as the array itself knows its length (that's different from old-style C arrays), giving the max row number.
You assume
the second version should avoid creating and copying MyData's
That sounds like a misconception about Java's inner workings. Every time you execute a new MyData(...) expression, you create a new instance somewhere on the heap. Providing a MyData[] array doesn't avoid that. Translated to C terminology, the array just holds pointers to MyData objects, not the real objects. And Java instances are hardly ever copied (unless you explicitly call something like object.clone()). It's just the reference (= pointer) to the instance that gets copied when you assign something to a variable.
But even the first version is far from perfect, if I understand its purpose correctly. The D() method itself can determine when there's no more data available, so why should it return an array longer than necessary? With Java arrays that's a bit inconvenient, so typical Java code returns a List<MyData> in similar situations.
One more comment on the MyData() constructor and later "calculate or fill in MyData structure". I know that style exists (and is quite popular in the C family of languages), but it's not predominant in Java, and I especially dislike it. To me, it sounds like asking "Give me a car" and getting just a skeleton instead of a usable car. If I want it to have wheels, an engine and seats, I later have to supply them myself. If a usable car needs the selection of options, I want to supply them when ordering the car / calling the constructor, so that I can honestly call the result a car instead of a skeleton.
And finally a comment on the Java naming conventions: the vast majority of Java code follows the conventions, so your method names beginning with upper case look very strange to me.
This question isn't specifically about performing tokenization with regular expressions, but more so about how an appropriate type of object (or appropriate constructor of an object) can be matched to handle the tokens output from a tokenizer.
To explain a bit more, my objective is to parse a text file containing lines of tokens into appropriate objects that describe the data. My parser is in fact already complete, but at present is a mess of switch...case statements and the focus of my question is how I can refactor this using a nice OO approach.
First, here's an example to illustrate what I'm doing overall. Imagine a text file that contains many entries like the following two:
cat 50 100 "abc"
dog 40 "foo" "bar" 90
When parsing those two particular lines of the file, I need to create instances of classes Cat and Dog respectively. In reality there are quite a large number of different object types being described, and sometimes different variations of numbers of arguments, with defaults often being assumed if the values aren't there to explicity state them (which means it's usually appropriate to use the builder pattern when creating the objects, or some classes have several constructors).
The initial tokenization of each line is being done using a Tokenizer class I created that uses groups of regular expressions that match each type of possible token (integer, string, and a few other special token types relevant to this application) along with Pattern and Matcher. The end result from this tokenizer class is that, for each line it parses, it provides back a list of Token objects, where each Token has a .type property (specifying integer, string, etc.) along with primitive value properties.
For each line parsed, I have to:
switch...case on the object type (first token);
switch on the number of arguments and choose an appropriate constructor
for that number of arguments;
Check that each token type is appropriate for the types of arguments needed to construct the object;
Log an error if the quantity or combination of argument types aren't appropriate for the type of object being called for.
The parser I have at the moment has a lot of switch/case or if/else all over the place to handle this and although it works, with a fairly large number of object types it's getting a bit unwieldy.
Can someone suggest an alternative, cleaner and more 'OO' way of pattern matching a list of tokens to an appropriate method call?
The answer was in the question; you want a Strategy, basically a Map where the key would be, e.g., "cat" and the value an instance of:
final class CatCreator implements Creator {
final Argument<Integer> length = intArgument("length");
final Argument<Integer> width = intArgument("width");
final Argument<String> name = stringArgument("length");
public List<Argument<?>> arguments() {
return asList(length, width, name);
}
public Cat create(Map<Argument<?>, String> arguments) {
return new Cat(length.get(arguments), width.get(arguments), name.get(arguments));
}
}
Supporting code that you would reuse between your various object types:
abstract class Argument<T> {
abstract T get(Map<Argument<?>, String> arguments);
private Argument() {
}
static Argument<Integer> intArgument(String name) {
return new Argument<Integer>() {
Integer get(Map<Argument<?>, String> arguments) {
return Integer.parseInt(arguments.get(this));
}
});
}
static Argument<String> stringArgument(String name) {
return new Argument<String>() {
String get(Map<Argument<?>, String> arguments) {
return arguments.get(this);
}
});
}
}
I'm sure someone will post a version that needs less code but uses reflection. Choose either but do bear in mind the extra possibilities for programming mistakes making it past compilation with reflection.
I have done something similar, where I have decoupled my parser from code emitter, which I consider anything else but the parsing itself. What I did, is introduce an interface which the parser uses to invoke methods on whenever it believes it has found a statement or a similar program element. In your case these may well be individual lines you have shown in the example in your question. So whenever you have a line parsed you invoke a method on the interface, an implementation of which will take care of the rest. That way you isolate the program generation from parsing, and both can do well on their own (well, at least the parser, as the program generation will implement an interface the parser will use). Some code to illustrate my line of thinking:
interface CodeGenerator
{
void onParseCat(int a, int b, String c); ///As per your line starting with "cat..."
void onParseDog(int a, String b, String c, int d); /// In same manner
}
class Parser
{
final CodeGenerator cg;
Parser(CodeGenerator cg)
{
this.cg = cg;
}
void parseCat() /// When you already know that the sequence of tokens matches a "cat" line
{
/// ...
cg.onParseCat(/* variable values you have obtained during parsing/tokenizing */);
}
}
This gives you several advantages, one of which being that you do not need a complicated switch logic as you have determined type of statement/expression/element already and invoke the correct method. You can even use something like onParse in CodeGenerator interface, relying on Java method overriding if you want to always use same method. Remember also that you can query methods at runtime with Java, which can aid you further in removing switch logic.
getClass().getMethod("onParse", Integer.class, Integer.class, String.class).invoke(this, catStmt, a, b, c);
Just make note that the above uses Integer class instead of the primitive type int, and that your methods must override based on parameter type and count - if you have two distinct statements using same parameter sequence, the above may fail because there will be at least two methods with the same signature. This is of course a limitation of method overriding in Java (and many other languages).
In any case, you have several methods to achieve what you want. The key to avoid switch is to implement some form of virtual method call, rely on built-in virtual method call facility, or invoke particular methods for particular program element types using static binding.
Of course, you will need at least one switch statement where you determine which method to actually call based on what string your line starts with. It's either that or introducing a Map<String,Method> which gives you a runtime switch facility, where the map will map a string to a proper method you can call invoke (part of Java) on. I prefer to keep switch where there is not substantial amount of cases, and reserve Java Maps for more complicated run-time scenarios.
But since you talk about "fairly large amount of object types", may I suggest you introduce a runtime map and use the Map class indeed. It depends on how complicated your language is, and whether the string that starts your line is a keyword, or a string in a far larger set.
Recently I refactored the code of a 3rd party hash function from C++ to C. The process was relatively painless, with only a few changes of note. Now I want to write the same function in Java and I came upon a slight issue.
In the C/C++ code there is a C preprocessor macro that takes a few integer variables names as arguments and performs a bunch of bitwise operations with their contents and a few constants. That macro is used in several different places, therefore its presence avoids a fair bit of code duplication.
In Java, however, there is no equivalent for the C preprocessor. There is also no way to affect any basic type passed as an argument to a method - even autoboxing produces immutable objects. Coupled with the fact that Java methods return a single value, I can't seem to find a simple way to rewrite the macro.
Avenues that I considered:
Expand the macro by hand everywhere: It would work, but the code duplication could make things interesting in the long run.
Write a method that returns an array: This would also work, but it would repeatedly result into code like this:
long tmp[] = bitops(k, l, m, x, y, z);
k = tmp[0];
l = tmp[1];
m = tmp[2];
x = tmp[3];
y = tmp[4];
z = tmp[5];
Write a method that takes an array as an argument: This would mean that all variable names would be reduced to array element references - it would be rather hard to keep track of which index corresponds to which variable.
Create a separate class e.g. State with public fields of the appropriate type and use that as an argument to a method: This is my current solution. It allows the method to alter the variables, while still keeping their names. It has the disadvantage, however, that the State class will get more and more complex, as more macros and variables are added, in order to avoid copying values back and forth among different State objects.
How would you rewrite such a C macro in Java? Is there a more appropriate way to deal with this, using the facilities provided by the standard Java 6 Development Kit (i.e. without 3rd party libraries or a separate preprocessor)?
Option 3, create you own MutableInteger wrapper class.
struct MutableInteger{
public MutableInteger(int v) { this.value = value;}
public int value;
}
public void swap3( MutableInteger k, MutableInteger l, MutableInteger m) {
int t = m.value;
m.value = l.value
l.value=k.value;
k.value=t;
}
Create a separate class e.g. State
with public fields of the appropriate
type and use that as an argument to a
method
This, but as an intermediate step. Then continue refactoring - ideally class State should have private fields. Replace the macros with methods to update this state. Then replace all the rest of your code with methods that update the state, until eventually your program looks like:
System.out.println(State(System.in).hexDigest());
Finally, rename State to SHA1 or whatever ;-)
How can I pass a primitive type by reference in java? For instance, how do I make an int passed to a method modifiable?
There isn't a way to pass a primitive directly by reference in Java.
A workaround is to instead pass a reference to an instance of a wrapper class, which then contains the primitive as a member field. Such a wrapper class could be extremely simple to write for yourself:
public class IntRef { public int value; }
But how about some pre-built wrapper classes, so we don't have to write our own? OK:
The Apache commons-lang Mutable* classes:
Advantages: Good performance for single threaded use. Completeness.
Disadvantages: Introduces a third-party library dependency. No built-in concurrency controls.
Representative classes: MutableBoolean, MutableByte, MutableDouble, MutableFloat, MutableInt, MutableLong, MutableObject, MutableShort.
The java.util.concurrent.atomic Atomic* classes:
Advantages: Part of the standard Java (1.5+) API. Built-in concurrency controls.
Disadvantages: Small performance hit when used in a single-threaded setting. Missing direct support for some datatypes, e.g. there is no AtomicShort.
Representative classes: AtomicBoolean, AtomicInteger, AtomicLong, and AtomicReference.
Note: As user ColinD shows in his answer, AtomicReference can be used to approximate some of the missing classes, e.g. AtomicShort.
Length 1 primitive array
OscarRyz's answer demonstrates using a length 1 array to "wrap" a primitive value.
Advantages: Quick to write. Performant. No 3rd party library necessary.
Disadvantages: A little dirty. No built-in concurrency controls. Results in code that does not (clearly) self-document: is the array in the method signature there so I can pass multiple values? Or is it here as scaffolding for pass-by-reference emulation?
Also see
The answers to StackOverflow question "Mutable boolean field in Java".
My Opinion
In Java, you should strive to use the above approaches sparingly or not at all. In C it is common to use a function's return value to relay a status code (SUCCESS/FAILURE), while a function's actual output is relayed via one or more out-parameters. In Java, it is best to use Exceptions instead of return codes. This frees up method return values to be used for carrying the actual method output -- a design pattern which most Java programmers find to be more natural than out-parameters.
Nothing in java is passed by reference. It's all passed by value.
Edit: Both primitives and object types are passed by value. You can never alter the passed value/reference and expect the originating value/reference to change. Example:
String a;
int b;
doSomething(a, b);
...
public void doSomething(String myA, int myB) {
// whatever I do to "myA" and "myB" here will never ever ever change
// the "a" and "b"
}
The only way to get around this hurdle, regardless of it being a primitive or reference, is to pass a container object, or use the return value.
With a holder:
private class MyStringHolder {
String a;
MyStringHolder(String a) {
this.a = a;
}
}
MyStringHolder holdA = new MyStringHolder("something");
public void doSomething(MyStringHolder holder) {
// alter holder.a here and it changes.
}
With return value
int b = 42;
b = doSomething(b);
public int doSomething(int b) {
return b + 1;
}
Pass an AtomicInteger, AtomicBoolean, etc. instead. There isn't one for every primitive type, but you can use, say, an AtomicReference<Short> if necessary too.
Do note: there should very rarely be a need to do something like this in Java. When you want to do it, I'd recommend rethinking what you're trying to do and seeing if you can't do it some other way (using a method that returns an int, say... what exactly the best thing to do is will vary from situation to situation).
That's not possible in Java, as an alternative you can wrap it in a single element array.
void demo() {
int [] a = { 0 };
increment ( a )
}
void increment( int [] v ) {
v[0]++;
}
But there are always better options.
You can't. But you can return an integer which is a modified value
int i = 0;
i = doSomething(i);
If you are passing in more than one you may wish to create a Data Transfer Object (a class specifically to contain a set of variables which can be passed to classes).
Pass an object that has that value as a field.
That's not possible in Java
One option is to use classes like java.lang.Integer, then you're not passing a primitive at all.
On the other hand, you can just use code like:
int a = 5;
a = func(a);
and have func return the modified value.
Occasionally , we have to write methods that receive many many arguments , for example :
public void doSomething(Object objA , Object objectB ,Date date1 ,Date date2 ,String str1 ,String str2 )
{
}
When I encounter this kind of problem , I often encapsulate arguments into a map.
Map<Object,Object> params = new HashMap<Object,Object>();
params.put("objA",ObjA) ;
......
public void doSomething(Map<Object,Object> params)
{
// extracting params
Object objA = (Object)params.get("objA");
......
}
This is not a good practice , encapsulate params into a map is totally a waste of efficiency.
The good thing is , the clean signature , easy to add other params with fewest modification .
what's the best practice for this kind of problem ?
In Effective Java, Chapter 7 (Methods), Item 40 (Design method signatures carefully), Bloch writes:
There are three techniques for shortening overly long parameter lists:
break the method into multiple methods, each which require only a subset of the parameters
create helper classes to hold group of parameters (typically static member classes)
adapt the Builder pattern from object construction to method invocation.
For more details, I encourage you to buy the book, it's really worth it.
Using a map with magical String keys is a bad idea. You lose any compile time checking, and it's really unclear what the required parameters are. You'd need to write very complete documentation to make up for it. Will you remember in a few weeks what those Strings are without looking at the code? What if you made a typo? Use the wrong type? You won't find out until you run the code.
Instead use a model. Make a class which will be a container for all those parameters. That way you keep the type safety of Java. You can also pass that object around to other methods, put it in collections, etc.
Of course if the set of parameters isn't used elsewhere or passed around, a dedicated model may be overkill. There's a balance to be struck, so use common sense.
If you have many optional parameters you can create fluent API: replace single method with the chain of methods
exportWithParams().datesBetween(date1,date2)
.format("xml")
.columns("id","name","phone")
.table("angry_robots")
.invoke();
Using static import you can create inner fluent APIs:
... .datesBetween(from(date1).to(date2)) ...
It's called "Introduce Parameter Object". If you find yourself passing same parameter list on several places, just create a class which holds them all.
XXXParameter param = new XXXParameter(objA, objB, date1, date2, str1, str2);
// ...
doSomething(param);
Even if you don't find yourself passing same parameter list so often, that easy refactoring will still improve your code readability, which is always good. If you look at your code 3 months later, it will be easier to comprehend when you need to fix a bug or add a feature.
It's a general philosophy of course, and since you haven't provided any details, I cannot give you more detailed advice either. :-)
First, I'd try to refactor the method. If it's using that many parameters it may be too long any way. Breaking it down would both improve the code and potentially reduce the number of parameters to each method. You might also be able to refactor the entire operation to its own class. Second, I'd look for other instances where I'm using the same (or superset) of the same parameter list. If you have multiple instances, then it likely signals that these properties belong together. In that case, create a class to hold the parameters and use it. Lastly, I'd evaluate whether the number of parameters makes it worth creating a map object to improve code readability. I think this is a personal call -- there is pain each way with this solution and where the trade-off point is may differ. For six parameters I probably wouldn't do it. For 10 I probably would (if none of the other methods worked first).
This is often a problem when constructing objects.
In that case use builder object pattern, it works well if you have big list of parameters and not always need all of them.
You can also adapt it to method invocation.
It also increases readability a lot.
public class BigObject
{
// public getters
// private setters
public static class Buider
{
private A f1;
private B f2;
private C f3;
private D f4;
private E f5;
public Buider setField1(A f1) { this.f1 = f1; return this; }
public Buider setField2(B f2) { this.f2 = f2; return this; }
public Buider setField3(C f3) { this.f3 = f3; return this; }
public Buider setField4(D f4) { this.f4 = f4; return this; }
public Buider setField5(E f5) { this.f5 = f5; return this; }
public BigObject build()
{
BigObject result = new BigObject();
result.setField1(f1);
result.setField2(f2);
result.setField3(f3);
result.setField4(f4);
result.setField5(f5);
return result;
}
}
}
// Usage:
BigObject boo = new BigObject.Builder()
.setField1(/* whatever */)
.setField2(/* whatever */)
.setField3(/* whatever */)
.setField4(/* whatever */)
.setField5(/* whatever */)
.build();
You can also put verification logic into Builder set..() and build() methods.
There is a pattern called as Parameter object.
Idea is to use one object in place of all the parameters. Now even if you need to add parameters later, you just need to add it to the object. The method interface remains same.
You could create a class to hold that data. Needs to be meaningful enough though, but much better than using a map (OMG).
Code Complete* suggests a couple of things:
"Limit the number of a routine's parameters to about seven. Seven is a magic number for people's comprehension" (p 108).
"Put parameters in input-modify-output order ... If several routines use similar parameters, put the similar parameters in a consistent order" (p 105).
Put status or error variables last.
As tvanfosson mentioned, pass only the parts of a structured variables ( objects) that the routine needs. That said, if you're using most of the structured variable in the function, then just pass the whole structure, but be aware that this promotes coupling to some degree.
* First Edition, I know I should update. Also, it's likely that some of this advice may have changed since the second edition was written when OOP was beginning to become more popular.
Using a Map is a simple way to clean the call signature but then you have another problem. You need to look inside the method's body to see what the method expects in that Map, what are the key names or what types the values have.
A cleaner way would be to group all parameters in an object bean but that still does not fix the problem entirely.
What you have here is a design issue. With more than 7 parameters to a method you will start to have problems remembering what they represent and what order they have. From here you will get lots of bugs just by calling the method in wrong parameter order.
You need a better design of the app not a best practice to send lots of parameters.
Good practice would be to refactor. What about these objects means that they should be passed in to this method? Should they be encapsulated into a single object?
Create a bean class, and set the all parameters (setter method) and pass this bean object to the method.
Look at your code, and see why all those parameters are passed in. Sometimes it is possible to refactor the method itself.
Using a map leaves your method vulnerable. What if somebody using your method misspells a parameter name, or posts a string where your method expects a UDT?
Define a Transfer Object . It'll provide you with type-checking at the very least; it may even be possible for you to perform some validation at the point of use instead of within your method.
I would say stick with the way you did it before.
The number of parameters in your example is not a lot, but the alternatives are much more horrible.
Map - There's the efficiency thing that you mentioned, but the bigger problem here are:
Callers don't know what to send you without referring to something
else... Do you have javadocs which states exactly what keys and
values are used? If you do (which is great), then having lots of parameters
isn't a problem either.
It becomes very difficult to accept different argument types. You
can either restrict input parameters to a single type, or use
Map<String, Object> and cast all the values. Both options are
horrible most of the time.
Wrapper objects - this just moves the problem since you need to fill the wrapper object in the first place - instead of directly to your method, it will be to the constructor of the parameter object.
To determine whether moving the problem is appropriate or not depends on the reuse of said object. For instance:
Would not use it: It would only be used once on the first call, so a lot of additional code to deal with 1 line...?
{
AnObject h = obj.callMyMethod(a, b, c, d, e, f, g);
SomeObject i = obj2.callAnotherMethod(a, b, c, h);
FinalResult j = obj3.callAFinalMethod(c, e, f, h, i);
}
May use it: Here, it can do a bit more. First, it can factor the parameters for 3 method calls. it can also perform 2 other lines in itself... so it becomes a state variable in a sense...
{
AnObject h = obj.callMyMethod(a, b, c, d, e, f, g);
e = h.resultOfSomeTransformation();
SomeObject i = obj2.callAnotherMethod(a, b, c, d, e, f, g);
f = i.somethingElse();
FinalResult j = obj3.callAFinalMethod(a, b, c, d, e, f, g, h, i);
}
Builder pattern - this is an anti-pattern in my view. The most desirable error handling mechanism is to detect earlier, not later; but with the builder pattern, calls with missing (programmer did not think to include it) mandatory parameters are moved from compile time to run time. Of course if the programmer intentionally put null or such in the slot, that'll be runtime, but still catching some errors earlier is a much bigger advantage to catering for programmers who refuse to look at the parameter names of the method they are calling.
I find it only appropriate when dealing with large number of optional parameters, and even then, the benefit is marginal at best. I am very much against the builder "pattern".
The other thing people forget to consider is the role of the IDE in all this.
When methods have parameters, IDEs generate most of the code for you, and you have the red lines reminding you what you need to supply/set. When using option 3... you lose this completely. It's now up to the programmer to get it right, and there's no cues during coding and compile time... the programmer must test it to find out.
Furthermore, options 2 and 3, if adopted wide spread unnecessarily, have long term negative implications in terms of maintenance due to the large amount of duplicate code it generates. The more code there is, the more there is to maintain, the more time and money is spent to maintain it.
This is often an indication that your class holds more than one responsibility (i.e., your class does TOO much).
See The Single Responsibility Principle
for further details.
If you are passing too many parameters then try to refactor the method. Maybe it is doing a lot of things that it is not suppose to do. If that is not the case then try substituting the parameters with a single class. This way you can encapsulate everything in a single class instance and pass the instance around and not the parameters.
... and Bob's your uncle: No-hassle fancy-pants APIs for object creation!
https://projectlombok.org/features/Builder