I'm revisiting data structures and algorithms to refresh my knowledge and from time to time I stumble across this problem:
Often, several data structures do need to swap some elements on the underlying array. So I implement the swap() method in ADT1, ADT2 as a private non-static method. The good thing is, being a private method I don't need to check on the parameters, the bad thing is redundancy. But if I put the swap() method in a helper class as a public static method, I need to check the indices every time for validity, making the swap call very unefficient when many swaps are done.
So what should I do? Neglect the performance degragation, or write small but redundant code?
Better design should always trump small inefficiencies. Only address performance problem if it actually is proven to be one.
Besides, what kind of checking are you doing anyway? Aren't naturally thrown ArrayIndexOutOfBoundsException and/or NullPointerException good enough?
It's worth nothing that while there's public static Collections.swap(List<?>,int,int), java.util.Arrays makes its overloads (for int[], long[], byte[], etc) all private static.
I'm not sure if Josh Bloch ever explicitly addressed why he did that, but one might guess that it has something to do with Item 25 on his book Effective Java 2nd Edition: Prefer lists to arrays. Yes, there will be "performance degradation" in using List, but it's negligible, and the many advantages more than make up for it.
If you don't need to make the checks in the private method, don't make them in the static one. This will result in a RuntimeException for invalid calls, but since all your calls are supposed to be valid, it will be as though you've used a private method.
It's always better for your code to be less efficient than to be duplicated (some constant calls are not considerable). At least that is what is taught at my university.
Code duplication produces bugs. So you prefer your program to work correctly rather than to work a little faster.
If you want to prevent constraints checking: what comes to my mind is that you can either accept naturally thrown exceptions as polygenelubricants suggested or create an abstract super class to all your data structures based on arrays. That abstract class would have protected method swap that will not check parameters. It's not perfect, but I guess that a protected method that does not check parameters is better than a public method that does not do it.
Related
I have a method that list the elements of an ArrayList, typically it prints the components of a global variable ArrayList, but in one specific instance I need it to print the components of a local variable.
So I have this
public static void listPlayers(ArrayList<Player> characters, boolean beingRolled) {
//print character components (beingRolled specifies which parts of each players to print
}
and I use that when I'm using the local variable ArrayList,
when I want to use the global i call this version
public static void listPlayers(boolean beingRolled) {
listPlayers(players, abitraryBoolean);
}
where players is the global variable
Another thing I was thinking about is anywhere I want to use the global I could pass a null reference for characters and write this method
public static void listPlayers(ArrayList<Player> characters, boolean beingRolled) {
if (characters == null) characters = players;
//print components
}
Which is the more professional/recommended version?
In general, stay away from null; the overloaded method signature is a better approach.
That said, one thing you said is scary -- You have a static method operating on a global variable. In general, that is a bad idea. You should consider refactoring to use Object/class scoped state, rather than global, static scoped state.
I would even go so far as to say that you should only use the overload that takes the ArrayList as a parameter, and pass it the global variable every time. Then, even if you must use a global variable, at least you're using it in one fewer place.
Edit: One of my professors back in college wrote a book on refactoring that is very readable and has a lot of good content (though the typography is a bit odd). It's called Principle-Based Refactoring: Learning Software Design Principles by Applying Refactoring Rules, by Steve Halladay. I highly suggest reading the first half (second half is essentially a reference).
The first version is better because it is easier for someone else to understand. When a function's arguments are all listed, the caller can more easily predict what the function will do. If the function relies on variables which are not visible, say if this is available in a library without the source, the caller will not understand why the function acts in an unexpected way. This would be impossible to understand without good documentation.
Class A
Class A {
public HashMap <Integer,Double> myHashMap;
public A(){
myHashMap = new HashMap()
}
}
class B
Class B {
private A anInstanceOfA;
public B(A a) {
this.anInstanceOfA = a;
}
aMethod(){
anInstanceOfA.myHashMap.get(1); <--getting hashmap value for key = 1
//proceed to use this value, but instead of storing it to a variable
// I use anInstanceOfA.myHashMap.get(1) each time I need that value.
}
In aMethod() I use anInstanceOfA.myHashMap.get(1) to get the value for key = 1. I do that multiple times in aMethod() and I'm wondering if there is any difference in efficiency between using anInstanceOfA.myHashMap.get(1) multiple times or just assigning it to a variable and using the assigned variable multiple times.
I.E
aMethod(){
theValue = anInstanceOfA.myHashMap.get(1);
//proceed to use theValue in my calculations. Is there a difference in efficiency?
}
In theory the JVM can optimise away the difference to be very small (compared to what the rest of the program is doing). However I prefer to make it a local variable as I believe it makes the code clearer (as I can give it a meaningful name)
I suggest you do what you believe is simpler and clearer, unless you have measured a performance difference.
The question seems to be that you want to know if it is more expensive to call get(l) multiple times instead of just once.
The answer to this is yes. The question is if it is enough to matter. The definitive answer is to ask the JVM by profiling. You can, however, guess by looking at the get method in your chosen implementation and consider if you want to do all that work every time.
Note, that there is another reason that you might want to put the value in a variable, namely that you can give it a telling name, making your program easier to maintain in the future.
This seems like a micro-optimization, that really doesn't make much difference in the scheme of things.
As #peter already suggested, 'optimizing' for style/readability is a better rationale for choosing the second option over the first one. Optimizing for speed only starts making sense if you really do a lot of calls, or if the call is very expensive -- both are probably not the case in your current example.
Put it in a local variable, for multiple reasons:
It will be much faster. Reading a local variable is definitely cheaper than a HashMap lookup, probably by a factor of 10-100x.
You can give the local variable a good, meaningful name
Your code will probably be shorter / simpler overall, particularly if you use the local variable many times.
You may get bugs during future maintenance if someone modifies one of the get calls but forgets to change the others. This is a problem whenever you are duplicating code. Using a local variable minimises this risk.
In concurrent situations, the value could theoretically change if the HashMap is modified by some other code. You normally want to get the value once and work with the same value. Although if you are running into problems of this nature you should probably be looking at other solutions first (locking, concurrent collections etc.)
For example given the following methods:
public double[] getCoord(){
return coord;
}
public double getCoord(int variable){
return coord[variable];
}
Would it be better to call
object.getCoord()[1]
or
object.getCoord(1)
and why?
Although there is no performance difference, the second method presents a far superior API, because Java arrays are always mutable. The first API lets your users write
object.getCoord()[1] = 12.345;
and modify internals of your object behind your back. This is never a good thing: even a non-malicious users could do things you never intended, simply by mistake.
In terms of performance, it doesn't matter. The first method returns a reference to the array, not a copy.
That said, the second method protects the array from being modified outside the class.
No, Java doesn't read the whole array when you use the subscript operator ([]). With regards to would it be better to use the accessor method to grab the array first, then index into it versus call a method that does the same thing, it's probably negligible. You're still incurring the overhead (minimal mind you) of invoking a function and returning a result either way.
I am going to guess that #2 is marginally slower because a parameter has to be pushed onto the stack prior to the call to getCoord(int). Not much in it though.
Neither has to read the whole array.
Both are slower than direct array access, for example coord[1].
When you're designing the API for a code library, you want it to be easy to use well, and hard to use badly. Ideally you want it to be idiot proof.
You might also want to make it compatible with older systems that can't handle generics, like .Net 1.1 and Java 1.4. But you don't want it to be a pain to use from newer code.
I'm wondering about the best way to make things easily iterable in a type-safe way... Remembering that you can't use generics so Java's Iterable<T> is out, as is .Net's IEnumerable<T>.
You want people to be able to use the enhanced for loop in Java (for Item i : items), and the foreach / For Each loop in .Net, and you don't want them to have to do any casting. Basically you want your API to be now-friendly as well as backwards compatible.
The best type-safe option that I can think of is arrays. They're fully backwards compatible and they're easy to iterate in a typesafe way. But arrays aren't ideal because you can't make them immutable. So, when you have an immutable object containing an array that you want people to be able to iterate over, to maintain immutability you have to provide a defensive copy each and every time they access it.
In Java, doing (MyObject[]) myInternalArray.clone(); is super-fast. I'm sure that the equivalent in .Net is super-fast too. If you have like:
class Schedule {
private Appointment[] internalArray;
public Appointment[] appointments() {
return (Appointment[]) internalArray.clone();
}
}
people can do like:
for (Appointment a : schedule.appointments()) {
a.doSomething();
}
and it will be simple, clear, type-safe, and fast.
But they could do something like:
for (int i = 0; i < schedule.appointments().length; i++) {
Appointment a = schedule.appointments()[i];
}
And then it would be horribly inefficient because the entire array of appointments would get cloned twice for every iteration (once for the length test, and once to get the object at the index). Not such a problem if the array is small, but pretty horrible if the array has thousands of items in it. Yuk.
Would anyone actually do that? I'm not sure... I guess that's largely my question here.
You could call the method toAppointmentArray() instead of appointments(), and that would probably make it less likely that anyone would use it the wrong way. But it would also make it harder for people to find when they just want to iterate over the appointments.
You would, of course, document appointments() clearly, to say that it returns a defensive copy. But a lot of people won't read that particular bit of documentation.
Although I'd welcome suggestions, it seems to me that there's no perfect way to make it simple, clear, type-safe, and idiot proof. Have I failed if a minority of people are unwitting cloning arrays thousands of times, or is that an acceptable price to pay for simple, type-safe iteration for the majority?
NB I happen to be designing this library for both Java and .Net, which is why I've tried to make this question applicable to both. And I tagged it language-agnostic because it's an issue that could arise for other languages too. The code samples are in Java, but C# would be similar (albeit with the option of making the Appointments accessor a property).
UPDATE: I did a few quick performance tests to see how much difference this made in Java. I tested:
cloning the array once, and iterating over it using the enhanced for loop
iterating over an ArrayList using
the enhanced for loop
iterating over an unmodifyable
ArrayList (from
Collections.unmodifyableList) using
the enhanced for loop
iterating over the array the bad way (cloning it repeatedly in the length check
and when getting each indexed item).
For 10 objects, the relative speeds (doing multiple repeats and taking the median) were like:
1,000
1,300
1,300
5,000
For 100 objects:
1,300
4,900
6,300
85,500
For 1000 objects:
6,400
51,700
56,200
7,000,300
For 10000 objects:
68,000
445,000
651,000
655,180,000
Rough figures for sure, but enough to convince me of two things:
Cloning, then iterating is definitely
not a performance issue. In fact
it's consistently faster than using a
List. (this is why Java's
enum.values() method returns a
defensive copy of an array instead of
an immutable list.)
If you repeatedly call the method,
repeatedly cloning the array unnecessarily,
performance becomes more and more of an issue the larger the arrays in question. It's pretty horrible. No surprises there.
clone() is fast but not what I would describe as super faster.
If you don't trust people to write loops efficiently, I would not let them write a loop (which also avoids the need for a clone())
interface AppointmentHandler {
public void onAppointment(Appointment appointment);
}
class Schedule {
public void forEachAppointment(AppointmentHandler ah) {
for(Appointment a: internalArray)
ah.onAppointment(a);
}
}
Since you can't really have it both ways, I would suggest that you create a pre generics and a generics version of your API. Ideally, the underlying implementation can be mostly the same, but the fact is, if you want it to be easy to use for anyone using Java 1.5 or later, they will expect the usage of Generics and Iterable and all the newer languange features.
I think the usage of arrays should be non-existent. It does not make for an easy to use API in either case.
NOTE: I have never used C#, but I would expect the same holds true.
As far as failing a minority of the users, those that would call the same method to get the same object on each iteration of the loop would be asking for inefficiency regardless of API design. I think as long as that's well documented, it's not too much to ask that the users obey some semblance of common sense.
I overheard two of my colleagues arguing about whether or not to create a new data model class which only contains one string field and a setter and a getter for it. A program will then create a few objects of the class and put them in an array list. The guy who is storing them argue that there should be a new type while the guy who is getting the data said there is not point going through all this trouble while you can simple store string.
Personally I prefer creating a new type so we know what's being stored in the array list, but I don't have strong arguments to persuade the 'getting' data guy. Do you?
Sarah
... a new data model class which only contains one string field and a setter and a getter for it.
If it was just a getter, then it is not possible to say in general whether a String or a custom class is better. It depends on things like:
consistency with the rest of your data model,
anticipating whether you might want to change the representation,
anticipating whether you might want to implement validation when creating an instance, add helper methods, etc,
implications for memory usage or persistence (if they are even relevant).
(Personally, I would be inclined to use a plain String by default, and only use a custom class if for example, I knew that it was likely that a future representation change / refinement would be needed. In most situations, it is not a huge problem to change a String into custom class later ... if the need arises.)
However, the fact that there is proposed to be a setter for the field changes things significantly. Instances of the class will be mutable, where instances of String are not. On the one hand this could possibly be useful; e.g. where you actually need mutability. On the other hand, mutability would make the class somewhat risky for use in certain contexts; e.g. in sets and as keys in maps. And in other contexts you may need to copy the instances. (This would be unnecessary for an immutable wrapper class or a bare String.)
(The simple answer is to get rid of the setter, unless you really need it.)
There is also the issue that the semantics of equals will be different for a String and a custom wrapper. You may therefore need to override equals and hashCode to get a more intuitive semantic in the custom wrapper case. (And that relates back to the issue of a setter, and use of the class in collections.)
Wrap it in a class, if it matches the rest of your data model's design.
It gives you a label for the string so that you can tell what it represents at run time.
It makes it easier to take your entity and add additional fields, and behavior. (Which can be a likely occurrence>)
That said, the key is if it matches the rest of your data model's design... be consistent with what you already have.
Counterpoint to mschaef's answer:
Keep it as a string, if it matches the rest of your data model's design. (See how the opening sounds so important, even if I temper it with a sentence that basically says we don't know the answer?)
If you need a label saying what it is, add a comment. Cost = one line, total. Heck, for that matter, you need a line (or three) to comment your new class, anyway, so what's the class declaration for?
If you need to add additional fields later, you can refactor it then. You can't design for everything, and if you tried, you'd end up with a horrible mess.
As Yegge says, "the worst thing that can happen to a code base is size". Add a class declaration, a getter, a setter, now call those from everywhere that touches it, and you've added size to your code without an actual (i.e., non-hypothetical) purpose.
I disagree with the other answers:
It depends whether there's any real possibility of adding behavior to the type later [Matthew Flaschen]
No, it doesn’t. …
Never hurts to future-proof the design [Alex]
True, but not relevant here …
Personally, I would be inclined to use a plain String by default [Stephen C]
But this isn’t a matter of opinion. It’s a matter of design decisions:
Is the entity you store logically a string, a piece of text? If yes, then store a string (ignoring the setter issue).
If not – then do not store a string. That data may be stored as a string is an implementation detail, it should not be reflected in your code.
For the second point it’s irrelevant whether you might want to add behaviour later on. All that matters is that in a strongly typed language, the data type should describe the logical entity. If you handle things that are not text (but may be represented by text, may contain text …) then use a class that internally stores said text. Do not store the text directly.
This is the whole point of abstraction and strong typing: let the types represent the semantics of your code.
And finally:
As Yegge says, "the worst thing that can happen to a code base is size". [Ken]
Well, this is so ironic. Have you read any of Steve Yegge’s blog posts? I haven’t, they’re just too damn long.
It depends whether there's any real possibility of adding behavior to the type later. Even if the getters and setters are trivial now, a type makes sense if there is a real chance they could do something later. Otherwise, clear variable names should be sufficient.
In the time spent discussing whether to wrap it in a class, it could be wrapped and done with. Never hurts to future-proof the design, especially when it only takes minimal effort.
I see no reason why the String should be wrapped in a class. The basic perception behind the discussion is, the need of time is a String object. If it gets augmented later, get it refactored then. Why add unnecessary code in the name of future proofing.
Wrapping it in a class provides you with more type safety - in your model you can then only use instances of the wrapper class, and you can't easily make a mistake where you put a string that contains something different into the model.
However, it does add overhead, extra complexity and verbosity to your code.