API Design for Idiot-Proof Iteration Without Generics

API Design for Idiot-Proof Iteration Without Generics - java

When you're designing the API for a code library, you want it to be easy to use well, and hard to use badly. Ideally you want it to be idiot proof.
You might also want to make it compatible with older systems that can't handle generics, like .Net 1.1 and Java 1.4. But you don't want it to be a pain to use from newer code.
I'm wondering about the best way to make things easily iterable in a type-safe way... Remembering that you can't use generics so Java's Iterable<T> is out, as is .Net's IEnumerable<T>.
You want people to be able to use the enhanced for loop in Java (for Item i : items), and the foreach / For Each loop in .Net, and you don't want them to have to do any casting. Basically you want your API to be now-friendly as well as backwards compatible.
The best type-safe option that I can think of is arrays. They're fully backwards compatible and they're easy to iterate in a typesafe way. But arrays aren't ideal because you can't make them immutable. So, when you have an immutable object containing an array that you want people to be able to iterate over, to maintain immutability you have to provide a defensive copy each and every time they access it.
In Java, doing (MyObject[]) myInternalArray.clone(); is super-fast. I'm sure that the equivalent in .Net is super-fast too. If you have like:
class Schedule {
private Appointment[] internalArray;
public Appointment[] appointments() {
return (Appointment[]) internalArray.clone();
}
}
people can do like:
for (Appointment a : schedule.appointments()) {
a.doSomething();
}
and it will be simple, clear, type-safe, and fast.
But they could do something like:
for (int i = 0; i < schedule.appointments().length; i++) {
Appointment a = schedule.appointments()[i];
}
And then it would be horribly inefficient because the entire array of appointments would get cloned twice for every iteration (once for the length test, and once to get the object at the index). Not such a problem if the array is small, but pretty horrible if the array has thousands of items in it. Yuk.
Would anyone actually do that? I'm not sure... I guess that's largely my question here.
You could call the method toAppointmentArray() instead of appointments(), and that would probably make it less likely that anyone would use it the wrong way. But it would also make it harder for people to find when they just want to iterate over the appointments.
You would, of course, document appointments() clearly, to say that it returns a defensive copy. But a lot of people won't read that particular bit of documentation.
Although I'd welcome suggestions, it seems to me that there's no perfect way to make it simple, clear, type-safe, and idiot proof. Have I failed if a minority of people are unwitting cloning arrays thousands of times, or is that an acceptable price to pay for simple, type-safe iteration for the majority?
NB I happen to be designing this library for both Java and .Net, which is why I've tried to make this question applicable to both. And I tagged it language-agnostic because it's an issue that could arise for other languages too. The code samples are in Java, but C# would be similar (albeit with the option of making the Appointments accessor a property).
UPDATE: I did a few quick performance tests to see how much difference this made in Java. I tested:
cloning the array once, and iterating over it using the enhanced for loop
iterating over an ArrayList using
the enhanced for loop
iterating over an unmodifyable
ArrayList (from
Collections.unmodifyableList) using
the enhanced for loop
iterating over the array the bad way (cloning it repeatedly in the length check
and when getting each indexed item).
For 10 objects, the relative speeds (doing multiple repeats and taking the median) were like:
1,000
1,300
1,300
5,000
For 100 objects:
1,300
4,900
6,300
85,500
For 1000 objects:
6,400
51,700
56,200
7,000,300
For 10000 objects:
68,000
445,000
651,000
655,180,000
Rough figures for sure, but enough to convince me of two things:
Cloning, then iterating is definitely
not a performance issue. In fact
it's consistently faster than using a
List. (this is why Java's
enum.values() method returns a
defensive copy of an array instead of
an immutable list.)
If you repeatedly call the method,
repeatedly cloning the array unnecessarily,
performance becomes more and more of an issue the larger the arrays in question. It's pretty horrible. No surprises there.

clone() is fast but not what I would describe as super faster.
If you don't trust people to write loops efficiently, I would not let them write a loop (which also avoids the need for a clone())
interface AppointmentHandler {
public void onAppointment(Appointment appointment);
}
class Schedule {
public void forEachAppointment(AppointmentHandler ah) {
for(Appointment a: internalArray)
ah.onAppointment(a);
}
}

Since you can't really have it both ways, I would suggest that you create a pre generics and a generics version of your API. Ideally, the underlying implementation can be mostly the same, but the fact is, if you want it to be easy to use for anyone using Java 1.5 or later, they will expect the usage of Generics and Iterable and all the newer languange features.
I think the usage of arrays should be non-existent. It does not make for an easy to use API in either case.
NOTE: I have never used C#, but I would expect the same holds true.

As far as failing a minority of the users, those that would call the same method to get the same object on each iteration of the loop would be asking for inefficiency regardless of API design. I think as long as that's well documented, it's not too much to ask that the users obey some semblance of common sense.

Related

AtomicInteger & lambda expressions in single-threaded app

I need to modify a local variable inside a lambda expression in a JButton's ActionListener and since I'm not able to modify it directly, I came across the AtomicInteger type.
I implemented it and it works just fine but I'm not sure if this is a good practice or if it is the correct way to solve this situation.
My code is the following:
newAnchorageButton.addActionListener(e -> {
AtomicInteger anchored = new AtomicInteger();
anchored.set(0);
cbSets.forEach(cbSet ->
cbSet.forEach(cb -> {
if (cb.isSelected())
anchored.incrementAndGet();
})
);
// more code where I use the 'anchored' variable...
}
I'm not sure if this is the right way to solve this since I've read that AtomicInteger is used mostly for concurrency-related applications and this program is single-threaded, but at the same time I can't find another way to solve this.
I could simply use two nested for-loops to go over those arrays but I'm trying to reduce the method's cognitive complexity as much as I can according to the sonarlint vscode extension, and leaving those for-loops theoretically increases the method complexity and therefore its readability and maintainability.
Replacing the for-loops with lambda expressions reduces the cognitive complexity but maybe I shouldn't pay that much attention to it.

While it is safe enough in single-threaded code, it would be better to count them in a functional way, like this:
long anchored = cbSets.stream() // get a stream of the sets
.flatMap(List::stream) // flatten to list of cb's
.filter(JCheckBox::isSelected) // only selected ones
.count(); // count them
Instead of mutating an accumulator, we limit the flattened stream to only the ones we're interested in and ask for the count.
More generally, though, it is always possible to sum things up or generally aggregate the values without a mutable variable. Consider:
record Country(int population) { }
countries.stream()
.mapToInt(Country::population)
.reduce(0, Math::addExact)
Note: we never mutate any values; instead, we combine each successive value with the preceding one, producing a new value. One could use sum() but I prefer reduce(0, Math::addExact) to avoid the possibility of overflow.

and leaving those for-loops theoretically increases the method complexity and therefore its readability and maintainability.
This is obvious horsepuckey. x.forEach(foo -> bar) is not 'cognitively simpler' than for (var foo : x) bar; - you can map each AST node straight over from one to the other.
If a definition is being used to define complexity which concludes that one is significantly more complex than the other, then the only correct conclusion is that the definition is silly and should be fixed or abandoned.
To make it practical: Yes, introducing AtomicInteger, whilst performance wise it won't make one iota of difference, does make the code way more complicated. AtomicInteger's simple existence in the code suggests that concurrency is relevant here. It isn't, so you'd have to add a comment to explain why you're using it. Comments are evil. (They imply the code does not speak for itself, and they cannot be tested in any way). They are often the least evil, but evil they are nonetheless.
The general 'trick' for keeping lambda-based code cognitively easily followed is to embrace the pipeline:
You write some code that 'forms' a stream. This can be as simple as list.stream(), but sometimes you do some stream joining or flatmapping a collection of collections.
You have a pipeline of operations that operate on single elements in the stream and do not refer to the whole or to any neighbour.
At the end, you reduce (using collect, reduce, max - some terminator) such that the reducing method returns what you need.
The above model (and the other answer follows it precisely) tends to result in code that is as readable/complex as the 'old style' code, and rarely (but sometimes!) more readable, and significantly less complicated. Deviate from it and the result is virtually always considerably more complicated - a clear loser.
Not all for loops in java fit the above model. If it doesn't fit, then trying to force that particular square peg into the round hole will take a lot of effort and almost always results in code that is significantly worse: Either an order of magnitude slower or considerably more cognitively complicated.
It also means that it is virtually never 'worth' rewriting perfectly fine readable non-stream based code into stream based code; at best it becomes a percentage point more readable according to some personal tastes, with no significant universally agreed upon improvement.
Turn off that silly linter rule. The fact that it considers the above 'less' complex, and that it evidently determines that for (var foo : x) bar; is 'more complicated' than x.forEach(foo -> bar) is proof enough that it's hurting way more than it is helping.

I have the following to add to the two other answers:
Two general good practices in your code are in question:
Lambdas shouldn't be longer than 3-4 lines
Except in some precise cases, lambdas of stream operations should be stateless.
For #1, consider extracting the code of the lambda to a private method for example, when it's getting too long.
You will probably gain in readability, and you will also probably gain in better separating UI from business logic.
For #2, you are probably not concerned since you are working in a single thread at the moment, but streams can be parallelized, and they may not always execute exactly as you think it does.
For that reason, it's always better to keep the code stateless in stream pipeline operations. Otherwise you might be surprised.
More generally, streams are very good, very concise, but sometimes it's just better to do the same with good old loops.
Don't hesitate to come back to classic loops.
When Sonar tells you that the complexity is too high, in fact, you should try to factorize your code: split into smaller methods, improve the model of your objects, etc.

In Java, should I use ArrayList<Long> or long[] ?

I am writing a program which accepts 400 numbers of type long and will modify some of them depending on conditions at runtime and I want to know whether to use ArrayList<Long> or long[].
Which will be faster to use? I am thinking of using long[] because size is fixed.

When the size is fixed, long[] is faster, but it allows a less maintainable API, because it does not implement the List interface.
Note a long[] is faster for 2 reasons:
Uses primitive longs and not box object Longs (also enables better cache performace, since the longs are allocated contigously and the Longs aren't guaranteed to)
An array is much simpler and more efficient DS.
Nevertheless, for simpler maintainability - I would have used a List<Long>, unless performace is very critical at this part of the program.
If you use this collection very often in a tight loop - and your profiler says it is indeed a bottleneck - I would then switch to a more efficient long[].

As far as the speed goes, it almost does not matter for a list of 400 items. If you need to grow your list dynamically, ArrayList<Long> is better; if the size is fixed, long[] may be better (and a bit faster, although again, you would probably not notice the difference in most situations).

There are few things which haven't been mentioned in other answers:
A generic collection is in actuality a collection of Objects, or better said, this is what Java compiler will make of it. This is while long[] will always remain what it is.
A consequence of the first bullet point is that if you do something that eventually puts something other then Long into your collection, in certain situations the compiler will let it through (because Java type system is unsound, i.e. as an example, it will allow you to upcast and then re-cast to a completely disparate type).
A more general consequence of these two is that Java generics are half-baked and in some less trivial cases such as reflection, serialization etc. may "surprise" you. It is in fact safer to use plain arrays, then generics.
package tld.example;
import java.util.List;
import java.util.ArrayList;
class Example {
static void testArray(long[] longs) {
System.out.println("testArray");
}
static void testGeneric(List<Long> longs) {
System.out.println("testGeneric");
}
#SuppressWarnings("unchecked")
public static void main(String... arguments) {
List<Long> fakeLongs = new ArrayList<Long>();
List<Object> mischiefManaged = (List<Object>)(Object)fakeLongs;
mischiefManaged.add(new Object());
// this call succeeds and prints the value.
// we could sneak in a wrong type into this function
// and it remained unnoticed
testGeneric(fakeLongs);
long[] realLongs = new long[1];
// this will fail because it is not possible to perform this cast
// despite the compiler thinks it is.
Object[] forgedLongs = (Object[])(Object)realLongs;
forgedLongs[0] = new Object();
testArray(realLongs);
}
}
This example is a bit contrived because it is difficult to come up with a short convincing example, but trust me, in less trivial cases, when you have to use reflection and unsafe casts this is quite a possibility.
Now, you have to consider that besides what is reasonable, there is a tradition. Every community has a set of it's customs and tradition. There are a lot of superficial beliefs, such as those voiced here eg. when someone claims that implementing List API is an unconditional goodness, and if this does not happen, then it must be bad... This is not just a dominating opinion, this is what overwhelming majority of Java programmers believe in. After all, it doesn't matter that much, and Java, as a language has a lot more of other shortcomings... so, if you want to secure your job interview or simply avoid conflicts with other Java programmers, then use Java generics, no matter the reason. But if you don't like it - well, perhaps just use some other language ;)

long[] is both much faster and takes much less memory. Read "Effective Java" Item 49: "Prefer primitive types to boxed primitives"

I think better to use ArrayList because it is much much easier to maintain in long runs. In future if the size of your array gets increased beyond 400 then maintaining a long[] has lot of overhead whereas ArrayList grows dynamically so you don't need to worry about increased size.
Also, deletion of element is handled in much better way by ArrayList than static arrays (long[]) as they automatically reorganize the elements so that they still appear as ordered elements.
Static arrays are worse at this.

Why not use one of the many list implementations that work on primitive types? TLongArrayList for instance. It can be used just like Javas List but is based on a long[] array. So you have the advantages of both sides.
HPPC has a short overview of some libraries: https://github.com/carrotsearch/hppc/blob/master/ALTERNATIVES.txt

Why does Scala implement for as a closure?

Recent events on the blogosphere have indicated that a possible performance problem with Scala is its use of closures to implement for.
What are the reasons for this design decision, as opposed to a C or Java-style "primitive for" - that is one which will be turned into a simple loop?
(I'm making a distinction between Java's for and its "foreach" construct here, as the latter involves an implicit Iterator).
More detail, following up from Peter. This bit of Scala:
object ScratchFor {
def main(args : Array[String]) : Unit = {
for (val s <- args) {
println(s)
}
}
}
creates 3 classes: ScratchFor$$anonfun$main$1.class ScratchFor$.class ScratchFor.class
ScratchFor::main just forwards to the companion object, ScratchFor$.MODULE$::main which spins up an ScratchFor$$anonfun$main$1 (which is an implementation of AbstractFunction1).
It's in the apply() method of this anonymous inner impl of AbstractFunction1 that the actual code lives, which is effectively the loop body.
I don't see HotSpot being able to rewrite this into a simple loop. Happy to be proved wrong on this, though.

Traditional for loops are clumsy, verbose and error-prone. I think it is proof enough of this that "for-each" loops where added to Java, C# and C++, but if you want more details you may check item 46 of Effective Java.
Now, for-each loops are still much faster than Scala for-comprehension, but they are also much less powerful (and more clumsy) because they cannot return values. If you want to transform or filter a collection (or do both to a group of collections), you'll still have to handle all the mechanical details of constructing the result collection in addition to computing the values. Not to mention it inevitably uses some mutable state.
Finally, even though for-each loops are adequate enough for collections, they are not suited to other monadic classes (of which collections are a subset of).
So Scala has a general method which takes care of all of the above. Yes, it is slower, but the goal is to have the compiler effectively optimise it well enough so that this doesn't become a hindrance (and, of course, JIT could help here as well).
That has not been accomplished to this date, but -optimise has reduced a lot of ground between common for-each loops and for-comprehensions on the latest versions of Scala. If performance is essential, you can always use while or tail recursion.
Now, it would be possibly for Scala to have common for loops or for-each loops as special cases specifically targeted at performance issues (since for-comprehensions can do everything they do). However, that violates two principles that guide Scala's design:
Reduce complexity. Yes, contrary to what some say, that is a design goal, and special cases that serve no other purpose other than optimise performance -- even though a workable solution exists for performance cases -- would needlessly increase the complexity of the language.
Scalability. This is in the sense that the use can scale the language for any size of problem by writing libraries. The point here is that having the compiler optimise one particular class, such as Range, would make it impossible for the user to create a replacement class that would perform just as well.

The for comprehension in Scala is a powerful general-purpose looping and pattern-matching construct. Look at what it can do:
case class Person(first: String, last: String) {}
val people = List(Person("Isaac","Newton"), Person("Michael","Jordan"))
val lastfirst = for (Person(f,l) <- people) yield l+", "+f
for (n <- lastfirst) println(n)
The second case looks pretty straightforward--take each item in a collection and print it. But the first takes apart a list containing a custom data structure and transforms it into a different collection type!
The first for there highlights only a small portion of the capability of the construct; it is both extremely powerful and extremely general. In order to maintain this power, the for must be able to turn into something very general, which means closures. Then the question is: do you also introduce special cases that operate on known collections in simple ways with improved performance? The answer thus far has been mostly no, instead preferring solutions that optimize the general closure-taking methods that for turns into.
Whether this is useful for you in particular depends on whether you are using the general capabilities a lot (in which case you will be glad) or not (in which case you may wish progress was faster).
Still, try -optimize. It often usefully speeds up simple for-comprehensions these days.

The for-comprehension is much more than a simple loop.
If you need an imperative loop, use while. If you want to write performant code in Scala, you need to know this. Just like you have to know about language implementation when you want to write fast code in every other language.
So, since the for-comprehension is not a simple loop, I hope you understand that it's not compiled down to a simple loop.

I would assume using a closure is a general solution. A more optimal solution in some cases would be to "inline" the closure as a loop and eliminate the need to create an object. Perhaps the Scala designers feel the JIT should do this, rather having the compiler do this.
Let's say in Java this is the same as writing
public static void main(String... args) {
for_loop(args, new Function<String>() {
public void apply(String s) {
System.out.println(s);
}
});
}
interface Function<T> {
void apply(T s);
}
public static <T> void for_loop(T... ts, Function<T> tFunc) {
for(T t: ts) tFunc.apply(t);
}
This is fairly easy to inline (if you're a human). What is surprising is that Scala doesn't have an intrinsic to perform the optimisation to eliminate the need for a new object. Certainly the JIT could do it in theory, but in practise, it might be a while before it handles this specific case.

I'm surprised that no one has mentioned one of the pitfalls you can get into if for does not create a closure.
In Python for example:
ls = [None] * 3
for i in [0, 1, 2]:
ls[i] = lambda: i
print(ls[0]())
print(ls[1]())
print(ls[2]())
This prints 2 2 2, because i has a longer lifetime than the for loop. I run into this trap all the time in Python and R.
So even in the very simplest of cases, it is important for for in Scala to be implemented using an anonymous function, because it creates an environment to store variables.

Casting and Generics, Any performance difference?

I am coding in Android a lot lately, Though I am comfortable in JAVA, but missing some
ideas about core concepts being used there.
I am interested to know whether any performance difference is there between these 2 codes.
First Method:
//Specified as member variable.
ArrayList <String> myList = new ArrayList <String>();
and using as String temp = myList.get(1);
2nd Method:
ArrayList myList = new ArrayList(); //Specified as member variable.
and using
String temp1 = myList.get(1).toString();
I know its about casting. Does the first method has great advantage over the second,
Most of the time in real coding I have to use second method because arraylist can take different data types, I end up specifying
ArrayList <Object> = new ArrayList <Object>();
or more generic way.

In short, there's no performance difference worth worrying about, if it exists at all. Generic information isn't stored at runtime anyway, so there's not really anything else happening to slow things down - and as pointed out by other answers it may even be faster (though even if it hypothetically were slightly slower, I'd still advocate using generics.) It's probably good to get into the habit of not thinking about performance so much on this level. Readability and code quality are generally much more important than micro-optimisations!
In short, generics would be the preferred option since they guarantee type safety and make your code cleaner to read.
In terms of the fact you're storing completely different object types (i.e. not related from some inheritance hierarchy you're using) in an arraylist, that's almost definitely a flaw with your design! I can count the times I've done this on one hand, and it was always a temporary bodge.

Generics aren't reified, which means they go away at runtime. Using generics is preferred for several reasons:
It makes your code clearer, as to which classes are interacting
It keeps it type safe: you can't accidentally add a List to a List
It's faster: casting requires the JVM to test type castability at runtime, in case it needs to throw a ClassCastException. With Generics, the compiler knows what types things must be, and so it doesn't need to check them.

There is a performance difference in that code:
The second method is actually slower.
The reason why:
Generics don't require casting/conversion (your code uses a conversion method, not a cast), the type is already correct. So when you call the toString() method, it is an extra call with extra operations that are unnecessary when using the method with generics.
There wouldn't be a problem with casting, as you are using the toString() method. But you could accidentally add an incorrect object (such as an array of Strings). The toString() method would work properly and not throw an exception, but you would get odd results.

As android is used for Mobiles and handheld devices where resources are limited you have to be careful using while coding.
Casting can be overhead if you are using String data type to store in ArrayList.
So in my opinion you should use first method of being specific.

There is no runtime performance difference because of "type erasure".
But if you are using Java 1.5 or above, you SHOULD use generics and not the weakly typed counterparts.
Advantages of generics --
* The flexibility of dynamic binding, with the advantage of static type-checking. Compiler-detected errors are less expensive to repair than those detected at runtime.
* There is less ambiguity between containers, so code reviews are simpler.
* Using fewer casts makes code cleaner.

Standard API way to check if one array is contained in another

I have two byte[] arrays in a method like this:
private static boolean containsBytes(byte[] body, byte[] checker){
//Code you do not want to ever see here.
}
I want to, using the standard API as much as possible, determine if the series contained in the checker array exists anywhere in the body array.
Right now I'm looking at some nasty code that did a hand-crafted algorithm. The performance of the algorithm is OK, which is about all you can say for it. I'm wondering if there is a more standard api way to accomplish it. Otherwise, I know how to write a readable hand-crafted one.
To get a sense of scale here, the checker array would not be larger than 48 (probably less) and the body might be a few kb large at most.

Not in the standard library (like Jon Skeet said, probably nothing there that does this) but Guava could help you here with its method Bytes.indexOf(byte[] array, byte[] target).
boolean contained = Bytes.indexOf(body, checker) != -1;
Plus, the same method exists in the classes for the other primitive types as well.

I don't know of anything in the standard API to help you here. There may be something in a third party library, although it would potentially need to be implemented repeatedly, once for each primitive type :(
EDIT: I was going to look for Boyer-Moore, but this answer was added on my phone, and I ran out of time :)
Depending on the data and your requirements, you may find that a brute force approach is absolutely fine - and a lot simpler to implement than any of the fancier algorithms available. The simple brute force approach is generally my first port of call - it often turns out to be perfectly adequate :)

You probably already know this, but what you're trying to (re-)implement is basically a string search:
http://en.wikipedia.org/wiki/String_searching_algorithm
The old code might in fact be an implementation of one of the string search algorithms; for better performance, it might be good to implement one of the other algorithms. You didn't mention how often this method is going to be called, which would help to decide whether it's worth doing that.

The collections framework can both cheaply wrap an array in the List interface and search for a sublist. I think this would work reasonably well:
import java.util.Arrays;
import java.util.Collections;
boolean found = Collections.indexOfSubList(Arrays.asList(body), Arrays.asList(checker) >= 0;

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.