AtomicInteger & lambda expressions in single-threaded app

AtomicInteger & lambda expressions in single-threaded app - java

I need to modify a local variable inside a lambda expression in a JButton's ActionListener and since I'm not able to modify it directly, I came across the AtomicInteger type.
I implemented it and it works just fine but I'm not sure if this is a good practice or if it is the correct way to solve this situation.
My code is the following:
newAnchorageButton.addActionListener(e -> {
AtomicInteger anchored = new AtomicInteger();
anchored.set(0);
cbSets.forEach(cbSet ->
cbSet.forEach(cb -> {
if (cb.isSelected())
anchored.incrementAndGet();
})
);
// more code where I use the 'anchored' variable...
}
I'm not sure if this is the right way to solve this since I've read that AtomicInteger is used mostly for concurrency-related applications and this program is single-threaded, but at the same time I can't find another way to solve this.
I could simply use two nested for-loops to go over those arrays but I'm trying to reduce the method's cognitive complexity as much as I can according to the sonarlint vscode extension, and leaving those for-loops theoretically increases the method complexity and therefore its readability and maintainability.
Replacing the for-loops with lambda expressions reduces the cognitive complexity but maybe I shouldn't pay that much attention to it.

While it is safe enough in single-threaded code, it would be better to count them in a functional way, like this:
long anchored = cbSets.stream() // get a stream of the sets
.flatMap(List::stream) // flatten to list of cb's
.filter(JCheckBox::isSelected) // only selected ones
.count(); // count them
Instead of mutating an accumulator, we limit the flattened stream to only the ones we're interested in and ask for the count.
More generally, though, it is always possible to sum things up or generally aggregate the values without a mutable variable. Consider:
record Country(int population) { }
countries.stream()
.mapToInt(Country::population)
.reduce(0, Math::addExact)
Note: we never mutate any values; instead, we combine each successive value with the preceding one, producing a new value. One could use sum() but I prefer reduce(0, Math::addExact) to avoid the possibility of overflow.

and leaving those for-loops theoretically increases the method complexity and therefore its readability and maintainability.
This is obvious horsepuckey. x.forEach(foo -> bar) is not 'cognitively simpler' than for (var foo : x) bar; - you can map each AST node straight over from one to the other.
If a definition is being used to define complexity which concludes that one is significantly more complex than the other, then the only correct conclusion is that the definition is silly and should be fixed or abandoned.
To make it practical: Yes, introducing AtomicInteger, whilst performance wise it won't make one iota of difference, does make the code way more complicated. AtomicInteger's simple existence in the code suggests that concurrency is relevant here. It isn't, so you'd have to add a comment to explain why you're using it. Comments are evil. (They imply the code does not speak for itself, and they cannot be tested in any way). They are often the least evil, but evil they are nonetheless.
The general 'trick' for keeping lambda-based code cognitively easily followed is to embrace the pipeline:
You write some code that 'forms' a stream. This can be as simple as list.stream(), but sometimes you do some stream joining or flatmapping a collection of collections.
You have a pipeline of operations that operate on single elements in the stream and do not refer to the whole or to any neighbour.
At the end, you reduce (using collect, reduce, max - some terminator) such that the reducing method returns what you need.
The above model (and the other answer follows it precisely) tends to result in code that is as readable/complex as the 'old style' code, and rarely (but sometimes!) more readable, and significantly less complicated. Deviate from it and the result is virtually always considerably more complicated - a clear loser.
Not all for loops in java fit the above model. If it doesn't fit, then trying to force that particular square peg into the round hole will take a lot of effort and almost always results in code that is significantly worse: Either an order of magnitude slower or considerably more cognitively complicated.
It also means that it is virtually never 'worth' rewriting perfectly fine readable non-stream based code into stream based code; at best it becomes a percentage point more readable according to some personal tastes, with no significant universally agreed upon improvement.
Turn off that silly linter rule. The fact that it considers the above 'less' complex, and that it evidently determines that for (var foo : x) bar; is 'more complicated' than x.forEach(foo -> bar) is proof enough that it's hurting way more than it is helping.

I have the following to add to the two other answers:
Two general good practices in your code are in question:
Lambdas shouldn't be longer than 3-4 lines
Except in some precise cases, lambdas of stream operations should be stateless.
For #1, consider extracting the code of the lambda to a private method for example, when it's getting too long.
You will probably gain in readability, and you will also probably gain in better separating UI from business logic.
For #2, you are probably not concerned since you are working in a single thread at the moment, but streams can be parallelized, and they may not always execute exactly as you think it does.
For that reason, it's always better to keep the code stateless in stream pipeline operations. Otherwise you might be surprised.
More generally, streams are very good, very concise, but sometimes it's just better to do the same with good old loops.
Don't hesitate to come back to classic loops.
When Sonar tells you that the complexity is too high, in fact, you should try to factorize your code: split into smaller methods, improve the model of your objects, etc.

Related

How to decide between lambda iteration and normal loop?

Since he introduction of Java 8 I got really hooked to lambdas and started using them whenever possible, mostly to start getting accustomed to them. One of the most common usage is when we want to iterate and act upon a collection of objects in which case I either resort to forEach or stream(). I rarely write the old for(T t : Ts) loop and I almost forgot about the for(int i = 0.....).
However, we were discussing this with my supervisor the other day and he told me that lambdas aren't always the best choice and can sometimes hinder performance. From a lecture I had seen on this new feature I got the feeling that lambda iterations are always fully optimized by the compiler and will (always?) be better than bare iterations, but he begs to differ. Is this true? If yes how do I distinguish between the best solution in each scenario?
P.S: I'm not talking about cases where it is recommended to apply parallelStream. Obviously those will be faster.

Performance depends on so many factors, that it’s hard to predict. Normally, we would say, if your supervisor claims that there was a problem with performance, your supervisor is in charge of explaining what problem.
One thing someone might be afraid of, is that behind the scenes, a class is generated for each lambda creation site (with the current implementation), so if the code in question is executed only once, this might be considered a waste of resources. This harmonizes with the fact that lambda expressions have a higher initialization overhead as the ordinary imperative code (we are not comparing to inner classes here), so inside class initializers, which only run once, you might consider avoiding it. This is also in line with the fact, that you should never use parallel streams in class initializers, so this potential advantage isn’t available here anyway.
For ordinary, frequently executed code that is likely to be optimized by the JVM, these problems do not arise. As you supposed correctly, classes generated for lambda expressions get the same treatment (optimizations) as other classes. At these places, calling forEach on collections bears the potential of being more efficient than a for loop.
The temporary object instances created for an Iterator or the lambda expression are negligible, however, it might be worth noting that a foreach loop will always create an Iterator instance whereas lambda expression do not always do. While the default implementation of Iterable.forEach will create an Iterator as well, some of the most often used collections take the opportunity to provide a specialized implementation, most notably ArrayList.
The ArrayList’s forEach is basically a for loop over an array, without any Iterator. It will then invoke the accept method of the Consumer, which will be a generated class containing a trivial delegation to the synthetic method containing the code of you lambda expression. To optimize the entire loop, the horizon of the optimizer has to span the ArrayList’s loop over an array (a common idiom recognizable for an optimizer), the synthetic accept method containing a trivial delegation and the method containing your actual code.
In contrast, when iterating over the same list using a foreach loop, an Iterator implementation is created containing the ArrayList iteration logic, spread over two methods, hasNext() and next() and instance variables of the Iterator. The loop will repeatedly invoke the hasNext() method to check the end condition (index<size) and next() which will recheck the condition before returning the element, as there is no guaranty that the caller does properly invoke hasNext() before next(). Of course, an optimizer is capable of removing this duplication, but that requires more effort than not having it in the first place. So to get the same performance of the forEach method, the optimizer’s horizon has to span your loop code, the nontrivial hasNext() implementation and the nontrivial next() implementation.
Similar things may apply to other collections having a specialized forEach implementation as well. This also applies to Stream operations, if the source provides a specialized Spliterator implementation, which does not spread the iteration logic over two methods like an Iterator.
So if you want to discuss the technical aspects of foreach vs. forEach(…), you may use these information.
But as said, these aspects describe only potential performance aspects as the work of the optimizer and other runtime environmental aspects may change the outcome completely. I think, as a rule of thumb, the smaller the loop body/action is, the more appropriate is the forEach method. This harmonizes perfectly with the guideline of avoiding overly long lambda expressions anyway.

It depends on specific implementation.
In general forEach method and foreach loop over Iterator usually have pretty similar performance as they use similar level of abstraction. stream() is usually slower (often by 50-70%) as it adds another level that provides access to the underlying collection.
The advantages of stream() generally are the possible parallelism and easy chaining of the operations with lot of reusable ones provided by JDK.

Why does Java CharSequence.chars() return an IntStream? [duplicate]

In Java 8, there is a new method String.chars() which returns a stream of ints (IntStream) that represent the character codes. I guess many people would expect a stream of chars here instead. What was the motivation to design the API this way?

As others have already mentioned, the design decision behind this was to prevent the explosion of methods and classes.
Still, personally I think this was a very bad decision, and there should, given they do not want to make CharStream, which is reasonable, different methods instead of chars(), I would think of:
Stream<Character> chars(), that gives a stream of boxes characters, which will have some light performance penalty.
IntStream unboxedChars(), which would to be used for performance code.
However, instead of focusing on why it is done this way currently, I think this answer should focus on showing a way to do it with the API that we have gotten with Java 8.
In Java 7 I would have done it like this:
for (int i = 0; i < hello.length(); i++) {
System.out.println(hello.charAt(i));
}
And I think a reasonable method to do it in Java 8 is the following:
hello.chars()
.mapToObj(i -> (char)i)
.forEach(System.out::println);
Here I obtain an IntStream and map it to an object via the lambda i -> (char)i, this will automatically box it into a Stream<Character>, and then we can do what we want, and still use method references as a plus.
Be aware though that you must do mapToObj, if you forget and use map, then nothing will complain, but you will still end up with an IntStream, and you might be left off wondering why it prints the integer values instead of the strings representing the characters.
Other ugly alternatives for Java 8:
By remaining in an IntStream and wanting to print them ultimately, you cannot use method references anymore for printing:
hello.chars()
.forEach(i -> System.out.println((char)i));
Moreover, using method references to your own method do not work anymore! Consider the following:
private void print(char c) {
System.out.println(c);
}
and then
hello.chars()
.forEach(this::print);
This will give a compile error, as there possibly is a lossy conversion.
Conclusion:
The API was designed this way because of not wanting to add CharStream, I personally think that the method should return a Stream<Character>, and the workaround currently is to use mapToObj(i -> (char)i) on an IntStream to be able to work properly with them.

The answer from skiwi covered many of the major points already. I'll fill in a bit more background.
The design of any API is a series of tradeoffs. In Java, one of the difficult issues is dealing with design decisions that were made long ago.
Primitives have been in Java since 1.0. They make Java an "impure" object-oriented language, since the primitives are not objects. The addition of primitives was, I believe, a pragmatic decision to improve performance at the expense of object-oriented purity.
This is a tradeoff we're still living with today, nearly 20 years later. The autoboxing feature added in Java 5 mostly eliminated the need to clutter source code with boxing and unboxing method calls, but the overhead is still there. In many cases it's not noticeable. However, if you were to perform boxing or unboxing within an inner loop, you'd see that it can impose significant CPU and garbage collection overhead.
When designing the Streams API, it was clear that we had to support primitives. The boxing/unboxing overhead would kill any performance benefit from parallelism. We didn't want to support all of the primitives, though, since that would have added a huge amount of clutter to the API. (Can you really see a use for a ShortStream?) "All" or "none" are comfortable places for a design to be, yet neither was acceptable. So we had to find a reasonable value of "some". We ended up with primitive specializations for int, long, and double. (Personally I would have left out int but that's just me.)
For CharSequence.chars() we considered returning Stream<Character> (an early prototype might have implemented this) but it was rejected because of boxing overhead. Considering that a String has char values as primitives, it would seem to be a mistake to impose boxing unconditionally when the caller would probably just do a bit of processing on the value and unbox it right back into a string.
We also considered a CharStream primitive specialization, but its use would seem to be quite narrow compared to the amount of bulk it would add to the API. It didn't seem worthwhile to add it.
The penalty this imposes on callers is that they have to know that the IntStream contains char values represented as ints and that casting must be done at the proper place. This is doubly confusing because there are overloaded API calls like PrintStream.print(char) and PrintStream.print(int) that differ markedly in their behavior. An additional point of confusion possibly arises because the codePoints() call also returns an IntStream but the values it contains are quite different.
So, this boils down to choosing pragmatically among several alternatives:
We could provide no primitive specializations, resulting in a simple, elegant, consistent API, but which imposes a high performance and GC overhead;
we could provide a complete set of primitive specializations, at the cost of cluttering up the API and imposing a maintenance burden on JDK developers; or
we could provide a subset of primitive specializations, giving a moderately sized, high performing API that imposes a relatively small burden on callers in a fairly narrow range of use cases (char processing).
We chose the last one.

Calling getters on an object vs. storing it as a local variable (memory footprint, performance)

In the following piece of code we make a call listType.getDescription() twice:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
if (listType.getDescription() != null)
{
children.add(new SelectItem( listType.getId() , listType.getDescription()));
}
}
I would tend to refactor the code to use a single variable:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
String description = listType.getDescription();
if (description != null)
{
children.add(new SelectItem(listType.getId() ,description));
}
}
My understanding is the JVM is somehow optimized for the original code and especially nesting calls like children.add(new SelectItem(listType.getId(), listType.getDescription()));.
Comparing the two options, which one is the preferred method and why? That is in terms of memory footprint, performance, readability/ease, and others that don't come to my mind right now.
When does the latter code snippet become more advantageous over the former, that is, is there any (approximate) number of listType.getDescription() calls when using a temp local variable becomes more desirable, as listType.getDescription() always requires some stack operations to store the this object?

I'd nearly always prefer the local variable solution.
Memory footprint
A single local variable costs 4 or 8 bytes. It's a reference and there's no recursion, so let's ignore it.
Performance
If this is a simple getter, the JVM can memoize it itself, so there's no difference. If it's a expensive call which can't be optimized, memoizing manually makes it faster.
Readability
Follow the DRY principle. In your case it hardly matters as the local variable name is character-wise as about as long as the method call, but for anything more complicated, it's readability as you don't have to find the 10 differences between the two expressions. If you know they're the same, so make it clear using the local variable.
Correctness
Imagine your SelectItem does not accept nulls and your program is multithreaded. The value of listType.getDescription() can change in the meantime and you're toasted.
Debugging
Having a local variable containing an interesting value is an advantage.
The only thing to win by omitting the local variable is saving one line. So I'd do it only in cases when it really doesn't matter:
very short expression
no possible concurrent modification
simple private final getter

I think the way number two is definitely better because it improves readability and maintainability of your code which is the most important thing here. This kind of micro-optimization won't really help you in anything unless you writing an application where every millisecond is important.

I'm not sure either is preferred. What I would prefer is clearly readable code over performant code, especially when that performance gain is negligible. In this case I suspect there's next to no noticeable difference (especially given the JVM's optimisations and code-rewriting capabilities)

In the context of imperative languages, the value returned by a function call cannot be memoized (See http://en.m.wikipedia.org/wiki/Memoization) because there is no guarantee that the function has no side effect. Accordingly, your strategy does indeed avoid a function call at the expense of allocating a temporary variable to store a reference to the value returned by the function call.
In addition to being slightly more efficient (which does not really matter unless the function is called many times in a loop), I would opt for your style due to better code readability.

I agree on everything. About the readability I'd like to add something:
I see lots of programmers doing things like:
if (item.getFirst().getSecond().getThird().getForth() == 1 ||
item.getFirst().getSecond().getThird().getForth() == 2 ||
item.getFirst().getSecond().getThird().getForth() == 3)
Or even worse:
item.getFirst().getSecond().getThird().setForth(item2.getFirst().getSecond().getThird().getForth())
If you are calling the same chain of 10 getters several times, please, use an intermediate variable. It's just much easier to read and debug

I would agree with the local variable approach for readability only if the local variable's name is self-documenting. Calling it "description" wouldn't be enough (which description?). Calling it "selectableListTypeDescription" would make it clear. I would throw in that the incremented variable in the for loop should be named "selectableListType" (especially if the "listTypeManager" has accessors for other ListTypes).
The other reason would be if there's no guarantee this is single-threaded or your list is immutable.

Why does Scala implement for as a closure?

Recent events on the blogosphere have indicated that a possible performance problem with Scala is its use of closures to implement for.
What are the reasons for this design decision, as opposed to a C or Java-style "primitive for" - that is one which will be turned into a simple loop?
(I'm making a distinction between Java's for and its "foreach" construct here, as the latter involves an implicit Iterator).
More detail, following up from Peter. This bit of Scala:
object ScratchFor {
def main(args : Array[String]) : Unit = {
for (val s <- args) {
println(s)
}
}
}
creates 3 classes: ScratchFor$$anonfun$main$1.class ScratchFor$.class ScratchFor.class
ScratchFor::main just forwards to the companion object, ScratchFor$.MODULE$::main which spins up an ScratchFor$$anonfun$main$1 (which is an implementation of AbstractFunction1).
It's in the apply() method of this anonymous inner impl of AbstractFunction1 that the actual code lives, which is effectively the loop body.
I don't see HotSpot being able to rewrite this into a simple loop. Happy to be proved wrong on this, though.

Traditional for loops are clumsy, verbose and error-prone. I think it is proof enough of this that "for-each" loops where added to Java, C# and C++, but if you want more details you may check item 46 of Effective Java.
Now, for-each loops are still much faster than Scala for-comprehension, but they are also much less powerful (and more clumsy) because they cannot return values. If you want to transform or filter a collection (or do both to a group of collections), you'll still have to handle all the mechanical details of constructing the result collection in addition to computing the values. Not to mention it inevitably uses some mutable state.
Finally, even though for-each loops are adequate enough for collections, they are not suited to other monadic classes (of which collections are a subset of).
So Scala has a general method which takes care of all of the above. Yes, it is slower, but the goal is to have the compiler effectively optimise it well enough so that this doesn't become a hindrance (and, of course, JIT could help here as well).
That has not been accomplished to this date, but -optimise has reduced a lot of ground between common for-each loops and for-comprehensions on the latest versions of Scala. If performance is essential, you can always use while or tail recursion.
Now, it would be possibly for Scala to have common for loops or for-each loops as special cases specifically targeted at performance issues (since for-comprehensions can do everything they do). However, that violates two principles that guide Scala's design:
Reduce complexity. Yes, contrary to what some say, that is a design goal, and special cases that serve no other purpose other than optimise performance -- even though a workable solution exists for performance cases -- would needlessly increase the complexity of the language.
Scalability. This is in the sense that the use can scale the language for any size of problem by writing libraries. The point here is that having the compiler optimise one particular class, such as Range, would make it impossible for the user to create a replacement class that would perform just as well.

The for comprehension in Scala is a powerful general-purpose looping and pattern-matching construct. Look at what it can do:
case class Person(first: String, last: String) {}
val people = List(Person("Isaac","Newton"), Person("Michael","Jordan"))
val lastfirst = for (Person(f,l) <- people) yield l+", "+f
for (n <- lastfirst) println(n)
The second case looks pretty straightforward--take each item in a collection and print it. But the first takes apart a list containing a custom data structure and transforms it into a different collection type!
The first for there highlights only a small portion of the capability of the construct; it is both extremely powerful and extremely general. In order to maintain this power, the for must be able to turn into something very general, which means closures. Then the question is: do you also introduce special cases that operate on known collections in simple ways with improved performance? The answer thus far has been mostly no, instead preferring solutions that optimize the general closure-taking methods that for turns into.
Whether this is useful for you in particular depends on whether you are using the general capabilities a lot (in which case you will be glad) or not (in which case you may wish progress was faster).
Still, try -optimize. It often usefully speeds up simple for-comprehensions these days.

The for-comprehension is much more than a simple loop.
If you need an imperative loop, use while. If you want to write performant code in Scala, you need to know this. Just like you have to know about language implementation when you want to write fast code in every other language.
So, since the for-comprehension is not a simple loop, I hope you understand that it's not compiled down to a simple loop.

I would assume using a closure is a general solution. A more optimal solution in some cases would be to "inline" the closure as a loop and eliminate the need to create an object. Perhaps the Scala designers feel the JIT should do this, rather having the compiler do this.
Let's say in Java this is the same as writing
public static void main(String... args) {
for_loop(args, new Function<String>() {
public void apply(String s) {
System.out.println(s);
}
});
}
interface Function<T> {
void apply(T s);
}
public static <T> void for_loop(T... ts, Function<T> tFunc) {
for(T t: ts) tFunc.apply(t);
}
This is fairly easy to inline (if you're a human). What is surprising is that Scala doesn't have an intrinsic to perform the optimisation to eliminate the need for a new object. Certainly the JIT could do it in theory, but in practise, it might be a while before it handles this specific case.

I'm surprised that no one has mentioned one of the pitfalls you can get into if for does not create a closure.
In Python for example:
ls = [None] * 3
for i in [0, 1, 2]:
ls[i] = lambda: i
print(ls[0]())
print(ls[1]())
print(ls[2]())
This prints 2 2 2, because i has a longer lifetime than the for loop. I run into this trap all the time in Python and R.
So even in the very simplest of cases, it is important for for in Scala to be implemented using an anonymous function, because it creates an environment to store variables.

API Design for Idiot-Proof Iteration Without Generics

When you're designing the API for a code library, you want it to be easy to use well, and hard to use badly. Ideally you want it to be idiot proof.
You might also want to make it compatible with older systems that can't handle generics, like .Net 1.1 and Java 1.4. But you don't want it to be a pain to use from newer code.
I'm wondering about the best way to make things easily iterable in a type-safe way... Remembering that you can't use generics so Java's Iterable<T> is out, as is .Net's IEnumerable<T>.
You want people to be able to use the enhanced for loop in Java (for Item i : items), and the foreach / For Each loop in .Net, and you don't want them to have to do any casting. Basically you want your API to be now-friendly as well as backwards compatible.
The best type-safe option that I can think of is arrays. They're fully backwards compatible and they're easy to iterate in a typesafe way. But arrays aren't ideal because you can't make them immutable. So, when you have an immutable object containing an array that you want people to be able to iterate over, to maintain immutability you have to provide a defensive copy each and every time they access it.
In Java, doing (MyObject[]) myInternalArray.clone(); is super-fast. I'm sure that the equivalent in .Net is super-fast too. If you have like:
class Schedule {
private Appointment[] internalArray;
public Appointment[] appointments() {
return (Appointment[]) internalArray.clone();
}
}
people can do like:
for (Appointment a : schedule.appointments()) {
a.doSomething();
}
and it will be simple, clear, type-safe, and fast.
But they could do something like:
for (int i = 0; i < schedule.appointments().length; i++) {
Appointment a = schedule.appointments()[i];
}
And then it would be horribly inefficient because the entire array of appointments would get cloned twice for every iteration (once for the length test, and once to get the object at the index). Not such a problem if the array is small, but pretty horrible if the array has thousands of items in it. Yuk.
Would anyone actually do that? I'm not sure... I guess that's largely my question here.
You could call the method toAppointmentArray() instead of appointments(), and that would probably make it less likely that anyone would use it the wrong way. But it would also make it harder for people to find when they just want to iterate over the appointments.
You would, of course, document appointments() clearly, to say that it returns a defensive copy. But a lot of people won't read that particular bit of documentation.
Although I'd welcome suggestions, it seems to me that there's no perfect way to make it simple, clear, type-safe, and idiot proof. Have I failed if a minority of people are unwitting cloning arrays thousands of times, or is that an acceptable price to pay for simple, type-safe iteration for the majority?
NB I happen to be designing this library for both Java and .Net, which is why I've tried to make this question applicable to both. And I tagged it language-agnostic because it's an issue that could arise for other languages too. The code samples are in Java, but C# would be similar (albeit with the option of making the Appointments accessor a property).
UPDATE: I did a few quick performance tests to see how much difference this made in Java. I tested:
cloning the array once, and iterating over it using the enhanced for loop
iterating over an ArrayList using
the enhanced for loop
iterating over an unmodifyable
ArrayList (from
Collections.unmodifyableList) using
the enhanced for loop
iterating over the array the bad way (cloning it repeatedly in the length check
and when getting each indexed item).
For 10 objects, the relative speeds (doing multiple repeats and taking the median) were like:
1,000
1,300
1,300
5,000
For 100 objects:
1,300
4,900
6,300
85,500
For 1000 objects:
6,400
51,700
56,200
7,000,300
For 10000 objects:
68,000
445,000
651,000
655,180,000
Rough figures for sure, but enough to convince me of two things:
Cloning, then iterating is definitely
not a performance issue. In fact
it's consistently faster than using a
List. (this is why Java's
enum.values() method returns a
defensive copy of an array instead of
an immutable list.)
If you repeatedly call the method,
repeatedly cloning the array unnecessarily,
performance becomes more and more of an issue the larger the arrays in question. It's pretty horrible. No surprises there.

clone() is fast but not what I would describe as super faster.
If you don't trust people to write loops efficiently, I would not let them write a loop (which also avoids the need for a clone())
interface AppointmentHandler {
public void onAppointment(Appointment appointment);
}
class Schedule {
public void forEachAppointment(AppointmentHandler ah) {
for(Appointment a: internalArray)
ah.onAppointment(a);
}
}

Since you can't really have it both ways, I would suggest that you create a pre generics and a generics version of your API. Ideally, the underlying implementation can be mostly the same, but the fact is, if you want it to be easy to use for anyone using Java 1.5 or later, they will expect the usage of Generics and Iterable and all the newer languange features.
I think the usage of arrays should be non-existent. It does not make for an easy to use API in either case.
NOTE: I have never used C#, but I would expect the same holds true.

As far as failing a minority of the users, those that would call the same method to get the same object on each iteration of the loop would be asking for inefficiency regardless of API design. I think as long as that's well documented, it's not too much to ask that the users obey some semblance of common sense.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.