Usage of Java 9 collection factories - java

In the context of the comments and answers given at List.of() or Collections.emptyList() and List.of(...) or Collections.unmodifiableList() I came up with two following rules of thumb (which also apply to Set and Map factories accordingly).
Don't replace all occurrences
Keep using Collections.emptyList() for readability and when e.g. initializing lazy field members like:
class Bean {
private List<Bean> beans = Collection.emptyList();
public List<Bean> getBeans() {
if (beans == Collections.EMPTY_LIST) { beans = new ArrayList<>(); }
return beans;
}
}
Use new factories as method argument builders
Use new factories List.of() and variants as quick and less-to-type version, when calling an executable with List parameter(s). Here are my current substitution patterns:
Collections.emptyList() --> List.of()
Collections.singletonList(a) --> List.of(a)
Arrays.asList(a, ..., z) --> List.of(a, ..., z)
In a fictional usage of Collections.indexOfSubList, the following lines
Collections.indexOfSubList(Arrays.asList(1, 2, 3), Collections.emptyList());
Collections.indexOfSubList(Arrays.asList(1, 2, 3), Collections.singletonList(1));
Collections.indexOfSubList(Arrays.asList(1, 2, 3), Arrays.asList(1));
Collections.indexOfSubList(Arrays.asList(1, 2, 3), Arrays.asList(2, 3));
Collections.indexOfSubList(Arrays.asList(1, 2, 3), Arrays.asList(1, 2, 3));
will read
Collections.indexOfSubList(List.of(1, 2, 3), List.of());
Collections.indexOfSubList(List.of(1, 2, 3), List.of(1));
Collections.indexOfSubList(List.of(1, 2, 3), List.of(1));
Collections.indexOfSubList(List.of(1, 2, 3), List.of(2, 3));
Collections.indexOfSubList(List.of(1, 2, 3), List.of(1, 2, 3));
Do you (dis-)agree?

Generally, use of the new factories is safe for new code, where there is no existing code that depends on behaviors of the existing collections.
There are several reasons the new collection factories aren't drop-in replacements for code that initializes collections using the existing APIs. Obviously immutability is one of the most prominent reasons; if you need to modify the collection later, it obviously can't be immutable! But there are other reasons as well, some of them quite subtle.
For an example of replacement of existing APIs with the new APIs, see JDK-8134373. The review threads are here: Part1 Part2.
Here's a rundown of the issues.
Array Wrapping vs Copying. Sometimes you have an array, e.g. a varargs parameter, and you want to process it as a list. Sometimes Arrays.asList is the most appropriate thing here, as it's just a wrapper. By contrast, List.of creates a copy, which might be wasteful. On the other hand, the caller still has a handle to the wrapped array and can modify it, which might be a problem, so sometimes you want to pay the expense of copying it, for example, if you want to keep a reference to the list in an instance variable.
Hashed Collection Iteration Order. The new Set.of and Map.of structures randomize their iteration order. The iteration order of HashSet and HashMap is undefined, but in practice it turns out to be relatively stable. Code can develop inadvertent dependencies on iteration order. Switching to the new collection factories may expose old code to iteration order dependencies, surfacing latent bugs.
Prohibition of Nulls. The new collections prohibit nulls entirely, whereas the common non-concurrent collections (ArrayList, HashMap) allow them.
Serialization Format. The new collections have a different serialization format from the old ones. If the collection is serialized, or it's stored in some other class that's serialized, the serialized output will differ. This might or might not be an issue. But if you expect to interoperate with other systems, this could be a problem. In particular, if you transmit the serialized form of the new collections to a Java 8 JVM, it will fail to deserialize, because the new classes don't exist on Java 8.
Strict Mutator Method Behavior. The new collections are immutable, so of course they throw UnsupportedOperationException when mutator methods are called. There are some edge cases, however, where behavior is not consistent across all the collections. For example,
Collections.singletonList("").addAll(Collections.emptyList())
does nothing, whereas
List.of("").addAll(Collections.emptyList())
will throw UOE. In general, the new collections and the unmodifiable wrappers are consistently strict in throwing UOE on any call to a mutator method, even if no actual mutation would occur. Other immutable collections, such as those from Collections.empty* and Collections.singleton*, will throw UOE only if an actual mutation would occur.
Duplicates. The new Set and Map factories reject duplicate elements and keys. This is usually not a problem if you're initializing a collection with a list of constants. Indeed, if a list of constants has a duplicate, it's probably a bug. Where this is potentially an issue is when a caller is allowed to pass in a collection or array (e.g., varags) of elements. If the caller passes in duplicates, the existing APIs would silently omit the duplicates, whereas the new factories will throw IllegalArgumentException. This is a behavioral change that might impact callers.
None of these issues are fatal problems, but they are behavioral differences that you should be aware of when retrofitting existing code. Unfortunately this means that doing a mass replacement of existing calls with the new collection factories is probably ill-advised. It's probably necessary to do some inspection at each site to assess any potential impact of the behavioral changes.

(Im)Mutability
First of all, it is important to note that the collection factories return immutable variants. Unfortunately, this does not show in the type system so you have to track that manually / mentally. This already forbids some replacements that might otherwise be worthwile, so it must become 0. in your list of rules. :)
For example, creating a collection of seed elements that are later modified by other code might look like this:
private final Set<String> commonLetters = initialCommonLetters()
private static Set<String> initialCommonLetters() {
Set<String> letters = new HashSet<>();
letters.add("a");
letters.add("e");
return letters;
}
Would be great to simply write commonLetters = Set.of("a", "e"); but this will likely break other code as the returned set is immutable.
Constants
The (im)mutability discussion immediately leads to constants. This was a major reason to introduce them! Gone are the days where you need a static initializer block to create that COMMON_LETTERS constant. This would hence be the place where I would look first for use cases.
Replacing
As you say, there seems to be no reason to start replacing calls to Collections::empty..., Collections::singleton..., or Arrays::asList just for the fun of it. What I would do, though, as soon as I start using the new methods in a class I would replace the old variants as well to have the code rely on fewer concepts, making understanding it easier.
Preference
The last argument is also one that could apply to the of() variants in general. While Collections::empty... and Collections::singleton... are somewhat clearer about their intent, I slightly tend towards saying that always using of, no matter how many arguments you have, offsets that advantage by writing code that, as a whole, uses less concepts.
I see no reason to continue using Arrays::asList.

Related

Java List of() static method

I am executing below code snippet
System.out.println(List.of(1, 2).getClass());
System.out.println(List.of(1, 2, 3).getClass());
output of this code is;
class java.util.ImmutableCollections$List2
class java.util.ImmutableCollections$ListN
I am expecting java.util.ImmutableCollections$List3 as output for the second statement because there is of() method which takes three parameter, Why java creating ImmutableCollections$ListN but not ImmutableCollections$List3?
Edited: It is Java-9 question. There are total 11 overloaded of() methods in List interface each of them takes variable number of parameters from zero to 10 and eleventh one takes varargs to handle N list. So I am expecting List0 to List10 implementation for first 10 overloaded methods, but it is returning ListN with three parameters. Yes, it is implementation detail but just curious to know more information of this.
The main reason to have several different private implementations of List is to save space.
Consider an implementation that stores its elements in an array. (This is essentially what ListN does.) In Hotspot (64-bit with compressed object pointers, each 4 bytes) each object requires a 12-byte header. The ListN object has a single field containing the array, for a total of 16 bytes. An array is a separate object, so that has another 12-byte header plus a 4-byte length. That's another 16 bytes, not counting any actual elements stored. If we're storing two elements, they take 8 bytes. That brings the total to 40 bytes for storing a two-element list. That's quite a bit of overhead!
If we were to store the elements of a small list in fields instead of an array, that object would have a header (12 bytes) plus two fields (8 bytes) for a total of 20 bytes -- half the size. For small lists, there's a considerable savings with storing elements in fields of the List object itself instead of in an array, which is a separate object. This is what the old List2 implementation did. It's recently been superseded by the List12 implementation, which can store lists of one or two elements in fields.
Now, in the API there are 12 overloaded List.of() methods: zero to ten fixed args plus varargs. Shouldn't there be corresponding List0 through List10 and ListN implementations?
There could be, but there doesn't necessarily have to be. An early prototype of these implementations had the optimized small list implementations tied to the APIs. So the zero, one, and two fixed arg of() methods created instances of List0, List1, and List2, and the varargs List.of() method created an instance of ListN. This was fairly straightforward, but it was quite restrictive. We wanted to be able to add, remove, or rearrange implementations at will. It's considerably more difficult to change APIs, since we have to remain compatible. Thus, we decided to decouple things so that the number of arguments in the APIs was largely independent of the implementation instantiated underneath.
In JDK 9 we ended up with the 12 overloads in the API, but only four implementations: field-based implementations holding 0, 1, and 2 elements, and an array-based implementation holding an arbitrary number. Why not add more field-based implementations? Diminishing returns and code bloat. Most lists have few elements, and there's an exponential dropoff in the occurrences of lists as the number of elements gets larger. The space savings get relatively smaller compared to an array-based implementation. Then there's the matter of maintaining all those extra implementations. Either they'd have to be entered directly in the source code (bulky) or we'd switch over to a code generation scheme (complex). Neither seemed justified.
Our startup performance guru Claes Redestad did some measurements and found that there was a speedup in having fewer list implementations. The reason is megamorphic dispatch. Briefly, if the JVM is compiling the code for a virtual call site and it can determine that only one or two different implementations are called, it can optimize this well. But if there are many different implementations that can be called, it has to go through a slower path. (See this article for Black Magic details.)
For the list implementations, it turns out that we can get by with fewer implementations without losing much space. The List1 and List2 implementations can be combined into a two-field List12 implementation, with the second field being null if there's only one element. We only need one zero-length list, since it's immutable! For a zero-length list, we can get rid of List0 just use a ListN with a zero-length array. It's bigger than an old List0 instance, but we don't care, since there's only one of them.
These changes just went into the JDK 11 mainline. Since the API is completely decoupled from the implementations, there is no compatibility issue.
There are additional possibilities for future enhancements. One potential optimization is to fuse an array onto the end of an object, so the object has a fixed part and a variable-length part. This will avoid the need for an array object's header, and it will probably improve locality of reference. Another potential optimization is with value types. With value types, it might be possible to avoid heap allocation entirely, at least for small lists. Of course, this is all highly speculative. But if new features come along in the JVM, we can take advantage of them in the implementations, since they're are entirely hidden behind the API.
ListN is the all-purpose version. List2 is an optimised implementation. There is no such optimised implementation for a list with three elements.
There currently exist* optimised versions for lists and sets with zero, one and two elements. List0, List1, List2, Set0 etc...
There's also an optimised implementation for an empty map, Map0, and for a map containing a single key-value pair, Map1.
Discussion relating to how these implementations are able to provide performance improvements can been seen in JDK-8166365.
*bear in mind this is an implementation detail which may be subject to change, and actually is due to change fairly soon
Neither ImmutableCollections$List2 nor ImmutableCollections$ListN is generated at runtime. There are four classes already written:
static final class List0<E> extends AbstractImmutableList<E> { ... }
static final class List1<E> extends AbstractImmutableList<E> { ... }
static final class List2<E> extends AbstractImmutableList<E> { ... }
static final class ListN<E> extends AbstractImmutableList<E> { ... }
Starting with of(E e1, E e2, E e3) and up to of(E e1, ..., E e10) an instance of ImmutableCollections.ListN<> is going to be created.
Why java creating ImmutableCollections$ListN but not ImmutableCollections$List3?
The designers have probably decided that 3 and N cases are similar and it's not worth writing a separate class for 3. Apparently, they won't get enough benefits from $List3, $List7, $List10 as they have got from the $List0, $List1, and $List2 versions. They are specifically-optimised.
Currently, 4 classes cover 10 methods. If they decided to add some more methods (e.g. with 22 arguments), there would still be these 4 classes.
Imagine you are writing 22 classes for 22 methods. How much unnecessary code duplication would it involve?
Those are both classes that are being returned. i.e. there is a separate class for ImmutableCollections$List2 and ImmutableCollections$ListN (the $ indicates an inner class)
This is an implementation detail, and (presumably) List2 exists for (possibly) some optimisation reason. I suspect if you look at the source (via your IDE or similar) you'll see two distinct inner classes.
As Jon Skeet rightly mentioned, it is an implementation detail. The specification of List.of says that it returns an immutable List, and that's all that matters.
The developers probably decided that they could provide efficient implementations of one-element (List1) and two-element lists (List2), and that all other sizes could be handled by a single type (ListN). This could change at some point in the future - maybe they will introduce a List3 at some point, maybe not.
As per the rules of polymorphism and encapsulation, none of this matters. As long as the returned object is a List, you should not concern yourself with its actual implementation.

Collectors.toUnmodifiableList in java-10

How do you create an Unmodifiable List/Set/Map with Collectors.toList/toSet/toMap, since toList (and the like) are document as :
There are no guarantees on the type, mutability, serializability, or thread-safety of the List returned
Before java-10 you have to provide a Function with Collectors.collectingAndThen, for example:
List<Integer> result = Arrays.asList(1, 2, 3, 4)
.stream()
.collect(Collectors.collectingAndThen(
Collectors.toList(),
x -> Collections.unmodifiableList(x)));
With Java 10, this is much easier and a lot more readable:
List<Integer> result = Arrays.asList(1, 2, 3, 4)
.stream()
.collect(Collectors.toUnmodifiableList());
Internally, it's the same thing as Collectors.collectingAndThen, but returns an instance of unmodifiable List that was added in Java 9.
Additionally to clear out a documented difference between the two(collectingAndThen vs toUnmodifiableList) implementations :
The Collectors.toUnmodifiableList would return a Collector that
disallows null values and will throw NullPointerException if it is
presented with a null value.
static void additionsToCollector() {
// this works fine unless you try and operate on the null element
var previous = Stream.of(1, 2, 3, 4, null)
.collect(Collectors.collectingAndThen(Collectors.toList(), Collections::unmodifiableList));
// next up ready to face an NPE
var current = Stream.of(1, 2, 3, 4, null).collect(Collectors.toUnmodifiableList());
}
and furthermore, that's owing to the fact that the former constructs an instance of Collections.UnmodifiableRandomAccessList while the latter constructs an instance of ImmutableCollections.ListN which adds to the list of attributes brought to the table with static factory methods.
Stream#toList
Java 16 adds a method on the Stream interface: toList(). To quote the Javadoc:
The returned List is unmodifiable; calls to any mutator method will always cause UnsupportedOperationException to be thrown.
Not just more convenient than Collectors, this method has some goodies like better performance on parallel streams.
In particular with parallel() -- as it avoids result copying.
Benchmark is a few simple ops on 100K elem stream of Long.
For a further reading please go to: http://marxsoftware.blogspot.com/2020/12/jdk16-stream-to-list.html
In resume the link states something like this.
Gotcha: It may be tempting to go into one's code base and use
stream.toList() as a drop-in replacement for
stream.collect(Collectors.toList()), but there may be differences in
behavior if the code has a direct or indirect dependency on the
implementation of stream.collect(Collectors.toList()) returning an
ArrayList. Some of the key differences between the List returned by
stream.collect(Collectors.toList()) and stream.toList() are spelled
out in the remainder of this post.
The Javadoc-based documentation for Collectors.toList() states
(emphasis added), "Returns a Collector that accumulates the input
elements into a new List. There are no guarantees on the type,
mutability, serializability, or thread-safety of the List returned..."
Although there are no guarantees regarding the "type, mutability,
serializability, or thread-safety" on the List provided by
Collectors.toList(), it is expected that some may have realized it's
currently an ArrayList and have used it in ways that depend on the
characteristics of an ArrayList
What i understand is that the Stream.toList() it will result in a inmutable List.
Stream.toList() provides a List implementation that is immutable (type
ImmutableCollections.ListN that cannot be added to or sorted) similar
to that provided by List.of() and in contrast to the mutable (can be
changed and sorted) ArrayList provided by
Stream.collect(Collectors.toList()). Any existing code depending on
the ability to mutate the ArrayList returned by
Stream.collect(Collectors.toList()) will not work with Stream.toList()
and an UnsupportedOperationException will be thrown.
Although the implementation nature of the Lists returned by
Stream.collect(Collectors.toList()) and Stream.toList() are very
different, they still both implement the List interface and so they
are considered equal when compared using List.equals(Object)
And this method will allow nulls so starting from Java 16 we will have a
mutable/null-friendly----->Collectors.toList()
immutable/null-friendly--->Stream.toList()
immutable/null-hostile---->Collectors.toUnmodifiableList() //Naughty
It's great.
List/Set/Map.copyOf
You asked:
How do you create an Unmodifiable List/Set/Map
As of Java 10, simply pass your existing list/set/map to:
List.copyOf
Set.copyOf
Map.copyOf
These static methods return an unmodifiable List, unmodifiable Set, or unmodifiable Map, respectively. Read the details on those linked Javadoc pages.
No nulls allowed.
If the passed collection is already unmodifiable, that passed collection is simply returned, no further work, no new collection.
Note: If using the convenient Stream#toList method in Java 16+ as described in this other Answer, there is no point to this solution here, no need to call List.copyOf. The result of toList is already unmodifiable.

Why does Spliterator<?> defines NONNULL as a characteristic?

The javadoc of Spliterator (which is basically what is really behind a Stream if I understand things correctly) defines many characeristics which make sense such as SIZED, CONCURRENT, IMMUTABLE etc.
But it also defines NONNULL; why?
I'd have though that it would be the user's responsibility to ensure that and that if, for instance, a developer tried to .sort() a non SORTED stream where there are null elements he/she would rightfully be greeted with an NPE...
But then this characteristic exists. Why? The javadoc of Spliterator itself doesn't mention any real usage of it, and neither does the package-info.java of the java.util.stream package...
From the documentation of Spliterator:
A Spliterator also reports a set of characteristics() of its structure, source, and elements from among ORDERED, DISTINCT, SORTED, SIZED, NONNULL, IMMUTABLE, CONCURRENT, and SUBSIZED. These may be employed by Spliterator clients to control, specialize or simplify computation.
Note that it does not mention the prevention of NullPointerExceptions. If you sort a Stream which might contain null values it is your responsibility to provide a Comparator which can handle nulls.
The second sentence also makes it clear that using these flags is only an option, not a requirement for “Spliterator clients”, which is not limited to usage by Streams.
So regardless of whether it is used by the current implementation of the Stream API, are there possibilities to gain advantage of the knowledge about a NONULL characteristic?
I think so. An implementation may branch to a specialized code for a non-null Spliterator to utilize null for representing certain state then, e.g. absent values or the initial value before processing the first element, etc. If fact, the actual implementation code for dealing with Streams which may contain null is complicated. But of course, you always have to weigh up whether the simplification of one case justifies the code duplication.
But sometimes the simplification is as simple as knowing that there are no null values implies that you can use one of the Concurrent… Collections, which don’t allow nulls, internally.
I found the following comments in the code for the enum StreamOpFlag.
// The following Spliterator characteristics are not currently used but a
// gap in the bit set is deliberately retained to enable corresponding
// stream flags if//when required without modification to other flag values.
//
// 4, 0x00000100 NONNULL(4, ...
// 5, 0x00000400 IMMUTABLE(5, ...
// 6, 0x00001000 CONCURRENT(6, ...
// 7, 0x00004000 SUBSIZED(7, ...

Immutable Collections that return a new view when modified

Firstly im not asking a question of a wrapper that makes the wrapped readonly such as Collections.unmodifableXXX. My api would have a different api where all modifier method would return the new collection.
Something like a simple list would no longer have void set methods but would return a new List.
Ideally the package would include the same immutable variations of List, Set, Map and even boring Stack.
UPDATE
// i am omitting generics etc to keep things simple.
XList list = List.fromArray( 1, 2, 3 );
XList list2 = list.add( 4 );
System.out.println( list ); // 1, 2, 3
System.out.println( list2 ); // 1, 2, 3, 4
removes, sets etc all return a different List after updating elements etc.
Actually, I think that the OP is describing a functional collections API. This can be implemented using copy-on-write, but the key difference is in the API design itself.
I couldn't find an alternative Java collections framework that works like this. This is not to say that you couldn't write one ...
(The standard Java copy-on-write collections are mutable, and behave like "ordinary" collections in most respects. The purpose of using the copy-on-write mechanism in these classes is to allow concurrent iteration and modification, and reduces synchronization overheads on shared collections with a lot of thread contention.)
What you are describing is called "copy-on-write", and Java has two implementations of such collections: CopyOnWriteArrayList and CopyOnWriteArraySet
How about using the copy on write collections from java 5?
http://www.javamex.com/tutorials/synchronization_concurrency_8_copy_on_write.shtml
The Clojure programming language (which compiles to Java byte code, runs on the JVM, and is fully interoperable with Java) has collections with exactly the semantics you are looking for.
Here's an overview page describing the general principal behind Clojure's data structures: http://clojure.org/data_structures
Although Clojure has its own Lisp-like syntax, most of its library functions are implemented in Java. For example, Clojure lists are defined by PersistentList.java. You could import and use this class in any Java program. Clojure has similar classes for Sets, Maps, etc...
You can't do it within the vanilla Collections framework, because e.g. Collection.add(Object) is specified to return a boolean, and you can't override it to return a new Collection instead.
You might try the data structures in Functional Java, perhaps.

Java: Does Collections.unmodifiableXYZ(...) in special cases make a collection object a flyweight?

The possible answers are either "never" or "it depends".
Personally, I would say, it depends.
Following usage would make a collection appear (to me) to be a flyweight:
public final static List<Integer> SOME_LIST =
Collections.unmodifiableList(
new LinkedList<Integer>(){ // scope begins
{
add(1);
add(2);
add(3);
}
} // scope ends
);
Right? You can't ever change it, because the only place where the
"original" collection object is known (which could be changed), is the
scope inside unmodifiableList's parameter list, which ends immediately.
Second thing is: when you retrieve an element from the list, it's an
Integer which itself is a flyweight.
Other obvious cases where final static and unmodifiableList are
not used, would not be considered as flyweights.
Did I miss something?
Do I have to consider some internal aspects of LinkedList which could
compromise the flyweight?
i think you are referring to the flyweight pattern. the fundamental idea of this pattern is that you are dealing with complex objects whose instances can be reused, and put out different representations with its methods.
to make such a object work correctly it should be immutable.
immutability is clearly given when creating a List the way you described.
but since there is no external object/parameters on which the SOME_LISt operates on i would not call this an example of a flyweight pattern.
another typical property of the flyweight pattern is the "interning" of such objects. when creating just a single instance of an object this does not make sense.
if you are dealing a lot with lists that are passed around from one object to another and you want to ensure the Immutability, a better option might be to use Google-Collections.
final static ImmutableList<Integer> someList = ImmutableList.of(1, 2, 3);
of course it is also possible to construct more complex Immutable Objects with Builders.
this creates an instance of an immutable list. it will still implement the List interface, but will refuse to execute any add(),addAll() set(), remove() operation.
so you can still pass it to methods when a List interface is required, yet be sure that its content is not altered.
I think your example are for immutable objects, a flyweight is something quite different. Immutable objects are candidates for flyweight, but a flyweight doesn't have to be immutable, it just has to be designed to save memory.
Having the library detect that the mutable List has not otherwise escaped is a bit of an ask, although theoretically possible.
If you serialise the returned object, then trusted code could view the internal object. Although the serialised form of the class are documented, it's not documented that the method uses those classes.
In practical terms, any cache is down to the user of the API.
(Why LinkedList for an immutable list, btw? Other than it changes the unmodifiable implementation.)
Integer is only a flyweight from -128 to 127.
See also http://www.javaworld.com/javaworld/jw-07-2003/jw-0725-designpatterns.html.

Categories

Resources