Java List of() static method

Java List of() static method - java

I am executing below code snippet
System.out.println(List.of(1, 2).getClass());
System.out.println(List.of(1, 2, 3).getClass());
output of this code is;
class java.util.ImmutableCollections$List2
class java.util.ImmutableCollections$ListN
I am expecting java.util.ImmutableCollections$List3 as output for the second statement because there is of() method which takes three parameter, Why java creating ImmutableCollections$ListN but not ImmutableCollections$List3?
Edited: It is Java-9 question. There are total 11 overloaded of() methods in List interface each of them takes variable number of parameters from zero to 10 and eleventh one takes varargs to handle N list. So I am expecting List0 to List10 implementation for first 10 overloaded methods, but it is returning ListN with three parameters. Yes, it is implementation detail but just curious to know more information of this.

The main reason to have several different private implementations of List is to save space.
Consider an implementation that stores its elements in an array. (This is essentially what ListN does.) In Hotspot (64-bit with compressed object pointers, each 4 bytes) each object requires a 12-byte header. The ListN object has a single field containing the array, for a total of 16 bytes. An array is a separate object, so that has another 12-byte header plus a 4-byte length. That's another 16 bytes, not counting any actual elements stored. If we're storing two elements, they take 8 bytes. That brings the total to 40 bytes for storing a two-element list. That's quite a bit of overhead!
If we were to store the elements of a small list in fields instead of an array, that object would have a header (12 bytes) plus two fields (8 bytes) for a total of 20 bytes -- half the size. For small lists, there's a considerable savings with storing elements in fields of the List object itself instead of in an array, which is a separate object. This is what the old List2 implementation did. It's recently been superseded by the List12 implementation, which can store lists of one or two elements in fields.
Now, in the API there are 12 overloaded List.of() methods: zero to ten fixed args plus varargs. Shouldn't there be corresponding List0 through List10 and ListN implementations?
There could be, but there doesn't necessarily have to be. An early prototype of these implementations had the optimized small list implementations tied to the APIs. So the zero, one, and two fixed arg of() methods created instances of List0, List1, and List2, and the varargs List.of() method created an instance of ListN. This was fairly straightforward, but it was quite restrictive. We wanted to be able to add, remove, or rearrange implementations at will. It's considerably more difficult to change APIs, since we have to remain compatible. Thus, we decided to decouple things so that the number of arguments in the APIs was largely independent of the implementation instantiated underneath.
In JDK 9 we ended up with the 12 overloads in the API, but only four implementations: field-based implementations holding 0, 1, and 2 elements, and an array-based implementation holding an arbitrary number. Why not add more field-based implementations? Diminishing returns and code bloat. Most lists have few elements, and there's an exponential dropoff in the occurrences of lists as the number of elements gets larger. The space savings get relatively smaller compared to an array-based implementation. Then there's the matter of maintaining all those extra implementations. Either they'd have to be entered directly in the source code (bulky) or we'd switch over to a code generation scheme (complex). Neither seemed justified.
Our startup performance guru Claes Redestad did some measurements and found that there was a speedup in having fewer list implementations. The reason is megamorphic dispatch. Briefly, if the JVM is compiling the code for a virtual call site and it can determine that only one or two different implementations are called, it can optimize this well. But if there are many different implementations that can be called, it has to go through a slower path. (See this article for Black Magic details.)
For the list implementations, it turns out that we can get by with fewer implementations without losing much space. The List1 and List2 implementations can be combined into a two-field List12 implementation, with the second field being null if there's only one element. We only need one zero-length list, since it's immutable! For a zero-length list, we can get rid of List0 just use a ListN with a zero-length array. It's bigger than an old List0 instance, but we don't care, since there's only one of them.
These changes just went into the JDK 11 mainline. Since the API is completely decoupled from the implementations, there is no compatibility issue.
There are additional possibilities for future enhancements. One potential optimization is to fuse an array onto the end of an object, so the object has a fixed part and a variable-length part. This will avoid the need for an array object's header, and it will probably improve locality of reference. Another potential optimization is with value types. With value types, it might be possible to avoid heap allocation entirely, at least for small lists. Of course, this is all highly speculative. But if new features come along in the JVM, we can take advantage of them in the implementations, since they're are entirely hidden behind the API.

ListN is the all-purpose version. List2 is an optimised implementation. There is no such optimised implementation for a list with three elements.
There currently exist* optimised versions for lists and sets with zero, one and two elements. List0, List1, List2, Set0 etc...
There's also an optimised implementation for an empty map, Map0, and for a map containing a single key-value pair, Map1.
Discussion relating to how these implementations are able to provide performance improvements can been seen in JDK-8166365.
*bear in mind this is an implementation detail which may be subject to change, and actually is due to change fairly soon

Neither ImmutableCollections$List2 nor ImmutableCollections$ListN is generated at runtime. There are four classes already written:
static final class List0<E> extends AbstractImmutableList<E> { ... }
static final class List1<E> extends AbstractImmutableList<E> { ... }
static final class List2<E> extends AbstractImmutableList<E> { ... }
static final class ListN<E> extends AbstractImmutableList<E> { ... }
Starting with of(E e1, E e2, E e3) and up to of(E e1, ..., E e10) an instance of ImmutableCollections.ListN<> is going to be created.
Why java creating ImmutableCollections$ListN but not ImmutableCollections$List3?
The designers have probably decided that 3 and N cases are similar and it's not worth writing a separate class for 3. Apparently, they won't get enough benefits from $List3, $List7, $List10 as they have got from the $List0, $List1, and $List2 versions. They are specifically-optimised.
Currently, 4 classes cover 10 methods. If they decided to add some more methods (e.g. with 22 arguments), there would still be these 4 classes.
Imagine you are writing 22 classes for 22 methods. How much unnecessary code duplication would it involve?

Those are both classes that are being returned. i.e. there is a separate class for ImmutableCollections$List2 and ImmutableCollections$ListN (the $ indicates an inner class)
This is an implementation detail, and (presumably) List2 exists for (possibly) some optimisation reason. I suspect if you look at the source (via your IDE or similar) you'll see two distinct inner classes.

As Jon Skeet rightly mentioned, it is an implementation detail. The specification of List.of says that it returns an immutable List, and that's all that matters.
The developers probably decided that they could provide efficient implementations of one-element (List1) and two-element lists (List2), and that all other sizes could be handled by a single type (ListN). This could change at some point in the future - maybe they will introduce a List3 at some point, maybe not.
As per the rules of polymorphism and encapsulation, none of this matters. As long as the returned object is a List, you should not concern yourself with its actual implementation.

Related

Best way to convert a PriorityQueue to a sorted array

I am concerned with different styles of creating a sorted array from a Java PriorityQueue containing a few thousand elements. The Java 8 docs say
If you need ordered traversal, consider using Arrays.sort(pq.toArray()).
However, I do like the streaming API, so what I had at first was
Something[] elems = theHeap.stream().sorted(BY_CRITERION.reversed())
.toArray(Something[]::new);
(Where BY_CRITERION is the custom comparator of the PriorityQueue, and I do want the reverse order of that.) Is there any disadvantage to using this idiom, as compared to the following:
Something[] elems = theHeap.toArray(new Something[0]);
Arrays.sort(elems, BY_CRITERION.reversed());
This latter code certainly appears to be following the API doc recommendation more directly, but apart from that, is it really more efficient in terms of memory, such as fewer temporary structures being allocated etc.?
I would think that the streaming solution would have to buffer the stream elements in a temporary structure (an array?) and then sort them, and finally copy the sorted elements to the array allocated in toArray().
While the imperative solution would buffer the heap elements in a newly allocated array and then sort them. So this might be one copy operation fewer. (And one array allocation. The discussion of Collection.toArray(new T[size]) vs. Collection.toArray(new T[0]) is tangentially related here. For example, see here for why on OpenJDK the latter is faster.)
And what about sorting efficiency? The docs of Arrays.sort() say
Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays
while the documentation of Stream.sorted() is silent on this point. So at least in terms of being reliably documented, the imperative solution would seem to have an advantage.
But is there anything more to know?

Fundamentally, both variants do the same and since both are valid solutions within the library’s intended use cases, there is no reason why the implementations should prefer one over the other regarding choosing algorithms or adding optimizations.
In practice, this means that the most expensive operation, the sorting, ends up at the same implementation methods internally. The sorted(…) operation of the Stream implementation buffers all elements into an intermediate array, followed by calling Arrays.sort(T[], int, int, Comparator<? super T>), which will delegate to the same method as the method Arrays.sort(T[], Comparator<? super T>) you use in your first variant, the target being a sort method within the internal TimSort class.
So everything that has been said about the time and space complexity for Arrays.sort applies to the Stream.sort as well. But there is a performance difference though. For the OpenJDK implementations up to Java 10, the Stream is not capable of fusing the sorted with the subsequent toArray step, to directly use the result array for the sorting step. So currently, the Stream variant bears a final copying step from the intermediate array used for sorting to the final array created by the function passed to toArray. But future implementations might learn this trick, then, there will be no relevant performance different at all between the two solutions.

Improve memory usage: IntegerHashMap

We use a HashMap<Integer, SomeType>() with more than a million entries. I consider that large.
But integers are their own hash code. Couldn't we save memory with a, say, IntegerHashMap<Integer, SomeType>() that uses a special Map.Entry, using int directly instead of a pointer to an Integer object? In our case, that would save 1000000x the memory required for an Integer object.
Any faults in my line of thought? Too special to be of general interest? (at least, there is an EnumHashMap)
add1. The first generic parameter of IntegerHashMap is used to make it closely similar to the other Map implementations. It could be dropped, of course.
add2. The same should be possible with other maps and collections. For example ToIntegerHashMap<KeyType, Integer>, IntegerHashSet<Integer>, etc.

What you're looking for is a "Primitive collections" library. They are usually much better with memory usage and performance. One of the oldest/popular libraries was called "Trove". However, it is a bit outdated now. The main active libraries in use now are:
Goldman Sach Collections
Fast Util
Koloboke
See Benchmarks Here

Some words of caution:
"integers are their own hash code" I'd be very careful with this statement. Depending on the integers you have, the distribution of keys may be anything from optimal to terrible. Ideally, I'd design the map so that you can pass in a custom IntFunction as hashing strategy. You can still default this to (i) -> i if you want, but you probably want to introduce a modulo factor, or your internal array will be enormous. You may even want to use an IntBinaryOperator, where one param is the int and the other is the number of buckets.
I would drop the first generic param. You probably don't want to implement Map<Integer, SomeType>, because then you will have to box / unbox in all your methods, and you will lose all your optimizations (except space). Trying to make a primitive collection compatible with an object collection will make the whole exercise pointless.

Sorting a List in parallel without creating a temporary array in Java 8

Java 8 provides java.util.Arrays.parallelSort, which sorts arrays in parallel using the fork-join framework. But there's no corresponding Collections.parallelSort for sorting lists.
I can use toArray, sort that array, and store the result back in my list, but that will temporarily increase memory usage, which if I'm using parallel sorting is already high because parallel sorting only pays off for huge lists. Instead of twice the memory (the list plus parallelSort's working memory), I'm using thrice (the list, the temporary array and parallelSort's working memory). (Arrays.parallelSort documentation says "The algorithm requires a working space no greater than the size of the original array".)
Memory usage aside, Collections.parallelSort would also be more convenient for what seems like a reasonably common operation. (I tend not to use arrays directly, so I'd certainly use it more often than Arrays.parallelSort.)
The library can test for RandomAccess to avoid trying to e.g. quicksort a linked list, so that can't a reason for a deliberate omission.
How can I sort a List in parallel without creating a temporary array?

There doesn't appear to be any straightforward way to sort a List in parallel in Java 8. I don't think this is fundamentally difficult; it looks more like an oversight to me.
The difficulty with a hypothetical Collections.parallelSort(list, cmp) is that the Collections implementation knows nothing about the list's implementation or its internal organization. This can be seen by examining the Java 7 implementation of Collections.sort(list, cmp). As you observed, it has to copy the list elements out to an array, sort them, and then copy them back into the list.
This is the big advantage of the List.sort(cmp) extension method over Collections.sort(list, cmp). It might seem that this is merely a small syntactic advantage being able to write myList.sort(cmp) instead of Collections.sort(myList, cmp). The difference is that myList.sort(cmp), being an interface extension method, can be overridden by the specific List implementation. For example, ArrayList.sort(cmp) sorts the list in-place using Arrays.sort() whereas the default implementation implements the old copyout-sort-copyback technique.
It should be possible to add a parallelSort extension method to the List interface that has similar semantics to List.sort but does the sorting in parallel. This would allow ArrayList to do a straightforward in-place sort using Arrays.parallelSort. (It's not entirely clear to me what the default implementation should do. It might still be worth it to do copyout-parallelSort-copyback.) Since this would be an API change, it can't happen until the next major release of Java SE.
As for a Java 8 solution, there are a couple workarounds, none very pretty (as is typical of workarounds). You could create your own array-based List implementation and override sort() to sort in parallel. Or you could subclass ArrayList, override sort(), grab the elementData array via reflection and call parallelSort() on it. Of course you could just write your own List implementation and provide a parallelSort() method, but the advantage of overriding List.sort() is that this works on the plain List interface and you don't have to modify all the code in your code base to use a different List subclass.

I think you are doomed to use a custom List implementation augmented with your own parallelSort or else change all your other code to store the big data in Array types.
This is the inherent problem with layers of abstract data types. They're meant to isolate the programmer from details of implementation. But when the details of implementation matter - as in the case of underlying storage model for sort - the otherwise splendid isolation leaves the programmer helpless.
The standard List sort documents provide an example. After the explanation that mergesort is used, they say
The default implementation obtains an array containing all elements in this list, sorts the array, and iterates over this list resetting each element from the corresponding position in the array. (This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.)
In other words, "since we don't know the underlying storage model for a List and couldn't touch it if we did, we make a copy organized in a known way." The parenthesized expression is based on the fact that the List "i'th element accessor" on a linked list is Omega(n), so the normal array mergesort implemented with it would be a disaster. In fact it's easy to implement mergesort efficiently on linked lists. The List implementer is just prevented from doing it.
A parallel sort on List has the same problem. The standard sequential sort fixes it with custom sorts in the concrete List implementations. The Java folks just haven't chosen to go there yet. Maybe in Java 9.

Use the following:
yourCollection.parallelStream().sorted().collect(Collectors.toList());
This will be parallel when sorting, because of parallelStream(). I believe this is what you mean by parallel sort?

Just speculating here, but I see several good reasons for generic sort algorithms preferring to work on arrays instead of List instances:
Element access is performed via method calls. Despite all the optimizations JIT can apply, even for a list that implements RandomAccess, this probably means a lot of overhead compared to plain array accesses which can be optimized very well.
Many algorithms require copying some fragments of the array to temporary structures. There are efficient methods for copying arrays or their fragments. An arbitrary List instance on the other hand, can't be easily copied. New lists would have to be allocated which poses two problems. First, this means allocating some new objects which is likely more costly than allocating arrays. Second, the algorithm would have to choose what implementation of List should be allocated for this temporary structure. There are two obvious solutions, both bad: either just choose some hard-coded implementation, e.g. ArrayList, but then it could just allocate simple arrays as well (and if we're generating arrays then it's much easier if the soiurce is also an array). Or, let the user provide some list factory object, which makes the code much more complicated.
Related to the previous issue: there is no obvious way of copying a list into another due to how the API is designed. The best the List interface offers is addAll() method, but this is probably not efficient for most cases (think of pre-allocating the new list to its target size vs adding elements one by one which many implementations do).
Most lists that need to be sorted will be small enough for another copy to not be an issue.
So probably the designers thought of CPU efficiency and code simplicity most of all, and this is easily achieved when the API accepts arrays. Some languages, e.g. Scala, have sort methods that work directly on lists, but this comes at a cost and probably is less efficient than sorting arrays in many cases (or sometimes there will probably just be a conversion to and from array performed behind the scenes).

By combining the existing answers I came up with this code.
This works if you are not interested in creating a custom List class and if you don't bother to create a temporary array (Collections.sort is doing it anyway).
This uses the initial list and does not create a new one as in the parallelStream solution.
// Convert List to Array so we can use Arrays.parallelSort rather than Collections.sort.
// Note that Collections.sort begins with this very same conversion, so we're not adding overhead
// in comparaison with Collections.sort.
Foo[] fooArr = fooLst.toArray(new Foo[0]);
// Multithread the TimSort. Automatically fallback to mono-thread when size is less than 8192.
Arrays.parallelSort(fooArr, Comparator.comparingStuff(Foo::yourmethod));
// Refill the List using the sorted Array, the same way Collections.sort does it.
ListIterator<Foo> i = fooLst.listIterator();
for (Foo e : fooArr) {
i.next();
i.set(e);
}

When is it better to use a vector than an array and vice versa in java?

When is it better to use a vector than an array and vice versa in java? and why?

Vector: never, unless an API requires it, because it's a class, not an interface.
List: this should be your default array-like collection. It's an interface so anything can be a List if it needs to. (and there are lots of List implementations out there e.g. ArrayList, LinkedList, CopyOnWriteArrayList, ImmutableList, for various feature sets)
Vector is threadsafe, but so is the Collections.synchronizedList() wrapper.
array: rarely, if required by an API. The one other major advantage of arrays is when you need a fixed-length array of primitives, the memory space required is fairly compact, as compared to a List<Integer> where the integers need to be boxed into Integer objects.

A Vector (or List) when you don't know before hand how many elements are going to be inserted.
An array when you absolutely know what's the maximum number of elements on that vector's whole life.
Since there are no high performance penalties when using List or Vector (any collection for that matter), I would always chose to use them. Their flexibility is too important to not be considered.
Nowadays I only use arrays when I absolutely need to. Example: when using an API that requires them.

First off ArrayList is a faster implementation than Vector (but not thread safe though).
Arrays are handy when you know the length beforehand and will not change much (or at all).

When declaring a method, use List.
Do not use Vector, it's an early part of the JDK and was retrofitted to work with Collections.
If there's a very performance-sensitive algorithm, a private array member can be helpful. If there's a need to return its contents or pass them to a method, it's generally best to construct an object around it, perhaps as simple as Arrays.asList(thePrivateArray). For a thread-safe list: Collections.synchronizedList(Arrays.asList(thePrivateArray)). In order to prevent modification of the array contents, I typically use Collections.unmodifiableList(Arrays.asList(thePrivateArray)).

What is the need of collection framework in java?

What is the need of Collection framework in Java since all the data operations(sorting/adding/deleting) are possible with Arrays and moreover array is suitable for memory consumption and performance is also better compared with Collections.
Can anyone point me a real time data oriented example which shows the difference in both(array/Collections) of these implementations.

Arrays are not resizable.
Java Collections Framework provides lots of different useful data types, such as linked lists (allows insertion anywhere in constant time), resizeable array lists (like Vector but cooler), red-black trees, hash-based maps (like Hashtable but cooler).
Java Collections Framework provides abstractions, so you can refer to a list as a List, whether backed by an array list or a linked list; and you can refer to a map/dictionary as a Map, whether backed by a red-black tree or a hashtable.
In other words, Java Collections Framework allows you to use the right data structure, because one size does not fit all.

Several reasons:
Java's collection classes provides a higher level interface than arrays.
Arrays have a fixed size. Collections (see ArrayList) have a flexible size.
Efficiently implementing a complicated data structures (e.g., hash tables) on top of raw arrays is a demanding task. The standard HashMap gives you that for free.
There are different implementation you can choose from for the same set of services: ArrayList vs. LinkedList, HashMap vs. TreeMap, synchronized, etc.
Finally, arrays allow covariance: setting an element of an array is not guaranteed to succeed due to typing errors that are detectable only at run time. Generics prevent this problem in arrays.
Take a look at this fragment that illustrates the covariance problem:
String[] strings = new String[10];
Object[] objects = strings;
objects[0] = new Date(); // <- ArrayStoreException: java.util.Date

Collection classes like Set, List, and Map implementations are closer to the "problem space." They allow developers to complete work more quickly and turn in more readable/maintainable code.

For each class in the Collections API there's a different answer to your question. Here are a few examples.
LinkedList: If you remove an element from the middle of an array, you pay the cost of moving all of the elements to the right of the removed element. Not so with a linked list.
Set: If you try to implement a set with an array, adding an element or testing for an element's presence is O(N). With a HashSet, it's O(1).
Map: To implement a map using an array would give the same performance characteristics as your putative array implementation of a set.

It depends upon your application's needs. There are so many types of collections, including:
HashSet
ArrayList
HashMap
TreeSet
TreeMap
LinkedList
So for example, if you need to store key/value pairs, you will have to write a lot of custom code if it will be based off an array - whereas the Hash* collections should just work out of the box. As always, pick the right tool for the job.

Well the basic premise is "wrong" since Java included the Dictionary class since before interfaces existed in the language...
collections offer Lists which are somewhat similar to arrays, but they offer many more things that are not. I'll assume you were just talking about List (and even Set) and leave Map out of it.
Yes, it is possible to get the same functionality as List and Set with an array, however there is a lot of work involved. The whole point of a library is that users do not have to "roll their own" implementations of common things.
Once you have a single implementation that everyone uses it is easier to justify spending resources optimizing it as well. That means when the standard collections are sped up or have their memory footprint reduced that all applications using them get the improvements for free.
A single interface for each thing also simplifies every developers learning curve - there are not umpteen different ways of doing the same thing.
If you wanted to have an array that grows over time you would probably not put the growth code all over your classes, but would instead write a single utility method to do that. Same for deletion and insertion etc...
Also, arrays are not well suited to insertion/deletion, especially when you expect that the .length member is supposed to reflect the actual number of contents, so you would spend a huge amount of time growing and shrinking the array. Arrays are also not well suited for Sets as you would have to iterate over the entire array each time you wanted to do an insertion to check for duplicates. That would kill any perceived efficiency.

Arrays are not efficient always. What if you need something like LinkedList? Looks like you need to learn some data structure : http://en.wikipedia.org/wiki/List_of_data_structures

Java Collections came up with different functionality,usability and convenience.
When in an application we want to work on group of Objects, Only ARRAY can not help us,Or rather they might leads to do things with some cumbersome operations.
One important difference, is one of usability and convenience, especially given that Collections automatically expand in size when needed:
Collections came up with methods to simplify our work.
Each one has a unique feature:
List- Essentially a variable-size array;
You can usually add/remove items at any arbitrary position;
The order of the items is well defined (i.e. you can say what position a given item goes in in the list).
Used- Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them.
Set- Things can be "there or not"— when you add items to a set, there's no notion of how many times the item was added, and usually no notion of ordering.
Used- Remembering "which items you've already processed", e.g. when doing a web crawl;
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Here you find use of each collection as per scenario:

Collection is the framework in Java and you know that framework is very easy to use rather than implementing and then use it and your concern is that why we don't use the array there are drawbacks of array like it is static you have to define the size of row at least in beginning, so if your array is large then it would result primarily in wastage of large memory.
So you can prefer ArrayList over it which is inside the collection hierarchy.
Complexity is other issue like you want to insert in array then you have to trace it upto define index so over it you can use LinkedList all functions are implemented only you need to use and became your code less complex and you can read there are various advantages of collection hierarchy.

Collection framework are much higher level compared to Arrays and provides important interfaces and classes that by using them we can manage groups of objects with a much sophisticated way with many methods already given by the specific collection.
For example:
ArrayList - It's like a dynamic array i.e. we don't need to declare its size, it grows as we add elements to it and it shrinks as we remove elements from it, during the runtime of the program.
LinkedList - It can be used to depict a Queue(FIFO) or even a Stack(LIFO).
HashSet - It stores its element by a process called hashing. The order of elements in HashSet is not guaranteed.
TreeSet - TreeSet is the best candidate when one needs to store a large number of sorted elements and their fast access.
ArrayDeque - It can also be used to implement a first-in, first-out(FIFO) queue or a last-in, first-out(LIFO) queue.
HashMap - HashMap stores the data in the form of key-value pairs, where key and value are objects.
Treemap - TreeMap stores key-value pairs in a sorted ascending order and retrieval speed of an element out of a TreeMap is quite fast.
To learn more about Java collections, check out this article.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.