I came across newArrayListWithCapacity(int) (1) in the codebase and was wondering if it has any advantages.
From what I understand, if we don't know the size, new ArrayList is going to serve the purpose, and if we do now the exact size, we can simply use an array. I'm trying to understand the benefits of using newArrayListWithCapacity(int). And how is it different from newArrayListWithExpectedSize?
If I define the expected size as x and if I end up having y number of entries, does it adversely affect the performance?
1: https://guava.dev/releases/15.0/api/docs/com/google/common/collect/Lists.html#newArrayListWithCapacity(int)
#GwtCompatible(serializable=true) public static ArrayList
newArrayListWithCapacity(int initialArraySize) Creates an ArrayList
instance backed by an array of the exact size specified; equivalent to
ArrayList.ArrayList(int). Note: if you know the exact size your list
will be, consider using a fixed-size list (Arrays.asList(Object[])) or
an ImmutableList instead of a growable ArrayList.
Note: If you have only an estimate of the eventual size of the list,
consider padding this estimate by a suitable amount, or simply use
newArrayListWithExpectedSize(int) instead.
The method and the constructor do the same thing, which is to provide an ArrayList with a specific capacity, meaning that the initial backing array is created with the provided size.
Guava created a factory method wrapping the ArrayList(int) constructor because, while well documented, it might sometimes lead people in confusion. The Guava team did the same with other collection factory methods such as Sets.newHashSetWithCapacity(int) and Sets.newHashSetWithExpectedSize(int).
The pros are the following:
You have an explicit factory method name. This is good because not everyone knows what the constructors mean. So having a factory method that says what it creates is handy for people not entirely familiar with the API, but who are reading the code.
You don't have to handle generics in any way. Those are handled automatically for you through the generic method mechanism. Guava added this method in when Java 6 was released, so you had to manually add <FullTypeName> to each constructor calls (such as List<String> strings = new ArrayList<String>(10)). Nowadays you can simply use <> (such as List<String> strings = new ArrayList<>(10), but at the time the gain was huge!
The cons are:
You lose one method call of performance. But usually, when you're already dealing with Java's standard API, performance is not what you're looking for. Some other libraries provide high performance collections, and neither the Java Collections or Guava's Collections are serious contenders there. Also, as MikeFHay mentions in the comments below, the method call will likely be inlined in modern JVMs.
You depend of Google Guava. This is not really a letdown because Guava is a fantastic library, but if this is the only reason you use Guava, it might be.
The Guava team made that method because the pros clearly outmatch the cons in their view (which i personally share).
I downloaded the source code for Google Guava just out of curiosity to see what is backing the immutable collections.
I was going through the ImmutableList and I noticed that it was still backed by an old-fashioned array, as in Object[]
So I'm just curious, the Object[] array is probably mutable by its nature, and it is not threadsafe. I already knew that ArrayList and CopyOnWriteArrayList were backed by an Object[] array, but they are mutable.
So is the ImmutableList only immutable and threadsafe because its internal properties are well encapsulated and protected? Is it also immutable and threadsafe because the encapsulation ensures nothing will modify it after construction? Will there ever be a day where low-level arrays like this will be switched out for something better and not of a legacy, and be inherently immutable and final rather than immutable by careful encapsulation?
Yes, the ImmutableList is "only" immutable because it does not allow its internal backing array to be modified.
That is exactly the same situation as for a java.lang.String, which also wraps a private char[]. Along the same lines, the very useful concurrency libraries are largely implemented as regular Java classes relying on only very few (and very basic) JVM synchronization primitives.
So that should be good enough. Obviously, people writing these library classes have to be careful and knowledgable, but this requirement does not magically go away by moving the code even more "low-level" into JVM primitives. (Agreed, you can shoot yourself in the foot using dark voodoo like reflection now, but that hardly happens by accident, and in regular usage, a "user-land" implementation works just as well).
Will there ever be a day
That is pure speculation. But there is apparently work on "value types", which is a related topic.
It is obvious that immutability increases the re-usability since it creates new object in each state change.Can somebody tells me a practical scenario where we need a immutable class ?
Consider java.lang.String. If it weren't immutable, every time you ever have a string you want to be confident wouldn't change underneath you, you'd have to create a copy.
Another example is collections: it's nice to be able to accept or return a genuinely immutable collection (e.g. from Guava - not just an immutable view on a mutable collection) and have confidence that it won't be changed.
Whether those count as "needs" or not, I don't know - but I wouldn't want to develop without them.
A good example is related to hashing. A class overrides the equals() and hashCode() methods so that it can be used in data structures like HashSet and (as keys in) HashMap, and the hash code is typically derived by some identifying member attributes. However, if these attributes were to change then so would the object's hash code, so the object is no longer usable in a hashing data structure.
Java provides a nice example: String.
This article has a good color example (since color definitions don't change).
http://www.ibm.com/developerworks/java/library/j-jtp02183/index.html
By persistent collections I mean collections like those in clojure.
For example, I have a list with the elements (a,b,c).
With a normal list, if I add d, my original list will have (a,b,c,d) as its elements.
With a persistent list, when I call list.add(d), I get back a new list, holding (a,b,c,d).
However, the implementation attempts to share elements between the list wherever possible, so it's much more memory efficient than simply returning a copy of the original list.
It also has the advantage of being immutable (if I hold a reference to the original list, then it will always return the original 3 elements).
This is all explained much better elsewhere (e.g. http://en.wikipedia.org/wiki/Persistent_data_structure).
Anyway, my question is... what's the best library for providing this functionality for use in java? Can I use the clojure collections somehow (other that by directly using clojure)?
Just use the ones in Clojure directly. While obviously you might not want to use the language it's self, you can still use the persistent collections directly as they are all just Java classes.
import clojure.lang.PersistentHashMap;
import clojure.lang.IPersistentMap;
IPersistentMap map = PersistentHashMap.create("key1", "value1");
assert map.get("key1").equals("value1");
IPersistentMap map2 = map.assoc("key1", "value1");
assert map2 != map;
assert map2.get("key1").equals("value1");
(disclaimer: I haven't actually compiled that code :)
the down side is that the collections aren't typed, i.e. there are no generics with them.
What about pcollections?
You can also check out Clojure's implementation of persistent collections (PersistentHashMap, for instance).
I was looking for a slim, Java "friendly" persistent collection framework and took TotallyLazy and PCollections mentioned in this thread for a testdrive, because they sounded most promising to me.
Both provide reasonable simple interfaces to manipulate persistent lists:
// TotallyLazy
PersistentList<String> original = PersistentList.constructors.empty(String.class);
PersistentList<String> modified = original.append("Mars").append("Raider").delete("Raider");
// PCollections
PVector<String> original = TreePVector.<String>empty();
PVector<String> modified = original.plus("Mars").plus("Raider").minus("Raider");
Both PersistentList and PVector extend java.util.List, so both libraries should integrate well into an existing environment.
It turns out, however, that TotallyLazy runs into performance problems when dealing with larger lists (as already mentioned in a comment above by #levantpied). On my MacBook Pro (Late 2013) inserting 100.000 elements and returning the immutable list took TotallyLazy ~2000ms, whereas PCollections finished within ~120ms.
My (simple) test cases are available on Bitbucket, if someone wants to take a more thorough look.
[UPDATE]: I recently had a look at Cyclops X, which is a high performing and more complete lib targeted for functional programming. Cyclops also contains a module for persistent collections.
https://github.com/andrewoma/dexx is a port of Scala's persistent collections to Java. It includes:
Set, SortedSet, Map, SortedMap and Vector
Adapters to view the persistent collections as java.util equivalents
Helpers for easy construction
Paguro provides type-safe versions of the actual Clojure collections for use in Java 8+. It includes: List (Vector), HashMap, TreeMap, HashSet, and TreeSet. They behave exactly the way you specify in your question and have been painstakingly fit into the existing java.util collections interfaces for maximum type-safe Java compatibility. They are also a little faster than PCollections.
Coding your example in Paguro looks like this:
// List with the elements (a,b,c)
ImList<T> list = vec(a,b,c);
// With a persistent list, when I call list.add(d),
// I get back a new list, holding (a,b,c,d)
ImList<T> newList = list.append(d);
list.size(); // still returns 3
newList.size(); // returns 4
You said,
The implementation attempts to share elements between the list
wherever possible, so it's much more memory efficient and fast than
simply returning a copy of the original list. It also has the
advantage of being immutable (if I hold a reference to the original
list, then it will always return the original 3 elements).
Yes, that's exactly how it behaves. Daniel Spiewak explains the speed and efficiency of these collections much better than I could.
May want to check out clj-ds. I haven't used it, but it seems promising. Based off of the projects readme it extracted the data structures out of Clojure 1.2.0.
Functional Java implements a persistent List, lazy List, Set, Map, and Tree. There may be others, but I'm just going by the information on the front page of the site.
I am also interested to know what the best persistent data structure library for Java is. My attention was directed to Functional Java because it is mentioned in the book, Functional Programming for Java Developers.
There's pcollections (Persistent Collections) library you can use:
http://code.google.com/p/pcollections/
The top voted answer suggest to directly use the clojure collections which I think is a very good idea. Unfortunately the fact that clojure is a dynamically typed language and Java is not makes the clojure libraries very uncomfortable to use in Java.
Because of this and the lack of light-weight, easy-to-use wrappers for the clojure collections types I have written my own library of Java wrappers using generics for the clojure collection types with a focus on ease of use and clarity when it comes to interfaces.
https://github.com/cornim/ClojureCollections
Maybe this will be of use to somebody.
P.S.: At the moment only PersistentVector, PersistentMap and PersistentList have been implemented.
In the same vein as Cornelius Mund, Pure4J ports the Clojure collections into Java and adds Generics support.
However, Pure4J is aimed at introducing pure programming semantics to the JVM through compile time code checking, so it goes further to introduce immutability constraints to your classes, so that the elements of the collection cannot be mutated while the collection exists.
This may or may not be what you want to achieve: if you are just after using the Clojure collections on the JVM I would go with Cornelius' approach, otherwise, if you are interested in pursuing a pure programming approach within Java then you could give Pure4J a try.
Disclosure: I am the developer of this
totallylazy is a very good FP library which has implementations of:
PersistentList<T>: the concrete implementations are LinkedList<T> and TreeList<T> (for random access)
PersistentMap<K, V>: the concrete implementations are HashTreeMap<K, V> and ListMap<K, V>
PersistentSortedMap<K, V>
PersistentSet<T>: the concrete implementation is TreeSet<T>
Example of usage:
import static com.googlecode.totallylazy.collections.PersistentList.constructors.*;
import com.googlecode.totallylazy.collections.PersistentList;
import com.googlecode.totallylazy.numbers.Numbers;
...
PersistentList<Integer> list = list(1, 2, 3);
// Create a new list with 0 prepended
list = list.cons(0);
// Prints 0::1::2::3
System.out.println(list);
// Do some actions on this list (e.g. remove all even numbers)
list = list.filter(Numbers.odd);
// Prints 1::3
System.out.println(list);
totallylazy is constantly being maintained. The main disadvantage is the total absence of Javadoc.
I'm surprised nobody mentioned vavr. I use it for a long time now.
http://www.vavr.io
Description from their site:
Vavr core is a functional library for Java. It helps to reduce the amount of code and to increase the robustness. A first step towards functional programming is to start thinking in immutable values. Vavr provides immutable collections and the necessary functions and control structures to operate on these values. The results are beautiful and just work.
https://github.com/arnohaase/a-foundation is another port of Scala's libraries.
It is also available from Maven Central: com.ajjpj.a-foundation:a-foundation
JavaDoc of ImmutableSet says:
Unlike Collections.unmodifiableSet, which is a view of a separate collection that can still change, an instance of this class contains its own private data and will never change. This class is convenient for public static final sets ("constant sets") and also lets you easily make a "defensive copy" of a set provided to your class by a caller.
But the ImmutableSet still stores reference of elements, I couldn't figure out the difference to Collections.unmodifiableSet(). Sample:
StringBuffer s=new StringBuffer("a");
ImmutableSet<StringBuffer> set= ImmutableSet.of(s);
s.append("b");//s is "ab", s is still changed here!
Could anyone explain it?
Consider this:
Set<String> x = new HashSet<String>();
x.add("foo");
ImmutableSet<String> guava = ImmutableSet.copyOf(x);
Set<String> builtIn = Collections.unmodifiableSet(x);
x.add("bar");
System.out.println(guava.size()); // Prints 1
System.out.println(builtIn.size()); // Prints 2
In other words, ImmutableSet is immutable despite whatever collection it's built from potentially changing - because it creates a copy. Collections.unmodifiableSet prevents the returned collection from being directly changed, but it's still a view on a potentially-changing backing set.
Note that if you start changing the contents of the objects referred to by any set, all bets are off anyway. Don't do that. Indeed, it's rarely a good idea to create a set using a mutable element type in the first place. (Ditto maps using a mutable key type.)
Besides the behavioral difference that Jon mentions, an important difference between ImmutableSet and the Set created by Collections.unmodifiableSet is that ImmutableSet is a type. You can pass one around and have it remain clear that the set is immutable by using ImmutableSet rather than Set throughout the code. With Collections.unmodifiableSet, the returned type is just Set... so it's only clear that the set is unmodifiable at the point where it is created unless you add Javadoc everywhere you pass that Set saying "this set is unmodifiable".
Kevin Bourrillion (Guava lead developer) compares immutable / unmodifiable collections in this presentation. While the presentation is two years old, and focuses on "Google Collections" (which is now a subpart of Guava), this is a very interesting presentation. The API may have changed here and there (the Google Collections API was in Beta at the time), but the concepts behind Google Collections / Guava are still valid.
You might also be interested in this other SO question ( What is the difference between google's ImmutableList and Collections.unmodifiableList() ).
A difference between the two not stated in other answers is that ImmutableSet does not permit null values, as described in the Javadoc
A high-performance, immutable Set with reliable, user-specified iteration order. Does not permit null elements.
(The same restriction applies to values in all Guava immutable collections.)
For example:
ImmutableSet.of(null);
ImmutableSet.builder().add("Hi").add(null); // Fails in the Builder.
ImmutableSet.copyOf(Arrays.asList("Hi", null));
All of these fail at runtime. In contrast:
Collections.unmodifiableSet(new HashSet<>(Arrays.asList("Hi", null)));
This is fine.