From time to time during code reviews I see constructors like that one:
Foo(Collection<String> words) {
this.words = Collections.unmodifiableCollection(words);
}
Is this proper way of protecting internal state of the class? If not, what's the idiomatic approach to create proper defensive copy in constructors?
It should be, but it isn't correct because the caller can still modify the underlying list.
Instead of wrapping the list, you should make a defensive copy, for example by using Guava's ImmutableList instead.
Foo(Collection<String> words) {
if (words == null) {
throw new NullPointerException( "words cannot be null" );
}
this.words = ImmutableList.copyOf(words);
if (this.words.isEmpty()) { //example extra precondition
throw new IllegalArgumentException( "words can't be empty" );
}
}
So the correct way of establishing the initial state for the class is:
Check the input parameter for null.
If the input type isn't guaranteed to be immutable (as is the case with Collection), make a defensive copy. In this case, because the element type is immutable (String), a shallow copy will do, but if it wasn't, you'd have to make a deeper copy.
Perform any further precondition checking on the copy. (Performing it on the original could leave you open to TOCTTOU attacks.)
Collections.unmodifiableCollection(words);
only creates wrapper via which you can't modify words, but it doesn't mean that words can't be modified elsewhere. For example:
List<String> words = new ArrayList<>();
words.add("foo");
Collection<String> fixed = Collections.unmodifiableCollection(words);
System.out.println(fixed);
words.add("bar");
System.out.println(fixed);
result:
[foo]
[foo, bar]
If you want to preserve current state of words in unmodifiable collection you will need to create your own copy of elements from passed collection and then wrap it with Collections.unmodifiableCollection(wordsCopy);
like if you only want to preserve order of words:
this.words = Collections.unmodifiableCollection(new ArrayList<>(words));
// separate list holding current words ---------^^^^^^^^^^^^^^^^^^^^^^
No, that doesn't protect it fully.
The idiom I like to use to make sure that the contents is immutable:
public Breaker(Collection<String> words) {
this.words = Collections.unmodifiableCollection(
Arrays.asList(
words.toArray(
new String[words.size()]
)
)
);
}
The downside here though, is that if you pass in a HashSet or TreeSet, it'll lose the speed lookup. You could do something other than converting it to a fixed size List if you cared about Hash or Tree characteristics.
Related
I want to collect items based on a filter. But the resulting list should not be initialized if no match was found. I'd prefer null instead of empty list.
List<String> match = list
.stream()
.filter(item -> item.getProperty == "match")
.collect(Collectors.toList());
if (match != null && !match.isEmpty()) {
//handle seldom match
}
Problem: most of the time I will not have a match, resulting in an empty collection. Which means most of the time the list is instanciated even though I don't need it.
Collecto.toList() allocates a List using ArrayList::new which is a very cheap operation since ArrayList doesn't actually allocate the backing array until elements are inserted. All the constructor does is initialize an internal Object[] field to the value of a statically created empty array. The actual backing array is initialized to its "initial size" only when the first element is inserted.
So why go through the pain of avoiding this construction? It sounds like a premature optimization.
If you're so worried about GC pressure, just don't use Streams. The stream and the Collector itself are probably quite a lot more "expensive" to create than the list.
I am only thinking of a case when something other than Collectors.toList() would be expensive to compute, otherwise use:
... collect(Collectors.collectingAndThen(list -> {
list.isEmpty() ? null: list;
}))
But just keep in mind that someone using that List would most probably expect an empty one in case of missing elements, instead of a null.
Creating an empty ArrayList is quite cheap and laziness here would only make things worse.
Otherwise, here is a variant that could defer to null if you really really wanted to:
private static <T> List<T> list(Stream<T> stream) {
Spliterator<T> sp = stream.spliterator();
if (sp.getExactSizeIfKnown() == 0) {
System.out.println("Exact zero known");
return null;
}
T[] first = (T[]) new Object[1];
boolean b = sp.tryAdvance(x -> first[0] = x);
if (b) {
List<T> list = new ArrayList<>();
list.add(first[0]);
sp.forEachRemaining(list::add);
return list;
}
return null;
}
List<String> doSomething(String input){
if(input == null){
return Collections.emptyList();
}
List<String> lst = getListfromSomewhereElse(input)
if(lst.isEmpty(){
return Collections.emptyList(); //Approach 1
// return lst; //Return same empty list
}
// do some more processing on lst
return lst;
}
I prefer approach 1, coz its more readable and explicit. What is better approch 1 or 2?
Question is if the list is empty should i return same list or explicitly create new empty list and return
Collections.emptyList() return one constant member of Collections, so it takes no excessive time (can be optimized by JIT) and memory.
On the other side return of getListfromSomewhereElse possibly locks empty list returned from other code. Potentially you can get any list class and potentially it can take a bit of memory. Generally it's not a problem, as this method is also derived, reviewed and tested by your own team, but who knows what happens in outer libraries?
For example, getListfromSomewhereElse can read really large file into memory and then remove all elements from it. So, empty list will hold thousands elements capacity unless you/them know its structure and get rid of excessive capacity. Approach 1 will simply overcome this by usage of already existing constant list.
As a side note, if you process list elements in java 8 stream style, you naturally get new list with .collect(Collectors.toList()) step. But JDK developers don't force emptyList in this case.
So, unless you are sure in getListfromSomewhereElse, you better return Collections.emptyList() (or new ArrayList() or whatever list type you return by method contract).
I would prefer
List<String> doSomething(String input) {
List<String> list = new ArrayList<String>();
if (input != null) {
List<String> listFromSomewhereElse = getListfromSomewhereElse(input);
list.addAll(listFromSomewhereElse);
}
return list;
}
Keep in mind that Collections.emptyList() is unmodifiable. Depending on the result of getListFromSomewhereElse a client of doSomething might be confused that it can sometimes modify the list it gets and under some other situation it throws an UnsupportedOperationException. E.g.
List<String> list = someClass.doSomething(null);
list.add("A");
will throw UnsupportedOperationException
while
List<String> list = someClass.doSomething("B");
list.add("A");
might work depending on the result of getListFromSomewhereElse
It's very seldom necessary to do (pseudocode):
if(input list is empty) {
return an empty list
} else {
map each entry in input list to output list
}
... because every mainstream way of mapping an input list to an output list produces an empty list "automatically" for an empty input. For example:
List<String> input = Collections.emptyList();
List<String> output = new ArrayList<>();
for(String entry : input) {
output.add(entry.toLowerCase());
}
return output;
... will return an empty list. To treat an empty list as a special case makes for wasted code, and less expressive code.
Likewise, the modern Java approach of using Streams does the same:
List<String> output = input.stream()
.map( s -> s.toLowerCase())
.collect(Collectors.toList());
... will create an empty List in output, with no "special" handling for an empty input.
Collections.emptyList() returns a class that specifically implements an immutable, empty list. It has a very simple implementation, for example its size() is just return 0;.
But this means your caller won't be able to modify the returned list -- only if it's empty. Although immutability is a good thing, it's inconsistent to sometimes return a immutable list and other times not, and this could result in errors you detect late. If you want to enforce immutability, to it always by wrapping the response in Collections.unmodifiableList(), or using an immutable list from a library like Guava.
You also test whether the input is null. Consider whether this is necessary. Who's going to be calling the method? If it's just you, then don't do that! If you know you're not going to do it, your code needn't check for it.
If it's a public API for other programmers, you might need to handle nulls gratefully, but in many cases it's entirely appropriate to document that the input mustn't be null, and just let it throw a NullPointerException if it happens (or you can force one early by starting your method with Objects.requireNonNull(input).
Conclusion: my recommendation is:
List<String> doSomething(String input){
Objects.requireNonNull(input); // or omit this and let an
// exception happen further down
return doMoreProcessingOn(getListfromSomewhereElse(input));
}
It's best if doMoreProcessingOn() produces a new List, rather than modifying input.
I have a Collection of elements and a single element. I want my getter function looks a little like this
public List<Element> getElements(boolean includeOtherElement){
if (includeOtherElement){
return elements + otherElement; //Obviously this doesn't work, but I'm looking for something that would work like this
}
return elements;
}
Is there a way I could acheive this behavior in a single line, such as with the Stream API or something similar?
EDIT:
I have since realized that I should not be modifying state in a getter method.
If you don't want to add otherElement to the elements list but still want to return a list, you will have to provide a new list.
Simple way:
public List<Element> getElements(boolean includeOtherElement){
if (includeOtherElement){
List<Element> extendedList = new ArrayList<>(elements);
extendedList.add(otherElement);
return extendedList;
}
return elements;
}
This would create a shallow copy of your list.
If you really want to, you could provide your own list implementation, which delegates the first indices to the elements list and the last index to the otherElement;
Usually you do not want your getter function to be changing variables. I would recommend doing something like temp = elements to return a temp variable, with the extra element added via temp.add(otherElement). This way you will not change your current variable while still eliciting the same behavior.
I'm not sure if this answer will work exactly for you but I think it is a good example of something that is possible with the API. The reason that it may or may not be usable in a particular case is that it depends on the location of the extra element(s) being constant in the list.
Suppose it was the case that you didn't want to create a copy of the list, because you actually wanted whoever calls the getter to be able to modify the list. Or, suppose that you just didn't want to make a defensive copy of the list, perhaps because the list is very, very large. In these cases you don't want to add or remove the extra element from the list, you want to return a view of the list which either includes or excludes the extra element.
This is actually possible with existing API, using the List.subList(int, int) method. subList doesn't create a copy, like e.g. String.substring(...) does. Rather, the returned sublist is mutable (if the original list was), and changes to the sublist show through to the original list.
(Though I should note that the semantics of the sublist become undefined if the backing list is structurally modified, i.e. you shouldn't add/remove to and from the original list while keeping a reference to the sublist. If you change the original list through a reference other than the sublist, you have to create a new sublist.)
So what you could do is keep a complete list with all of the elements, then return a sublist of the range which excludes the element(s) you don't want to return.
This could also be used in conjunction with Collections.unmodifiableList(List) if the code which calls the getter is not supposed to be able to mutate the list.
Here's an example which shows this in action:
import java.util.*;
class Example {
public static void main (String[] args) {
Example e = new Example();
List<Object> sub = e.getList(false);
// sublist is [1, 2, 3]
System.out.printf("sublist is %s%n", sub);
sub.add(0, -1);
sub.add(1, 0);
sub.add(5);
// sublist is [-1, 0, 1, 2, 3, 5]
System.out.printf("sublist is %s%n", sub);
List<Object> all = e.getList(true);
// full list is [element0, -1, 0, 1, 2, 3, 5]
System.out.printf("full list is %s%n", all);
// As a side-note, this sequence of calls is
// e.g. what you aren't supposed to do:
// all.add(something); // <- structural modification
// sub.add(another); // <- this is undefined
// You have to make a new sublist first:
// sub = e.getList(false);
// sub.add(another); // <- this is fine now
}
List<Object> objects = new ArrayList<Object>();
Example() {
objects.add("element0");
objects.add(1);
objects.add(2);
objects.add(3);
}
List<Object> getList(boolean includeElement0) {
if (includeElement0) {
return objects;
} else {
return objects.subList(1, objects.size());
}
}
}
Here is the example on Ideone: http://ideone.com/PloZCh.
Also, while not returning a List, this can be done by returning a Stream:
Stream<E> stream = list.stream();
if (includeOtherElement) {
return Stream.concat(stream, Stream.of(otherElement));
} else {
return stream;
}
I suppose this may be the canonical way to do such a thing in Java 8.
I have a concurrent List used in multi-threaded environment. Once the List is built, mostly operation is traversing it. I am wondering which of the following 2 methods are more efficient, or what's cost of creating a new List vs using synchronized? Or maybe there are other better ways?
List<Object> list = new CopyOnWriteArrayList<Object>();
public int[] getAllValue1() {
List<Object> list2 = new ArrayList<Object>(list);
int[] data = new int[list2.size()];
int i = 0;
for (Object obj : list2) {
data[i++] = obj.getValue();
}
return data;
}
public int[] getAllValue2() {
synchronized (list) {
int[] data = new int[list.size()];
int i = 0;
for (Object obj : list) {
data[i++] = obj.getValue();
}
return data;
}
}
UPDATE
getAllValue1(): It is threadsafe, because it takes a snapshot of the CopyOnWriteList, which itself is threadsafe List. However, as sharakan points out, the cost is iterate 2 lists, and creating a local object ArrayList, which could be costly if the original list is large.
getAllValue2(): It is also threadsafe in the synchronization block. (Assume other functions do synchronization properly.) The reason to put it in the synchronization block is because I want to pre-allocate the array, to make sure .size() call is synchronized with iteration. (Iteration part is threadsafe, because it use CopyOnWriteList.) However the cost here is the opportunity cost of using syncrhonized block. If there are 1 million clients calling getAllValue2(), each one has to wait.
So I guess the answer really depends on how many concurrent users need to read the data. If not many concurrent users, probably the method2 is better. Otherwise, method1 is better. Agree?
In my usage, I have a couple concurrent clients, probably method2 is preferred. (BTW, my list is about 10k size).
getAllValue1 looks good given that you need to return an array of primitive types based on a a field of your objects. It'll be two iterations, but consistent and you won't cause any contention between reader threads. You haven't posted any profiling results, but unless your list is quite large I'd be more worried about contention in a multithreaded environment than the cost of two complete iterations.
You could remove one iteration if you change the API. Easiest way to do that is to return a Collection instead, as follows:
public Collection<Integer> getAllValue1() {
List<Integer> list2 = new ArrayList<Integer>(list.size());
for (Object obj : list2) {
list2.add(obj.getValue());
}
return list2;
}
if you can change your API that way, that'd be an improvement.
I think the second one is more efficient. The reason is, in the first one, you create another list as a local creation. That means if the original list contains lot of data, it is gonna copy all of them. If it contains millions of data, then it will be a issue.
However, there is list.toArray() method
Collections interface also contain some useful stuff
Collection synchronizedCollection = Collections.synchronizedCollection(list);
or
List synchronizedList = Collections.synchronizedList(list);
If you need objects VALUE, and not object, then move with the second code of yours. Else, you can replace appropriate parts of the second code with the above functions, and do whatever you want.
Edit (again):
Since you are using a copy on write array list (should've been more observant) I would get the iterator and use that to initialize your array. Since the iterator is a snapshot of the array at the time you ask for it you can synchronize on the list to get the size and then subsequently iterate without fear of ConcurrentModificationException or having the iterator change.
public int[] getAllValue1() {
synchronized(list){
int[] data = new int[list.size()];
}
Iterator i = list.iterator();
while(i.hasNext()){
data[i++] = i.next().getValue();
}
return data;
}
I am storing Integer objects representing an index of objects I want to track. Later in my code I want to check to see if a particular object's index corresponds to one of those Integers I stored earlier. I am doing this by creating an ArrayList and creating a new Integer from the index of a for loop:
ArrayList<Integer> courseselectItems = new ArrayList();
//Find the course elements that are within a courseselect element and add their indicies to the ArrayList
for(int i=0; i<numberElementsInNodeList; i++) {
if (nodeList.item(i).getParentNode().getNodeName().equals("courseselect")) {
courseselectItems.add(new Integer(i));
}
}
I then want to check later if the ArrayList contains a particular index:
//Cycle through the namedNodeMap array to find each of the course codes
for(int i=0; i<numberElementsInNodeList; i++) {
if(!courseselectItems.contains(new Integer(i))) {
//Do Stuff
}
}
My question is, when I create a new Integer by using new Integer(i) will I be able to compare integers using ArrayList.contains()? That is to say, when I create a new object using new Integer(i), will that be the same as the previously created Integer object if the int value used to create them are the same?
I hope I didn't make this too unclear. Thanks for the help!
Yes, you can use List.contains() as that uses equals() and an Integer supports that when comparing to other Integers.
Also, because of auto-boxing you can simply write:
List<Integer> list = new ArrayList<Integer>();
...
if (list.contains(37)) { // auto-boxed to Integer
...
}
It's worth mentioning that:
List list = new ArrayList();
list.add(new Integer(37));
if (list.contains(new Long(37)) {
...
}
will always return false because an Integer is not a Long. This trips up most people at some point.
Lastly, try and make your variables that are Java Collections of the interface type not the concrete type so:
List<Integer> courseselectItems = new ArrayList();
not
ArrayList<Integer> courseselectItems = new ArrayList();
My question is, when I create a new Integer by using new Integer(i) will I be able to compare integers using ArrayList.contains()? That is to say, when I create a new object using new Integer(i), will that be the same as the previously created Integer object if the int value used to create them are the same?
The short answer is yes.
The long answer is ...
That is to say, when I create a new object using new Integer(i), will that be the same as the previously created Integer object if the int value used to create them are the same?
I assume you mean "... will that be the same instance as ..."? The answer to that is no - calling new will always create a distinct instance separate from the previous instance, even if the constructor parameters are identical.
However, despite having separate identity, these two objects will have equivalent value, i.e. calling .equals() between them will return true.
Collection.contains()
It turns out that having separate instances of equivalent value (.equals() returns true) is okay. The .contains() method is in the Collection interface. The Javadoc description for .contains() says:
http://java.sun.com/javase/6/docs/api/java/util/Collection.html#contains(java.lang.Object)
boolean contains(Object o)
Returns true if this collection
contains the specified element. More
formally, returns true if and only if
this collection contains at least one
element e such that (o==null ? e==null
: o.equals(e)).
Thus, it will do what you want.
Data Structure
You should also consider whether you have the right data structure.
Is the list solely about containment? is the order important? Do you care about duplicates? Since a list is order, using a list can imply that your code cares about ordering. Or that you need to maintain duplicates in the data structure.
However, if order is not important, if you don't want or won't have duplicates, and if you really only use this data structure to test whether contains a specific value, then you might want to consider whether you should be using a Set instead.
Short answer is yes, you should be able to do ArrayList.contains(new Integer(14)), for example, to see if 14 is in the list. The reason is that Integer overrides the equals method to compare itself correctly against other instances with the same value.
Yes it will, because List.contains() use the equals() method of the object to be compared. And Integer.equals() does compare the integer value.
As cletus and DJ mentioned, your approach will work.
I don't know the context of your code, but if you don't care about the particular indices, consider the following style also:
List<Node> courseSelectNodes = new ArrayList<Node>();
//Find the course elements that are within a courseselect element
//and add them to the ArrayList
for(Node node : numberElementsInNodeList) {
if (node.getParentNode().getNodeName().equals("courseselect")) {
courseSelectNodes.add(node);
}
}
// Do stuff with courseSelectNodes
for(Node node : courseSelectNodes) {
//Do Stuff
}
I'm putting my answer in the form of a (passing) test, as an example of how you might research this yourself. Not to discourage you from using SO - it's great - just to try to promote characterization tests.
import java.util.ArrayList;
import junit.framework.TestCase;
public class ContainsTest extends TestCase {
public void testContains() throws Exception {
ArrayList<Integer> list = new ArrayList<Integer>();
assertFalse(list.contains(new Integer(17)));
list.add(new Integer(17));
assertTrue(list.contains(new Integer(17)));
}
}
Yes, automatic boxing occurs but this results in a performance penalty. Its not clear from your example why you would want to solve the problem in this manner.
Also, because of boxing, creating the Integer class by hand is superfluous.