List<String> listStr = new ArrayList<String>();
if(listStr.size == 0){
}
versus
if(listStr.isEmpty()){
}
In my view one of the benefits of using listStr.isEmpty() is that it doesn't check the size of the list and then compares it to zero, it just checks if the list is empty. Are there any other advantages as I often see if(listStr.size == 0) instead of if(listStr.isEmpty()) in codebases? Is there is a reason it's checked this way that I am not aware of?
The answers to this question could give you the answer. Basically, in implementations of some lists the method isEmpty() checks if the size is zero (and therefore from the point of view of performance they are practically equivalent). In other types of lists (for example the linked lists), however, counting items require more time than to check if it is empty or not.
For this reason it is always convenient to use the method isEmpty() to check if a list is empty. The reasons for which such a method is provided in all types of lists are also related to the interface, since ArrayList, Vector and LinkedList implement the same List interface: this interface has the isEmpty() method; then, each specific type of list provides its implementation of isEmpty() method.
No, there's no reason. isEmpty() expresses the intent more clearly, and should be preferred. PMD even has a rule for that. It doesn't matter much, though.
.size() can be O(1) or O(N), depending on the data structure; .isEmpty() is never O(N).
While isEmpty() may express the intent better as JB Nizet suggests. If you are an old school programer, your style may lean towards expressions like .size() > 0 etc. So if the answer can be an opinion on ones intent, the answer can also be do what your muscle memory tells you.
In many cases size() and isEmpty are virtually the same as others have said. And if you are going to write software you don't want to optimize prematurely. I think the curiosity is justified in the question, but to be effective / efficient, code the way thats natural for you if optimal performance isn't critical. Dwelling on these subtle behaviors could actually waste your development time.
Related
If I have a Name object and have an ArrayList of type Name (names), and I want to ascertain whether my list of names contains a given Name object (n), I could do it two ways:
boolean exists = names.contains(n);
or
boolean exists = names.stream().anyMatch(x -> x.equals(n));
I was considering if these two would behave the same and then thought about what happens if n was assigned null?
For contains, as I understand, if the argument is null, then it returns true if the list contains null. How would I achieve this anyMatch - would it be by using Objects.equals(x, n)?
If that is how it works, then which approach is more efficient - is it anyMatch as it can take advantage of laziness and parallelism?
The problem with the stream-based version is that if the collection (and thus its stream) contains null elements, then the predicate will throw a NullPointerException when it tries to call equals on this null object.
This could be avoided with
boolean exists = names.stream().anyMatch(x -> Objects.equals(x, n));
But there is no practical advantage to be expected for the stream-based solution in this case. Parallelism might bring an advantage for really large lists, but one should not casually throw in some parallel() here and there assuming that it may make things faster. First, you should clearly identify the actual bottlenecks.
And in terms of readability, I'd prefer the first, classical solution here. If you want to check whether the list of names.contains(aParticularValue), you should do this - it just reads like prose and makes the intent clear.
EDIT
Another advantage of the contains approach was mentioned in the comments and in the other answer, and that may be worth mentioning here: If the type of the names collection is later changed, for example, to be a HashSet, then you'll get the faster contains-check (with O(1) instead of O(n)) for free - without changing any other part of the code. The stream-based solution would then still have to iterate over all elements, and this could have a significantly lower performance.
They should provide the same result if hashCode() and equals() are written in reasonable way.
But the performance may be completely different. For Lists it wouldn't matter that much but for HashSet contains() will use hashCode() to locate the element and it will be done (most probably) in constant time. While with the second solution it will loop over all items and call a function so will be done in linear time.
If n is null, actually doesn't matter as usually equals() methods are aware of null arguments.
I've got an ArrayList that can be anywhere from 0 to 5000 items long (pretty big objects, too).
At one point I compare it against another ArrayList, to find their intersection. I know this is O(n^2).
Is creating a HashMap alongside this ArrayList, to achieve constant-time lookup, a valid strategy here, in order to reduce the complexity to O(n)? Or is the overhead of another data structure simply not worth it? I believe it would take up no additional space (besides for the references).
(I know, I'm sure 'it depends on what I'm doing', but I'm seriously wondering if there's any drawback that makes it pointless, or if it's actually a common strategy to use. And yes, I'm aware of the quote about prematurely optimizing. I'm just curious from a theoretical standpoint).
First of all, a short side note:
And yes, I'm aware of the quote about prematurely optimizing.
What you are asking about here is not "premature optimization"!
You are not talking about replacing a multiplication with some odd bitwise operations "because they are faster (on a 90's PC, in a C-program)". You are thinking about the right data structure for your application pattern. You are considering the application cases (though you did not tell us many details about them). And you are considering the implications that the choice of a certain data structure will have on the asymptotic running time of your algorithms. This is planning, or maybe engineering, but not "premature optimization".
That being said, and to tell you what you already know: It depends.
To elaborate this a bit: It depends on the actual operations (methods) that you perform on these collections, how frequently you perform then, how time-critical they are, and how memory-sensitive the application is.
(For 5000 elements, the latter should not be a problem, as only references are stored - see the discussion in the comments)
In general, I'd also be hesitant to really store the Set alongside the List, if they are always supposed to contain the same elements. This wording is intentional: You should always be aware of the differences between both collections. Primarily: A Set can contain each element only once, whereas a List may contain the same element multiple times.
For all hints, recommendations and considerations, this should be kept in mind.
But even if it is given for granted that the lists will always contain elements only once in your case, then you still have to make sure that both collections are maintained properly. If you really just stored them, you could easily cause subtle bugs:
private Set<T> set = new HashSet<T>();
private List<T> list = new ArrayList<T>();
// Fine
void add(T element)
{
set.add(element);
list.add(element);
}
// Fine
void remove(T element)
{
set.remove(element);
list.remove(element); // May be expensive, but ... well
}
// Added later, 100 lines below the other methods:
void removeAll(Collection<T> elements)
{
set.removeAll(elements);
// Ooops - something's missing here...
}
To avoid this, one could even consider to create a dedicated collection class - something like a FastContainsList that combines a Set and a List, and forwards the contains call to the Set. But you'll qickly notice that it will be hard (or maybe impossible) to not violate the contracts of the Collection and List interfaces with such a collection, unless the clause that "You may not add elements twice" becomes part of the contract...
So again, all this depends on what you want to do with these methods, and which interface you really need. If you don't need the indexed access of List, then it's easy. Otherwise, referring to your example:
At one point I compare it against another ArrayList, to find their intersection. I know this is O(n^2).
You can avoid this by creating the sets locally:
static <T> List<T> computeIntersection(List<T> list0, List<T> list1)
{
Set<T> set0 = new LinkedHashSet<T>(list0);
Set<T> set1 = new LinkedHashSet<T>(list1);
set0.retainAll(set1);
return new ArrayList<T>(set0);
}
This will have a running time of O(n). Of course, if you do this frequently, but rarely change the contents of the lists, there may be options to avoid the copies, but for the reason mentioned above, maintainng the required data structures may become tricky.
I have two ways of checking if a List is empty or not
if (CollectionUtils.isNotEmpty(listName))
and
if (listName != null && listName.size() != 0)
My arch tells me that the former is better than latter. But I think the latter is better.
Can anyone please clarify it?
You should absolutely use isEmpty(). Computing the size() of an arbitrary list could be expensive. Even validating whether it has any elements can be expensive, of course, but there's no optimization for size() which can't also make isEmpty() faster, whereas the reverse is not the case.
For example, suppose you had a linked list structure which didn't cache the size (whereas LinkedList<E> does). Then size() would become an O(N) operation, whereas isEmpty() would still be O(1).
Additionally of course, using isEmpty() states what you're actually interested in more clearly.
CollectionUtils.isNotEmpty checks if your collection is not null and not empty. This is better comparing to double check but only if you have this Apache library in your project. If you don't then use:
if(list != null && !list.isEmpty())
Unless you are already using CollectionUtils I would go for List.isEmpty(), less dependencies.
Performance wise CollectionUtils will be a tad slower. Because it basically follows the same logic but has additional overhead.
So it would be readability vs. performance vs. dependencies. Not much of a big difference though.
if (CollectionUtils.isNotEmpty(listName))
Is the same as:
if(listName != null && !listName.isEmpty())
In first approach listName can be null and null pointer exception will not be thrown. In second approach you have to check for null manually. First approach is better because it requires less work from you. Using .size() != 0 is something unnecessary at all, also i learned that it is slower than using .isEmpty()
If you have the Apache common utilities in your project rather use the first one. Because its shorter and does exactly the same as the latter one. There won't be any difference between both methods but how it looks inside the source code.
Also a empty check using
listName.size() != 0
Is discouraged because all collection implementations have the
listName.isEmpty()
function that does exactly the same.
So all in all, if you have the Apache common utils in your classpath anyway, use
if (CollectionUtils.isNotEmpty(listName))
in any other case use
if(listName != null && listName.isEmpty())
You will not notice any performance difference. Both lines do exactly the same.
Apache Commons' CollectionUtils.isNotEmpty(Collection) is a NULL-SAFE check
Returns TRUE is the Collection/List is not-empty and not-null
Returns FALSE if the Collection is Null
Example:
List<String> properties = new ArrayList();
...
if (CollectionUtils.isNotEmpty(properties)) {
// process the list
} else {
// list is null or empty
}
Refer:
https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/CollectionUtils.html#isNotEmpty(java.util.Collection)
isEmpty()
Returns true if this list contains no elements.
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/List.html
A good example of where this matters in practice is the ConcurrentSkipListSet implementation in the JDK, which states:
Beware that, unlike in most collections, the size method is not a constant-time operation.
This is a clear case where isEmpty() is much more efficient than checking whether size()==0.
You can see why, intuitively, this might be the case in some collections. If it's the sort of structure where you have to traverse the whole thing to count the elements, then if all you want to know is whether it's empty, you can stop as soon as you've found the first one.
Use CollectionUtils.isEmpty(Collection coll)
Null-safe check if the specified collection is empty.
Null returns true.
Parameters:
coll - the collection to check, may be null
Returns:
true if empty or null
The org.apache.commons.collections4.CollectionUtils isEmpty() method is used to check any collections(List, Set, etc.) are empty or not. It checks for null as well as size of collections. The CollectionUtils isEmpty() is a static method, which accepts Collection as a parameter.
I would use the first one. It is clear to see right away what it does. I dont think the null check is necessary here.
table.column = ANY(ARRAY[ :canEmptyArrayParameter ]::BIGINT[])
It helps me to check empty parameter of array
this parameter can be Collections.emptyList();
To Check collection is empty, you can use method: .count(). Example:
DBCollection collection = mMongoOperation.getCollection("sequence");
if(collection.count() == 0) {
SequenceId sequenceId = new SequenceId("id", 0);
mMongoOperation.save(sequenceId);
}
How expensive is calling size() on List or Map in Java? or it is better to save size()'s value in a variable if accessed frequently?
The answer is that it depends on the actual implementation class. For some Map and Collection classes, size() is a cheap constant-time operation. For others, it may entail counting the members.
The Java Collections Cheatsheet (V2) is normally a good source for this kind of information, but the host server is currently a bit sick.
The "coderfriendly.com" domain is no more, but I tracked down a copy of the cheat-sheet on scribd.com.
The cost of size() will also be obvious from looking at the source code. (And this is an "implementation detail" that is pretty much guaranteed to not change ... for the standard collection classes.)
FOLLOWUP
Unfortunately, the cheatsheet only documents the complexity of size for queue implementations. I think that's because it is O(1) for all other collections; see #seanizer's answer.
List and Map are interfaces, so it's impossible to say. For the implementations in the Java Standard API, the size is generally kept in a field and thus not performance-relevant.
For most Collections, calling size() is a constant-time operation. There are however some exceptions. One is ConcurrentLinkedQueue. From the Javadoc of the size() method:
Beware that, unlike in most collections, this method is NOT a constant-time operation. Because of the asynchronous nature of these queues, determining the current number of elements requires an O(n) traversal.
So I'm afraid there's no generic answer, you have to check the documentation of the individual collection you are using.
for ArrayList the implementation is like
public int size() {
return lastIndex - firstIndex;
}
So not over head
You can check the source code for detailed info for your required Impl.
Note: The source given is from openjdk
Implement it, then test it. If it is slow, take a closer look.
"Premature optimisation is the root of all evil." - D. Knuth
Also: You should not require certain implementation features, especially if they are black-boxed. What happens if you replace that list with a concurrent list at a later date? What happens if Oracle decides to rewrite List? Will it still be fast? You just don't know.
You don't have to worry much about that. The list implementations keep track of size. The cost of the call is just O(1). If you are very curious, you can read the source code for the implementations of Collection's concrete classes and see the size() method there.
Implementation gets it from a private pre-computed variable so it's not expensive.
No need to store.Its not at all expensive.Check the source of ArrayList and HashMap.
I think some implementations of LinkedList count the total for each call. The call to a method itself can be a little taxing, but only if we're talking about large iterations or driver coding for hardware would that really be an issue.
In either case, if you save it to a local variable, there won't be any problems.
If all that you're doing is a simple one-pass iteration (i.e. only hasNext() and next(), no remove()), are you guaranteed linear time performance and/or amortized constant cost per operation?
Is this specified in the Iterator contract anywhere?
Are there data structures/Java Collection which cannot be iterated in linear time?
java.util.Scanner implements Iterator<String>. A Scanner is hardly a data structure (e.g. remove() makes absolutely no sense). Is this considered a design blunder?
Is something like PrimeGenerator implements Iterator<Integer> considered bad design, or is this exactly what Iterator is for? (hasNext() always returns true, next() computes the next number on demand, remove() makes no sense).
Similarly, would it have made sense for java.util.Random implements Iterator<Double>?
Should a type really implement Iterator if it's effectively only using one-third of its API? (i.e. no remove(), always hasNext())
There is no such guarantee. As you point out, anyone can model anything as Iterator. Individual producers of iterators would have to specify their individual performance.
Nothing in the Iterator documentaton mentions any kind of performance guarantee, so there is no guarantee.
It also wouldn't make sense to require this constraint on such a universal tool.
A much more useful constraint would be document a iterator() method to specify the time constraints that this Iterator instance fulfills (for example an Iterator over a general-purpose Collection will most likely be able to guarantee linear time operation).
Similarly, nothing in the documentation requires hasNext() to ever return false, so an endless Iterator would be perfectly valid.
However, there is a general assumption that all Iterator instances behave like "normal" Iterator instances as returned by Collection.iterator() in that they return some number of values and end at some point. This is not required by the documentation and, strictly speaking, any code depending on that fact would be subtly broken.
All of your proposals sound reasonable for Iterator. The API docs explicitly say remove need not be supported, and suggests that one not use the older Enumeration that works just like Iterator except without remove.
Also, infinite-length streams are a very useful concept in functional programming, and can be implemented with an Iterator that always hasNext. It's a feature that it can handle either case.
It sounds like you're thinking of iterators in the sense of a list or set traversal. I think a more useful mental model is a discrete object stream, aything that you want to handle one-at-a-time that can be streamed from a source in terms of discrete instances.
In that sense a stream of prime numbers or of list objects both makes sense, and the model doesn't imply anything about the finite-ness of the data source.
I can imagine a use case for this.. And it seems intuitive enough. Personally I think it's fine.
for(long prime : new PrimeGenerator()){
//do stuff
if(condition){
break;
}
}