Java contains vs anyMatch behaviour - java

If I have a Name object and have an ArrayList of type Name (names), and I want to ascertain whether my list of names contains a given Name object (n), I could do it two ways:
boolean exists = names.contains(n);
or
boolean exists = names.stream().anyMatch(x -> x.equals(n));
I was considering if these two would behave the same and then thought about what happens if n was assigned null?
For contains, as I understand, if the argument is null, then it returns true if the list contains null. How would I achieve this anyMatch - would it be by using Objects.equals(x, n)?
If that is how it works, then which approach is more efficient - is it anyMatch as it can take advantage of laziness and parallelism?

The problem with the stream-based version is that if the collection (and thus its stream) contains null elements, then the predicate will throw a NullPointerException when it tries to call equals on this null object.
This could be avoided with
boolean exists = names.stream().anyMatch(x -> Objects.equals(x, n));
But there is no practical advantage to be expected for the stream-based solution in this case. Parallelism might bring an advantage for really large lists, but one should not casually throw in some parallel() here and there assuming that it may make things faster. First, you should clearly identify the actual bottlenecks.
And in terms of readability, I'd prefer the first, classical solution here. If you want to check whether the list of names.contains(aParticularValue), you should do this - it just reads like prose and makes the intent clear.
EDIT
Another advantage of the contains approach was mentioned in the comments and in the other answer, and that may be worth mentioning here: If the type of the names collection is later changed, for example, to be a HashSet, then you'll get the faster contains-check (with O(1) instead of O(n)) for free - without changing any other part of the code. The stream-based solution would then still have to iterate over all elements, and this could have a significantly lower performance.

They should provide the same result if hashCode() and equals() are written in reasonable way.
But the performance may be completely different. For Lists it wouldn't matter that much but for HashSet contains() will use hashCode() to locate the element and it will be done (most probably) in constant time. While with the second solution it will loop over all items and call a function so will be done in linear time.
If n is null, actually doesn't matter as usually equals() methods are aware of null arguments.

Related

fj.data.Set comparison

In java.util, we can use the containsAll method to compare two java.util.Set. What's the best way to compare two fj.data.Set?
Is there really any "valuable" benefit of using fj over java.util?
I have never used that library nor will I ever, but looking through the API I found the method
public final boolean subsetOf(Set<A> s)
Returns true if this set is a subset of the given set.
Parameters: s - A set which is a superset of this set if this method returns true.
Returns: true if this set is a subset of the given set.
I believe this should be used like a "reversed" containsAll:
a.containsAll(b) is true i.f.f. b.subsetOf(a) is true (not sure how equal sets are handled, my guess is that it's fine).
Afterthought: I just noticed how fishy the wording in the javadoc is. The parameter description is dependent on the output: a superset of this set if this method returns true. You're not supposed to assume on the parameter or use a conditional for it. A better wording would be along the lines of: a set to be checked for being a superset.
To compare two sets use Set.subsetOf. So s1.subsetOf(s2) if s1 is a subset of s2 (in contains terms, s2 contains s1).
To create sets use Set.set which takes an ordering of elements (that is, how to know whether an item is less than another so the set can be stored in a red-black tree).
FunctionalJava uses the functional programming (FP) approach - data is immutable. That makes the Collection interface inappropriate for implementing. Note all the mutation and updating of objects in the Collection interface in methods like add, addAll, remove, etc.
There should be better support for converting to and from java.util.Set, but to convert a fj.data.Set to a java.util class, call set.toStream().toCollection() or set.toList().toJavaList().

Is creating a HashMap alongside an ArrayList just for constant-time contains() a valid strategy?

I've got an ArrayList that can be anywhere from 0 to 5000 items long (pretty big objects, too).
At one point I compare it against another ArrayList, to find their intersection. I know this is O(n^2).
Is creating a HashMap alongside this ArrayList, to achieve constant-time lookup, a valid strategy here, in order to reduce the complexity to O(n)? Or is the overhead of another data structure simply not worth it? I believe it would take up no additional space (besides for the references).
(I know, I'm sure 'it depends on what I'm doing', but I'm seriously wondering if there's any drawback that makes it pointless, or if it's actually a common strategy to use. And yes, I'm aware of the quote about prematurely optimizing. I'm just curious from a theoretical standpoint).
First of all, a short side note:
And yes, I'm aware of the quote about prematurely optimizing.
What you are asking about here is not "premature optimization"!
You are not talking about replacing a multiplication with some odd bitwise operations "because they are faster (on a 90's PC, in a C-program)". You are thinking about the right data structure for your application pattern. You are considering the application cases (though you did not tell us many details about them). And you are considering the implications that the choice of a certain data structure will have on the asymptotic running time of your algorithms. This is planning, or maybe engineering, but not "premature optimization".
That being said, and to tell you what you already know: It depends.
To elaborate this a bit: It depends on the actual operations (methods) that you perform on these collections, how frequently you perform then, how time-critical they are, and how memory-sensitive the application is.
(For 5000 elements, the latter should not be a problem, as only references are stored - see the discussion in the comments)
In general, I'd also be hesitant to really store the Set alongside the List, if they are always supposed to contain the same elements. This wording is intentional: You should always be aware of the differences between both collections. Primarily: A Set can contain each element only once, whereas a List may contain the same element multiple times.
For all hints, recommendations and considerations, this should be kept in mind.
But even if it is given for granted that the lists will always contain elements only once in your case, then you still have to make sure that both collections are maintained properly. If you really just stored them, you could easily cause subtle bugs:
private Set<T> set = new HashSet<T>();
private List<T> list = new ArrayList<T>();
// Fine
void add(T element)
{
set.add(element);
list.add(element);
}
// Fine
void remove(T element)
{
set.remove(element);
list.remove(element); // May be expensive, but ... well
}
// Added later, 100 lines below the other methods:
void removeAll(Collection<T> elements)
{
set.removeAll(elements);
// Ooops - something's missing here...
}
To avoid this, one could even consider to create a dedicated collection class - something like a FastContainsList that combines a Set and a List, and forwards the contains call to the Set. But you'll qickly notice that it will be hard (or maybe impossible) to not violate the contracts of the Collection and List interfaces with such a collection, unless the clause that "You may not add elements twice" becomes part of the contract...
So again, all this depends on what you want to do with these methods, and which interface you really need. If you don't need the indexed access of List, then it's easy. Otherwise, referring to your example:
At one point I compare it against another ArrayList, to find their intersection. I know this is O(n^2).
You can avoid this by creating the sets locally:
static <T> List<T> computeIntersection(List<T> list0, List<T> list1)
{
Set<T> set0 = new LinkedHashSet<T>(list0);
Set<T> set1 = new LinkedHashSet<T>(list1);
set0.retainAll(set1);
return new ArrayList<T>(set0);
}
This will have a running time of O(n). Of course, if you do this frequently, but rarely change the contents of the lists, there may be options to avoid the copies, but for the reason mentioned above, maintainng the required data structures may become tricky.

Is Map.containsKey() useful in a Map that has no null values?

In the following piece of code:
if (map.containsKey(key)) {
map.remove(key);
}
Looking at performance, is it useful to first do a Map.containsKey() check before trying to remove the value from the map?
Same question goes for retrieving values, is it useful to first do the contains check if you know that the map contains no null values?
if (map.containsKey(key)) {
Object value = map.get(key);
}
remove returns null if there's no mapping for key no exception will be thrown:
public V remove(Object key)
I don't see any reason to perform that if before trying to remove a key, perhaps maybe if you want to count how many items where removed from the map..
In the second example, you'll get null if the key doesn't exist. Whether to check or not, depends on your logic.
Try not to waste your time on thinking about performance, containsKey has O(1) time complexity:
This implementation provides constant-time performance for the basic operations (get and put)
is it useful to first do a Map.containsKey() check before trying to remove the value from the map?
No, it is counterproductive:
In the case when the item is not there, you would see no difference
In the case when the item is there, you would end up with two look-ups.
If you want to remove the item unconditionally, simply call map.remove(key).
Same question goes for retrieving values
Same logic applies here. Of course you need to check the result for null, so in this case if stays there.
Note that this cleanup exercise is about readability first, and only then about performance. Accessing a map is a fast operation, so accessing it twice is unlikely to cause major performance issues except for some rather extreme cases. However, removing an extra conditional will make your code more readable, which is very important.
The Java documentation on remove() states that it will remove the element only if the map contains such element. So the contains() check before remove() is redundant.
This is subjective (and entirely a case of style), but for the case where you're retrieving a value, I prefer the contains(key) call to the null check. Boolean comparisons just feel better than null comparisons. I'd probably feel differently if Map<K,V>.get(key) returned Optional<V>.
Also, it's worth noting the "given no null keys" assertion is one that can be fairly hard to prove, depending on the type of the Map (which you might not even know). In general I think the redundant check on retrieval is (or maybe just feels) safer, just in case there's a mistake somewhere else (knocks on wood, checks for black cats, and avoids a ladder on the way out).
For the removal operation you're spot on. The check is useless.

if(listStr.size == 0){ versus if(listStr.isEmpty()){

List<String> listStr = new ArrayList<String>();
if(listStr.size == 0){
}
versus
if(listStr.isEmpty()){
}
In my view one of the benefits of using listStr.isEmpty() is that it doesn't check the size of the list and then compares it to zero, it just checks if the list is empty. Are there any other advantages as I often see if(listStr.size == 0) instead of if(listStr.isEmpty()) in codebases? Is there is a reason it's checked this way that I am not aware of?
The answers to this question could give you the answer. Basically, in implementations of some lists the method isEmpty() checks if the size is zero (and therefore from the point of view of performance they are practically equivalent). In other types of lists (for example the linked lists), however, counting items require more time than to check if it is empty or not.
For this reason it is always convenient to use the method isEmpty() to check if a list is empty. The reasons for which such a method is provided in all types of lists are also related to the interface, since ArrayList, Vector and LinkedList implement the same List interface: this interface has the isEmpty() method; then, each specific type of list provides its implementation of isEmpty() method.
No, there's no reason. isEmpty() expresses the intent more clearly, and should be preferred. PMD even has a rule for that. It doesn't matter much, though.
.size() can be O(1) or O(N), depending on the data structure; .isEmpty() is never O(N).
While isEmpty() may express the intent better as JB Nizet suggests. If you are an old school programer, your style may lean towards expressions like .size() > 0 etc. So if the answer can be an opinion on ones intent, the answer can also be do what your muscle memory tells you.
In many cases size() and isEmpty are virtually the same as others have said. And if you are going to write software you don't want to optimize prematurely. I think the curiosity is justified in the question, but to be effective / efficient, code the way thats natural for you if optimal performance isn't critical. Dwelling on these subtle behaviors could actually waste your development time.

What would be a good way to implement a set collection with weak references, compares by reference, and is also sortable in Java?

I want to have an object that allows other objects of a specific type to register themselves with it. Ideally it would store the references to them in some sort of set collection and have .equals() compare by reference rather than value. It shouldn't have to maintain a sort at all times, but it should be able to be sorted before the collection is iterated over.
Looking through the Java Collection Library, I've seen the various features I'm looking for on different collection types, but I am not sure about how I should go about using them to build the kind of collection I'm looking for.
This is Java in the context of Android if that is significant.
Java's built-in tree-based collections won't work.
To illustrate, consider a tree containing weak references to nodes 'B', 'C', and 'D':
C
B D
Now let the weak reference 'C' get collected, leaving null behind:
-
B D
Now insert an element into the tree. The TreeMap/TreeSet doesn't have sufficient information to select the left or right subtree. If your comparator says null is a small value, then it will be incorrect when inserting 'A'. If it says null is a large value, it will be incorrect when inserting 'E'.
Sort on demand is a good choice.
A more robust solution is to use an ArrayList<WeakReference<T>> and to implement a Comparator<WeakReference<T>> that delegates to a Comparator<T>. Then call Collections.sort() prior to iteration.
Android's Collections.sort uses TimSort behind-the-scenes and so it runs quite efficiently if the input is already partially sorted.
Perhaps the collections classes are a level of abstraction below what you're looking for? It sounds like the end product you want is a cache with the ability to iterate in a user-defined sort order. If so, perhaps the cache interface in the Google Guava library is close enough to what you want:
http://code.google.com/p/guava-libraries/source/browse/trunk/guava/src/com/google/common/cache/Cache.java
At a glance, it looks like CacheBuilder in that package doesn't allow you to build an implementation with user-defined iteration order. However, it does provide a Map view that might be good enough for your needs:
List<Thing> cachedThings = Lists.newArrayList(cache.asMap().values());
Collections.sort(cachedThings, YOUR_THING_COMPARATOR);
for (Thing thing : cachedThings) { ... }
Even if this isn't exactly what you want, the classes in that package might give you some useful insights re: using References with Collections.
DISCLAIMER: This was a comment but it got kinda big, sorry if it doesn't solve your problem:
References in Java
Just to clarify what I mean when I say reference, since it isn't really a term commonly used in Java: Java does not really use references or pointers. It uses a kind of pseudo-reference that can be (and is by default) assigned to the special null instance. That's one way to explain it anyway. In Java, these pseudo-references are the only way that an Object can be handled. When I say reference, I mean these pseudo-references.
Sets
Any Set implementation will not allow two references to the same object to be included in it since it uses identity equality for this check. That violates the mathematical concept of a set. The Java Sets ignore any attempt to add duplicate references.
You mention a Map in your comment though... Could you clarify what kind of collection you are after? And why you need that kind of equality checking within it? Are you thinking in C++ terms? I'll try to edit my answer to be more helpful then :)
EDIT: I thought that might have been your goal ;) So a TreeSet should do the trick then! I would not get concerned about performance until there is a performance issue. Simplicity is fantastic for readability, maintenance and preventing bugs. If performance does become a problem, ideally you should profile your code and only optimize the areas that are proven to be the problem.

Categories

Resources