ArrayList - add "same" objects (same => equals, hashCode), Threads - java

Ive got one question. What happens when I try to add the "same" object twice to an ArrayList. With "the same" I mean an object of an individual class, which is identified as the same with equals() and hashCode(). It has different values for most of the member variables and was created from maybe different threads, but for equals() and hashCode() its the "same".
Does the second object then replace the first object?
Also, what happens if two threads try to add those objects exactly at the same time to the ArrayList? Is this even possible? If yes, what happens?
Thank you! :-)
[EDIT] Thanks for all the answers! Should I use synchronizedList then rather then using "synchronize(list){}"? --> I read the docs, even with synchronizedList, for iterating synchronize(list) shall be used
[EDIT2]
Can a synchronizedList be declared as a member variable? I tried, but it didnt work.

No, ArrayList doesn't attempt to detect duplicates at all - you can have an ArrayList with the exact same reference appearing multiple times. If you want a collection to avoid duplicates, you need a Set implementation - and if you also want to preserve insertion order, you probably want LinkedHashSet.
Note, however, that without locking ArrayList should not be mutated from multiple threads in the first place - it's simply not meant to be a thread-safe collection in that way. Several threads can read from an ArrayList without synchronization, but not mutate it. From the docs:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method. This is best done at creation time, to prevent accidental unsynchronized access to the list
If you want to mutate a collection from multiple threads without locking, I suggest you look at the collections in java.util.concurrent.

Does the second object then replace
the first object?
No, most developers do explicit checks
if(!list.contains(foo)){
list.add(foo);
}
Also, what happens if two threads try
to add those objects exactly at the
same time to the ArrayList? Is this
even possible? If yes, what happens?
Yes, this is possible. If multiple threads write to/read from the same ArrayList, then use the synchronized keyword whenever you access this list
public List<Foo> getFoos(){
synchronized(list){
return list;
}
}
public void addFoo(Foo foo){
synchronized(list){
list.add(foo);
}
}
EDIT
As someone pointed out, I suppose checking whether or not the ArrayList contains the object to be added is quite expensive. If you want to ensure that the object is only added once, I'd follow the recommendation made below of using a LinkedHashSet. According to the API, when attempting to add to this data structure it
Adds the specified element to this set
if it is not already present. More
formally, adds the specified element e
to this set if this set contains no
element e2 such that (e==null ?
e2==null : e.equals(e2)). If this set
already contains the element, the call
leaves the set unchanged and returns
false.

It will allow to add simply. List has nothing to do with hashCode(), equals() while insertion it doesn't care for duplicate.
ArrayList isn't thread safe so you might not get desired result. you can have synchronizedList from Collections class

An ArrayList can contain multiple references to the same exact object (identity equivalence). It doesn't check equals() or hashCode() when adding objects.
You will just end up with two references in your ArrayList.
ArrayList is NOT thread-safe...so the behaviour if you try to have two threads add at the same time is undefined. Maybe try using a SynchronizedList if you want to do something like that.

If you try to add the same object twice, it will work, or if you try to add 2 objects with everything the same, it will still work. It is not best practice to do that because its harder to maintain the list.
overall: you shouldn't do it

Related

Immutable list in .NET: Why are you allowed to add and remove items?

In C#, I just got the need of having an immutable list, meaning that the list can not be changed.
Much like in Java's immutable list: https://www.geeksforgeeks.org/immutable-list-in-java/
From there:
If any attempt is made to add null element in List,
UnsupportedOperationException is thrown.
Now, with .NET (at least with Core 2.2) there is also an immutable list, documented here.
They say (emphasis mine):
When you add or remove items from an immutable list, a copy of the
original list is made with the items added or removed, and the
original list is unchanged.
So, this implementation basically allows changing the list (by getting a manipulated copy each time), as opposed to the java understanding, and what's more, it will mostly go undetected clogging memory.
What's the point in having an immutable list that supports add and remove methods in the first place?
The problem for me here is, that users of my code would get a list, immutable presumably, but out of neglectance would happily add items, which will never made it to the original "repository". This will cause confusion.
I guess the (only) way to go here, to forbid manipulation entirely, and make it clear to the code user, would be to use the IEnumerale interface?
What's the point in having an immutable list that supports add and
remove methods in the first place?
No one but to be conform with the List contract, the implementation even immutable will expose every List methods.
After you have two ways to cope with these modification methods : throwing an exception or guaranteeing the immutability by creating and returning a new List at each modification.
About :
I guess the (only) way to go here, to forbid manipulation entirely,
would be to use the IEnumerale interface?
Indeed, in Java you use Iterable (that is close enough) when you want to be able to manipulate a collection of things without a way to change it.
As alternative you can also use an array.
As you said: "a copy of the original list is made with the items added or removed, and the original list is unchanged.".
So you can add/remove elements and a new list is made with the changes. The original list is unchanged.
What's the point in having an immutable list that supports add and remove methods in the first place?
First think of this: What is the point of an immutable list that doesn't support adding or removing items in any way? There is nothing particular useful to that. You can use array for that.
Now back to your question. The list is immutable, so consumers can't change the instance itself which was provided through some other method or class. The backing storage can't be altered by consumers! But the producer of the immutable list can 'alter' the backing store by creating a new immutable list and assigning that to the original variable. Isn't that useful!

Is a readonly EnumSet iterator thread safe?

I have an EnumSet which is final and immutable i.e. initialized once in the constructor.
Is the contains() method on this EnumSet thread safe? It is internally using an iterator to make the contains check. Hence if two threads are simultaneously calling the contains() can the iterator position in one call effect other one? Or are the iterators having different instances in these two thread calls?
The contents of an EnumSet can be changed despite the reference to it being final. No EnumSet is immutable. You can, however, wrap your EnumSet via Collections.unmodifiableSet(). If you also avoid retaining any reference to the original EnumSet then the unmodifiable wrapper object is functionally immutable.
Mutability notwithstanding, two iterators operating over the same Set at the same time presents no problem as long as the Set is not modified. This isn't really any different from the case of just one iterator.
In any case, the contains() method of an EnumSet likely doesn't create or use an iterator. The class implements membership via a bit vector, so it uses bit operations to perform the contains() test.
No, if two threads call contains() at the same time, that will call iterator() twice which will create two separate iterators.
If you were trying to share an iterator between two threads, that would not be a good idea.
Note that if you modify the set in one thread while iterating over it (e.g. via contains) in another, then this bit of the docs comes into play:
The returned iterator is weakly consistent: it will never throw ConcurrentModificationException and it may or may not show the effects of any modifications to the set that occur while the iteration is in progress.

What does it mean to say that a list is locked internally?

This code is from the book Effective Java
Object[] snapshot = list.toArray();// Locks list internally
I am mainly interested in the comment here . Does it make the list unmodifiable ? What does it mean to say that a list is locked internally ? How long is this lock kept ? Is there a better alternative to convert a List to an array ?
I would imagine that it means the list doesn't maintain a reference to the returned array, meaning that the array can be modified without affecting the original list from where it came. Likewise, any modifications to the list won't be reflected in the array.
This is important in terms of thread safety, because it means you can iterate on the contents of the list from a thread-safe perspective, without worrying about another thread altering the sttae of the list in the meantime. In this sense the state of the list is "locked" in the returned array, no matter what changes are made to the list afterwards - you can see it as taking a snapshot.
toArray(); doesn't alter the state of the list - so it doesn't make it unmodifiable or anything like that.
Like the others said, I think that is about concurrency:
Text from javadoc of java.uitl.List
The returned array will be "safe" in that no references to it are
maintained by this list. (In other words, this method must
allocate a new array even if this list is backed by an array).
The caller is thus free to modify the returned array.
Its about thread safety - i.e. conversion of the list to Array will be thread safe
Edit:
In simplest way - you can take it as
when Thread one is converting List -> Array no other thread is allowed to alter the list till the time Thread one has not completed the conversion
For those wondering where the "internal locking" takes place:
Please note that J. Bloch writes as an introduction for the given code: "For example, suppose you have a synchronized list (of the sort returned by Collections.synchroniedList) (...)"
In that case toArray() really "locks internal" because the implementation of the synchronized list will do just that (with a mutex) preventing any modification by other threads while the decoupled array is created.

Java: Effeciently keep track of used objects

I have a program that collects objects over time. Those objects are often, but not always duplicates of objects the program has already received. The number of unique objects can sometimes be up in the tens of thousands. As my lists grow, it takes more time to identify whether an object has appeared or not before.
My current method is to store everything in an ArrayList, al; use Collections.sort(al); and use Collections.binarySearch(al, key) to determine whether I've used an object. Everytime I come across a new object I have to insert and sort however.
I'm wondering if there's just a better way to do this. Contains tends to slow up too quickly. I'm looking for something as close to O(1) as possible.
Thanks much.
This is java. For the purpose of understanding what I'm talking about, I basically need a method that does this:
public boolean objectAlreadyUsed(Object o) {
return \\ Have we seen this object already?
}
Instead of using an ArrayList, why wouldn't you use a Set implementation (likely a HashSet)? You'll get constant-time lookup, no sorting needed.
N.B. your objects will need to correctly override hashCode() and equals().
This begs the question - why not use a data structure that doesn't allow duplicates (e.g. Set)? If you attempt to add a duplicate item, the method will return false and the data structure will remain unchanged.
Make sure the objects have correct equals() and hashCode() methods, and store them in a HashSet. Lookup then becomes constant time.
If retaining unwanted objects becomes an issue, by the way, you could consider using one of the many WeakHashSet implementations available on the Internet -- it will hold the objects but still allow them to be garbage collected if necessary.

Java best practices, add to collection before or after object has been modified?

Say you are adding x number of objects to a collection, and after or before adding them to a collection you are modifying the objects attributes. When would you add the element to the collection before or after the object has been modified.
Option A)
public static void addToCollection(List<MyObject> objects) {
MyObject newObject = new MyObject();
objects.add(newObject);
newObject.setMyAttr("ok");
}
Option B)
public static void addToCollection(List<MyObject> objects) {
MyObject newObject = new MyObject();
newObject.setMyAttr("ok");
objects.add(newObject);
}
To be on the safe side, you should modify before adding, unless there is a specific reason you cannot do this, and you know the collection can handle the modification. The example can reasonably be assumed to be safe, since the general List contract does not depend upon object attributes - but that says nothing about specific implementations, which may have additional behavior that depends upon the object's value.
TreeSet, and Maps in general do no tolerate modifying objects after they have been inserted, because the structure of the collection is dependent upon the attributes of the object. For trees, any attributes used by the comparator cannot be changed once the item has been added. For maps, it's the hashCode that must remain constant.
So, in general, modify first, and then add. This becomes even more important with concurrent collections, since adding first can lead to other collection users seeing an object before it been assigned it's final state.
The example you provided won't have any issues because you're using a List collection which doesn't care about the Object contents.
If you were using something like TreeMap which internally sorts the contents of the Object keys it stores it could cause the Collection to get into an unexpected state. Again this depends on if the equals method uses the attribute you're changing to compare.
The safest way is to modify the object before placing it into the collection.
One of the good design rules to follow, is not to expose half-constructed object to a 3rd party subsystem.
So, according to this rule, initialize your object to the best of your abilities and then add it to the list.
If objects is an ArrayList then the net result is probably the same, however imaging if objects is a special flavor of List that fires some kind of notification event every time a new object is added to it, then the order will matter greatly.
In my opinion its depend of the settted attribure and tyle of collection, if the collection is a Set and the attribute have infulance on the method equal or hascode then definitely i will set this property before this refer also to sorterd list etc. in other cases this is irrelevant. But for this exapmle where object is created i will first set the atributes than add to collection because the code is better organized.
I think either way it's the same, personally I like B, :)
It really does boil down to what the situation requires. Functionally there's no difference.
One thing you should be careful with, is being sure you have the correct handle to the object you want to modify.
Certainly in this instance, modifying the object is part of the "create the object" thought, and so should be grouped with the constructor as such. After you "create the object" you "add it to the collection". Thus, I would do B, and maybe even add a blank line after the modification to give more emphasis on the two separate thoughts.

Categories

Resources