Java sets: why there is no T get(Object o)? [duplicate]

Java sets: why there is no T get(Object o)? [duplicate] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I understand that only one instance of any object according to .equals() is allowed in a Set and that you shouldn't "need to" get an object from the Set if you already have an equivalent object, but I would still like to have a .get() method that returns the actual instance of the object in the Set (or null) given an equivalent object as a parameter.
Any ideas/theories as to why it was designed like this?
I usually have to hack around this by using a Map and making the key and the value same, or something like that.
EDIT: I don't think people understand my question so far. I want the exact object instance that is already in the set, not a possibly different object instance where .equals() returns true.
As to why I would want this behavior, typically .equals() does not take into account all the properties of the object. I want to provide some dummy lookup object and get back the actual object instance in the Set.

While the purity argument does make the method get(Object) suspect, the underlying intent is not moot.
There are various class and interface families that slightly redefine equals(Object). One need look no further than the collections interfaces. For example, an ArrayList and a LinkedList can be equal; their respective contents merely need to be the same and in the same order.
Consequently, there are very good reasons for finding the matching element in a set. Perhaps a clearer way of indicating intent is to have a method like
public interface Collection<E> extends ... {
...
public E findMatch(Object o) throws UnsupportedOperationException;
...
}
Note that this API has value broader that within Set.
As to the question itself, I don't have any theory as to why such an operation was omitted. I will say that the minimal spanning set argument does not hold, because many operations defined in the collections APIs are motivated by convenience and efficiency.

The problem is: Set is not for "getting" objects, is for adding and test for presence.
I understand what are you looking for, I had a similar situation and ended using a map of the same object in key and value.
EDIT: Just to clarify: http://en.wikipedia.org/wiki/Set_(abstract_data_type)

I had the same question in java forum years ago. They told me that the Set interface is defined. It cannot be changed because it will break the current implementations of Set interface. Then, they started to claim bullshit, like you see here: "Set does not need the get method" and started to drill me that Map must always be used to get elements from a set.
If you use the set only for mathematical operations, like intersection or union, then may be contains() is sufficient. However, Set is defined in collections to store data. I explained for need get() in Set using the relational data model.
In what follows, an SQL table is like a class. The columns define attributes (known as fields in Java) and records represent instances of the class. So that an object is a vector of fields. Some of the fields are primary keys. They define uniqueness of the object. This is what you do for contains() in Java:
class Element {
public int hashCode() {return sumOfKeyFields()}
public boolean equals(Object e) {keyField1.equals(e) && keyField2.equals(e) && ..}
I'm not aware of DB internals. But, you specify key fields only once, when define a table. You just annotate key fields with #primary. You do not specify the keys second time, when add a record to the table. You do not separate keys from data, as you do in mapping. SQL tables are sets. They are not maps. Yet, they provide get() in addition to maintaining uniqueness and contains() check.
In "Art of Computer Programming", introducing the search, D. Knuth says the same:
Most of this chapter is devoted to the study of a very simple search
problem: how to find the data that has been stored with a given
identification.
You see, data is store with identification. Not identification pointing to data but data with identification. He continues:
For example, in a numerical application we might want
to find f(x), given x and a table of the values of f; in a
nonnumerical application, we might want to find the English
translation of a given Russian word.
It looks like he starts to speak about mapping. However,
In general, we shall suppose that a set of N records has been stored,
and the problem is to locate the appropriate one. We generally require
the N keys to be distinct, so that each key uniquely identifies its
record. The collection of all records is called a table or file,
where the word "table" is usually used to indicate a small file, and
"file" is usually used to indicate a large table. A large file or a
group of files is frequently called a database.
Algorithms for searching are presented with a so-called argument, K,
and the problem is to find which record has K as its key. Although the
goal of searching is to find the information stored in the record
associated with K, the algorithms in this chapter generally ignore
everything but the keys themselves. In practice we can find the
associated data once we have located K; for example, if K appears in
location TABLE + i, the associated data (or a pointer to it) might be
in location TABLE + i + 1
That is, the search locates the key filed of the record and it should not "map" the key to the data. Both are located in the same record, as fileds of java object. That is, search algorithm examines the key fields of the record, as it does in the set, rather than some remote key, as it does in the map.
We are given N items to be sorted; we shall call them records, and
the entire collection of N records will be called a file. Each
record Rj has a key Kj, which governs the sorting process. Additional
data, besides the key, is usually also present; this extra "satellite
information" has no effect on sorting except that it must be carried
along as part of each record.
Neither, I see no need to duplicate the keys in an extra "key set" in his discussion of sorting.
... ["The Art of Computer Programming", Chapter 6, Introduction]
entity set is collection or set all entities of a particular entity type
[http://wiki.answers.com/Q/What_is_entity_and_entity_set_in_dbms]
The objects of single class share their class attributes. Similarly, do records in DB. They share column attributes.
A special case of a collection is a class extent, which is the
collection of all objects belonging to the class. Class extents allow
classes to be treated like relations
... ["Database System Concepts", 6th Edition]
Basically, class describes the attributes common to all its instances. A table in relational DB does the same. "The easiest mapping you will ever have is a property mapping of a single attribute to a single column." This is the case I'm talking about.
I'm so verbose on proving the analogy (isomorphism) between objects and DB records because there are stupid people who do not accept it (to prove that their Set must not have the get method)
You see in replays how people, who do not understand this, say that Set with get would be redundant? It is because their abused map, which they impose to use in place of set, introduces the redundancy. Their call to put(obj.getKey(), obj) stores two keys: the original key as part of the object and a copy of it in the key set of the map. The duplication is the redundancy. It also involves more bloat in the code and wastes memory consumed at Runtime. I do not know about DB internals, but principles of good design and database normalization say that such duplication is bad idea - there must be only one source of truth. Redundancy means that inconsistency may happen: the key maps to an object that has a different key. Inconsistency is a manifestation of redundancy. Edgar F. Codd proposed DB normalization just to get rid of redundancies and their inferred inconsistencies. The teachers are explicit on the normalization: Normalization will never generate two tables with a one-to-one relationship between them. There is no theoretical reason to separate a single entity like this with some fields in a single record of one table and others in a single record of another table
So, we have 4 arguments, why using a map for implementing get in set is bad:
the map is unnecessary when we have a set of unique objects
map introduces redundancy in Runtime storage
map introduces code bloat in the DB (in the Collections)
using map contradicts the data storage normalization
Even if you are not aware of the record set idea and data normalization, playing with collections, you may discover this data structure and algorithm yourself, as we, org.eclipse.KeyedHashSet and C++ STL designers did.
I was banned from Sun forum for pointing out these ideas. The bigotry is the only argument against the reason and this world is dominated by bigots. They do not want to see concepts and how things can be different/improved. They see only actual world and cannot imagine that design of Java Collections may have deficiencies and could be improved. It is dangerous to remind rationale things to such people. They teach you their blindness and punish if you do not obey.
Added Dec 2013: SICP also says that DB is a set with keyed records rather than a map:
A typical data-management system spends a large amount of time
accessing or modifying the data in the records and therefore requires
an efficient method for accessing records. This is done by identifying
a part of each record to serve as an identifying key. Now we represent
the data base as a set of records.

Well, if you've already "got" the thing from the set, you don't need to get() it, do you? ;-)
Your approach of using a Map is The Right Thing, I think. It sounds like you're trying to "canonicalize" objects via their equals() method, which I've always accomplished using a Map as you suggest.

I'm not sure if you're looking for an explanation of why Sets behave this way, or for a simple solution to the problem it poses. Other answers dealt with the former, so here's a suggestion for the latter.
You can iterate over the Set's elements and test each one of them for equality using the equals() method. It's easy to implement and hardly error-prone. Obviously if you're not sure if the element is in the set or not, check with the contains() method beforehand.
This isn't efficient compared to, for example, HashSet's contains() method, which does "find" the stored element, but won't return it. If your sets may contain many elements it might even be a reason to use a "heavier" workaround like the map implementation you mentioned. However, if it's that important for you (and I do see the benefit of having this ability), it's probably worth it.

So I understand that you may have two equal objects but they are not the same instance.
Such as
Integer a = new Integer(3);
Integer b = new Integer(3);
In which case a.equals(b) because they refer to the same intrinsic value but a != b because they are two different objects.
There are other implementations of Set, such as IdentitySet, which do a different comparison between items.
However, I think that you are trying to apply a different philosophy to Java. If your objects are equal (a.equals(b)) although a and b have a different state or meaning, there is something wrong here. You may want to split that class into two or more semantic classes which implement a common interface - or maybe reconsider .equals and .hashCode.
If you have Joshua Bloch's Effective Java, have a look at the chapters called "Obey the general contract when overriding equals" and "Minimize mutability".

Just use the Map solution... a TreeSet and a HashSet also do it since they are backed up by a TreeMap and a HashMap, so there is no penalty in doing so (actualy it should be a minimal gain).
You may also extend your favorite Set to add the get() method.
[]]

I think your only solution, given some Set implementation, is to iterate over its elements to find one that is equals() -- then you have the actual object in the Set that matched.
K target = ...;
Set<K> set = ...;
for (K element : set) {
if (target.equals(element)) {
return element;
}
}

If you think about it as a mathematical set, you can derive a way to find the object.
Intersect the set with a collection of object containing only the object you want to find. If the intersection is not empty, the only item left in the set is the one you were looking for.
public <T> T findInSet(T findMe, Set<T> inHere){
inHere.retainAll(Arrays.asList(findMe));
if(!inHere.isEmpty){
return inHere.iterator().next();
}
return null;
}
Its not the most efficient use of memory, but its functionally and mathematically correct.

"I want the exact object instance that is already in the set, not a possibly different object instance where .equals() returns true."
This doesn't make sense. Say you do:
Set<Foo> s = new Set<Foo>();
s.Add(new Foo(...));
...
Foo newFoo = ...;
You now do:
s.contains(newFoo)
If you want that to only be true if an object in the set is == newFoo, implement Foo's equals and hashCode with object identity. Or, if you're trying to map multiple equal objects to a canonical original, then a Map may be the right choice.

I think the expectation is that equals truely represent some equality, not simply that the two objects have the same primary key, for example. And if equals represented two really equal objects, then a get would be redundant. The use case you want suggests a Map, and perhaps a different value for the key, something that represents a primary key, rather than the whole object, and then properly implement equals and hashcode accordingly.

Functional Java has an implementation of a persistent Set (backed by a red/black tree) that incidentally includes a split method that seems to do kind of what you want. It returns a triplet of:
The set of all elements that appear before the found object.
An object of type Option that is either empty or contains the found object if it exists in the set.
The set of all elements that appear after the found object.
You would do something like this:
MyElementType found = hayStack.split(needle)._2().orSome(hay);

Object fromSet = set.tailSet(obj).first();
if (! obj.equals(fromSet)) fromSet = null;
does what you are looking for. I don't know why java hides it.

Say, I have a User POJO with ID and name.
ID keeps the contract between equals and hashcode.
name is not part of object equality.
I want to update the name of the user based on the input from somewhere say, UI.
As java set doesn't provide get method, I need to iterate over the set in my code and update the name when I find the equal object (i.e. when ID matches).
If you had get method, this code could have been shortened.
Java now comes with all kind of stupid things like javadb and enhanced for loop, I don't understand why in this particular case they are being purist.

I had the same problem. I fixed it by converting my set to a Map, and then getting them from the map. I used this method:
public Map<MyObject, MyObject> convertSetToMap(Set<MyObject> set)
{
Map<MyObject, MyObject> myObjectMap = new HashMap<MyObject, MyObject>();
for(MyObject myObject: set){
myObjectMap.put(myObject, myObject);
}
return myObjectMap
}
Now you can get items from your set by calling this method like this:
convertSetToMap(myset).get(myobject);
You can override the equals in your class to let it check on only a certain properties like Id or name.

if you have made a request for this in Java bug parade list it here and we can vote it up. I think at least the convenience class java.util.Collections that just takes a set and an object
and is implemented something like
searchSet(Set ss, Object searchFor){
Iterator it = ss.iterator();
while(it.hasNext()){
Object s = it.next();
if(s != null && s.equals(searchFor)){
return s;
}
}

This is obviously a shortcoming of the Set API.
Simply, I want to lookup an object in my Set and update its property.
And I HAVE TO loop through my (Hash)Set to get to my object... Sigh...

I agree that I'd like to see Set implementations provide a get() method.
As one option, in the case where your Objects implement (or can implement) java.lang.Comparable, you can use a TreeSet. Then the get() type function can be obtained by calling ceiling() or floor(), followed by a check for the result being non-null and equal to the comparison Object, such as:
TreeSet myTreeSet<MyObject> = new TreeSet();
:
:
// Equivalent of a get() and a null-check, except for the incorrect value sitting in
// returnedMyObject in the not-equal case.
MyObject returnedMyObject = myTreeSet.ceiling(comparisonMyObject);
if ((null != returnedMyObject) && returnedMyObject.equals(comparisonMyObject)) {
:
:
}

The reason why there is no get is simple:
If you need to get the object X from the set is because you need something from X and you dont have the object.
If you do not have the object then you need some means (key) to locate it. ..its name, a number what ever. Thats what maps are for right.
map.get( "key" ) -> X!
Sets do not have keys, you need yo traverse them to get the objects.
So, why not add a handy get( X ) -> X
That makes no sense right, because you have X already, purist will say.
But now look at it as non purist, and see if you really want this:
Say I make object Y, wich matches the equals of X, so that set.get(Y)->X. Volia, then I can access the data of X that I didn have. Say for example X has a method called get flag() and I want the result of that.
Now look at this code.
Y
X = map.get( Y );
So Y.equals( x ) true!
but..
Y.flag() == X.flag() = false. ( Were not they equals ?)
So, you see, if set allowed you to get the objects like that It surely is to break the basic semantic of the equals. Later you are going to live with little clones of X all claming that they are the same when they are not.
You need a map, to store stuff and use a key to retrieve it.

I understand that only one instance of any object according to .equals() is allowed in a Set and that you shouldn't "need to" get an object from the Set if you already have an equivalent object, but I would still like to have a .get() method that returns the actual instance of the object in the Set (or null) given an equivalent object as a parameter.

The simple interface/API gives more freedom during implementation. For example if Set interface would be reduced just to single contains() method we get a set definition typical for functional programming - it is just a predicate, no objects are actually stored. It is also true for java.util.EnumSet - it contains only a bitmap for each possible value.

It's just an opinion. I believe we need to understand that we have several java class without fields/properties, i.e. only methods. In that case equals cannot be measured by comparing function, one such example is requestHandlers. See the below example of a JAX-RS application. In this context SET makes more sense then any data structure.
#ApplicationPath("/")
public class GlobalEventCollectorApplication extends Application {
#Override
public Set<Class<?>> getClasses() {
Set<Class<?>> classes = new HashSet<Class<?>>();
classes.add(EventReceiverService.class);
classes.add(VirtualNetworkEventSerializer.class);
return classes;
}
}
To answer your question, if you have an shallow-employee object ( i.e. only EMPID, which is used in equals method to determine uniqueness ) , and if you want to get a deep-object by doing a lookup in set, SET is not the data-structure , as its purpose is different.

List is ordered data structure. So it follows the insertion order. Hence the data you put will be available at exact position the time you inserted.
List<Integer> list = new ArrayList<>();
list.add(1);
list.add(2);
list.add(3);
list.get(0); // will return value 1
Remember this as simple array.
Set is un ordered data structure. So it follows no order. The data you insert at certain position will be available any position.
Set<Integer> set = new HashSet<>();
set.add(1);
set.add(2);
set.add(3);
//assume it has get method
set.get(0); // what are you expecting this to return. 1?..
But it will return something else. Hence it does not make any sense to create get method in Set.
**Note****For explanation I used int type, this same is applicable for Object type also.

I think you've answered your own question: it is redundant.
Set provides Set#contains (Object o) which provides the equivalent identity test of your desired Set#get(Object o) and returns a boolean, as would be expected.

Related

Find an a specific instance, what is the best approach

So imagine I have two instances of a class:
public class MyClass {
public void sayHello() {
System.out.println("Hello");
}
}
a = new MyClass();
b = new MyClass();
Now I add those to another object, such as:
public class OtherClass {
private ArrayList<MyClass> myClsList = new ArrayList<>();
public void add(MyClass obj) {
myClsList.add(obj);
}
public void remove(MyClass obj) {
// ????
}
}
c = new OtherClass();
c.add(a);
c.add(b);
Now I want to remove one specific instance e.g
c.remove(a);
Could I just iterate over them and test for equality, I mean this should theoretically work, since the two instances have distinct "internal pointers"?
I guess using a HashMap based approach would be more efficient, but what can I use as an key there (suppose I can't add unique instance ids or something).
EDIT: There is some confusion as to what exactly I'd like to know.
The key here is that I'd like to know if there is any way of removing that specific instance from c's ArrayList or whatever Aggregator Object I might use, just by providing the respective object reference.
I imagine this could be done by just keeping the ArrayList and testing for equality (although I'm not a 100% sure) but it would be cleaner if it was possible without iterating through the whole list.
I'd just like to know if anything of the like is possible in Java. (I know how to workaround it by using additional information but the clue is to just have the respective object reference for filtering/ retrieving purposes.

You can use a.toString(), according to the Java doc,
The toString method for class Object returns a string consisting of
the name of the class of which the object is an instance, the at-sign
character `#', and the unsigned hexadecimal representation of the hash
code of the object.
This should give you an unique identifier for your class instance, hence you can use this as a hash key without storing / creating any extra identifiers.
NB: Be careful with this practice, don't rely on the the value returned by `Object.toString(), as being related to the actual object addres, see detailed explanation here.

While your question is one that many beginners have (including myself), I believe that your concern is not justified in this case. The features you are asking for are already built into the Java language at the specification level.
First of all, let's look at Object.equals(). On the one hand, the Language Specification states that
The method equals defines a notion of object equality, which is based on value, not reference, comparison.
However, the documentation for Object.equals() clearly states that
The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true).
This means that you can safely redirect OtherClass.remove to ArrayList.remove(). Whatever Object.equals is comparing works exactly like a unique ID. In fact, in many (but not all) implementations, it compares the memory addresses to the objects, which are a form of unique ID.
Quite understandably, you do not wish to use linear iteration every time. As it happens, the machinery of Object is perfectly suited for use with something like a HashSet, which, by the way is the solution I recommend you use in this case.
If you are not dealing with some huge data set, we do not need to discuss the optimization of Object.hashCode(). You just need to know that it will implement whatever contract is necessary to work correctly with Object.equals to make HashSet.remove work correctly.
The spec itself only states that
The method hashCode is very useful, together with the method equals, in hashtables such as java.util.Hashmap.
This does not really say much, so we turn to the API reference. The two relevant point are:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
Simply put, the hashCode of equal objects must be the same, but an equal hashCode does not necessarily mean equal objects. Object implements this contract, so you can use it with a HashSet, which is backed by a HashMap.
The one piece of information that is missing to make this a formal argument in favor of not doing any additional work, is why I keep citing the API reference as if it was the language specification. As it happens:
As noted above, this specification often refers to classes of the Java SE platform API. In particular, some classes have a special relationship with the Java programming language. Examples include classes such as Object, Class, ClassLoader, String, Thread, and the classes and interfaces in package java.lang.reflect, among others. This specification constrains the behavior of such classes and interfaces, but does not provide a complete specification for them. The reader is referred to the Java SE platform API documentation.
[emphasis mine], but you get the idea. The Java SE API reference is the language spec as far as the behavior of the methods of Object is concerned.
As an aside, you will probably want to stay away from something like TreeSet, because that will require you to add a bunch of machinery to your implementation. As a minimum, MyClass instances will have to be orderable, either by implementing Comparable, or by assigning a custom Comparator to the Set.
TL;DR
The language specification states that you have at least the following two options available to you with no additional effort on your part:
Make myClsList an ArrayList and use the appropriate add()/remove() methods as you see fit.
Make myClsList a HashSet and use the appropriate add()/remove() methods.
I recommend the second option. In fact, instead of containment, you may consider extending HashSet so you don't have to bother implementing your own add/remove methods.
Final Note
All this works as long as MyClass overrides neither Object.equals nor Object.hashCode. The moment you do that, you put the burden of satisfying contractual requirements entirely on yourself.

Getting the reference to a duplicate in a Set

I have a Set object and I use this set to ensure that when I add an element to it that already exists in the set, it's not added. This is the easy part, just use Set.add(); But after this is done I need the reference to the object in the Set.
What I essentially mean is having a .add() that doesn't return a boolean, but the actual object you tried to add (if it wasn't added, the one in the set). Is there already a Set implementation that does this, or do I have to write my own?
At the moment I used a Set.add() and if it returns false I use an iterator to look for the one in the set. Although this works, I find it ugly. Especially when using the HashSet implementation which should be able to find the object a lot faster using hashcodes. Any ideas?
EDIT: Wow, lots of answers in a relatively short time, thanks. Ok, so what I'm trying to do is create a certain datastructure that loads data from some place and creates objects from it. This data might contain duplicates, and this wouldn't be a problem if I used a set and just needed this one set, but the datastructure needs to add references to these unique objects to other objects in the datastructure, therefore I need the references to the (unique) objects in the set. Also, I can't just not load the data that is already contained in the set, because there is more (unique) data linked to it, which is also added, together with a reference to that data that was already contained in the set. For illustration purposes (because the above explanation is far from clear) I'll give an example here:
Data:
foo bar
1 3
1 4
2 5
Datastructure:
Set<Foo> totalFooSet
Set<Bar> totalBarSet
Foo:
sometype data
Set<Bar> barSet
Bar:
sometype data
Set<Foo> fooSet
This is sort of like a many-to-many relation.
I'm not sure if there is some major design flaw here, I've looked it over with some other people and we can't figure out how to do this differently. I like the idea of using the HashMap, so I'll create a subclass and add an addAndReturn() function to it.

(As #AlexR says, I'm assuming that you want a reference to the previous object equal to the one you are trying to add now)
Instead of using a Set, try using a HashMap with the same object as a key and a value. Then you can do the following:
Foo objectToAdd = //obtained the normal way
Map<Foo,Foo> psuedoSet = //this is stored somewhere
Foo result = psuedoSet.get(objectToAdd);
if (result == null) {
pseudoSet.put(objectToAdd, objectToAdd);
result = objectToAdd;
}
return result;

Similar to Sean's answer (which I upvoted), but possibly more reusable.
public class HashMapBackedSet<T> extends HashMap<T,T>{
public T add( T toAdd ){
T existing = get( toAdd );
if( existing != null ){
return existing;
}
put( toAdd, toAdd );
return toAdd;
}
}

If I understand you correctly, if the element you just tried to add is already contained in the set, you want the instance which is already in the set (which is equal to the one added, but not necessarily identical)?
This behavior is provided by the interners of the Google Guava library:
Interner<Object> interner = Interners.newStrongInterner();
Object objectInSet = interner.intern(otherObject);
Unfortunately, interners do not provide any other methods like iterating over their contained values, so using them as a set replacement may not be possible for you.
Another option would be a HashMap<T, T> where you store a mapping from each object to itself. Then you can get the reference to the already contained object easily by calling get(). If you don't mind that the object is always overridden, just call put() which returns exactly the object you want (the previously stored object).

Set cannot contain duplicate entries. The purpose of set is not to do this.
As far as I understand you want to get reference to the previous object identical to one that you are trying to add now.
You do not have to iterate the set to find this object. Just user oldObject = set.get(newObject).
This operation is as fast as getting array element by index.

Wrap your set in a class that returns the object when you call add?

Java best practices, add to collection before or after object has been modified?

Say you are adding x number of objects to a collection, and after or before adding them to a collection you are modifying the objects attributes. When would you add the element to the collection before or after the object has been modified.
Option A)
public static void addToCollection(List<MyObject> objects) {
MyObject newObject = new MyObject();
objects.add(newObject);
newObject.setMyAttr("ok");
}
Option B)
public static void addToCollection(List<MyObject> objects) {
MyObject newObject = new MyObject();
newObject.setMyAttr("ok");
objects.add(newObject);
}

To be on the safe side, you should modify before adding, unless there is a specific reason you cannot do this, and you know the collection can handle the modification. The example can reasonably be assumed to be safe, since the general List contract does not depend upon object attributes - but that says nothing about specific implementations, which may have additional behavior that depends upon the object's value.
TreeSet, and Maps in general do no tolerate modifying objects after they have been inserted, because the structure of the collection is dependent upon the attributes of the object. For trees, any attributes used by the comparator cannot be changed once the item has been added. For maps, it's the hashCode that must remain constant.
So, in general, modify first, and then add. This becomes even more important with concurrent collections, since adding first can lead to other collection users seeing an object before it been assigned it's final state.

The example you provided won't have any issues because you're using a List collection which doesn't care about the Object contents.
If you were using something like TreeMap which internally sorts the contents of the Object keys it stores it could cause the Collection to get into an unexpected state. Again this depends on if the equals method uses the attribute you're changing to compare.
The safest way is to modify the object before placing it into the collection.

One of the good design rules to follow, is not to expose half-constructed object to a 3rd party subsystem.
So, according to this rule, initialize your object to the best of your abilities and then add it to the list.
If objects is an ArrayList then the net result is probably the same, however imaging if objects is a special flavor of List that fires some kind of notification event every time a new object is added to it, then the order will matter greatly.

In my opinion its depend of the settted attribure and tyle of collection, if the collection is a Set and the attribute have infulance on the method equal or hascode then definitely i will set this property before this refer also to sorterd list etc. in other cases this is irrelevant. But for this exapmle where object is created i will first set the atributes than add to collection because the code is better organized.

I think either way it's the same, personally I like B, :)

It really does boil down to what the situation requires. Functionally there's no difference.
One thing you should be careful with, is being sure you have the correct handle to the object you want to modify.

Certainly in this instance, modifying the object is part of the "create the object" thought, and so should be grouped with the constructor as such. After you "create the object" you "add it to the collection". Thus, I would do B, and maybe even add a blank line after the modification to give more emphasis on the two separate thoughts.

Duplicate values in the Set collection?

Is it possible to allow duplicate values in the Set collection?
Is there any way to make the elements unique and have some copies of them?
Is there any functions for Set collection for having duplicate values in it?

Ever considered using a java.util.List instead?
Otherwise I would recommend a Multiset from Google Guava (the successor to Google Collections, which this answer originally recommended -ed.).

The very definition of a Set disallows duplicates. I think perhaps you want to use another data structure, like a List, which will allow dups.
Is there any way to make the elements unique and have some copies of them?
If for some reason you really do need to store duplicates in a set, you'll either need to wrap them in some kind of holder object, or else override equals() and hashCode() of your model objects so that they do not evaluate as equivalent (and even that will fail if you are trying to store references to the same physical object multiple times).
I think you need to re-evaluate what you are trying to accomplish here, or at least explain it more clearly to us.

From the javadocs:
"sets contain no pair of elements e1
and e2 such that e1.equals(e2), and at
most one null element"
So if your objects were to override .equals() so that it would return different values for whatever objects you intend on storing, then you could store them separately in a Set (you should also override hashcode() as well).
However, the very definition of a Set in Java is,
"A collection that contains no
duplicate elements. "
So you're really better off using a List or something else here. Perhaps a Map, if you'd like to store duplicate values based on different keys.

Sun's view on "bags" (AKA multisets):
We are extremely sympathetic to the desire for type-safe collections. Rather than adding a "band-aid" to the framework that enforces type-safety in an ad hoc fashion, the framework has been designed to mesh with all of the parameterized-types proposals currently being discussed. In the event that parameterized types are added to the language, the entire collections framework will support compile-time type-safe usage, with no need for explicit casts. Unfortunately, this won't happen in the the 1.2 release. In the meantime, people who desire runtime type safety can implement their own gating functions in "wrapper" collections surrounding JDK collections.
(source; note it is old and possibly obsolete -ed.)
Apart from Google's collections API, you can use Apache Commons Collections.
Apache Commons Collections:
http://commons.apache.org/collections/
Javadoc for Bag

I don't believe that you can have duplicate values within a set. A set is defined as a collection of unique values. You may be better off using an ArrayList.

These sound like interview questions, so I'll answer them like interview questions...
Is it possible to allow duplicate values in the Set collection?
Yes, but it requires that the person implementing the Set violate the design contract upon which Set is built. Basically, I could write a class that extends Set and doesn't enforce Set's promises.
In addition, other violations are possible. I could use a Set implementation that relies upon Java's hashCode() contract. Then if I provided an Object that violates Java's hashcode contract, I might be able to place two objects into the set which are equal, but yeild different hashcodes (because they might not be checked in equality against each other due to being in different hash bucket chains.
Is there any way to make the elements unique and have some copies of them?
It basically depends on how you define uniqueness. If an object's uniqueness is determined by its value, then one can have multiple copies of the same unique object; however, if the object's uniqueness is determined by its instance, then by definition it would not be possible to have multiple copies of the same object. You could however have multiple references to them.
Is there any functions for Set collection for having duplicate values in it?
The Set interface doesn't have any functions for detecting / reporting duplicates; however, it is based on the Collections interface, which has to support the List interface, so it is possible to pass duplicates into a Set; however, a properly implemented Set will just ignore the duplicates, and present one copy of every element determined to be unique.

I don't think so. The only way would be to use a List. You can also trick with function equals(), hashcode() or compareTo() but it is going to be ankward.

NO chance.... you can not have duplicate values in SET interface...
If you want duplicates then you can try Array-List

As mentioned choose the right collection for the task and likely a List will be what you need. Messing with the equals(), hashcode() or compareTo() to break identity is generally a bad idea simply to wedge an instance into the wrong collection to start with. Worse yet it may break code in other areas of the application that depend on these methods producing valid comparison results and be very difficult to debug or track down such errors.

This question was asked to me also in an interview. I think the answer is, ofcourse Set will not allow duplicate elements and instead ArrayList or other collections should be used for the same, however overriding equals() for the type of the object being stored in the set will allow you to manipulate on the comparison logic. And hence you may be able to store duplicate elements in the Set. Its more of a hack, which would allow non-unique elements in the Set and ofcourse is not recommended in production level code.

You can do so by overriding hashcode as given below:
public class Test
{
static int a=0;
#Override
public int hashCode()
{
a++;
return a;
}
public static void main(String[] args)
{
Set<Test> s=new HashSet<Test>();
Test t1=new Test();
Test t2=t1;
s.add(t1);
s.add(t2);
System.out.println(s);
System.out.println("--Done--");
}
}

Well, In this case we are trying to break the purpose of specific collection. If we want to allow duplicate records simply use list or multimap.

Set will store unique values and if you wants to store duplicate values then for list,but still if you want duplicate values in set then create set of ArrayList so that you can put duplicate elements into it.
Set<ArrayList> s = new HashSet<ArrayList>();
ArrayList<String> arr = new ArrayList<String>();
arr.add("First");
arr.add("Second");
arr.add("Third");
arr.add("Fourth");
arr.add("First");
s.add(arr);

You can use Tree Map instead :
Key can be used as element you wish to store
and Value will be the frequency of input element.
The insertion and removal will require custom handling.
Insertion : Check if the map already contains the element , if yes then increment its frequency. O(log N)
Removal : if the element's frequency is 1 then remove it , else decrease frequency by 1. O(log N)
More details can be found in the java docs of tree map
Overall time complexity will remain same as TreeSet O(log N) but worse than a HashSet O(1)
firstEntry() -> provides smallest element entry, Time Complexity : O(Log N)
lastEntry() -> provides greatest element entry, Time Complexity : O(Log N)

public class SET {
public static void main(String[] args) {
Set set=new HashSet();
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
Iterator it=set.iterator();
while(it.hasNext()){
Object o=it.next();
System.out.println(o);
}
}
}
public class AB{
int id;
String email;
public AB() {
System.out.println("DC");
}
AB(int id,String email){
this.id=id;
this.email=email;
}
#Override public String toString() {
// TODO Auto-generated method stub return ""+id+"\t"+email;}
}
}

Use a Java hash map even when there is no "mapping"?

I want to store some objects and then be able to retrieve them later as efficiently as possible. I will also remove some of them under certain conditions. It seems a hash map would be the right choice.
But, from what I've seen, hash maps always associate a value with another? For example, "john" and "555-5555", his phone number.
Now, my situation. Suppose I have a bunch of people, and each person is connected to other people. So, I need each person to store its contacts.
What I'm doing is have each person have a hashmap, and then I'd add to the hash otherPerson, otherPerson. Basically, the key is the value. Am I doing it wrong?
EDIT I don't think the HashSet would solve my problem because I have to retrieve the value to update it and there is no get method. Remove returns a boolean, so I can't even remove it to put it back again, which would probably be a bad idea anyway.

If all you need is checking if A is one of B's contacts, then Set is choice. It has contains() for that purpose.
Otherwise, the most suitable might be Map, as you need efficient retrieval operation. You said currently you use same object as key and value, but I'm not sure how you get the the key in the first place. Say you'd like to get contact A from B's contacts, and you use something like 'B.contacts.get(A)', where do you get A from? If you already have A, what's for to get it from the map again? (maybe there are multiple instances of the same person?)
Unless there are multiple instances of the same person, I'd say for each Person, define a ID like unique attribute, and use that as the key for the contacts map. Also, do you define equal()/hashCode() for person class? Map/Set uses hashCode() and equal() for finding the match. Depending on your usage, you might need to consider rewrite them for efficiency.

I don't think the HashSet would solve my problem because I have to retrieve the value to update it and there is no get method.
This is a puzzling statement. Why would you want to retrieve a value using a get method to update it? Surely, if you know which object you need to retrieve from the set/map, you don't need to retrieve it.
For example:
HashSet<Person> relations = ...
Person p = ...
if (relations.remove(p)) {
// we removed an object such that p.equals(obj) is true.
}
Now if you are worried that the object that was removed was equal to, but not identical to p, it seems to me that something is wrong with your design. Either:
you should not be creating multiple Person instances that are equal, or
you should not be caring that Person instances are not identical, or
you should not have overridden equals(Object).
In short, the problem is that you are not managing object identity properly.

Well, the data structure you'd be looking for here, would be a HashSet (or some other kind of set), I think (if your framework/library offers it). A set just says "I have the following items" instead of "I have the following items mapped to the following values". Which would be what you're modeling here.
As for HashSet vs. other implementations (if present): That all depends on what you're doing. If you need fast lookup, i. e. "is this element in the set?" questions, then hashing is a good thing. Other underlying data structures are perhaps better optimized for other set operations, such as union, intersection, etc.

A hash table/map simply requires that you have a way to get the values you're interested in looking up later; that's what the key is for.
However, in your specific case, it sounds like you're looking for a way to store relationships between people, and what you're keeping track of is whether or not person A has a relationship with person B. A better representation for that sort of thing is an adjacency list.

Am I missing something or don't you simply need an ArrayList<Person>?

I would just store the contacts in a List<Person>. E.g.
public class Person {
private List<Person> contacts;
}
With regard to editing the individual contact, it is really not the parent Person's responsibility to do that. It should at highest add/remove contacts. You can perfectly do that by contacts.add(otherPerson) or contacts.remove(otherPerson).
When you want to edit an individual Person, which may be one of the contacts, just get a handle to it independently, e.g. personDAO.find(personId) and then update it accordingly. It's actually also the Person's own responsibility to edit own details. With a good ORM under the hood, the changes will be reflected in the contact list of other Persons.

If you need to iterate through the people, or require them to have ordering, consider TreeMap or TreeSet instead of hashing.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.