Defining java collections in basic English [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Can someone give a very brief definition of what a collection is in java as if you were describing it to someone with little or no programming experience?
And also defining some types of collections like sets, lists and maps?
Thanks

A bag, with different restrictions.
List = Allowing duplicates. Ordered, items have a position(index).
Set = Not allowing duplicates
Map = Each item has a key which you can use to fetch it from the bag
easily instead of having to look through the bag.
If you want more information just Google it, it is not that hard.

I would describe collections as objects that can hold multiple other objects. Collections can be ordered and unordered. They may or may not allow duplicates. The differences between collections are as follows:
Lists hold objects with an order; duplicates are allowed.
Hash Sets hold objects without an order; duplicates are prohibited.
Linked hash sets hold objects with a predictable iteration order; duplicates are prohibited.
Tree sets hold objects with user-specified ordering; duplicates are prohibited.
Hash maps keep pairs of key and value objects, with no ordering and no duplicates on keys.
Linked hash maps keep pairs of key and value objects, with predictable ordering and no duplicates on keys.
Tree maps keep pairs of key and value objects, with user-specified ordering among keys and no duplicates on keys.
Maps allow access to values given a value of its corresponding key.

Collection
A Collection is a bunch of related elements. These items may or may not be ordered. There may or may not be duplicate elements.
This is the most abstracted data structure. Anytime you have a bunch of related elements in one place, its a Collection.
List
A List is an ordered Collection. A List can contain duplicates.
Set
A Set is an unordered Collection. A Set only contains distinct elements, meaning it cannot contain duplicates.
Map
A Map is an unordered Collection where one element is used to access another element. Think of a math expression like, y=x. That would yield a slope where every value on the y axis is mapped to a value on the x axis. A map can't contain duplicate keys. In my previous example, the values on the y-axis would be keys.

It's going to be difficult to define collections without using the word in its own definition, but I'll give it a try.
In programming, variables refer to one thing at a time (or nothing). For example,
int x = 5;
char letter = 'g';
Even with objects, variables still point to only one object at a time (or null):
Object o = new Object();
When you need to represent multiple "things", you can use a collection, which holds multiple things in it.
Collection<Integer> numbers = new ArrayList<Integer>();
numbers.add(2);
numbers.add(3);
numbers now refers to 2 and 3.
There are different types of collections built in to the Java libraries: lists, sets and maps.
A List is a collection which holds items in the order in which they are added. You can reference an item, or element, in the list by its position which corresponds to where you inserted it.
A Set is a collection which holds unique items in no particular order. There are types of sets which are sorted (called SortedSet, imagine that!) by some criteria, like sorting numbers numerically. Some sets have no predictable order (like a HashSet).
A Map is a collection which stores key/value pairs, like "x = 5" and "y = 10". You can retrieve an element by its key which was used when inserting it. Maps are also generally unsorted, except for certain types. Keys must be unique. For example:
Map<String, Integer> mapping = new HashMap<String, Integer>();
mapping.put("first key", 20);
mapping.put("second key", 80);
mapping.put("whoa this is a key too!", 8);
System.out.println(mapping.get("whoa this is a key too!")); // prints 8
For more details, see the java docs and look for Collection:
http://docs.oracle.com/javase/6/docs/api/

A Java Collection is just a "collection" (guess what) of objects of a certain type. You can add, remove and search items in the collection &co. Additionally, a collection has some properties, which varies among the different interfaces:
List
A list is an ordered collection, where each element in the collection has a number associated, 0 (included) to n (excluded), where n is the number of elements in the collection.
That number is called the "key", because you can get an element by specifying that key.
This is the "basic" type of collection, holding a row of elements together.
Map
A map is just like a list, but it's not ordered and the elements can have any object as key (for example, a String), not just an integer. This is great if the key has a meaning for you.
Set
A set is like a list, but unordered and without keys (you can't ask for "element number 3"). Additionally, it does not allow for the same element to be included twice, just like a set in Maths.

Related

Why linkedlist required when hash collision occur and Hashmap does not allow duplicate elements?

Why is a linkedlist required when hash collision occurs and HashMap does not allow duplicate elements? I was trying to understand following points in HashMap:
HashMap does not give order of elements. But following elements I am getting insertion order then LinkedHashMap is different with HashMap.
Map<String, Integer> ht2=new HashMap<String, Integer>();
ht2.put("A", 20);
ht2.put("B", 10);
ht2.put("C", 30);
ht2.put("D", 50);
ht2.put("E", 40);
ht2.put("F", 60);
ht2.put("G", 70);
for(Entry e:ht2.entrySet())
{
System.out.println(e.getKey() +"<<<key HashMap value>>>"+e.getValue());
}
HashMap does not allow duplicate keys , Yes I can get expected output. When we are storing object as a key we have to overwrite the equal method based on attribute, so same object or same object information will not be duplicate. So every bucket will have only one entry if entry same previous one will overwrite. I am not understanding how multiple entry are coming in a same bucket when collision occur it is overwriting the previous value. Why linked list is required here when duplicate are not allowing here. Please look into below example.
HashMap<Employee, Integer> hashMap = new HashMap<Employee, Integer>(4);
for (int i = 0; i < 100; i++) {
Employee myKey = new Employee(i,"XYZ",new Date());
hashMap.put(myKey, i);
}
System.out.println("myKey Size ::"+hashMap.size());
Here I am creating 100 Employee object so 100 buckets are created. I can see when hashcode value printed different value. So how linked list are coming here and how multiple entry are going in to same bucket.
There is a different between the number of buckets and the number of entries in the HashMap.
Two keys of the HashMap may have the same hashCode, even if they are not equal to each other, which means both of them will be stored in the same bucket. Therefore the linked list (or some other structure that can hold multiple entries) is required.
Even two keys having different hashCode may be stored in the same bucket, since the number of buckets is much smaller than the number of possible hashCode values. For example, if the HashMap has 16 buckets, keys with hashCode 0 and 16 will be mapped to the same bucket. Therefore the bucket must be able to hold multiple entries.
The first part of your question is not clear. If you meant to ask why you see different iteration order in HashMap vs. LinkedHashMap, the reason is HashMap doesn't maintain insertion order, and LinkedHashMap does maintain insertion order. If for some input you are seeing an iteration order matching the insertion order in HashMap, that's just coincidence (depending on the buckets that the inserted keys happen to be mapped to).
When a HashMap collision occurs, like you said in your question the .equals is involved. The linked list is used like this:
If a collision occurs and the .equals returns true, then the old reference (if the references are not identical, of course) is replaced by the new one
If the .equals() returns false against the existing value and only one object is in the current bucket, the HashMap inserts it to a linked list at index 0. Note that in java's standard HashMap implementation, the entries into this linked list are entirely internal, that is, you wouldn't even be able to access the list under normal circumstances
If there is more than one entry in the current bucket, it continues down the list until it finds a case where .equals() returns true on the existing object in the list and replaces, or it reaches the end of the list/bucket, in which case step 2 occurs
So you technically don't have to worry about the list, just make sure that your .hashcode minimizes the amount of collisions

Which Java Collection should I use?

In this question How can I efficiently select a Standard Library container in C++11? is a handy flow chart to use when choosing C++ collections.
I thought that this was a useful resource for people who are not sure which collection they should be using so I tried to find a similar flow chart for Java and was not able to do so.
What resources and "cheat sheets" are available to help people choose the right Collection to use when programming in Java? How do people know what List, Set and Map implementations they should use?
Since I couldn't find a similar flowchart I decided to make one myself.
This flow chart does not try and cover things like synchronized access, thread safety etc or the legacy collections, but it does cover the 3 standard Sets, 3 standard Maps and 2 standard Lists.
This image was created for this answer and is licensed under a Creative Commons Attribution 4.0 International License. The simplest attribution is by linking to either this question or this answer.
Other resources
Probably the most useful other reference is the following page from the oracle documentation which describes each Collection.
HashSet vs TreeSet
There is a detailed discussion of when to use HashSet or TreeSet here:
Hashset vs Treeset
ArrayList vs LinkedList
Detailed discussion: When to use LinkedList over ArrayList?
Summary of the major non-concurrent, non-synchronized collections
Collection: An interface representing an unordered "bag" of items, called "elements". The "next" element is undefined (random).
Set: An interface representing a Collection with no duplicates.
HashSet: A Set backed by a Hashtable. Fastest and smallest memory usage, when ordering is unimportant.
LinkedHashSet: A HashSet with the addition of a linked list to associate elements in insertion order. The "next" element is the next-most-recently inserted element.
TreeSet: A Set where elements are ordered by a Comparator (typically natural ordering). Slowest and largest memory usage, but necessary for comparator-based ordering.
EnumSet: An extremely fast and efficient Set customized for a single enum type.
List: An interface representing a Collection whose elements are ordered and each have a numeric index representing its position, where zero is the first element, and (length - 1) is the last.
ArrayList: A List backed by an array, where the array has a length (called "capacity") that is at least as large as the number of elements (the list's "size"). When size exceeds capacity (when the (capacity + 1)-th element is added), the array is recreated with a new capacity of (new length * 1.5)--this recreation is fast, since it uses System.arrayCopy(). Deleting and inserting/adding elements requires all neighboring elements (to the right) be shifted into or out of that space. Accessing any element is fast, as it only requires the calculation (element-zero-address + desired-index * element-size) to find it's location. In most situations, an ArrayList is preferred over a LinkedList.
LinkedList: A List backed by a set of objects, each linked to its "previous" and "next" neighbors. A LinkedList is also a Queue and Deque. Accessing elements is done starting at the first or last element, and traversing until the desired index is reached. Insertion and deletion, once the desired index is reached via traversal is a trivial matter of re-mapping only the immediate-neighbor links to point to the new element or bypass the now-deleted element.
Map: An interface representing an Collection where each element has an identifying "key"--each element is a key-value pair.
HashMap: A Map where keys are unordered, and backed by a Hashtable.
LinkedhashMap: Keys are ordered by insertion order.
TreeMap: A Map where keys are ordered by a Comparator (typically natural ordering).
Queue: An interface that represents a Collection where elements are, typically, added to one end, and removed from the other (FIFO: first-in, first-out).
Stack: An interface that represents a Collection where elements are, typically, both added (pushed) and removed (popped) from the same end (LIFO: last-in, first-out).
Deque: Short for "double ended queue", usually pronounced "deck". A linked list that is typically only added to and read from either end (not the middle).
Basic collection diagrams:
Comparing the insertion of an element with an ArrayList and LinkedList:
Even simpler picture is here. Intentionally simplified!
Collection is anything holding data called "elements" (of the same type). Nothing more specific is assumed.
List is an indexed collection of data where each element has an index. Something like the array, but more flexible.
Data in the list keep the order of insertion.
Typical operation: get the n-th element.
Set is a bag of elements, each elements just once (the elements are distinguished using their equals() method.
Data in the set are stored mostly just to know what data are there.
Typical operation: tell if an element is present in the list.
Map is something like the List, but instead of accessing the elements by their integer index, you access them by their key, which is any object. Like the array in PHP :)
Data in Map are searchable by their key.
Typical operation: get an element by its ID (where ID is of any type, not only int as in case of List).
The differences
Set vs. Map: in Set you search data by themselves, whilst in Map by their key.
N.B. The standard library Sets are indeed implemented exactly like this: a map where the keys are the Set elements themselves, and with a dummy value.
List vs. Map: in List you access elements by their int index (position in List), whilst in Map by their key which os of any type (typically: ID)
List vs. Set: in List the elements are bound by their position and can be duplicate, whilst in Set the elements are just "present" (or not present) and are unique (in the meaning of equals(), or compareTo() for SortedSet)
It is simple: if you need to store values with keys mapped to them go for the Map interface, otherwise use List for values which may be duplicated and finally use the Set interface if you don’t want duplicated values in your collection.
Here is the complete explanation http://javatutorial.net/choose-the-right-java-collection , including flowchart etc
Map
If choosing a Map, I made this table summarizing the features of each of the ten implementations bundled with Java 11.
Common collections, Common collections

How to detect duplicate Lists in Map<String,List<String>>

I have a Map of the form Map<String,List<String>>. The key is a document number, the List a list of terms that match some criteria and were found in the document.
In order to detect duplicate documents I would like to know if any two of the List<String> have exactly the same elements (this includes duplicate values).
The List<String> is sorted so I can loop over the map and first check List.size(). For any two lists
that are same size I would then have to compare the two lists with List.equals().
The Map and associated lists will never be very large, so even though this brute force approach will not scale well it
will suffice. But I was wondering if there is a better way. A way that does not involve so much
explicit looping and a way that will not produce an combinatorial explosion if the Map and/or Lists get a lot larger.
In the end all I need is a yes/no answer to the question: are any of the lists identical?
You can add the lists to a set data structure one by one. Happily the add method will tell you if an equal list is already present in the set:
HashSet<List<String>> set = new HashSet<List<String>>();
for (List<String> list : yourMap.values()) {
if (!set.add(list)) {
System.out.println("Found a duplicate!");
break;
}
}
This algorithm will find if there is a duplicate list in O(N) time, where N is the total number of characters in the lists of strings. This is quite a bit better than comparing every pair of lists, as for n lists there are n(n-1)/2 pairs to compare.
Use Map.containsValue(). Won't be more efficient than what you describe, but code will be cleaner. Link -> http://docs.oracle.com/javase/7/docs/api/java/util/Map.html#containsValue%28java.lang.Object%29
Also, depending on WHY exactly you're doing this, might be worth looking into this interface -> http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/BiMap.html
Not sure if it's a better way, but a cleaner way would be to create an object that implements Comparable and which holds one of your List. You could implement hashcode() and equals() as you describe above and change your map to contain instances of this class instead of the Lists directly.
You could then use HashSet to efficiently discover which lists are equal. Or you can add the values collection of the map to the HashSet and compare the size of the hashset to the size of the Map.
From the JavaDoc of 'List.equals(Object o)':
Compares the specified object with this list for equality. Returns
true if and only if the specified object is also a list, both lists
have the same size, and all corresponding pairs of elements in the two
lists are equal. (Two elements e1 and e2 are equal if (e1==null ?
e2==null : e1.equals(e2)).) In other words, two lists are defined to
be equal if they contain the same elements in the same order. This
definition ensures that the equals method works properly across
different implementations of the List interface.
This leads me to believe that it is doing the same thing you are proposing: Check to make sure both sides are a List, then compare the sizes, then check each pair. I wouldn't re-invent the wheel there.
You could use hashCode() instead, but the JavaDoc there seems to indicate it's looping as well:
Returns the hash code value for this list. The hash code of a list is
defined to be the result of the following calculation:
int hashCode = 1;
Iterator<E> i = list.iterator();
while (i.hasNext()) {
E obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
So, I don't think you are saving any time. You could, however, write a custom List that calculates the hash as items are put in. Then you negate the cost of doing looping.

How list differ from map?

In java, List and Map are using in collections. But i couldn't understand at which situations we should use List and which time use Map. What is the major difference between both of them?
Now would be a good time to read the Java collections tutorial - but fundamentally, a list is an ordered sequence of elements which you can access by index, and a map is a usually unordered mapping from keys to values. (Some maps preserve insertion order, but that's implementation-specific.)
It's usually fairly obvious when you want a key/value mapping and when you just want a collection of elements. It becomes less clear if the key is part of the value, but you want to be able to get at an item by that key efficiently. That's still a good use case for a map, even though in some senses you don't have a separate collection of keys.
There's also Set, which is a (usually unordered) collection of distinct elements.
Map is for Key:Value pair kind of data.for instance if you want to map student roll numbers to their names.
List is for simple ordered collection of elements which allow duplicates.
for instance to represent list of student names.
Map Interface
A Map cares about unique identifiers. You map a unique key (the ID) to a specific
value, where both the key and the value are, of course, objects.
The Map implementations let you do things like search for a
value based on the key, ask for a collection of just the values, or ask for a collection
of just the keys. Like Sets, Maps rely on the equals() method to determine whether
two keys are the same or different.
List Interface
A List cares about the index. The one thing that List has that non-lists don't have
is a set of methods related to the index. Those key methods include things like
get(int index), indexOf(Object o), add(int index, Object obj), and so
on. All three List implementations are ordered by index position—a position that
you determine either by setting an object at a specific index or by adding it without
specifying position, in which case the object is added to the end.
list is a linked list, where every object is connected to the next one via pointers. the time it takes to insert a new object to the list is O(1) but the rest of operations on it take longer.
the good thing about it is that it takes exactly the amount of memory you need and not even on byte more than that.
Maps are a data structure that has an array and each entry to the array is calculated with a hashFunction(key) that calculates the location according to the key. almost every operation in a Map taks O(1) (except inserting when there are 2 identical keys) but the space complexity is fairly large.
for more reading try wikipedia's HashMap and linked list
HashList is a data structure storing objects in a hash table and a list.it is a combination of hashmap and doubly linked list. acess will be faster. HashMap is hash table implementation of map interface it is same as HashTable except that it is unsynchronized and allow null values. List is an ordered collection and it allow nulls and duplicates in it. positional acess is possible. Set is a collection that doesn't allow duplicates, it may allow at most one null element. same as our mathematical set.
List is just an ordered collectiom(a sequence). Check this list documentation .You can access elements by their integer index (position in the list), and search for elements in the list.
Also lists allow duplicate elements and multiple NULL elements.
Map is an object that maps the values to the keys. Check this map documentation. A map cannot contain duplicate keys; each key can map to at most one value.
List - This datastructure is used to contain list of elements.
In case you need list of elements and the list may contain duplicate values,
then you have to use List.
Map - It contains data as key value pair. When you have to store data
in key value pair,so that latter you can retrieve data using the key,
you have to use Map data structure.
List implementation - ArrayList, LinkedList
Map implementation - HashMap, TreeMap
In comparison HashMap to ArrayList -
A hash map is the fastest data structure if you want to get all nodes for a page. The list of nodes can be fetched in constant time (O(1)) while with lists the time is O(n) (n=number of pages, faster on sorted lists but never getting near O(1))

When to use HashMap over LinkedList or ArrayList and vice-versa

What is the reason why we cannot always use a HashMap, even though it is much more efficient than ArrayList or LinkedList in add,remove operations, also irrespective of the number of the elements.
I googled it and found some reasons, but there was always a workaround for using HashMap, with advantages still alive.
Lists represent a sequential ordering of elements.
Maps are used to represent a collection of key / value pairs.
While you could use a map as a list, there are some definite downsides of doing so.
Maintaining order:
A list by definition is ordered. You add items and then you are able to iterate back through the list in the order that you inserted the items. When you add items to a HashMap, you are not guaranteed to retrieve the items in the same order you put them in. There are subclasses of HashMap like LinkedHashMap that will maintain the order, but in general order is not guaranteed with a Map.
Key/Value semantics:
The purpose of a map is to store items based on a key that can be used to retrieve the item at a later point. Similar functionality can only be achieved with a list in the limited case where the key happens to be the position in the list.
Code readability
Consider the following examples.
// Adding to a List
list.add(myObject); // adds to the end of the list
map.put(myKey, myObject); // sure, you can do this, but what is myKey?
map.put("1", myObject); // you could use the position as a key but why?
// Iterating through the items
for (Object o : myList) // nice and easy
for (Object o : myMap.values()) // more code and the order is not guaranteed
Collection functionality
Some great utility functions are available for lists via the Collections class. For example ...
// Randomize the list
Collections.shuffle(myList);
// Sort the list
Collections.sort(myList, myComparator);
Lists and Maps are different data structures. Maps are used for when you want to associate a key with a value and Lists are an ordered collection.
Map is an interface in the Java Collection Framework and a HashMap is one implementation of the Map interface. HashMap are efficient for locating a value based on a key and inserting and deleting values based on a key. The entries of a HashMap are not ordered.
ArrayList and LinkedList are an implementation of the List interface. LinkedList provides sequential access and is generally more efficient at inserting and deleting elements in the list, however, it is it less efficient at accessing elements in a list. ArrayList provides random access and is more efficient at accessing elements but is generally slower at inserting and deleting elements.
I will put here some real case examples and scenarios when to use one or another, it might be of help for somebody else:
HashMap
When you have to use cache in your application. Redis and membase are some type of extended HashMap. (Doesn't matter the order of the elements, you need quick ( O(1) ) read access (a value), using a key).
LinkedList
When the order is important (they are ordered as they were added to the LinkedList), the number of elements are unknown (don't waste memory allocation) and you require quick insertion time ( O(1) ). A list of to-do items that can be listed sequentially as they are added is a good example.
The downfall of ArrayList and LinkedList is that when iterating through them, depending on the search algorithm, the time it takes to find an item grows with the size of the list.
The beauty of hashing is that although you sacrifice some extra time searching for the element, the time taken does not grow with the size of the map. This is because the HashMap finds information by converting the element you are searching for, directly into the index, so it can make the jump.
Long story short...
LinkedList: Consumes a little more memory than ArrayList, low cost for insertions(add & remove)
ArrayList: Consumes low memory, but similar to LinkedList, and takes extra time to search when large.
HashMap: Can perform a jump to the value, making the search time constant for large maps. Consumes more memory and takes longer to find the value than small lists.

Categories

Resources