I want to know if java has any collection that can help me with minheap and maxheap implementation.
I know I can use PriortyQueue data structure to implement maxheap.
Can we use same for minheap? If yes, How?
Thanks,
Manan
I think you have it backwards: a heap is a way of implementing a priority queue. As for the min / max part, simply write the appropriate Comparator classes.
You have your levels of abstraction a bit backwards on this.
A heap is a tree like ( note, it doesn't actually have to have a tree, but rather a method of relating points of data to their "children" ( see below for more )) data-structure which has very specific relation between nodes and their children.
A priority queue, in contrast, is a more abstract idea. It is a queue ( list like data structure with a FIFO data access model ) but rather than being actually FIFO it rather returns the object of greatest priority first.
Just like how, in a heap, there are different ways of implementing the underlying structure, so too in a priority queue one could do many different things. The benefit to using a heap as the underlying structure of the priority queue is that you don't have to do any more work. Just heap the values based on their priority and when someone requests a value return the head.
The major difference is that the heap is defined by it's tree like property and the heap constraint where as a priority queue is only defined by how it interacts with others.
// A note on creating heaps
// Given that A is an array of n values indexed from 1 to n
// We can model a tree like structure buy stating that for any
// value i in (1,n) it's children are 2 * i and 2 * i + 1
// So with an array you can easily model a heap in this manner
Related
Recently, in an interview I was asked, what exactly is a bucket in hashmap? Whether it is an array or a arraylist or what?
I got confused. I know hashmaps are backed by arrays. So can I say that bucket is an array with a capacity of 16 in the start storing hashcodes and to which linked lists have their start pointer ?
I know how a hashmap internally works, just wanted to know what exactly is a bucket in terms of data structures.
No, a bucket is each element in the array you are referring to. In earlier Java versions, each bucket contained a linked list of Map entries. In new Java versions, each bucket contains either a tree structure of entries or a linked list of entries.
From the implementation notes in Java 8:
/*
* Implementation notes.
*
* This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node). Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
...
I hope this may help you to understand the implementation of hash map well.
Buckets exactly is an array of Nodes. So single bucket is an instance of class java.util.HashMap.Node. Each Node is a data structure similar to LinkedList, or may be like a TreeMap (since Java 8), HashMap decides itself what is better for performance--keep buckets as LinkedList or TreeMap. TreeMap will be only chosen in case of poorly designed hashCode() function, when lots of entries will be placed in single bucket.
See how buckets look like in HashMap:
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
Hashmap Bucket is where multiple nodes can store and nodes where hashmap object store based on index calculation and every nodes connected based on linkedlist architecture.
Buckets are basically a data structure that is being used in the Paging algorithm of the Operating System . To be in a very Laymans language.
The objects representing a particular hashcode is being stored in that bucket.(basically you can consider the header of the linked list data structure to be the hashcode value which is represented in the terms of bucket)
The references of the object is being stored in the link list , whose header represents the value of the Hashcode.
The JVM creates them and the size, depends upon the memory being allocated by the JVM.
I am using Java's Linkedlist in my project. I have to build a delete function that removes an element with a specified unique id (id is a filed in my class) in the Linkedlist. As per the Java official document, were I to use LinkedList.remove, the runtime would be O(n) as the process happens in two steps, the first of which is a linear search with a runtime of O(n) followed by the actual delete which takes O(1).
In an attempt to speed things up, I wanted to use a binary tree for lookup, where each node in the tree is (id, reference to the node in the linkedlist). I am not exactly sure how to implement this in Java. In C/C++, one could just store a pointer as reference to the node in the linkedlist.
==
If you are wondering why I have to use LinkedList, it's because I am building an order-matching engine for exchanges. LinkedList offers superior runtime as far as insert is concerned. I am also using insertion sort to keep prices in the orderbook sorted. Priority queue does not suit my needs because I have to show the sorted order book in real time.
Have you seen the video of Stroustrup's conference talk where he showed that you should use std::vector unless you have measured a performance benefit of not using std::vector? He showed that std::vector is almost always the correct collection to use, and showed that it is faster than linked list even when inserting and deleting in the middle.
Now translate that to Java: use ArrayList unless you have measured better performance with something else.
Why is that? With modern processor architectures, there is a locality benefit: elements that you compare together, elements that you process together, are all stored next to each other in memory and are likely to be in the CPU's cache at the same time. This allows them to be fetched and written to much faster than when they're in main memory. This is not the case with a linked list, where elements are allocated individually and spread all over the place. (This locality benefit is much more pronounced in C++ where you have the actual objects next to each other, but it's still valid to a smaller extent in Java, where you have the references next to each other, albeit not the actual objects.)
Now with ArrayList, you can keep the orders sorted by price, and use binary search to insert an order in the right place.
If your performance measurement shows that LinkedList is preferable, then unfortunately Java doesn't give you access to the internal representation – the actual nodes – of the LinkedList, so you'll have to homebrew your own list.
Why are you using a List?
If you have a unique id for each object, why not put it in a Map with the id as the key? If you choose a HashMap is implementation removal is O(1). If you implement using LinkedHashMap you can preserve insertion order as well.
LinkedList insertion is superior to....what?
HashMap get/put complexity
You can easily solve this by having a small change.
First have an object that has your value and id as fields
class MyElement implements Comparable{
int id,value;
//Implement compareTo() to sort based on values
//Override equals() method to compare ids
//Override hashcode() to return the id
}
Now use a TreeSet to store these objects.
In this data structure the incoming objects get sorted and deletion and insertion also find lower time complexity of O(log n)
To preserve order by id and to get good performance use TreeMap. Put, remove and get operations will be O(log n).
EDIT:
For preserving order of insertion of elements for each id you can use TreeMap < Integer, ArrayList < T > >, i.e. for each id you can save elements with particular id in list in order of insertion.
This is straight from the Java Docs:
This class and its iterator implement all of the optional methods of the Collection and Iterator interfaces. The Iterator provided in method iterator() is not guaranteed to traverse the elements of the priority queue in any particular order. If you need ordered traversal, consider using Arrays.sort(pq.toArray()).
So basically, my PriorityQueue works fine, but printing it out to the screen using its own built in toString() method caused me to see this anomaly in action, and was wondering if someone could explain why it is that the iterator provided (and used internally) does not traverse the PriorityQueue in its natural order?
Because the underlying data structure doesn't support it. A binary heap is only partially ordered, with the smallest element at the root. When you remove that, the heap is reordered so that the next smallest element is at the root. There is no efficient ordered traversal algorithm so none is provided in Java.
PriorityQueues are implemented using binary heap.
A heap is not a sorted structure and it is partially ordered. Each element has a “priority” associated with it. Using a heap to implement a priority queue, it will always have the element of highest priority in the root node of the heap. so in a priority queue, an element with high priority is served before an element with low priority. If two elements have the same priority, they are served according to their order in the queue. Heap is updated after each removal of elements to maintain the heap property
At first guess, it's probably traversing the data in the order in which it's stored. To minimize the time to insert an item in the queue, it doesn't normally store all the items in sorted order.
Well, as the Javadoc says, that's how it's been implemented. The priority queue probably uses a binary heap as the underlying data structure. When you remove items, the heap is reordered to preserve the heap property.
Secondly, it's unwise to tie in a specific implementation (forcing a sorted order). With the current implementation, you are free to traverse it in any order and use any implementation.
Binary heaps are an efficient way of implementing priority queues. The only guarantee about order that a heap makes is that the item at the top has the highest priority (maybe it is the "biggest" or "smallest" according to some order).
A heap is a binary tree that has the properties:
Shape property: the tree fills up from top to bottom left to right
Order prperty: the element at any node is bigger (or smaller if smallest has highest priority) than its two children nodes.
When the iterator visits all the elements it probably does so in a level-order traversal, i.e. it visits each node in each level in turn before going on to the next level. Since the only guarantee about order that is made that a node has a higher priority than its children, the nodes in each level will be in no particular order.
My question is about what are the fundamental/concrete data structure (like array) used in implementing abstract data structure implementations like variations maps/trees?
I'm looking for what's used really in java collection, not theoretical answers.
Based on quick code review of Sun/Oracle JDK. You can easily find the details yourself.
Lists/queues
ArrayList
Growing Object[] elementData field. Can hold 10 elements by default, grows by around 50% when cannot hold more objects, copying the old array to a bigger new one. Does not shrink when removing items.
LinkedList
Reference to Entry which in turns hold reference to actual element, previous and next element (if any).
ArrayDeque
Similar to ArrayList but also holding two pointers to internal E[] elements array - head and tail. Both adding and removing elements on either side is just a matter of moving these pointers. The array grows by 200% when is too small.
Maps
HashMap
Growing Entry[] table field holding so called buckets. Each bucket contains linked list of entries having the same hash of the key module table size.
TreeMap
Entry<K,V> root reference holding the root of the red-black balanced tree.
ConcurrentHashMap
Similar to HashMap but access to each bucket (called segment) is synchronized by an independent lock.
Sets
TreeSet
Uses TreeMap underneath (!)
HashSet
Uses HashMap underneath (!)
BitSet
Uses long[] words field to be as memory efficient as possible. Up to 64 bits can be stored in one element.
There is of course one answer for each implementation. Look at the javadocs, they often describe these things. http://docs.oracle.com/javase/7/docs/api/
What is the need of Collection framework in Java since all the data operations(sorting/adding/deleting) are possible with Arrays and moreover array is suitable for memory consumption and performance is also better compared with Collections.
Can anyone point me a real time data oriented example which shows the difference in both(array/Collections) of these implementations.
Arrays are not resizable.
Java Collections Framework provides lots of different useful data types, such as linked lists (allows insertion anywhere in constant time), resizeable array lists (like Vector but cooler), red-black trees, hash-based maps (like Hashtable but cooler).
Java Collections Framework provides abstractions, so you can refer to a list as a List, whether backed by an array list or a linked list; and you can refer to a map/dictionary as a Map, whether backed by a red-black tree or a hashtable.
In other words, Java Collections Framework allows you to use the right data structure, because one size does not fit all.
Several reasons:
Java's collection classes provides a higher level interface than arrays.
Arrays have a fixed size. Collections (see ArrayList) have a flexible size.
Efficiently implementing a complicated data structures (e.g., hash tables) on top of raw arrays is a demanding task. The standard HashMap gives you that for free.
There are different implementation you can choose from for the same set of services: ArrayList vs. LinkedList, HashMap vs. TreeMap, synchronized, etc.
Finally, arrays allow covariance: setting an element of an array is not guaranteed to succeed due to typing errors that are detectable only at run time. Generics prevent this problem in arrays.
Take a look at this fragment that illustrates the covariance problem:
String[] strings = new String[10];
Object[] objects = strings;
objects[0] = new Date(); // <- ArrayStoreException: java.util.Date
Collection classes like Set, List, and Map implementations are closer to the "problem space." They allow developers to complete work more quickly and turn in more readable/maintainable code.
For each class in the Collections API there's a different answer to your question. Here are a few examples.
LinkedList: If you remove an element from the middle of an array, you pay the cost of moving all of the elements to the right of the removed element. Not so with a linked list.
Set: If you try to implement a set with an array, adding an element or testing for an element's presence is O(N). With a HashSet, it's O(1).
Map: To implement a map using an array would give the same performance characteristics as your putative array implementation of a set.
It depends upon your application's needs. There are so many types of collections, including:
HashSet
ArrayList
HashMap
TreeSet
TreeMap
LinkedList
So for example, if you need to store key/value pairs, you will have to write a lot of custom code if it will be based off an array - whereas the Hash* collections should just work out of the box. As always, pick the right tool for the job.
Well the basic premise is "wrong" since Java included the Dictionary class since before interfaces existed in the language...
collections offer Lists which are somewhat similar to arrays, but they offer many more things that are not. I'll assume you were just talking about List (and even Set) and leave Map out of it.
Yes, it is possible to get the same functionality as List and Set with an array, however there is a lot of work involved. The whole point of a library is that users do not have to "roll their own" implementations of common things.
Once you have a single implementation that everyone uses it is easier to justify spending resources optimizing it as well. That means when the standard collections are sped up or have their memory footprint reduced that all applications using them get the improvements for free.
A single interface for each thing also simplifies every developers learning curve - there are not umpteen different ways of doing the same thing.
If you wanted to have an array that grows over time you would probably not put the growth code all over your classes, but would instead write a single utility method to do that. Same for deletion and insertion etc...
Also, arrays are not well suited to insertion/deletion, especially when you expect that the .length member is supposed to reflect the actual number of contents, so you would spend a huge amount of time growing and shrinking the array. Arrays are also not well suited for Sets as you would have to iterate over the entire array each time you wanted to do an insertion to check for duplicates. That would kill any perceived efficiency.
Arrays are not efficient always. What if you need something like LinkedList? Looks like you need to learn some data structure : http://en.wikipedia.org/wiki/List_of_data_structures
Java Collections came up with different functionality,usability and convenience.
When in an application we want to work on group of Objects, Only ARRAY can not help us,Or rather they might leads to do things with some cumbersome operations.
One important difference, is one of usability and convenience, especially given that Collections automatically expand in size when needed:
Collections came up with methods to simplify our work.
Each one has a unique feature:
List- Essentially a variable-size array;
You can usually add/remove items at any arbitrary position;
The order of the items is well defined (i.e. you can say what position a given item goes in in the list).
Used- Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them.
Set- Things can be "there or not"— when you add items to a set, there's no notion of how many times the item was added, and usually no notion of ordering.
Used- Remembering "which items you've already processed", e.g. when doing a web crawl;
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Here you find use of each collection as per scenario:
Collection is the framework in Java and you know that framework is very easy to use rather than implementing and then use it and your concern is that why we don't use the array there are drawbacks of array like it is static you have to define the size of row at least in beginning, so if your array is large then it would result primarily in wastage of large memory.
So you can prefer ArrayList over it which is inside the collection hierarchy.
Complexity is other issue like you want to insert in array then you have to trace it upto define index so over it you can use LinkedList all functions are implemented only you need to use and became your code less complex and you can read there are various advantages of collection hierarchy.
Collection framework are much higher level compared to Arrays and provides important interfaces and classes that by using them we can manage groups of objects with a much sophisticated way with many methods already given by the specific collection.
For example:
ArrayList - It's like a dynamic array i.e. we don't need to declare its size, it grows as we add elements to it and it shrinks as we remove elements from it, during the runtime of the program.
LinkedList - It can be used to depict a Queue(FIFO) or even a Stack(LIFO).
HashSet - It stores its element by a process called hashing. The order of elements in HashSet is not guaranteed.
TreeSet - TreeSet is the best candidate when one needs to store a large number of sorted elements and their fast access.
ArrayDeque - It can also be used to implement a first-in, first-out(FIFO) queue or a last-in, first-out(LIFO) queue.
HashMap - HashMap stores the data in the form of key-value pairs, where key and value are objects.
Treemap - TreeMap stores key-value pairs in a sorted ascending order and retrieval speed of an element out of a TreeMap is quite fast.
To learn more about Java collections, check out this article.