Direction for Implementing EntityCollection task [closed]

Direction for Implementing EntityCollection task [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have the following task:
Guidelines: You may implement your solution in Java/C#.
You are asked to implement the EntityCollection interface which is specified in the attached Java file.
Your implementation should support the following operations:
a. Add - adds the entity which is given as input to the collection.
b. Remove Max Value - removes the entity with the maximal value from the
collection and returns it.
You should provide 3 implementations for the following use-cases (A-C), according to the frequencies of performing Add & Remove Max
Value in these use-cases:
Each use-case implementation should be optimized in terms of its WC
time complexity -
If one operation is more frequent than the other operation (e.g. high vs. low) – then the frequent operation should have the lowest
possible complexity, whereas the other operation may have higher
complexity but still optimized as much as possible.
If both operations are equally frequent (e.g. medium vs. medium) – then both should have similar complexity, which is as low as possible
in each operation while taking into account also the need for the same
complexity in the other operation.
The given java code:
public interface Entity
{
public int getValue(); // unique
}
public interface EntityCollection
{
public void add(Entity entity);
public Entity removeMaxValue();
}
Notes: You may use any existing collections/data structures in your
solution.
My question: Do you think that this assignment is clear enough? I feel a bit fugue about how to approach this.
I think that they asked me to write some collection. but I can't see what's the meaning of the use case/operation.
Any directions hints/code examples would be appreciated.

The importance of this assignment is in your understanding of data structures and algorithms.
Doing a lot of adding and not a lot of removing the max value? use a linked-list. linked-list is O(1) for adding a new value to the list. so use that and use an easy to implement search algorithm for your second operation since it isn't used much.
For the second use case, you need to balance the speed of both operations, so choose a data structure that has decent speed for both. Maybe a Binary Search Tree.
And so on for the final case.
Here is a nice link outlining data structures and their speeds Cheat Sheet
You could choose a Hashtable for some of these, but note that despite the hash tables speed, it consumes extreme amounts of memory to achieve it. However, that is only a concern if memory is a problem or you are working with large data sets.

IMO you need to look into the data structures which efficiently support the add or remove operations as per the use case and then internally make use of corresponding in-built data structure in language of your choice and if you want more flexibility then implement that data structure yourself.
For example in use case A the add frequency is high and remove max value freq. is low, so you may make use of a Data structure which supports addition in say O(1)(constant-time) time-complexity. Whereas, for remove you may make use of something which is O(n)(linear-time) time complexity or anything less than O(n^2) is good enough for low-frequency operations.
So for use case A you can make use of Linked List as addition is O(1) but for remove-max you need to sort and then remove the max, which makes O(nlogn) complexity.
For Use case C - you can choose to go with Priority queue which has O(1) for remove max and O(log n) for addition. Priority queue is internally max-heap implementation in Java.

Related

Arrays unique element and Mobile apps data structure

Recently below questions were asked in an interview
You are given an array of integer with all elements repeated twice except one element which occurs only once, you need to find the unique element with O(nlogn)time complexity. Suppose array is {2,47,2,36,3,47,36} the output should be 3 here. I told we can perform merge sort(as it takes(nlogn)) after that we can check next element, but he said it will take O(nlogn)+O(n). I also told we can use HashMap to keep count of elements but again he said no as we have to iterate over hashmap again to get results. After some research, I came to know that using xor operation will give output in O(n). Is there any better solution other than sorting which can give the answer in O(nlogn) time?
As we use smartphones we can open many apps at a time. when we look at what all apps are open currently we see a list where the recently opened app is at the front and we can remove or close an app from anywhere on the list. There is some Collection available in java which can perform all these tasks in a very efficient way. I told we can use LinkedList or LinkedHashMap but he was not convinced. What could be the best Collection to use?

Firstly, if the interviewer used Big-O notation and expected a O(n log n) solution, there's nothing wrong with your answer. We know that O(x + y) = O(max(x, y)). Therefore, although your algorithm is O(n log n + n), it's okay if we just call O(n log n). However, it's possible to find the element that appears once in a sorted array can be achieved in O(log n) using binary search. As a hint, exploit odd and even indices while performing search. Also, if the interviewer expected a O(n log n) solution, the objection for traversing is absurd. The hash map solution is already O(n), and if there's a problem with this, it's the requirement of extra space. For this reason, the best one is to use XOR as you mentioned. There're still some more O(n) solutions but they're not better than the XOR solution.
To me, LinkedList is proper to use for this task as well. We want to remove from any location and also want to use some stack operations (push, pop, peek). A custom stack can be built from a LinkedList.

Is hash table proper for implementing the address book feature for a cellphone [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I was reading and I found this question , I do not know what is the right answer of it .. Here is the question :
Suppose you are implementing the address book feature for a cellphone. The address
book needs to be kept sorted by person’s last name and support fast access when
queried by last name. Which of the following data structures would be a good choice
to use for storing the address book? Explain why. Which would be the bad choice and why?
(a) unsorted linked list
(b) sorted linked list
(c) binary search tree
(d) hash table
My answer is to use a hash table because it has key and value ...
Is my answer correct ?
Thanks

Based on the requirements the correct struture to use is binary search tree.
Why?
(a) unsorted linked list : NOT Sorted, O(n) search
(b) sorted linked list : Sorted, O(n) search; note that you cannot apply binary search on a linked list single or double
(c) binary search tree : Sorted, O(logn) search
(d) hash table : NOT sorted, O(1) search
(a) and (d) are excluded due to the sorting requirment; then (c) is faster than (b) when searching.
*Just noticed that you also ask "for the bad choice", which is apparently (a) (unsorted linked list)

No, generally a hash table is not the best choice for a phone book.
Hashtables have excellent O(1) lookup performance, but they gain that by having little support for walking the book in order. Nearly every scenario I can think of for searching something in a phone book involves not having the answer in the first place.
For example, If I want to look up all people who's name starts with "George", I would not want to know that I needed to query "George Jestson" because that's the value stored in my hash table.
Use an ordered data structure, any kind of tree will allow you to trim values too high or too low, as will an ordered list or an ordered array; however, avoid an ordered linked list, as you still have to transverse the list O(n) to find an entry. You can do better O(n log n) by doing a b-tree search on an array, or just using a linked list tree of some sort.

(c) binary search tree
Because it is both sorted, and supports fast access.

Data structure in Java that supports quick search and remove in array with duplicates

More specifically, suppose I have an array with duplicates:
{3,2,3,4,2,2,1,4}
I want to have a data structure that supports search and remove the first occurrence of some value faster than O(n), say if the value is 4, then it becomes:
{3,2,3,2,2,1,4}
I also need to iterate the list from head according to the same order. Other operations like get(index) or insert are not needed.
You can use O(n) time to record the original data(say it's an int[]) in your data structure, I just need the later search and remove faster than O(n).
"Search and remove" is considered as ONE operation as shown above.
If I have to make it myself, I would use a LinkedList to store the data, and HashMap to map every key to a list of all occurrence of nodes together with their previous and next ones.
Is it a right approach? Are there any better choices already there in Java?

The data structure you describe, essentially a hybrid linked list and map, I think is the most efficient way of handling your stated problem. You'll have to keep track of the nodes yourself, since Java's LinkedList doesn't provide access to the actual nodes. The AbstractSequentialList may be helpful here.
The index structure you'll need is a map from an element value to the appearances of that element in the list. I recommend a hash table from hashCode % modulus to a linked list of (value, list of main-list nodes).
Note that this approach is still O(n) in the worst case, when you have universal hash collisions; this applies whether you use open or closed hashing. In the average case it should be something closer to O(ln(n)), but I'm not prepared to prove that.
Consider also whether the overhead of keeping track of all of this is really worth the gains. Unless you've actually profiled running code and determined that a LinkedList is causing problems because remove is O(n), stick with that until you do.

Since your requirement is that the first occurrence of the element should be removed and the remaining occurrences retained, there would be no way to do it faster than O(n) as you would definitely have to move through to the end of the list to find out if there is another occurrence. There is no standard api from Oracle in the java package that does this.

Fastest Java HashSet<Integer> library [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
In addition to this quite old post, I need something that will use primitives and give a speedup for an application that contains lots of HashSets of Integers:
Set<Integer> set = new HashSet<Integer>();
So people mention libraries like Guava, Javalution, Trove, but there is no perfect comparison of those in terms of benchmarks and performance results, or at least good answer coming from good experience. From what I see many recommend Trove's TIntHashSet, but others say it is not that good; some say Guava is supercool and manageable, but I do not need beauty and maintainability, only time execution, so Python's style Guava goes home :) Javalution? I've visited the website, seems too old for me and thus wacky.
The library should provide the best achievable time, memory does not matter.
Looking at "Thinking in Java", there is an idea of creating custom HashMap with int[] as keys. So I would like to see something similar with a HashSet or simply download and use an amazing library.
EDIT (in response to the comments below)
So in my project I start from about 50 HashSet<Integer> collections, then I call a function about 1000 times that inside creates up to 10 HashSet<Integer> collections. If I change initial parameters, the numbers may grow up exponentially. I only use add(), contains() and clear() methods on those collections, that is why they were chosen.
Now I'm going to find a library that implements HashSet or something similar, but will do that faster due to autoboxing Integer overhead and maybe something else which I do not know. In fact, I'm using ints as my data comes in and store them in those HashSets.

Trove is an excellent choice.
The reason why it is much faster than generic collections is memory use.
A java.util.HashSet<Integer> uses a java.util.HashMap<Integer, Integer> internally. In a HashMap, each object is contained in an Entry<Integer, Integer>. These objects take estimated 24 bytes for the Entry + 16 bytes for the actual integer + 4 bytes in the actual hash table. This yields 44 bytes, as opposed to 4 bytes in Trove, an up to 11x memory overhead (note that unoccupied entires in the main table will yield a smaller difference in practise).
See also these experiments:
http://www.takipiblog.com/2014/01/23/java-scala-guava-and-trove-collections-how-much-can-they-hold/

Take a look at the High Performance Primitive Collections for Java (HPPC). It is an alternative to trove, mature and carefully designed for efficiency. See the JavaDoc for the IntOpenHashSet.

Have you tried working with the initial capacity and load factor parameters while creating your HashSet?
HashSet doc
Initial capacity, as you might think, refers to how big will the empty hashset be when created, and loadfactor is a threshhold that determines when to grow the hash table. Normally you would like to keep the ratio between used buckets and total buckets, below two thirds, which is regarded as the best ratio to achieve good stable performance in a hash table.
Dynamic rezing of a hash table
So basically, try to set an initial capacity that will fit your needs (to avoid re-creating and reassigning the values of a hash table when it grows), as well as fiddling with the load factor until you find a sweet spot.
It might be that for your particular data distribution and setting/getting values, a lower loadfactor could help (hardly a higher one will, but your milage may vary).

What is the general purpose of using hashtables as a collection? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What exactly are hashtables?
I understand the purpose of using hash functions to securely store passwords. I have used arrays and arraylists for class projects for sorting and searching data. What I am having trouble understanding is the practical value of hashtables for something like sorting and searching.
I got a lecture on hashtables but we never had to use them in school, so it hasn't clicked. Can someone give me a practical example of a task a hashtable is useful for that couldn't be done with a numerical array or arraylist? Also, a very simple low level example of a hash function would be helpful.

There are all sorts of collections out there. Collections are used for storing and retrieving things, so one of the most important properties of a collection is how fast these operations are. To estimate "fastness" people in computer science use big-O notation which sort of means how many individual operations you have to accomplish to invoke a certain method (be it get or set for example). So for example to get an element of an ArrayList by an index you need exactly 1 operation, this is O(1), if you have a LinkedList of length n and you need to get something from the middle, you'll have to traverse from the start of the list to the middle, taking n/2 operations, in this case get has complexity of O(n). The same comes to key-value stores as hastable. There are implementations that give you complexity of O(log n) to get a value by its key whereas hastable copes in O(1). Basically it means that getting a value from hashtable by its key is really cheap.

Basically, hashtables have similar performance characteristics (cheap lookup, cheap appending (for arrays - hashtables are unordered, adding to them is cheap partly because of this) as arrays with numerical indices, but are much more flexible in terms of what the key may be. Given a continuous chunck of memory and a fixed size per item, you can get the adress of the nth item very easily and cheaply. That's thanks to the indices being integers - you can't do that with, say, strings. At least not directly. Hashes allows reducing any object (that implements it) to a number and you're back to arrays. You still need to add checks for hash collisions and resolve them (which incurs mostly a memory overhead, since you need to store the original value), but with a halfway decent implementation, this is not much of an issue.
So you can now associate any (hashable) object with any (really any) value. This has countless uses (although I have to admit, I can't think of one that's applyable to sorting or searching). You can build caches with small overhead (because checking if the cache can help in a given case is O(1)), implement a relatively performant object system (several dynamic languages do this), you can go through a list of (id, value) pairs and accumulate the values for identical ids in any way you like, and many other things.

Very simple. Hashtables are often called "associated arrays." Arrays allow access your data by index. Hash tables allow access your data by any other identifier, e.g. name. For example
one is associated with 1
two is associated with 2
So, when you got word "one" you can find its value 1 using hastable where key is one and value is 1. Array allows only opposite mapping.

For n data elements:
Hashtables allows O(k) (usually dependent only on the hashing function) searches. This is better than O(log n) for binary searches (which follow an n log n sorting, if data is not sorted you are worse off)
However, on the flip side, the hashtables tend to take roughly 3n amount of space.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.