Why no SortedMultiset in Google Collections? - java

Google Collections contains the Multiset interface and the TreeMultiset class, but I was surprised to find that there is no corresponding SortedMultiset interface.
Something like that would be very useful for modelling discrete probability distributions.
Before I attempt to implement it myself, I would like to know if there is a specific reason for leaving it out, e.g. likely violation of Multiset or Collection invariants, or inherent performance problems etc.
Edit: I didn't realise it originally but this is actually 3 separate requests:
A change to the return type of one method (TreeMultiset.entrySet)
An new interface to match the existing functionality of TreeMultiset
A new pair of methods to sum the counts in branches of the tree

I think it's just that no one's ever needed it yet, so we haven't written it yet. It's something I'd consider.

TreeMultiset.elementSet() returns a SortedSet, which might provide some of the functionality you want.
ETA: finnw, the SortedMultiset methods you're requesting wouldn't provide a significantly faster answer to the question "how many elements in my Multiset are less than 42?" The TreeMultiset implementation would still have to iterate across the multiset entries and sum the counts of the relevant elements.

Related

Highest and lowest elements in trove TIntSet?

TIntSet is "sorted set" by sense, i.e. it's elements have natural order.
Unfortunately, I can't find any methods similar to first() and last().
Is it possible to overcome this lack somehow?
fastutil is all ways better than Trove. There is IntSortedSet interface with firstInt() and lastInt() methods.
TIntSet is not sorted (it's a hash set) so in order to find the min or max you would need to iterate all values.
Why do you think this set is sorted? It looks like it's not. It doesn't implement SortedSet or NavigableSet interfaces from JDK collections framework - there are no such methods there.
first() and last() methods are actually inherited from interface SortedSet https://docs.oracle.com/javase/7/docs/api/java/util/SortedSet.html#first%28%29
If you want first/last/cell/floor etc type of option then Tree based set is required and trove does not have anything like that. If you want to extend TIntSet to have such thing then
one of the simple option can be maintaining sorted values in parallel int array and use that to serve first/last type of request, but this will required extra memory.
another option is that you can use just one array for values but keep is sorted and use binary search to support map API. This can be little slow for get/put as compared to trove but you can save memory because trove has .5 loadfactor
So you can make decision based on trade off you want .

Java: Vector sort vs Collection sort

I have never had the pleasure to work with the code below, and I've now stumbled upon an assignment where I'm suppose to argument which of these is most used, and why.
We've got two examples,
public Vector<Integer> sort(Vector<Integer> integers) { ... }
public Collection<Integer> sort(Collection<Integer> integers) {}
Essentially, we're to argument which of these two examples is the best solution to sorting things. Which of these two are used the most, and why?
Essentially, we're to argument which of these two examples is the best solution to sorting things. Which of these two are used the most, and why?
Lots of comments but I don't see an answer really. So: the second one is more used nowadays. The first one is an old method, back from the early Java days. You can still use it, but nowadays people use mostly the second one because it obviously imposes less requirements on the person using it (in particular the collection which you're sorting doesn't need to be a Vector).
There are advantages and disadvantages to each. I think you're being asked to choose (perhaps deliberately) between two poor choices.
As many have already said, Vector is obsolete. It should not be used.
However… there's a reason the Collections.sort method takes a List and not a Collection: How does one sort an unordered Set? How would one sort a collection which already has a required order, like a TreeSet?
Since the notion of sorting a Collection is so poorly defined, the method which takes a Vector is probably the better one to use, even though Vector itself shouldn't be used.

Why is there no direct implemention of Bag in java collection framework?

I can't figure out why JCF (Java Collection Framework) does't have a Bag implementation(to allow duplicates and not maintain order).
Bag performance would be much better than current Collection implementations in JCF.
I know how to implement Bag in Java.
I know Bag is available in Apache commons.
I know there are other implementations that can be used as a Bag but there is so much work to do in other implementations compared to a Bag.
Why has the Java Collections framework not provided direct implementations like this?
Posting my comment as an answer since it answers this question best.
From the bug report filed here :
There isn't a lot of enthusiasm among the maintainers of the
Collection framework to design and implement these interfaces/classes.
I personally can't recall having needed one. It would be more likely
that a popular package developed outside the JDK would be imported
into the JDK after having proved its worth in the real world.
The need for having support for Bags is valid today.
Guava has support for it. Also GS-Collections.
Currently, bag violates the collections contract. Many methods are in conflict with the current collections rules.
"Bag is a Collection that counts the number of times an object appears in the collection. Suppose you have a Bag that contains {a, a, b, c}. Calling getCount(Object) on a would return 2, while calling uniqueSet() would return {a, b, c}.
Note that this interface violates the Collection contract. The behavior specified in many of these methods is not the same as the behavior specified by Collection. The noncompliant methods are clearly marked with "(Violation)" in their summary line. A future version of this class will specify the same behavior as Collection, which unfortunately will break backwards compatibility with this version."
boolean add(java.lang.Object o)
(Violation) Add the given object to the bag and keep a count.
boolean removeAll(java.util.Collection c)
(Violation) Remove all elements represented in the given collection, respecting cardinality.
Please see the link for more information: HERE
JDK tries to give you implementation of common data structures and allow you to implement anything if common structures won't server your purpose. They may have thought that it is not common data structure.From practicality, it is not possible for them to implement every data structure out there or satisfy everybody's requirements. What you think common may not be common for majority.

List without duplicates?

I just read Is there a no-duplicate List implementation out there? answer about a List, NOT Set, implementation that doesn't allow duplicates. The accepted answer recommended the Collections15 SetUniqueList. Is there some equivalent -- in Guava perhaps? (I searched the docs and couldn't find) -- in other libs or is there some other currently popular solution?
I like to require the correct Java Semantic when I have a Collection that should not have any duplicates, and that is to use the Set interface. That said, in many cases you want to preserve the insertion order like a List would for convenience, maintaining two parallel data structures seems wasteful and complicated to keep in sync. That is why I use something like this. InsertionOrderSet.java This is a special implementation of SortedSet that uses a wrapper object to maintain and index that can be sorted on by a comparator, but hides that implementation detail from the external consumers so it just looks like a regular old type safe SortedSet.
The stuff in the answer you linked seems quite adequate. I'd go with one of those solutions. Alternatively, you could simply extend class ArrayList and override some of the methods that do a check against a HashSet prior to calling super.method for that same method. Simply add the Set as an instance field, use it for checking dupes and add/remove whatever items get added to/removed from the list.

When to use the various generic containers in Java

Does anyone know of any resources or books I can read to help me understand the various Java collection classes?
For example:When would one use a Collection<T> instead of a List<T>
and when would you use a Map<T, V> instead of aList<V>, where V has a member getId as shown below, allowing you to search the list for the element matching a given key:
class V {
T getId();
}
thanks
You use a Map if you want a map-like behaviour. You use a List if you want a list-like behaviour. You use a Collection if you don't care.
Generics have nothing to do with this.
See the Collections tutorial.
You can take a look at sun tutorial. It explains everything in detail.
http://java.sun.com/docs/books/tutorial/collections/index.html (Implementation section explain the difference between them)
This book is very good and covers both the collection framework and generics.
You can check the documentation of the java collection API.
Anyway a basic rule is : be as generic as possible for the type of your parameters. Be as generic as possible for the return type of your interfaces. Be as specific as possible for the return type of your final class.
A good place to start would be the Java API. Each collection has a good description associated with it. After that, you can search for any variety of articles and books on Java Collections on Google.
The decision depends on your data and your needs to use the data.
You should use a map if you have data where you can identify each element with a specific key and want to access or find it by with this key.
You take a List if you don't have a key but you're interested in the order of the elements. like a bunch of Strings you want to store in the order the user entered it.
You take a Set if you don't want to store the same element twice.
Also interesting for your decision is if you're working in am multithreaded environment. So if many threads are accessing the same list at the same tame you would rather take a Vector instead of an ArrayList.
Btw. for some collections it is usefull if your data class implements an interface like comparable or at least overrides the equals function.
here you will find more information.
Most Java books will have a good expanation of the Collections Framework. I find that Object-Oriented-Software-Development-Using has a good chapter that expains the reasons why one Collection is selected over another.
The Head first Java also has a good intropduction but may not tackle the problem of which to select.
The answer to your question is how are you going to be using the data structure? And to get a better idea of the possibilities, it is good to look at the whole collections interfaces hierarchy. For simplicity sake, I am restricting this discussion only to the classic interfaces, and am ignoring all of the concurrent interfaces.
Collection
+- List
+- Set
+- SortedSet
Map
+- SortedMap
So, we can see from the above, a Map and a Collection are different things.
A Collection is somewhat analogous to a bag, it contains a number of values, but makes no further guarantees about them. A list is simply an ordered set of values, where the order is defined externally, not implicitly from the values themselves. A Set on the other hand is a group of values, no two of which are the same, however they are not ordered, neither explicitly, nor implicitly. A SortedSet is a set of unique values that are implicitly sorted, that is, if you iterate over the values, then they will always be returned in the same order.
A Map is mapping from a Set of keys to values. A SortedMap is a mapping from a SortedSet of keys to values.
As to why you would want to use a Map instead of a List? This depends largely on how you need to lookup your data. If you need to do (effectively) random lookups using a key, then you should be using a set, since the implementations of that give you either O(1) or O(lgn) lookups. Searching through the list is O(n). If however, you are performing some kind of "batch" process, that is you are processing each, and every, item in the list then a list, or Set if you need the uniqueness constraint, is more appropriate.
The other answers already covered an overview of what the collections are, so I'd add one rule of thumb that applies to how you might use collections in your programming:
Be strict in what you send, but generous in what you receive
This is a little controversial (some engineers believe that you should always be as strict as possible) but it's a rule of thumb that, when applied to collections, teaches us to pick the collection that limits your users the least when taking arguments but gives as much information as possible when returning results.
In other words a method signature like:
LinkedList< A > doSomething(Collection< A > col);
Might be preferred to:
Collection< A > doSomething(LinkedList< A > list);
In version 1, your user doesn't have to massage their data to use your method. They can pass you an ArrayList< A >, LinkedHashSet< A > or a Collection< A > and you will deal with. On receiving the data from the method, they have a lot more information in what they can do with it (list specific iterators for example) than they would in option 2.

Categories

Resources