Java: Vector sort vs Collection sort - java

I have never had the pleasure to work with the code below, and I've now stumbled upon an assignment where I'm suppose to argument which of these is most used, and why.
We've got two examples,
public Vector<Integer> sort(Vector<Integer> integers) { ... }
public Collection<Integer> sort(Collection<Integer> integers) {}
Essentially, we're to argument which of these two examples is the best solution to sorting things. Which of these two are used the most, and why?

Essentially, we're to argument which of these two examples is the best solution to sorting things. Which of these two are used the most, and why?
Lots of comments but I don't see an answer really. So: the second one is more used nowadays. The first one is an old method, back from the early Java days. You can still use it, but nowadays people use mostly the second one because it obviously imposes less requirements on the person using it (in particular the collection which you're sorting doesn't need to be a Vector).

There are advantages and disadvantages to each. I think you're being asked to choose (perhaps deliberately) between two poor choices.
As many have already said, Vector is obsolete. It should not be used.
However… there's a reason the Collections.sort method takes a List and not a Collection: How does one sort an unordered Set? How would one sort a collection which already has a required order, like a TreeSet?
Since the notion of sorting a Collection is so poorly defined, the method which takes a Vector is probably the better one to use, even though Vector itself shouldn't be used.

Related

Sorting a List in parallel without creating a temporary array in Java 8

Java 8 provides java.util.Arrays.parallelSort, which sorts arrays in parallel using the fork-join framework. But there's no corresponding Collections.parallelSort for sorting lists.
I can use toArray, sort that array, and store the result back in my list, but that will temporarily increase memory usage, which if I'm using parallel sorting is already high because parallel sorting only pays off for huge lists. Instead of twice the memory (the list plus parallelSort's working memory), I'm using thrice (the list, the temporary array and parallelSort's working memory). (Arrays.parallelSort documentation says "The algorithm requires a working space no greater than the size of the original array".)
Memory usage aside, Collections.parallelSort would also be more convenient for what seems like a reasonably common operation. (I tend not to use arrays directly, so I'd certainly use it more often than Arrays.parallelSort.)
The library can test for RandomAccess to avoid trying to e.g. quicksort a linked list, so that can't a reason for a deliberate omission.
How can I sort a List in parallel without creating a temporary array?
There doesn't appear to be any straightforward way to sort a List in parallel in Java 8. I don't think this is fundamentally difficult; it looks more like an oversight to me.
The difficulty with a hypothetical Collections.parallelSort(list, cmp) is that the Collections implementation knows nothing about the list's implementation or its internal organization. This can be seen by examining the Java 7 implementation of Collections.sort(list, cmp). As you observed, it has to copy the list elements out to an array, sort them, and then copy them back into the list.
This is the big advantage of the List.sort(cmp) extension method over Collections.sort(list, cmp). It might seem that this is merely a small syntactic advantage being able to write myList.sort(cmp) instead of Collections.sort(myList, cmp). The difference is that myList.sort(cmp), being an interface extension method, can be overridden by the specific List implementation. For example, ArrayList.sort(cmp) sorts the list in-place using Arrays.sort() whereas the default implementation implements the old copyout-sort-copyback technique.
It should be possible to add a parallelSort extension method to the List interface that has similar semantics to List.sort but does the sorting in parallel. This would allow ArrayList to do a straightforward in-place sort using Arrays.parallelSort. (It's not entirely clear to me what the default implementation should do. It might still be worth it to do copyout-parallelSort-copyback.) Since this would be an API change, it can't happen until the next major release of Java SE.
As for a Java 8 solution, there are a couple workarounds, none very pretty (as is typical of workarounds). You could create your own array-based List implementation and override sort() to sort in parallel. Or you could subclass ArrayList, override sort(), grab the elementData array via reflection and call parallelSort() on it. Of course you could just write your own List implementation and provide a parallelSort() method, but the advantage of overriding List.sort() is that this works on the plain List interface and you don't have to modify all the code in your code base to use a different List subclass.
I think you are doomed to use a custom List implementation augmented with your own parallelSort or else change all your other code to store the big data in Array types.
This is the inherent problem with layers of abstract data types. They're meant to isolate the programmer from details of implementation. But when the details of implementation matter - as in the case of underlying storage model for sort - the otherwise splendid isolation leaves the programmer helpless.
The standard List sort documents provide an example. After the explanation that mergesort is used, they say
The default implementation obtains an array containing all elements in this list, sorts the array, and iterates over this list resetting each element from the corresponding position in the array. (This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.)
In other words, "since we don't know the underlying storage model for a List and couldn't touch it if we did, we make a copy organized in a known way." The parenthesized expression is based on the fact that the List "i'th element accessor" on a linked list is Omega(n), so the normal array mergesort implemented with it would be a disaster. In fact it's easy to implement mergesort efficiently on linked lists. The List implementer is just prevented from doing it.
A parallel sort on List has the same problem. The standard sequential sort fixes it with custom sorts in the concrete List implementations. The Java folks just haven't chosen to go there yet. Maybe in Java 9.
Use the following:
yourCollection.parallelStream().sorted().collect(Collectors.toList());
This will be parallel when sorting, because of parallelStream(). I believe this is what you mean by parallel sort?
Just speculating here, but I see several good reasons for generic sort algorithms preferring to work on arrays instead of List instances:
Element access is performed via method calls. Despite all the optimizations JIT can apply, even for a list that implements RandomAccess, this probably means a lot of overhead compared to plain array accesses which can be optimized very well.
Many algorithms require copying some fragments of the array to temporary structures. There are efficient methods for copying arrays or their fragments. An arbitrary List instance on the other hand, can't be easily copied. New lists would have to be allocated which poses two problems. First, this means allocating some new objects which is likely more costly than allocating arrays. Second, the algorithm would have to choose what implementation of List should be allocated for this temporary structure. There are two obvious solutions, both bad: either just choose some hard-coded implementation, e.g. ArrayList, but then it could just allocate simple arrays as well (and if we're generating arrays then it's much easier if the soiurce is also an array). Or, let the user provide some list factory object, which makes the code much more complicated.
Related to the previous issue: there is no obvious way of copying a list into another due to how the API is designed. The best the List interface offers is addAll() method, but this is probably not efficient for most cases (think of pre-allocating the new list to its target size vs adding elements one by one which many implementations do).
Most lists that need to be sorted will be small enough for another copy to not be an issue.
So probably the designers thought of CPU efficiency and code simplicity most of all, and this is easily achieved when the API accepts arrays. Some languages, e.g. Scala, have sort methods that work directly on lists, but this comes at a cost and probably is less efficient than sorting arrays in many cases (or sometimes there will probably just be a conversion to and from array performed behind the scenes).
By combining the existing answers I came up with this code.
This works if you are not interested in creating a custom List class and if you don't bother to create a temporary array (Collections.sort is doing it anyway).
This uses the initial list and does not create a new one as in the parallelStream solution.
// Convert List to Array so we can use Arrays.parallelSort rather than Collections.sort.
// Note that Collections.sort begins with this very same conversion, so we're not adding overhead
// in comparaison with Collections.sort.
Foo[] fooArr = fooLst.toArray(new Foo[0]);
// Multithread the TimSort. Automatically fallback to mono-thread when size is less than 8192.
Arrays.parallelSort(fooArr, Comparator.comparingStuff(Foo::yourmethod));
// Refill the List using the sorted Array, the same way Collections.sort does it.
ListIterator<Foo> i = fooLst.listIterator();
for (Foo e : fooArr) {
i.next();
i.set(e);
}

What would be a good way to implement a set collection with weak references, compares by reference, and is also sortable in Java?

I want to have an object that allows other objects of a specific type to register themselves with it. Ideally it would store the references to them in some sort of set collection and have .equals() compare by reference rather than value. It shouldn't have to maintain a sort at all times, but it should be able to be sorted before the collection is iterated over.
Looking through the Java Collection Library, I've seen the various features I'm looking for on different collection types, but I am not sure about how I should go about using them to build the kind of collection I'm looking for.
This is Java in the context of Android if that is significant.
Java's built-in tree-based collections won't work.
To illustrate, consider a tree containing weak references to nodes 'B', 'C', and 'D':
C
B D
Now let the weak reference 'C' get collected, leaving null behind:
-
B D
Now insert an element into the tree. The TreeMap/TreeSet doesn't have sufficient information to select the left or right subtree. If your comparator says null is a small value, then it will be incorrect when inserting 'A'. If it says null is a large value, it will be incorrect when inserting 'E'.
Sort on demand is a good choice.
A more robust solution is to use an ArrayList<WeakReference<T>> and to implement a Comparator<WeakReference<T>> that delegates to a Comparator<T>. Then call Collections.sort() prior to iteration.
Android's Collections.sort uses TimSort behind-the-scenes and so it runs quite efficiently if the input is already partially sorted.
Perhaps the collections classes are a level of abstraction below what you're looking for? It sounds like the end product you want is a cache with the ability to iterate in a user-defined sort order. If so, perhaps the cache interface in the Google Guava library is close enough to what you want:
http://code.google.com/p/guava-libraries/source/browse/trunk/guava/src/com/google/common/cache/Cache.java
At a glance, it looks like CacheBuilder in that package doesn't allow you to build an implementation with user-defined iteration order. However, it does provide a Map view that might be good enough for your needs:
List<Thing> cachedThings = Lists.newArrayList(cache.asMap().values());
Collections.sort(cachedThings, YOUR_THING_COMPARATOR);
for (Thing thing : cachedThings) { ... }
Even if this isn't exactly what you want, the classes in that package might give you some useful insights re: using References with Collections.
DISCLAIMER: This was a comment but it got kinda big, sorry if it doesn't solve your problem:
References in Java
Just to clarify what I mean when I say reference, since it isn't really a term commonly used in Java: Java does not really use references or pointers. It uses a kind of pseudo-reference that can be (and is by default) assigned to the special null instance. That's one way to explain it anyway. In Java, these pseudo-references are the only way that an Object can be handled. When I say reference, I mean these pseudo-references.
Sets
Any Set implementation will not allow two references to the same object to be included in it since it uses identity equality for this check. That violates the mathematical concept of a set. The Java Sets ignore any attempt to add duplicate references.
You mention a Map in your comment though... Could you clarify what kind of collection you are after? And why you need that kind of equality checking within it? Are you thinking in C++ terms? I'll try to edit my answer to be more helpful then :)
EDIT: I thought that might have been your goal ;) So a TreeSet should do the trick then! I would not get concerned about performance until there is a performance issue. Simplicity is fantastic for readability, maintenance and preventing bugs. If performance does become a problem, ideally you should profile your code and only optimize the areas that are proven to be the problem.

How can I hash composite classes?

Let Abstract be an abstract class, and A1,A2,...,An concrete classes that inherit from Abstact. Each one of Ai has a list of Abstract and a pre-defined, known at compile time, set of primitive types, let's assume we have a hush function for them, and there are no 'loops' in the structure of each concrete element.
Two elements e1 and e2 are identical if they have the same values for the predefined primitives, and if for each Abstract in e1, there exists an Abstract in e2 such that e1 and e2 are identical. (in other words, order is not important).
I am looking for a good hash heuristic for this kind of problem. It shouldn't (and as far as I know, can't be) a perfect hash function, but it should be good and easy to compute at run time.
I'll be glad if someone can give me some guidelines how to implement such a function, or direct me to an article that addresses this problem.
PS I am writing in Java, and I assume (correct me if I am wrong) the built in hash() won't be good enough for this problem.
EDIT :
the lists and primitives are fixed after construction, but are unknown at compile time.
If these lists can change after they are constructed, it would be a bad idea to base the hash function on them. Imagine if you stuck your object into a HashMap, and then changed part of it. You would no longer be able to locate it in the HashMap because its hashCode would be different.
You should only base the result of hashCode on immutable values. If you don't have any immutable values in your object, your best bet would probably be to simply use the basic Object.hashCode(), although you'll lose out on equality testing.
If these objects are immutable, however, then I recommend choosing some kind of sort order for your elements. Then you can compute a hash code across your lists, knowing that it will be the same even if the lists are in different orders, because you are sorting the values before hashing.
Use Google Guava's utilities... Objects.hashCode() is great. Also, the source is available, and they have solved the problem you state, so you can take a look at their solution.

What is the need of collection framework in java?

What is the need of Collection framework in Java since all the data operations(sorting/adding/deleting) are possible with Arrays and moreover array is suitable for memory consumption and performance is also better compared with Collections.
Can anyone point me a real time data oriented example which shows the difference in both(array/Collections) of these implementations.
Arrays are not resizable.
Java Collections Framework provides lots of different useful data types, such as linked lists (allows insertion anywhere in constant time), resizeable array lists (like Vector but cooler), red-black trees, hash-based maps (like Hashtable but cooler).
Java Collections Framework provides abstractions, so you can refer to a list as a List, whether backed by an array list or a linked list; and you can refer to a map/dictionary as a Map, whether backed by a red-black tree or a hashtable.
In other words, Java Collections Framework allows you to use the right data structure, because one size does not fit all.
Several reasons:
Java's collection classes provides a higher level interface than arrays.
Arrays have a fixed size. Collections (see ArrayList) have a flexible size.
Efficiently implementing a complicated data structures (e.g., hash tables) on top of raw arrays is a demanding task. The standard HashMap gives you that for free.
There are different implementation you can choose from for the same set of services: ArrayList vs. LinkedList, HashMap vs. TreeMap, synchronized, etc.
Finally, arrays allow covariance: setting an element of an array is not guaranteed to succeed due to typing errors that are detectable only at run time. Generics prevent this problem in arrays.
Take a look at this fragment that illustrates the covariance problem:
String[] strings = new String[10];
Object[] objects = strings;
objects[0] = new Date(); // <- ArrayStoreException: java.util.Date
Collection classes like Set, List, and Map implementations are closer to the "problem space." They allow developers to complete work more quickly and turn in more readable/maintainable code.
For each class in the Collections API there's a different answer to your question. Here are a few examples.
LinkedList: If you remove an element from the middle of an array, you pay the cost of moving all of the elements to the right of the removed element. Not so with a linked list.
Set: If you try to implement a set with an array, adding an element or testing for an element's presence is O(N). With a HashSet, it's O(1).
Map: To implement a map using an array would give the same performance characteristics as your putative array implementation of a set.
It depends upon your application's needs. There are so many types of collections, including:
HashSet
ArrayList
HashMap
TreeSet
TreeMap
LinkedList
So for example, if you need to store key/value pairs, you will have to write a lot of custom code if it will be based off an array - whereas the Hash* collections should just work out of the box. As always, pick the right tool for the job.
Well the basic premise is "wrong" since Java included the Dictionary class since before interfaces existed in the language...
collections offer Lists which are somewhat similar to arrays, but they offer many more things that are not. I'll assume you were just talking about List (and even Set) and leave Map out of it.
Yes, it is possible to get the same functionality as List and Set with an array, however there is a lot of work involved. The whole point of a library is that users do not have to "roll their own" implementations of common things.
Once you have a single implementation that everyone uses it is easier to justify spending resources optimizing it as well. That means when the standard collections are sped up or have their memory footprint reduced that all applications using them get the improvements for free.
A single interface for each thing also simplifies every developers learning curve - there are not umpteen different ways of doing the same thing.
If you wanted to have an array that grows over time you would probably not put the growth code all over your classes, but would instead write a single utility method to do that. Same for deletion and insertion etc...
Also, arrays are not well suited to insertion/deletion, especially when you expect that the .length member is supposed to reflect the actual number of contents, so you would spend a huge amount of time growing and shrinking the array. Arrays are also not well suited for Sets as you would have to iterate over the entire array each time you wanted to do an insertion to check for duplicates. That would kill any perceived efficiency.
Arrays are not efficient always. What if you need something like LinkedList? Looks like you need to learn some data structure : http://en.wikipedia.org/wiki/List_of_data_structures
Java Collections came up with different functionality,usability and convenience.
When in an application we want to work on group of Objects, Only ARRAY can not help us,Or rather they might leads to do things with some cumbersome operations.
One important difference, is one of usability and convenience, especially given that Collections automatically expand in size when needed:
Collections came up with methods to simplify our work.
Each one has a unique feature:
List- Essentially a variable-size array;
You can usually add/remove items at any arbitrary position;
The order of the items is well defined (i.e. you can say what position a given item goes in in the list).
Used- Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them.
Set- Things can be "there or not"— when you add items to a set, there's no notion of how many times the item was added, and usually no notion of ordering.
Used- Remembering "which items you've already processed", e.g. when doing a web crawl;
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Here you find use of each collection as per scenario:
Collection is the framework in Java and you know that framework is very easy to use rather than implementing and then use it and your concern is that why we don't use the array there are drawbacks of array like it is static you have to define the size of row at least in beginning, so if your array is large then it would result primarily in wastage of large memory.
So you can prefer ArrayList over it which is inside the collection hierarchy.
Complexity is other issue like you want to insert in array then you have to trace it upto define index so over it you can use LinkedList all functions are implemented only you need to use and became your code less complex and you can read there are various advantages of collection hierarchy.
Collection framework are much higher level compared to Arrays and provides important interfaces and classes that by using them we can manage groups of objects with a much sophisticated way with many methods already given by the specific collection.
For example:
ArrayList - It's like a dynamic array i.e. we don't need to declare its size, it grows as we add elements to it and it shrinks as we remove elements from it, during the runtime of the program.
LinkedList - It can be used to depict a Queue(FIFO) or even a Stack(LIFO).
HashSet - It stores its element by a process called hashing. The order of elements in HashSet is not guaranteed.
TreeSet - TreeSet is the best candidate when one needs to store a large number of sorted elements and their fast access.
ArrayDeque - It can also be used to implement a first-in, first-out(FIFO) queue or a last-in, first-out(LIFO) queue.
HashMap - HashMap stores the data in the form of key-value pairs, where key and value are objects.
Treemap - TreeMap stores key-value pairs in a sorted ascending order and retrieval speed of an element out of a TreeMap is quite fast.
To learn more about Java collections, check out this article.

Why no SortedMultiset in Google Collections?

Google Collections contains the Multiset interface and the TreeMultiset class, but I was surprised to find that there is no corresponding SortedMultiset interface.
Something like that would be very useful for modelling discrete probability distributions.
Before I attempt to implement it myself, I would like to know if there is a specific reason for leaving it out, e.g. likely violation of Multiset or Collection invariants, or inherent performance problems etc.
Edit: I didn't realise it originally but this is actually 3 separate requests:
A change to the return type of one method (TreeMultiset.entrySet)
An new interface to match the existing functionality of TreeMultiset
A new pair of methods to sum the counts in branches of the tree
I think it's just that no one's ever needed it yet, so we haven't written it yet. It's something I'd consider.
TreeMultiset.elementSet() returns a SortedSet, which might provide some of the functionality you want.
ETA: finnw, the SortedMultiset methods you're requesting wouldn't provide a significantly faster answer to the question "how many elements in my Multiset are less than 42?" The TreeMultiset implementation would still have to iterate across the multiset entries and sum the counts of the relevant elements.

Categories

Resources