Is switching between Collections worth it? - java

Java offers us Collections, where every option is best used in a certain scenario.
But what would be a good solution for the combination of following tasks:
Quickly iterate through every element in the list (order does not matter)
Check if the list contains (a) certain element(s)
Some options that were considered which may or may not be good practice:
It could be possible to, for example, first use a LinkedList, and
then convert it to a HashSet when the amount of elements
is unknown in advance (and if duplicates will not be present)
Pick a solution for one of both tasks and use the same implementation for the other task (if switching to another implementation is not worth it)
Perhaps some implementation exists that does both (failed to find one)
Is there a 'best' solution to this, and if so, what is it?
EDIT: For potential future visitors, this page contains many implementations with big O runtimes.

A HashSet can be iterated through quickly and provides efficient lookups.
HashSet<Object> set = new HashSet<>();
set.add("Hello");
for (Object obj : set) {
System.out.println(obj);
}
if (set.contains("Hello")) {
System.out.println("Found");
}

Quickly iterate through every element in the list (order does not matter)
It the order does not matter, you should go with a Collection implementation with a time complexity of O(n), since each of them is implementing Iterable and if you want to iterate over each element, you have to visit each element at least once (hence there is nothing better than O(n)). Practically, of course, one implementation is more suited compared to another one, since more often you have multiple considerations to take into account.
Check if the list contains (a) certain element(s)
This is typically the user case for a Set, you will have much better time complexity for contains operations. One thing to note here is that a Set does not have a predefined order when iterating over elements. It can change between implementations and it is risky to make assumptions about it.
Now to your question:
From my perspective, if you have the choice to choose the data structure of a class yourself, go with the most natural one for that use case. If you can imagine that you have to call contains a lot, then a Set might be suited for your use case. You can also use a List and each time you need to call contains (multiple times) you can create a Set with all elements from the List before. Of course, if you call this method often, it would be expensive to create the Set for each invocation. You may use a Set in the first place.
Your comment stated that you have a world of players and you want to check if a player is part of a certain world object. Since the world owns the players, it should also contain a Collection of some kind to store them. Now, in this case i would recommend a Map with a common identifier of the player as key, and the player itself as value.
public class World {
private Map<String, Player> players = new HashMap<>();
public Collection<Player> getPlayers() { ... }
public Optional<Player> getPlayer(String nickname) { ... }
// ...
}

Related

Should I implement List interface or extend ArrayList class

I am developing an application where as a background I need to monitor the user activity on particular objects and later when they are visualized they need to be sorted based on the order of which the user used them ( the last used object must be visualized on the first row of a grid for example.)
So if I have an ArrayList where I store the objects which the user is dealing with in order to add the last used object I need to check if it is already in the list and then move it at the first position. If the object is not there I simply add it at the first position of the list.
So instead of doing all these steps I want to make my own list where the logic explained above will be available.
My question is which scenario is better:
Implement the list interface
Extend the ArrayList class and override the ADD method
Create a class that contains an ArrayList and handles any additional functionality.
I.e. prefer composition over inheritance (and in this case, implementing an interface). It's also possible to have that class implement List for relevant cases and just direct the (relevant) operations to the ArrayList inside.
Also note that LinkedHashMap supports insertion order (default) and access order for iteration, if you don't need a List (or if you can suitably replace it with a Map).
So instead of doing all these steps i want to make my own list where
the logic explained above will be available.
I would try to refactor your design parameters (if you can) in order to be able to use the existing Java Collection Framework classes (perhaps a linked collection type). As a part of the Collections Framework, these have been optimized and maintained for years (so efficiency is likely already nearly optimal), and you won't have to worry about maintaining it yourself.
Of the two options you give, it is possible that neither is the easiest or best.
It doesn't sound like you'll be able to extend AbstractList (as a way of implementing List) so you'll have a lot of wheel reinvention to do.
The ArrayList class is not final, but not expressly designed and documented for inheritance. This can result in some code fragility as inheritance breaks encapsulation (discussed in Effective Java, 2nd Ed. by J. Bloch). This solution may not be the best way to go.
Of the options, if you can't refactor your design to allow use of the Collection classes directly, then write a class that encapsulates a List (or other Collection) as an instance field and add instrumentation to it. Favor composition over inheritance. In this way, your solution will be more robust and easier to maintain than a solution based on inheritance.
I think LinkedHashMap already does what you need - it keeps the elements in the order they were inserted or last accessed (this is determined by the parameter accessOrder in one of the constructors).
https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html
EDIT
I don't have enough reputation to comment, so I'm putting it here: You don't actually need a map, so Venkatesh's LinkedHashSet suggestion is better.
You can do something like this:
<T> void update(Set<T> set, T value) {
set.remove(value);
set.add(value);
}
and then
LinkedHashSet<String> set = new LinkedHashSet<>();
update(set, "a");
update(set, "b");
update(set, "c");
update(set, "a");
Iterator<String> it = new LinkedList<String>(set).descendingIterator();
while (it.hasNext()) {
System.out.println(it.next());
}
Output:
a
c
b
You might try using HashMap<Integer, TrackedObject> where TrackedObject is the class of the Object you're keep track of.
When your user uses an object, do
void trackObject(TrackedObject object)
{
int x = hashMap.size();
hashMap.add(Integer.valueOf(x), object);
}
then when you want to read out the tracked objects in order of use:
TrackedObject[] getOrderedArray()
{
TrackedObject[] array = new TrackedObject[hashMap.size()];
for(int i = 0; i < hashMap.size(); i++)
{
array[i] = hashMap.get(Integer.valueOf(i));
}
return array;
}
A LinkedHashSet Also can be helpful in your case. You can keep on adding elements to it, it will keep them in insertion order and also will maintain only unique values.

Is creating a HashMap alongside an ArrayList just for constant-time contains() a valid strategy?

I've got an ArrayList that can be anywhere from 0 to 5000 items long (pretty big objects, too).
At one point I compare it against another ArrayList, to find their intersection. I know this is O(n^2).
Is creating a HashMap alongside this ArrayList, to achieve constant-time lookup, a valid strategy here, in order to reduce the complexity to O(n)? Or is the overhead of another data structure simply not worth it? I believe it would take up no additional space (besides for the references).
(I know, I'm sure 'it depends on what I'm doing', but I'm seriously wondering if there's any drawback that makes it pointless, or if it's actually a common strategy to use. And yes, I'm aware of the quote about prematurely optimizing. I'm just curious from a theoretical standpoint).
First of all, a short side note:
And yes, I'm aware of the quote about prematurely optimizing.
What you are asking about here is not "premature optimization"!
You are not talking about replacing a multiplication with some odd bitwise operations "because they are faster (on a 90's PC, in a C-program)". You are thinking about the right data structure for your application pattern. You are considering the application cases (though you did not tell us many details about them). And you are considering the implications that the choice of a certain data structure will have on the asymptotic running time of your algorithms. This is planning, or maybe engineering, but not "premature optimization".
That being said, and to tell you what you already know: It depends.
To elaborate this a bit: It depends on the actual operations (methods) that you perform on these collections, how frequently you perform then, how time-critical they are, and how memory-sensitive the application is.
(For 5000 elements, the latter should not be a problem, as only references are stored - see the discussion in the comments)
In general, I'd also be hesitant to really store the Set alongside the List, if they are always supposed to contain the same elements. This wording is intentional: You should always be aware of the differences between both collections. Primarily: A Set can contain each element only once, whereas a List may contain the same element multiple times.
For all hints, recommendations and considerations, this should be kept in mind.
But even if it is given for granted that the lists will always contain elements only once in your case, then you still have to make sure that both collections are maintained properly. If you really just stored them, you could easily cause subtle bugs:
private Set<T> set = new HashSet<T>();
private List<T> list = new ArrayList<T>();
// Fine
void add(T element)
{
set.add(element);
list.add(element);
}
// Fine
void remove(T element)
{
set.remove(element);
list.remove(element); // May be expensive, but ... well
}
// Added later, 100 lines below the other methods:
void removeAll(Collection<T> elements)
{
set.removeAll(elements);
// Ooops - something's missing here...
}
To avoid this, one could even consider to create a dedicated collection class - something like a FastContainsList that combines a Set and a List, and forwards the contains call to the Set. But you'll qickly notice that it will be hard (or maybe impossible) to not violate the contracts of the Collection and List interfaces with such a collection, unless the clause that "You may not add elements twice" becomes part of the contract...
So again, all this depends on what you want to do with these methods, and which interface you really need. If you don't need the indexed access of List, then it's easy. Otherwise, referring to your example:
At one point I compare it against another ArrayList, to find their intersection. I know this is O(n^2).
You can avoid this by creating the sets locally:
static <T> List<T> computeIntersection(List<T> list0, List<T> list1)
{
Set<T> set0 = new LinkedHashSet<T>(list0);
Set<T> set1 = new LinkedHashSet<T>(list1);
set0.retainAll(set1);
return new ArrayList<T>(set0);
}
This will have a running time of O(n). Of course, if you do this frequently, but rarely change the contents of the lists, there may be options to avoid the copies, but for the reason mentioned above, maintainng the required data structures may become tricky.

Is there any advantage of using Value Object over Map in java

I have to store a product and its corresponding price.
Which of the below methods would be better:
Method 1:
public class Product {
String productName;
Integer productCost;
}
Method 2:
Map<String, Integer> product.
I assumed that using Map make things easier but still I was suggested to use value Object.
Is there any advantage of using Method 1 over Method 2 in terms of memory and performance? In what situation we can use these two methods?
You should consider using a Map if you need to quickly look up the price of an item based on its name. The difference here is that this lookup operation would be O(n) if you choose the first method and O(1) if you choose the second (since with the first you have to go through your entire collection of Products to find the one with the right name).
If you don't have too many products you want to store (e.g. ~10, as you mentioned), then the performance difference will probably be negligible, and you might be better off choosing whichever approach is easier to understand/manage. As the number of products you want to store increases substantially, then the difference can become more apparent.
Of course, if you don't need this quick look up feature then it doesn't make much sense to use a Map at all.
The value in the Object is recommended way when you want to keep the software quality high. With the Map you should "carry" it whenever you need to use the value of the Object and consequently you create a lot of copies of it. Getting the value from a Map has an impact on performace as well and you have moreover to save all the Objects created in it, preventing the garbage collector to sweap them when no more needed.

Efficient algorithm for filtering

I m working on a application which is a service. I am receiving a request object and I need to pass this object through set of filters and return the response. There are about 10 filters I need to pass the object through.
Currently the application is doing a sequential search on every filter as follows:
public List<Element) FilterA(Request request){
for(Element element in items)
{
// compare element to request object elements
// there are different field checking per object
}
}
So there is FilterB, FilterC etc. they are all done in similar fashion, within for loops different fields are being compared.
Can this be done via hashset? or Binary search?
Or is there an efficient algorithm. Essentially I d like to improve the O(n) to something less.
If you have n lists and f filters there are bascially only two approaches: iterate through the list and apply each filter to each individual element (keep it if it passes all of them, remove it otherwise); or do what you're doing now and let each filter iterate over the entire list. Both have a worst-case complexity of O(n*f), assuming O(1) element removal (I recommend using a LinkedList to achieve this, copy the contents to one if necessary).
You can really only improve upon this complexity by utilising properties of your input. Maybe you can combine multiple filters into one (when they're range checks, for instance) or maybe taking one element from the list will also result in the removal of others. Also, if you can guess which filters will probably remove more elements it will pay off to run these first.
So yeah, it really depends on what kind of stuff you're filtering and what your filters look like. In the most general case you can't win much (as long as you're already using lists from which you can remove elements in O(1) time) but you might gain something if you take knowledge of your input into account.

Optimal way of creating a SortedSet from a number of HashMap objects

I have a number of HashMap data structures containing hundreds of Comparable objects (say, of type MyClass) and need to put all the values (not the keys) in a single data structure, and then sort it.
Due to the volume and the arrival rate of MyClass objects, this procedure (performed at least once per millisecond) needs to be as efficient as possible.
An approach would be to use SortedSet, roughly as follows:
HashMap<String, MyClass>[] allMaps = ... // All the HashMaps
SortedSet<MyClass> set = new TreeSet<MyClass>();
Collection<MyClass> c;
for (HashMap<String, MyClass> m:allMaps)
{
c = m.values();
set.addAll(c);
}
It may be faster to pass a sorted collection to set.addAll(), which may re-sort the TreeSet on every insertion, or after every few insertions. However, to do so, a List needs to be passed to Collections.sort(), which means a conversion from Collection to List has to take place, i.e. another performance hit has to be sustained.
Also, there may be another, more efficient way of achieving the same goal.
Comments?
I think the answer kinda depends on how the MyClass data tends to change. For example, if you have a couple of new values coming in per timeframe, then you might want to consider keeping a hold of the last returned sorted set and a copy of the previous keys so that on the next run, you can just do a delta of the changes (i.e. find the new keys in the maps and manually insert them into the sorted set you returned last time).
This algorithm varies a bit if the MyClass objects might get removed from the maps. But the general thought is to make it faster, you have to find a way to perform incremental changes instead of reprocessing the whole set every time.

Categories

Resources