Given a simple Set<T>, what is a good way (fast, few lines of code) to get any value from the Set?
With a List, it's easy:
List<T> things = ...;
return things.get(0);
But, with a Set, there is no .get(...) method because Sets are not ordered.
A Set<T> is an Iterable<T>, so iterating to the first element works:
Set<T> things = ...;
return things.iterator().next();
Guava has a method to do this, though the above snippet is likely better.
Since streams are present, you can do it that way, too, but you have to use the class java.util.Optional. Optional is a wrapper-class for an element or explicitely no-element (avoiding the Nullpointer).
//returns an Optional.
Optional <T> optT = set.stream().findAny();
//Optional.isPresent() yields false, if set was empty, avoiding NullpointerException
if(optT.isPresent()){
//Optional.get() returns the actual element
return optT.get();
}
Edit:
As I use Optional quite often myself: There is a way for accessing the element or getting a default, in case it's not present:
optT.orElse(other) returns either the element or, if not present, other. other may be null, btw.
Getting any element from a Set or Collection may seem like an uncommon demand - if not arbitrary or eclectic - but, it is quite common when one, for example, needs to calculate statistics on Keys or Values objects in a Map and must initialise min/max values. The any element from a Set/Collection (returned by Map.keySet() or Map.values()) will be used for this initialisation prior to updating min/max values over each element.
So, what options one has when faced with this problem and at the same time trying to keep memory and execution time small and code clear?
Often you get the usual: "convert Set to ArrayList and get the first element". Great! Another array of millions of items and extra processing cycles to retrieve objects from Set, allocate array and populate it:
HashMap<K,V> map;
List<K> list = new ArrayList<V>(map.keySet()); // min/max of keys
min = max = list.get(0).some_property(); // initialisation step
for(i=list.size();i-->1;){
if( min > list.get(i).some_property() ){ ... }
...
}
Or one may use looping with an Iterator, using a flag to denote that min/max need to be initialised and a conditional statement to check if that flag is set for all the iterations in the loop. This implies a lot of conditional checking.
boolean flag = true;
Iterator it = map.keySet().iterator();
while( it.hasNext() ){
if( flag ){
// initialisation step
min = max = it.next().some_property();
flag = false;
} else {
if( min > list.get(i).some_property() ){ min = list.get(i).some_property() }
...
}
}
Or do the initialisation outside the loop:
HashMap<K,V> map;
Iterator it = map.keySet().iterator();
K akey;
if( it.hasNext() ){
// initialisation step:
akey = it.next();
min = max = akey.value();
do {
if( min > list.get(i).some_property() ){ min = akey.some_property() }
} while( it.hasNext() && ((akey=it.next())!=null) );
}
But is it really worth all this manouvre on the behalf of the programmer (and setting up the Iterator on behalf of the JVM) whenever min/max's needed?
The suggestion from a javally-correct ol' sport could well be: "wrap your Map in a class which keeps track of min and max values when put or deleted!".
There is another situation which in my experience the need for just any item from a Map arises. This is when the map contains objects which have a common property - all the same for all of them in that map - and you need to read that property. For example suppose there is a Map of holding bins of the same histogram which have the same number of dimensions. Given such a Map you may need to know the number of dimensions of just any Histobin in the Map in order to, say, create another Histobin of the same dimensions. Do I need to setup an iterator again and dispose it after calling next() just once? I will skip the javally-correct person's suggestion to this situation.
And if all the trouble in getting the any element causes insignificant memory and cpu cycles increase, then what about all the code one has to write just to get the hard-to-get any element.
We need the any element. Give it to us!
Related
I would like to discuss a bit of performance of a particular collection, LinkedHashMap, for a particular requirement and how Java 8 or 9 new features could help on that.
Let's suppose I have the following LinkedHashMap:
private Map<Product, Item> items = new LinkedHashMap<>();
Using the default constructor means this Map follows the insertion-order when it is iterated.
--EDITED--
Just to be clear here, I understand that Maps are not the right data structure to be accessed by index, it happens that this class needs actually two remove methods, one by Product, the right way, which is the key, and the other by position, or index, which is not common so that's my concern about performance. BTW, it's not MY requirement.
I have to implement a removeItem() method by index. For those that doesn't know, a LinkedHashMap doesn't have some sort of map.get(index); method available.
So I will list a couple of solutions:
Solution 1:
public boolean removeItem(int position) {
List<Product> orderedList = new ArrayList<>(items.keySet());
Product key = orderedList.get(position);
return items.remove(key) != null;
}
Solution 2:
public boolean removeItem(int position) {
int counter = 0;
Product key = null; //assuming there's no null keys
for(Map.Entry<Product, Item> entry: items.entrySet() ){
if( counter == position ){
key = entry.getKey();
break;
}
counter++;
}
return items.remove(key) != null;
}
Considerations about these 2 solutions.
S1: I understand that ArrayLists have fast iteration and access, so I believe the problem here is that a whole new collection is being created, so the memory would be compromised if I had a huge collection.
S2: I understand that LinkedHashMap iteration is faster than a HashMap but not as fast as an ArrayList, so I believe the time of iteration here would be compromised if we had a huge collection, but not the memory.
Considering all of that, and that my considerations are correct, can I say that both solutions have O(n) complexity?
Is there a better solution for this case in terms of performance, using the latest features of Java 8 or 9?
Cheers!
As said by Stephen C, the time complexity is the same, as in either case, you have a linear iteration, but the efficiency still differs, as the second variant will only iterate to the specified element, instead of creating a complete copy.
You could optimize this even further, by not performing an additional lookup after finding the entry. To use the pointer to the actual location within the Map, you have to make the use of its Iterator explicit:
public boolean removeItem(int position) {
if(position >= items.size()) return false;
Iterator<?> it=items.values().iterator();
for(int counter = 0; counter < position; counter++) it.next();
boolean result = it.next() != null;
it.remove();
return result;
}
This follows the logic of your original code to return false if the key was mapped to null. If you never have null values in the map, you could simplify the logic:
public boolean removeItem(int position) {
if(position >= items.size()) return false;
Iterator<?> it=items.entrySet().iterator();
for(int counter = 0; counter <= position; counter++) it.next();
it.remove();
return true;
}
You may retrieve a particular element using the Stream API, but the subsequent remove operation requires a lookup which makes it less efficient as calling remove on an iterator which already has a reference to the position in the map for most implementations.
public boolean removeItem(int position) {
if(position >= items.size() || position < 0)
return false;
Product key = items.keySet().stream()
.skip(position)
.findFirst()
.get();
items.remove(key);
return true;
}
Considering all of that, and that my considerations are correct, can I say that both solutions have O(n) complexity?
Yes. The average complexity is the same.m
In the first solution the new ArrayList<>(entrySet) step is O(N).
In the second solution the loop is O(N).
There is difference in the best case complexity though. In the first solution you always copy the entire list. In the second solution, you only iterate as far as you need to. So the best case is that it can stop iterating at the first element.
But while the average complexity is O(N) in both cases, my gut feeling is that the second solution will be fastest. (If it matters to you, benchmark it ...)
Is there a better solution for this case in terms of performance, using the latest features of Java 8 or 9?
Java 8 and Java 9 don't offer any performance improvements.
If you want better that O(N) average complexity, you will need a different data structure.
The other thing to note is that indexing the Map's entry sets is generally not a useful thing to do. Whenever an entry is removed from the set, the index values for some of the other entries change ....
Mimicking this "unstable" indexing behavior efficiently is difficult. If you want stable behavior, then you can augment your primary HashMap<K,V> / LinkedHashMap<K,V> with a HashMap<Integer,K> which you use for positional lookup / insertion / retrieval. But even that is a bit awkward ... considering what happens if you need to insert a new entry between entries at positions i and i + 1.
Below is a simple for loop I am using to try and go through and find the repeated ID's in a array list. The problem is that it only checks one index to the right so quite clearly if there is the same ID two, three or even four indexes across it will miss it and not report it as a repeated ID.
Obviously the goal of this code is to move through each index of the array list, get the ID and check if there are any other identical ID's.
Note for the below arraylist is...arraylist, the getId method simply returns the user ID for that array object.
for (int i=0; i<arraylist.size()-1; i++) {
if (arraylist.get(i).getId() == arraylist.get(i+1).getId()) {
System.out.println(arraylist.get(i).getId());
}
}
What I've tried and keep coming back to is to use two embedded for loops, one for iterating through the array list and one for iterating through an array with userIDs. What I planned on doing is checking if the current arraylist ID was the same as the array with 'pure' IDs and if it wasn't I would add it to the array of 'pure IDs. It would look something like this in psudocode.
for i<-0 i<arraylist size-1 i++
for j<-0 j<pureArray size j++
if arraylist.getId(i) != pureArray[j] then
increment pureArray size by one
add arraylist.getId(i) to pureArray
In practice perhaps due to my poor coding, this did not work.
So any opinions on how I can iterate completely through my arraylist then check and return if any the gotten IDs have multiple entries.
Thank you.
Looking at leifg's answer on this similar question, you can use two sets, one for duplicates and one for everything else, and you can Set#add(E), which "returns true if this set did not already contain the specified element," to determine whether or not the element is a duplicate. All you have to do is change the sets generics and what you are adding to them:
public Set<Integer> findDuplicates(List<MyObject> listContainingDuplicates)
{
// Assuming your ID is of type int
final Set<Integer> setToReturn = new HashSet();
final Set<Integer> set1 = new HashSet();
for (MyObject object : listContainingDuplicates)
{
if (!set1.add(object.getID()))
{
setToReturn.add(object.getID());
}
}
return setToReturn;
}
For the purpose of getting duplicates, nested for loop should do the job, see the code below. One more thing is what would you expect this nested for loop to do.
Regarding your pseudocode:
for i<-0 i<arraylist size i++
for j<-i+1 j<arraylist size j++
if arraylist.getId(i) != arraylist.getId(j) then
add arraylist.getId(i) to pureArray
1) Regarding j<- i+1, with every iteration you do not want to compare the same thing many times. With this set up you can make sure you compare first with others, then move to second and compare it to the rest (not including first because you already did this comparison) etc.
2) Incrementing your array every single iteration is highly impractical as you will need to remap and create a new array every single iteration. I would rather make sure array is big enough initially or use other data structure like another ArrayList or just string.
Here is a small demo of what I did, just a quick test, far no perfect.
import java.util.ArrayList;
public class Main {
public static void main(String[] args) {
// create a test array with ID strings
ArrayList test = new ArrayList<>();
test.add("123");
test.add("234");
test.add("123");
test.add("123");
String duplicates = "";
for(int i = 0; i < test.size(); i++) {
for(int j = i+1; j < test.size(); j++) {
// if values are equal AND current value is not already a part
// of duplicates string, then add it to duplicates string
if(test.get(i).equals(test.get(j)) && !duplicates.contains(test.get(j).toString())) {
duplicates += " " + test.get(j);
}
}
}
System.out.println(duplicates);
}
}
Purely for the purpose of finding duplicates, you can also create a HashSet and iteratively add the objects(ID's in your case)to the HashSet using .add( e) method.
Trick with HashSet is that it does not allow duplicate values and .add( e) method will return false if the same value is passed.
But be careful of what values(objects) you are giving to the .add() method, since it uses .equal() to compare whatever you're feeding it. It works if you pass Strings as a value.
But if you're giving it an Object make sure you override .equals() method in that object's class definition (because that's what .add() method will use to compare the objects)
I'm iterating through a huge file reading key and value from every line. I need to obtain specific number (say 100k) of elements with highest values. To store them I figured that I need a collection that allows me to check a minimum value in O(1) or O(log(n)) and if currently read value is higher then remove element with minimum value and put new one. What collection enables me to do that? Values are not unique so BiMap is probably not adequate here.
EDIT:
Ultimate goal is to obtain best [key, value] that will be used later. Say my file looks like below (first column - key, second value):
3 6
5 9
2 7
1 6
4 5
Let's assume I'm looking for best two elements and algorithm to achieve that. I figured that I'll use a key-based collection to store best elements. First two elements (<3, 6>, <5, 9>) will be obviously added to the collection as its capacity is 2. But when I get to the third line I need to check if <2, 7> is eligible to be added to the collection (so I need to be able to check if 7 is higher than minimum value in collection (6)
It sounds like you don't actually need a structure because you are simply looking for the largest N values with their corresponding keys, and the keys are not actually used for sorting or retrieval for the purpose of this problem.
I would use the PriorityQueue, with the minimum value at the root. This allows you to retrieve the smallest element in constant time, and if your next value is larger, removal and insertion in O(log N) time.
class V{
int key;
int value;
}
class ComparatorV implements Comparator<V>{
int compare(V a, V b){
return Integer.compare(a.value, b.value);
}
}
For your specific situation, you can use a TreeSet, and to get around the uniqueness of elements in a set you can store pairs which are comparable but which never appear equal when compared. This will allow you to violate the contract with Set which specifies that the Set not contain equal values.
The documentation for TreeSet contains:
The behavior of a set is well-defined even if its ordering is
inconsistent with equals; it just fails to obey the general contract
of the Set interface
So using the TreeSet with the Comparable inconsistent with equals should be fine in this situation. If you ever need to compare your chess pairs for a different reason (perhaps some other algorithm you are also running in this app) where the comparison should be consistent with equals, then provide a Comparator for the other use. Notice that TreeSet has a constructor which takes a Comparator, so you can use that instead of having ChessPair implement Comparable.
Notice: A TreeSet provides more flexibility than a PriorityQueue in general because of all of its utility methods, but by violating the "comparable consistent with equals" contract of Set some of the functionality of the TreeSet is lost. For example, you can still remove the first element of the set using Set.pollFirst, but you cannot remove an arbitrary element using remove since that will rely on the elements being equivalent.
Per your "n or at worst log(n)" requirement, the documentation also states:
This implementation provides guaranteed log(n) time cost for the basic
operations (add, remove and contains).
Also, I provide an optimization below which reduces the minimum-value query to O(1).
Example
Set s = new TreeSet<ChessPair>();
and
public class ChessPair implements Comparable<ChessPair>
{
final int location;
final int value;
public ChessPair(final int location, final int value)
{
this.location = location;
this.value = value;
}
#Override
public int compareTo(ChessPair o)
{
if(value < o.value) return -1;
return 1;
}
}
Now you have an ordered set containing your pairs of numbers, they are ordered by your value, you can have duplicate values, and you can get the associated locations. You can also easily grab the first element (set.first), last (set.last), or get a sub-set (set.subSet(a,b)), or iterate over the first (or last, by using descendingSet) n elements. This provides everything you asked for.
Example Use
You specified wanting to keep the 100 000 best elements. So I would use one algorithm for the first 100 000 possibilities which simply adds every time.
for(int i = 0; i < 100000 && dataSource.hasNext(); i += 1)
{
ChessPair p = dataSource.next(); // or whatever you do to get the next line
set.add(p);
}
and then a different one after that
while(dataSource.hasNext())
{
ChessPair p = dataSource.next();
if(p.value > set.first().value)
{
set.remove(set.pollFirst());
set.add(p);
}
}
Optimization
In your case, you can insert an optimization into the algorithm where you compare against the lowest value. The above, simple version performs an O(log(n)) operation every time it compares against minimum-value since set.first() is O(log(n)). Instead, you can store the minimum value in a local variable.
This optimization works well for scaling this algorithm because the impact is negligible - no gain, no loss - when n is close to the total data set size (ie: you want best 100 values out of 110), but when the total data set is vastly larger than n (ie: best 100 000 out of 100 000 000 000) the query for the minimum value is going to be your most common operation and will now be constant.
So now we have (after loading the initial n values)...
int minimum = set.first().value;
while(dataSource.hasNext())
{
ChessPair p = dataSource.next();
if(p.value > minimum)
{
set.remove(set.pollFirst());
set.add(p);
minimum = set.first().value;
}
}
Now your most common operation - query minimum value - is constant time (O(1)), your second most common operation - add - is worst case log(n) time, and your least most common operation - remove - is worst case log(n) time.
For arbitrarily large data sets, each input is now processed in constant O(1) time.
See java.util.TreeSet
Previous answer (now obsolete)
Based on question edits and discussion in the question's comments, I no longer believe my original answer to be correct. I am leaving it below for reference.
If you want a Map collection which allows fast access to elements based on order, then you want an ordered Map, for which there is a sub-interface SortedMap. Fortunately for you, Java has a great implementation of SortedMap: it's TreeMap, a Map which is backed by a "red-black" tree structure which is an ordered tree.
Red-black-trees are nice since they rotate branches in order to keep the tree balanced. That is, you will not end up with a tree that branches n times in one direction, yielding n layers, just because your data may already have been sorted. You are guaranteed to have approximately log(n) layers in the tree, so it is always fast and guarantees log(n) query even for worst-case.
For your situation, try out the java.util.TreeMap. On the page linked in the previous sentence, there are links also to Map and SortedMap. You should check out the one for SortedMap too, so you can see where TreeMap gets some of the specific functionality that you are looking for. It allows you to get the first key, the last key, and a sub-map that fetches a range from within this map.
For your situation though, it is probably sufficient to just grab an iterator from the TreeMap and iterate over the first n pairs, where n is the number of lowest (or highest) values that you want.
Use a TreeSet, which offers O(log n) insertion and O(1) retrieval of either the highest or lowest scored item.
Your item class must:
Implement Comparable
Not implement equals()
To keep the top 100K items only, use this code:
Item item; // to add
if (treeSet.size() == 100_000) {
if (treeSet.first().compareTo(item) < 0) {
treeSet.remove(treeSet.first());
treeSet.add(item);
}
} else {
treeSet.add(item);
}
If you want a collection ordered by values, you can use a TreeSet which stores tuples of your keys and values. A TreeSet has O(log(n)) access times.
class KeyValuePair<Key, Value: Comparable<Value>> implements Comparable<KeyValuePair<Key, Value>> {
Key key;
Value value;
KeyValuePair(Key key, Value value) {
this.key = key;
this.value = value;
}
public int compare(KeyValuePair<Key, Value> other) {
return this.value.compare(other.value);
}
}
or instead of implementing Comparable, you can pass a Comparator to the set at creation time.
You can then retrieve the first value using treeSet.first().value.
Something like this?
entry for your data structure, that can be sorted based on the value
class Entry implements Comparable<Entry> {
public final String key;
public final long value;
public Entry(String key, long value) {
this.key = key;
this.value = value;
}
public int compareTo(Entry other) {
return this.value - other.value;
}
public int hashCode() {
//hashcode based on the same values on which equals works
}
}
actual code that works with a PriorityQueue. The sorting is based on the value, not on the key as with a TreeMap. This is because of the compareMethod defined in Entry. If the sets grows above 100000, the lowest entry (with the lowest value) is removed.
public class ProcessData {
private int maxSize;
private PriorityQueue<Entry> largestEntries = new PriorityQueue<>(maxSize);
public ProcessData(int maxSize) {
this.maxSize = maxSize;
}
public void addKeyValue(String key, long value) {
largestEntries.add(new Entry(key, value));
if (largestEntries.size() > maxSize) {
largestEntries.poll();
}
}
}
I have a scenario in which I have a method who gets results as an Arraylist in a form like as shown in the below picture.
So, as a brief explanation to the picture, I will get Result 1 as the first chunk of objects, then I will be getting Result 2 which actually contains Result 1 and a new set of objects, and this goes on.
Note: All these chunk of objects will contain duplicates. So I will have to filter this out.
My aim is to create one single list out of these chunks without having any duplicates and have only one object from a family(one special character of these objects).
Please find the current code snippet, used in the synchronized method I call whenever I get a chunk of result, which I am using to implement this:
On every result update, this method will be called with the result arrayList.
private synchronized void processRequestResult(QueryResult result)
{
ArrayList currArrayList = result.getResultsList();
ArrayList tempArrayList = result.getResultsList();
/**
* Remove all elements in prevArrayList from currArrayList
*
* As per the javadocs, this would take each record of currArrayList and compare with each record of prevArrayList,
* and if it finds both equal, it will remove the record from currArrayList
*
* The problem is that its easily of n square complexity.
*/
currArrayList.removeAll(prevArrayList);
// Clone and keep the currList for dealing with next List
prevArrayList = (ArrayList) tempArrayList.clone();
for (int i = 0; i < currArrayList.size(); i++)
{
Object resultObject = currArrayList.get(i);
// Check for if it reached the max of items to be displayed in the list.
if (hashMap.size() >= MAX_RESULT_LIMIT)
{
//Stop my requests
//Launch Message
break;
}
//To check if of the same family or duplicate
if (resultObject instanceof X)
{
final Integer key = Integer.valueOf(resultObject.familyID);
hashMap.put(key, (X)myObject);
}
else if (resultObject instanceof Y)
{
final Integer key = Integer.valueOf(resultObject.familyID);
hashMap.put(key, (Y)myObject);
}
}
// Convert the HashSet to arrayList
allResultsList = new ArrayList(hashMap.values());
//Update the change to screen
}
In theory, I should only try to parse the delta object in the result which I receive next. So I went for removeAll method of an arrayList and then check for duplicates and same family by using a hashMap.
Please see my inline comments in the code, because of that, I would like to get some pointers to improve my performance to this process.
Update:
The special character of these object are that, a set of objects can belong to the same family(an ID), So only one object from each family should be present in the Final List.
SO that was the reason why I used a hashMap and made the familyID as the key.
I don't understand the diagram or the code, but I'm assuming the requirement is to create a List of elements that are unique.
Firstly, a Set is really what you need:
Set<MyClass> set = new HashSet<MyClass>();
Every time you get a new list of results:
set.addAll(list);
If you really need a List:
List<MyClass> list = new ArrayList<MyClass>(set);
lets say I have an List. There is no problem to modify list's item in for loop:
for (int i = 0; i < list.size(); i++) { list.get(i).setId(i); }
But I have a SortedSet instead of list. How can I do the same with it?
Thank you
First of all, Set assumes that its elements are immutable (actually, mutable elements are permitted, but they must adhere to a very specific contract, which I doubt your class does).
This means that generally you can't modify a set element in-place like you're doing with the list.
The two basic operations that a Set supports are the addition and removal of elements. A modification can be thought of as a removal of the old element followed by the addition of the new one:
You can take care of the removals while you're iterating, by using Iterator.remove();
You could accumulate the additions in a separate container and call Set.addAll() at the end.
You cannot modify set's key, because it causes the set rehasing/reordering. So, it will be undefined behaviour how the iteration will run further.
You could remove elements using iterator.remove(). But you cannot add elements, usually better solution is to accumulate them in a new collection and addAll it after the iteration.
Set mySet = ...;
ArrayList newElems = new ArrayList();
for(final Iterator it = mySet.iterator(); it.hasNext(); )
{
Object elem = it.next();
if(...)
newElems.add(...);
else if(...)
it.remove();
...
}
mySet.addAll(newElems);
Since Java 1.6, you're able to use a NavigableSet.
You should use an Iterator or better still the enhanced for-loop syntax (which depends on the class implementing the Iterable interface), irrespective of the Collection you're using. This abstracts away the mechanism used to traverse the collection and allows a new implementation to be substituted in without affecting the iteration routine.
For example:
Set<Foo> set = ...
// Enhanced for-loop syntax
for (Foo foo : set) {
// ...
}
// Iterator approach
Iterator it = set.iterator();
while (it.hasNext()) {
Foo foo = it.next();
}
EDIT
Kan makes a good point regarding modifying the item's key. Assuming that your class's equals() and hashCode() methods are based solely on the "id" attribute (which you're changing) the safest approach would be to explicitly remove these from the Set as you iterate and add them to an "output" Set; e.g.
SortedSet<Foo> input = ...
SortedSet<Foo> output = new TreeSet<Foo>();
Iterator<Foo> it = input.iterator();
while (it.hasNext()) {
Foo foo = it.next();
it.remove(); // Remove from input set before updating ID.
foo.setId(1);
output.add(foo); // Add to output set.
}
You cannot do that. But you may try, maybe you'll succeed, maybe you'll get ConcurrentModificationException. It's very important to remember, that modifying elements while iterating may have unexpected results. You should instead collect that elements in some collection. And after the iteration modify them one by one.
This will only work, if id is not used for equals, or the comperator you used for the sorted set:
int counter = 0;
for(ElementFoo e : set) {
e.setId(counter);
couter++;
}