I have a program that collects objects over time. Those objects are often, but not always duplicates of objects the program has already received. The number of unique objects can sometimes be up in the tens of thousands. As my lists grow, it takes more time to identify whether an object has appeared or not before.
My current method is to store everything in an ArrayList, al; use Collections.sort(al); and use Collections.binarySearch(al, key) to determine whether I've used an object. Everytime I come across a new object I have to insert and sort however.
I'm wondering if there's just a better way to do this. Contains tends to slow up too quickly. I'm looking for something as close to O(1) as possible.
Thanks much.
This is java. For the purpose of understanding what I'm talking about, I basically need a method that does this:
public boolean objectAlreadyUsed(Object o) {
return \\ Have we seen this object already?
}
Instead of using an ArrayList, why wouldn't you use a Set implementation (likely a HashSet)? You'll get constant-time lookup, no sorting needed.
N.B. your objects will need to correctly override hashCode() and equals().
This begs the question - why not use a data structure that doesn't allow duplicates (e.g. Set)? If you attempt to add a duplicate item, the method will return false and the data structure will remain unchanged.
Make sure the objects have correct equals() and hashCode() methods, and store them in a HashSet. Lookup then becomes constant time.
If retaining unwanted objects becomes an issue, by the way, you could consider using one of the many WeakHashSet implementations available on the Internet -- it will hold the objects but still allow them to be garbage collected if necessary.
Related
I have a large list (about 12,000 objects) of custom Objects inside which I will have to search for a specific Object a number of times. As of now, I am mostly using brute force to find the object, but it becomes extremely slow as the list grows larger. This is how I search as of now:
List<MyObject> objectsToSearch; //List containing about 12000 objects
MyObject objectToCompare = new MyObject("this is a parameter"); //Object to compare with list
for(MyObject compareFrom : objectsToSearch){
if(compareFrom.equals(objectToCompare)){
System.out.println("Object found");
}
}
Surely there must be a better way to achieve this. Increasing performance becomes especially important since I will be needing to perform this operation multiple times.
Despite my research I haven't found any detailed tutorial. How do I achieve this?
Make your class MyObject implement the Comparator and then sort the list using Collections.sort() and then you can apply Collections.binarySearch()
Are you sure that you really need a List? It seems that you are just checking the object for presence. If you don't need to keep the order of the objects and if it's sufficient to keep each object only once, consider using a Set instead, most probably HashSet.
Do not forget to implement equals() and hashCode() for the objects you store otherwise HashSet will not work.
If you have implemented equals method for your custom class correctly you can use the contains(Object o) method.
if(objectsToSearch.contains(objectToCompare)){
System.out.println("Object found");
Ive got one question. What happens when I try to add the "same" object twice to an ArrayList. With "the same" I mean an object of an individual class, which is identified as the same with equals() and hashCode(). It has different values for most of the member variables and was created from maybe different threads, but for equals() and hashCode() its the "same".
Does the second object then replace the first object?
Also, what happens if two threads try to add those objects exactly at the same time to the ArrayList? Is this even possible? If yes, what happens?
Thank you! :-)
[EDIT] Thanks for all the answers! Should I use synchronizedList then rather then using "synchronize(list){}"? --> I read the docs, even with synchronizedList, for iterating synchronize(list) shall be used
[EDIT2]
Can a synchronizedList be declared as a member variable? I tried, but it didnt work.
No, ArrayList doesn't attempt to detect duplicates at all - you can have an ArrayList with the exact same reference appearing multiple times. If you want a collection to avoid duplicates, you need a Set implementation - and if you also want to preserve insertion order, you probably want LinkedHashSet.
Note, however, that without locking ArrayList should not be mutated from multiple threads in the first place - it's simply not meant to be a thread-safe collection in that way. Several threads can read from an ArrayList without synchronization, but not mutate it. From the docs:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method. This is best done at creation time, to prevent accidental unsynchronized access to the list
If you want to mutate a collection from multiple threads without locking, I suggest you look at the collections in java.util.concurrent.
Does the second object then replace
the first object?
No, most developers do explicit checks
if(!list.contains(foo)){
list.add(foo);
}
Also, what happens if two threads try
to add those objects exactly at the
same time to the ArrayList? Is this
even possible? If yes, what happens?
Yes, this is possible. If multiple threads write to/read from the same ArrayList, then use the synchronized keyword whenever you access this list
public List<Foo> getFoos(){
synchronized(list){
return list;
}
}
public void addFoo(Foo foo){
synchronized(list){
list.add(foo);
}
}
EDIT
As someone pointed out, I suppose checking whether or not the ArrayList contains the object to be added is quite expensive. If you want to ensure that the object is only added once, I'd follow the recommendation made below of using a LinkedHashSet. According to the API, when attempting to add to this data structure it
Adds the specified element to this set
if it is not already present. More
formally, adds the specified element e
to this set if this set contains no
element e2 such that (e==null ?
e2==null : e.equals(e2)). If this set
already contains the element, the call
leaves the set unchanged and returns
false.
It will allow to add simply. List has nothing to do with hashCode(), equals() while insertion it doesn't care for duplicate.
ArrayList isn't thread safe so you might not get desired result. you can have synchronizedList from Collections class
An ArrayList can contain multiple references to the same exact object (identity equivalence). It doesn't check equals() or hashCode() when adding objects.
You will just end up with two references in your ArrayList.
ArrayList is NOT thread-safe...so the behaviour if you try to have two threads add at the same time is undefined. Maybe try using a SynchronizedList if you want to do something like that.
If you try to add the same object twice, it will work, or if you try to add 2 objects with everything the same, it will still work. It is not best practice to do that because its harder to maintain the list.
overall: you shouldn't do it
In my implementation, I have a class A which overrides equals(Object) and hashCode(). But I have a small doubt that is, while adding the instance of A to HashSet/HashMap the value of the hashCode() is x, after sometime the value of the same hashCode() changed to y. Will it effect anything?
The hash code mustn't change after it's been added to a map / set. It's okay for it to change before that, although it generally makes the type easier to work with if it doesn't change.
If the hash code changes, the key won't be found in the map / set, as even if it ends up in the same bucket, the hash code will be changed first.
When the return value of hashCode() or equals() changes while the object is contained in HashMap/HashSet etc., the behavior is undefined (you could get all kinds of strange behavior). So one must avoid such mutation of keys while the object is contained in such collections etc.
It is considered best to use only immutable objects for keys (or place them in a HashSet etc.). In fact python for example, does not allow mutable objects to be used as keys in maps. It is permissive/common to use mutable objects as keys in Java, but in such case it is advisable to make such objects "effectively immutable". I.e. do not change the state of such objects at all after instantiation.
To give an example, using a list as a key in a Map is usually considered okay, but you should avoid mutating such lists at any point of your application to avoid getting bitten by nasty bugs.
As long as you don't change the return value of hashCode() and equals() while the objects are in the container, you should be ok on paper. But one could easily introduce nasty, hard to find bugs by mistake so it's better to avoid the situation altogether.
Yes, the hash code of an object must not change during its lifetime. If it does, you need to notify the container (if that's possible); otherwise you will can get wrong results.
Edit: As pointed out, it depends on the container. Obviously, if the container never uses your hashCode or equals methods, nothing will go wrong. But as soon as it tries to compare things for equality (all maps and sets), you'll get yourself in trouble.
Yes. Many people answered the question here, I just want to say an analogy. Hash code is something like address in hash-based collection:
Imagine you check in a hotel by your name "Mike", after that you change your name to "GreatMike" on check-paper. Then when someone looks for you by your name "Mike", he cannot find you anymore.
I'm making a game in Java. I need some solution for my current runtime allocation, caused by my ArrayList. Every single minute or 30 seconds the garbage collector starts to runs because of I am calling for draw and updates-method through this collection.
How should I be able to do a non runtime allocation solution?
Thanks in advance and if needed, my code is posted below from my Manager class which contains the ArrayList of objects.:
Some code:
#Override
public void draw(GL10 gl) {
final int size = objects.size();
for(int x = 0; x < size; x++) {
Object object = objects.get(x);
object.draw(gl);
}
}
public void add(Object parent) {
objects.add(parent);
}
//Get collection, and later we call the draw function from these objects
public ArrayList<Object> getObjects() {
return objects;
}
public int getNumberOfObjects() {
return objects.size();
}
More explanation: The reason I mix with this is because (1) I see that the ArrayList implementation is slow and causing lags and (2) that I want to merge the objects/components together. When firing an update call from my Thread-class, it goes through my collection, send things down the tree/graph using the Manager's update function.
When looking at an Open Source project, Replica Island, I found that he used an alternative class FixedSizeArray that he wrotes on his own. Since I'm not that good at Java, I wanted to make things easier and now I'm looking for another solution. And at last, he explained WHY he made the special class:
FixedSizeArray is an alternative to a standard Java collection like ArrayList. It is designed to provide a contiguous array of fixed length which can be accessed, sorted, and searched without requiring any runtime allocation. This implementation makes a distinction between the "capacity" of an array (the maximum number of objects it can contain) and the "count" of an array (the current number of objects inserted into the array). Operations such as set() and remove() can only operate on objects that have been explicitly add()-ed to the array; that is, indexes larger than getCount() but smaller than getCapacity() can't be used on their own.
I see that the ArrayList implementation is slow and causing lags ...
If you see that, you are misinterpreting the evidence and jumping to unjustifiable conclusions. ArrayList is NOT slow, and it does NOT cause lags ... unless you use the class in a particularly suboptimal way.
The only times that an array list allocates memory are when you create the list, add more elements, copy the list, or call iterator().
When you create the array list, 2 java objects are created; one for the ArrayList and one for its backing array. If you use the initialCapacity argument and give an appropriate value, you can arrange that subsequent updates will not allocate memory.
When you add or insert an element, the array list may allocate one new object. But this only happens when the backing array is too small to hold all of the elements, and when it does happen the new backing array is typically twice the size of the old one. So inserting N elements will result in at most log2(N) allocations. Besides, if you create the array list with an appropriate initialCapacity, you can guarantee that there are zero allocations on add or insert.
When you copy a list to another list or array (using toArray or a copy constructor) you will get 1 or 2 allocations.
The iterator() method creates a new object each time you call it. But you can avoid this by iterating using an explicit index variable, List.size() and List.get(int). (Be aware that for (E e : someList) { ... } implicitly calls List.iterator().)
(External operations like Collections.sort do entail extra allocations, but that is not the fault of the array list. It will happen with any list type.)
In short, the only way you can get lots of allocations using an array list is if you create lots of array lists, or use them unintelligently.
The FixedSizedArray class you have found sounds like a waste of time. It sounds like it is equivalent to creating an ArrayList with an initial capacity ... with the restriction that it will break if you get the initial capacity wrong. Whoever wrote it probably doesn't understand Java collections very well.
It's not quite clear what you are asking, but:
If you know at compile time what objects should be in the collection, make it an array not an ArrayList and set the contents in an initialisation block.
Object[] objects = new Object[]{obj1,obj2,obj3};
What makes you think you know what the GC is reclaiming? Have you profiled your application?
What do you mean by "non-runtime allocation"? I'm really not even sure what you mean by "allocation" in this context... allocation of memory? That's done at runtime, obviously. You clearly aren't referring to any kind of fixed pool of objects that are known at compile time either, since your code allows adding objects to your list several different ways (not that you'd be able to allocate anything for them at compile time even if you were).
Beyond that, nothing in the code you've posted is going to cause garbage collection by itself. Objects can only be garbage collected when nothing in the program has a strong reference to them, and your posted code only allows adding objects to the ArrayList (though they can be removed by calling getObjects() and removing from that, of course). As long as you aren't removing objects from the objects list, you aren't reassigning objects to point to a different list, and the object containing it isn't itself becoming eligible for garbage collection, none of the objects it contains will ever be available for garbage collection either.
So basically, there isn't any specific problem with the code you've posted and your question doesn't make sense as asked. Perhaps there are more details you can provide or there's a better explanation of what exactly your issue is and what you want. If so, please try to add that to your question.
Edit:
From the description of FixedSizeArray and the code I looked at in it, it seems largely equivalent to an ArrayList that is initialized with a specific array capacity (using the constructor that takes an int initialCapcacity) except that it will fail at runtime if something tries to add to it when its array is full, where ArrayList will expand itself to hold more and continue working just fine. To be honest, it seems like a pointless class, possibly written because the author didn't actually understand ArrayList.
Note also that its statement about "not requiring any runtime allocation" is a bit misleading... it does of course have to allocate an array when it is created, but it just refuses to allocate a new array if its initial array fills up. You can achieve the same thing using ArrayList by simply giving it an initialCapacity that is at least large enough to hold the maximum number of objects you will ever add to it. If you do so, and you do in fact ensure you never add more than that number of objects to it, it will never allocate a new array after it is created.
However, none of this relates in any way to your stated issue about garbage collection, and your code still doesn't show anything that would cause huge numbers of objects to be garbage collected. If there is any issue at all, it may relate to the code that is actually calling the add and getObjects methods and what it's doing.
Say you are adding x number of objects to a collection, and after or before adding them to a collection you are modifying the objects attributes. When would you add the element to the collection before or after the object has been modified.
Option A)
public static void addToCollection(List<MyObject> objects) {
MyObject newObject = new MyObject();
objects.add(newObject);
newObject.setMyAttr("ok");
}
Option B)
public static void addToCollection(List<MyObject> objects) {
MyObject newObject = new MyObject();
newObject.setMyAttr("ok");
objects.add(newObject);
}
To be on the safe side, you should modify before adding, unless there is a specific reason you cannot do this, and you know the collection can handle the modification. The example can reasonably be assumed to be safe, since the general List contract does not depend upon object attributes - but that says nothing about specific implementations, which may have additional behavior that depends upon the object's value.
TreeSet, and Maps in general do no tolerate modifying objects after they have been inserted, because the structure of the collection is dependent upon the attributes of the object. For trees, any attributes used by the comparator cannot be changed once the item has been added. For maps, it's the hashCode that must remain constant.
So, in general, modify first, and then add. This becomes even more important with concurrent collections, since adding first can lead to other collection users seeing an object before it been assigned it's final state.
The example you provided won't have any issues because you're using a List collection which doesn't care about the Object contents.
If you were using something like TreeMap which internally sorts the contents of the Object keys it stores it could cause the Collection to get into an unexpected state. Again this depends on if the equals method uses the attribute you're changing to compare.
The safest way is to modify the object before placing it into the collection.
One of the good design rules to follow, is not to expose half-constructed object to a 3rd party subsystem.
So, according to this rule, initialize your object to the best of your abilities and then add it to the list.
If objects is an ArrayList then the net result is probably the same, however imaging if objects is a special flavor of List that fires some kind of notification event every time a new object is added to it, then the order will matter greatly.
In my opinion its depend of the settted attribure and tyle of collection, if the collection is a Set and the attribute have infulance on the method equal or hascode then definitely i will set this property before this refer also to sorterd list etc. in other cases this is irrelevant. But for this exapmle where object is created i will first set the atributes than add to collection because the code is better organized.
I think either way it's the same, personally I like B, :)
It really does boil down to what the situation requires. Functionally there's no difference.
One thing you should be careful with, is being sure you have the correct handle to the object you want to modify.
Certainly in this instance, modifying the object is part of the "create the object" thought, and so should be grouped with the constructor as such. After you "create the object" you "add it to the collection". Thus, I would do B, and maybe even add a blank line after the modification to give more emphasis on the two separate thoughts.