I need to delete some objects from an ArrayList if they meet a condition and I'm wondering which way could be more efficient.
Here's the situation: I have a class that contains an ArrayList containing some other objects. I have to iterate over this ArrayList and delete all elements meeting a certain condition.
As far as I know, those would be my options to delete:
Create a new ArrayList and add the elements that doesn't meet the condition. After the iteration, swap from the old arraylist to the new one without the elements.
Create a new ArrayList and add the elements that meet the condition. After the iteration, use the removeAll() method passing the ArrayList with the objects to be deleted.
Is there a more efficient way to delete objects from an ArrayList?
You could iterate backwards and remove as you go through the ArrayList. This has the advantage of subsequent elements not needing to shift and is easier to program than moving forwards.
Another way: The Iterator has an optional remove()-method, that is implemented for ArrayList. You can use it while iterating.
I don't know though, which variant is the most performant, you should measure it.
starblue commented, that the complexity isn't good, and that's true (for removeAll() too), because ArrayList has to copy all elements, if in the middle is an element added or removed. For that cases should a LinkedList work better. But, as we all don't know your real use-cases the best is too measure all variants, to pick the best solution.
Most performant would, I guess, be using the listIterator method and do a reverse iteration:
for (ListIterator<E> iter = list.listIterator(list.size()); iter.hasPrevious();){
if (weWantToDelete(iter.previous())) iter.remove();
}
Edit: Much later, one might also want to add the Java 8 way of removing elements from a list (or any collection!) using a lambda or method reference. An in-place filter for collections, if you like:
list.removeIf(e -> e.isBad() && e.shouldGoAway());
This is probably the best way to clean up a collection. Since it uses internal iteration, the collection implementation could take shortcuts to make it as fast as possible (for ArrayLists, it could minimize the amount of copying needed).
Obviously, of the two methods you mention number 1 is more efficient, since it only needs to go through the list once, while with method number 2 the list has to be traversed two times (first to find the elements to remove, and them to remove them).
Actually, removing a list of elements from another list is likely an algorithm that's worse than O(n) so method 2 is even worse.
The iterator method:
List data = ...;
for (Iterator i = data.iterator(); i.hasNext(); ) {
Object element = i.next();
if (!(...)) {
i.remove();
}
}
First, I'd make sure that this really is a performance bottleneck, otherwise I'd go with the solution that is cleanest and most expressive.
If it IS a performance bottleneck, just try the different strategies and see what's the quickest. My bet is on creating a new ArrayList and puting the desired objects in that one, discarding the old ArrayList.
Unless you're positive that the issue you're facing is indeed a bottleneck, I would go for the readable
public ArrayList filterThings() {
ArrayList pileOfThings;
ArrayList filteredPileOfThings = new ArrayList();
for (Thing thingy : pileOfThings) {
if (thingy.property != 1) {
filteredPileOfThings.add(thingy);
}
}
return filteredPileOfThings;
}
There is a hidden cost in removing elements from an ArrayList. Each time you delete an element, you need to move the elements to fill the "hole". On average, this will take N / 2 assignments for a list with N elements.
So removing M elements from an N element ArrayList is O(M * N) on average. An O(N) solution involves creating a new list. For example.
List data = ...;
List newData = new ArrayList(data.size());
for (Iterator i = data.iterator(); i.hasNext(); ) {
Object element = i.next();
if ((...)) {
newData.add(element);
}
}
If N is large, my guess is that this approach will be faster than the remove approach for values of M as small as 3 or 4.
But it is important to create newList large enough to hold all elements in list to avoid copying the backing array when it is expanded.
int sizepuede= listaoptionVO.size();
for (int i = 0; i < sizepuede; i++) {
if(listaoptionVO.get(i).getDescripcionRuc()==null){
listaoptionVO.remove(listaoptionVO.get(i));
i--;
sizepuede--;
}
}
edit: added indentation
Maybe Iterator’s remove() method? The JDK’s default collection classes should all creator iterators that support this method.
I have found an alternative faster solution:
int j = 0;
for (Iterator i = list.listIterator(); i.hasNext(); ) {
j++;
if (campo.getNome().equals(key)) {
i.remove();
i = list.listIterator(j);
}
}
With an iterator you can handle always the element which comes to order, not a specified index. So, you should not be troubled with the above matter.
Iterator itr = list.iterator();
String strElement = "";
while(itr.hasNext()){
strElement = (String)itr.next();
if(strElement.equals("2"))
{
itr.remove();
}
Whilst this is counter intuitive this is the way that i sped up this operation by a huge amount.
Exactly what i was doing:
ArrayList < HashMap < String , String >> results; // This has been filled with a whole bunch of results
ArrayList < HashMap < String , String > > discard = findResultsToDiscard(results);
results.removeall(discard);
However the remove all method was taking upwards of 6 seconds (NOT including the method to get the discard results) to remove approximately 800 results from an array of 2000 (ish).
I tried the iterator method suggested by gustafc and others on this post.
This did speed up the operation slightly (down to about 4 seconds) however this was still not good enough. So i tried something risky...
ArrayList < HashMap < String, String>> results;
List < Integer > noIndex = getTheDiscardedIndexs(results);
for (int j = noIndex.size()-1; j >= 0; j-- ){
results.remove(noIndex.get(j).intValue());
}
whilst the getTheDiscardedIndexs save an array of index's rather then an array of HashMaps. This it turns out sped up removing objects much quicker ( about 0.1 of a second now) and will be more memory efficient as we dont need to create a large array of results to remove.
Hope this helps someone.
I'm good with Mnementh's recommentation.
Just one caveat though,
ConcurrentModificationException
Mind that you don't have more than one thread running. This exception could appear if more than one thread executes, and the threads are not well synchronized.
Related
I have an arrayList with 30 elements. I'd like to create many sublists of 15 elements from this list. What's the efficient way of doing so?
Right now I clone the ArrayList and use remove(random) to do it, but I am sure this is too clumsy. What should I do instead?
Does Java have a "sample" function like in R?
Clarification: by sampling with no replacement I mean take at random 15 unique elements from the 30 available in the original list. Moreover I want to be able to do this repeatedly.
Use the Collections#shuffle method to shuffle your original list, and return a list with the first 15 elements.
Consider creating new list and adding random elements from current list instead of copying all elements and removing them.
Another way to do this is to create some kind of View on top of the current list.
Implement an Iterator interface that randomly generates index of element during next operation and retrieves element by index from current list.
No, Java does not have a sample function like in R. However, it is possible to write such a function:
// Samples n elements from original, and returns that list
public <T> static List<T> sample(List<T> original, int n) {
List<T> result = new ArrayList<T>(n);
for (int i = 0; i < original.size(); i++) {
if (result.size() == n)
return result;
if ((n - result.size()) >= (original.size() - i)) {
result.add(original.get(i));
} else if (Math.random() < ((double)n / original.size())) {
result.add(original.get(i));
}
}
return result;
}
This function iterates through original, and copies the current element to result based on a random number, unless we are near enough to the end of original to require copying all the remaining elements (the second if statement in the loop).
This is a basic combinatorics problem. You have 30 elements in your list, and you want to choose 15. If the order matters, you want a permutation, if it doesn't matter, you want a combination.
There are various Java combinatorics samples on the web, and they typically use combinadics. I don't know of any ready made Java libraries, but Apache Math Commons has binomial coefficient support to help you implement combinadics if you go that route. Once you have a sequence of 15 indices from 0 to 29, I'd suggest creating a read-only iterator that you can read the elements from. That way you won't have to create any new lists or copy any references.
Let's say i have a nested List which contains n^2 element in it and also i have a HashMap which contains n^2 keys in it. I want to get common elements between the list and hashmap with using retainAll() function. What is the time complexity of retainAll() in this case?
List<List<Integer> list = new ArrayList<>();
Map<List<Integer>, Integer> hashmap = new HashMap<List<Integer>, Integer>();
list.retainAll(hashmap);
To get the commons,i'm using this loop but it has n^2 complexity.
List<List<Integer>> commons = new ArrayList<>();
for (int i = 0; i < list.size(); i++) {
if (hashmap.get(list.get(i)) != null) {
commons.add(list.get(i));
}
}
Also if you have an algorithm which has better time complexity to get intersection of them, i need it.
EDIT :
Actually the problem is , i have a list of integers size of n . But i divided that list into sublists and sublists count became n^2. Since i want to find out that hashmap contains every sublist as a key in it, i used n^2 in question. But it's too big according to my whole project. I'm searching complexity decreasing ways.
Well, I mean if you have n^2 items in the list then the complexity is O(n^2), although this a really weird thing to say. Usually n is the size of the collection, so the time complexity of retainAll is consider O(n).
It's impossible to have an algorithm that produces a better time complexity for this, as far as I can think of. You have to iterate the list at least once...
What you can do is switch your data structure. Read more: Data structures for fast intersection operations?
I am developing an agent-based model in Java. I have used a profiler to reduce any inefficiencies down to the point that the only thing holding it back is Java's Collections.shuffle().
The agents (they're animals) in my model need to be processed in a random order so that no agent is consistently processed before the others.
I am looking for: Either a faster way to shuffle than Java's Collections.shuffle() or an alternative method of processing the elements in an ArrayList in a randomized order that is significantly faster. If you know of a data structure that would be faster than an ArrayList, by all means please answer. I have considered LinkedList and ArrayDeque, but they aren't making much of a difference.
Currently, I have over 1,000,000 elements in the list I am trying to shuffle. Over time, this amount increases and it is becoming increasingly inefficient to shuffle it.
Is there an alternative data structure or way of randomizing the processing of elements that is faster?
I only need to be able to store elements and process them in a randomized order. I do not use contains or anything more complex than storage and iterating over them.
Here is some sample code to better explain what I am trying to achieve:
UPDATE: Sorry for the ConcurrentModificationException, I didn't realize I had done that and I didn't intend to confuse anyone. Fixed it in the code below.
ArrayList<Agent> list = new ArrayList<>();
void process()
{
list.add(new Agent("Zebra"));
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
ArrayList<Agent> newlist = new ArrayList<>();
Collections.shuffle(list);//Something that will allow the order to be random (random quality does not matter to me), yet faster than a shuffle
for (String str : list)
{
newlist.add(str);
if(r.nextDouble() > 0.99)//1% chance of adding another agent to the list
{
newlist.add(new Agent("Lion"));
}
}
list = newlist;
}
}
ANOTHER UPDATE
I thought about doing list.remove(rando.nextInt(list.size()) but since remove for ArrayLists is O(n) it would be even worse to do that rather than shuffle for such a large list size.
I would use a simple ArrayList and not shuffle it at all. Instead select random list indices to process. To avoid processing a list element twice, I'd remove the processed elements from the list.
Now if the list is very large, removing a random entry itself would be the bottleneck. This can however be avoided easily by removing the last entry instead and moving it into the place the selected entry occupied before:
public String pullRandomElement(List<String> list, Random random) {
// select a random list index
int size = list.size();
int index = random.nextInt(size);
String result = list.get(index);
// move last entry to selected index
list.set(index, list.remove(size - 1));
return result;
}
Needless to say you should chose a list implementation where get(index) and remove(lastIndex) are fast O(1), such as ArrayList. You may also want to add edge case handling (such as list is empty).
You could use this: If you already have the list of items, generate a random according to its size and get nextInt.
ArrayList<String> list = new ArrayList<>();
int sizeOfCollection = list.size();
Random randomGenerator = new Random();
int randomId = randomGenerator.nextInt(sizeOfCollection);
Object x = list.get(randomId);
list.remove(randomId);
Since your code doesn't actually depend on the order of the list, it's enough to shuffle it once at the end of the processing.
void process() {
Random r = new Random();
for (int i = 0; i < 100000; i++) {
for (String str : list) {
if(r.nextDouble() > 0.9) {
list.add(str + str);
}
}
}
Collections.shuffle(list);
}
Though this would still throw a ConcurrentModificationException, like the original code.
Collections.shuffle() uses the modern variant of Fisher-Yates Algorithm:
From https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
To shuffle an array a of n elements (indices 0..n-1):
for i from n − 1 downto 1 do
j ← random integer such that 0 ≤ j ≤ i
exchange a[j] and a[i]
Colections.shuffle convert the list to an array, then does the shuffle, just using random.nextInt() and then copies everything back. (see http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Collections.java#Collections.shuffle%28java.util.List%29)
You make this only faster by avoiding the overhead of copying the array and writing back:
Either write your own implementation of ArrayList where you can directly access the backing array, or access the field "elementData" of your ArrayList via reflection.
Now use the the same algorithm as Collections.shuffle on that array, using the correct size().
This speeds up, because it avoid the copying if the whole array, like Collection.shuffle() does:
The access via reflection needs a bit time, so this solution is faster only for higher number of elements.
I would not reccomend this solution unless you want to win the race, having the fastes shuffle, also by means of execution time.
And as always when comparing speeds, make sure you warm up the VM by running the algorithm to be measured 1000 times before starting the measuring.
According to the documentation, Collections.shuffle() runs in O(N) time.
This method runs in linear time. If the specified list does not implement the RandomAccess interface and is large, this implementation dumps the specified list into an array before shuffling it, and dumps the shuffled array back into the list. This avoids the quadratic behavior that would result from shuffling a "sequential access" list in place.
I recommend you use the public static void shuffle(List<?> list, Random rnd) overload, although the performance benefit will probably be negligible.
Improving the performance will be difficult unless you allow some bias, such as with partial shuffling (only a segment of the list gets re-shuffled each time) or under-shuffling. Under-shuffling means writing your own Fisher-Yates routine and skipping certain list indices during the reverse traversal; for example, you could skip all odd indices. However the end of your list would receive less shuffling than the front which is another form of bias.
If you had a fixed list size M, you might consider caching some large number N of different fixed index permutations (0 to M-1 in random order) in memory at application startup. Then you could just randomly select one of these pre-orderings whenever you iterate the collection and just iterate according to that particular previously defined permutation. If N were large (say 1000 or more), the overall bias would be small (and also relatively uniform) and would be very fast. However you noted your list slowly grows, so this approach wouldn't be viable.
I am having 2 Lists and want to add them element by element. Like that:
Is there an easier way and probably much more well performing way than using a for loop to iterate over the first list and add it to the result list?
I appreciate your answer!
Depends on what kind of list and what kind of for loop.
Iterating over the elements (rather than indices) would almost certainly be plenty fast enough.
On the other hand, iterating over indices and repeatedly getting the element by index could work rather poorly for certain types of lists (e.g. a linked list).
My understanding is that you have List1 and List2 and that you want to find the best performing way to find result[index] = List1[index] + list2[index]
My main suggestion is that before you start optimising for performance is to measure whether you need to optimise at all. You can iterate through the lists as you said, something like:
for(int i = 0; i < listSize; i++)
{
result[i] = List1[i] + List2[i];
}
In most cases this is fine. See NPE's answer for a description of where this might be expensive, i.e. a linked list. Also see this answer and note that each step of the for loop is doing a get - on an array it is done in 1 step, but in a linked list it is done in as many steps at it takes to iterate to the element in the list.
Assuming a standard array, this is O(n) and (depending on array size) will be done so quickly that it will hardly result in a blip on your performance profiling.
As a twist, since the operations are completely independent, that is result[0] = List1[0] + List2[0] is independent of result[1] = List1[1] + List2[1], etc, you can run these operations in parallel. E.g. you could run the first half of the calculations (<= List.Size / 2) on one thread and the other half (> List.Size / 2) on another thread and expect the elapsed time to roughly halve (assuming at least 2 free CPUs). Now, the best number of threads to use depends on the size of your data, the number of CPUs, other operations happening at the same time and is normally best decided by testing and modeling under different conditions. All this adds complexity to your program, so my main recommendation is to start simple, then measure and then decide whether you need to optimise.
Looping is inevitable except you have a matrix API (e.g. OpenGL). You could implement a List<Integer> which is backed by the original Lists:
public class CalcList implements List<Integer> {
private List<Integer> l1, l2;
#Override
public int get(int index) {
return l1.get(index) + l2.get(index);
}
}
This avoids copy operations and moves the calculations at the end of your stack:
CalcList<Integer> results1 = new CalcList(list, list1);
CalcList<Integer> results2 = new CalcList(results1, list3);
// no calculation or memory allocated until now.
for (int result : results2) {
// here happens the calculation, still without unnecessary memory
}
This could give an advantage if the compiler is able to translate it into:
for (int i = 0; i < list1.size; i++) {
int result = list1[i] + list2[i] + list3[i] + …;
}
But I doubt that. You have to run a benchmark for your specific use case to find out if this implementation has an advantage.
Java doesn't come with a map style function, so the the way of doing this kind of operation is using a for loop.
Even if you use some other construct, the looping will be done anyway. An alternative is using the GPU for computations but this is not a default Java feature.
Also using arrays should be faster than operating with linked lists.
I am trying to make a remove method that works on an array implementation of a list.
Can I set the the duplicate element to null to remove it? Assuming that the list is in order.
ArrayList a = new ArrayList[];
public void removeduplicates(){
for(a[i].equals(a[i+1]){
a[i+1] = null;
}
a[i+1] = a[i];
}
No you can't remove an element from an array, as in making it shorter. Java arrays are fixed-size. You need to use an ArrayList for that.
If you set an element to null, the array will still have the same size, but with a null reference at that point.
// Let's say a = [0,1,2,3,4] (Integer[])
a[2] = null;
// Now a = [0,1,null,3,4]
Yes, you can set elements in an array to null, but code like a[i].equals(a[i+1]) will fail with a NullPointerException if the array contains nulls, so you just have to be more careful if you know that your array may contain nulls. It also doesn't change the size of the array so you will be wasting memory if you remove large numbers of elements. Fixed size arrays are generally not a good way to store data if you are often adding and removing elements - as you can guess from their name.
Can I set the the duplicate element to null to remove it?
You can set an element of the array null but this doesn't remove the element of the array... it just set the element to null (I feel like repeating the first sentence).
You should return a cleaned copy of the array instead. One way to do this would be to use an intermediary java.util.Set:
String[] data = {"A", "C", "B", "D", "A", "B", "E", "D", "B", "C"};
// Convert to a list to create a Set object
List<String> list = Arrays.asList(data);
Set<String> set = new HashSet<String>(list);
// Create an array to convert the Set back to array.
String[] result = new String[set.size()];
set.toArray(result);
Or maybe just use a java.util.Set :)
Is this a homework question?
Your problem is analogous to the stream processing program uniq: Preserve -- by way of copying -- any element that doesn't match the one before it. It only removes all duplicates if the sequence is sorted. Otherwise, it only removes contiguous duplicates. That means you need to buffer at most one element (even if by reference) to use as a comparison predicate when deciding whether to keep an element occurring later in the sequence.
The only special case is the first element. As it should never match any preceding element, you can try to initialize your buffered "previous" element to some value that's out of the domain of the sequence type, or you can special-case your iteration with a "first element" flag or explicitly copy the first element outside the iteration -- minding the case where the sequence is empty, too.
Note that I did not propose you provide this operation as a destructive in-place algorithm. That would only be appropriate with a structure like a linked list with constant-time overhead for removing an element. As others note here, removing an element from an array or vector involves shuffling down successor elements to "fill the hole", which is of time complexity n in the number of successors.
The straight-forward answer to your question is that setting an array or ArrayList element to null gives you a null entry in the array or ArrayList. This is not the same thing as removing the element. If just means that a[i] or a.get(i) will return null rather than the original element.
The code in the question is garbled. If you are going to use an ArrayList, the simplisitic solution would be something like this:
ArrayList a = new ArrayList();
public void removeduplicates() {
for (int i = 0; i < a.size() - 1; ) {
if (a.get(i).equals(a.get(i + 1)) {
a.remove(i);
} else {
i++;
}
}
}
but in the worst case, that is O(N**2) because each call to remove copies all elements at indexes greater than the current value of i.
If you want to improve the performance, do something like this:
public ArrayList removeduplicates() {
ArrayList res = new ArrayList(a.size());
if (a.size() == 0) {
return res;
}
res.add(a.get(0));
for (int i = 1; i < a.size(); i++) {
if (!a.get(i - 1).equals(a.get(i)) {
res.add(a.get(i));
}
}
return res;
}
(This is a quick hack. I'm sure it could be tidied up.)
Your code example was quite confusing. With ArrayList[] you showed an array of ArrayList objects.
Assuming that you're talking about just the java.util.ArrayList, then the most easy way to get rid of duplicates is to use a java.util.Set instead, as mentioned by others. If you really want to have, startwith, or end up with a List for some reasons then do:
List withDuplicates = new ArrayList() {{ add("foo"); add("bar"); add("waa"); add("foo"); add("bar"); }}; // Would rather have used Arrays#asList() here, but OK.
List withoutDuplicates = new ArrayList(new LinkedHashSet(withDuplicates));
System.out.println(withoutDuplicates); // [foo, bar, waa]
The LinkedHashSet is chosen here because it maintains the ordering. If you don't worry about the ordering, a HashSet is faster. But if you actually want to have it sorted, a TreeSet may be more of value.
On the other hand, if you're talking about a real array and you want to filter duplicates out of this without help of the (great) Collections framework, then you'd need to create another array and add items one by one to it while you check if the array doesn't already contain the to-be-added item. Here's a basic example (without help of Arrays.sort() and Arrays.binarySearch() which would have eased the task more, but then you would end up with a sorted array):
String[] array1 = new String[] {"foo", "bar", "foo", "waa", "bar"};
String[] array2 = new String[0];
loop:for (String array1item : array1) {
for (String array2item : array2) {
if (array1item.equals(array2item)) {
continue loop;
}
}
int length = array2.length;
String[] temp = new String[length + 1];
System.arraycopy(array2, 0, temp, 0, length);
array2 = temp;
array2[length] = array1item;
}
System.out.println(Arrays.toString(array2)); // [foo, bar, waa]
Hope this gives new insights.
If you are implementing your own list and you have decide to use a basic primitives storage mechanism. So using an array (rather than an arraylist) could be where you start.
For a simple implementation, your strategy should consider the following.
Decide how to expand your list. You could instantiate data blocks of 200 cells at a time. You would only use 199 because you might want to use the last cell to store the next allocation block.
Such linked list are horrible so you might decide to use a master block to store all the instances of blocks. You instantiate a master block of size 100. You start with one data block of 200 and store its ref in master[0]. As the list grows in size, you progressively store the ref of each new data blocks in master[1] .... master[99] and then you might have to recreate the master list to store 200 references.
For the reason of efficiency, when you delete a cell, you should not actually exterminate it immediately. You let it hang around until enough deletion has occurred for you to recreate the block.
You need to somehow flag a cell has been deleted. So the answer is obvious, of course you can set it to null because you are the king, the emperor, the dictator who decides how a cell is flagged as deleted. Using a null is a great and usual way. If you use null, then you have to disallow nulls from being inserted as data into your list class. You would have to throw an exception when such an attempt is made.
You have to design and write a garbage collection routine and strategy to compact the list by recreating blocks to remove nullified cells en mass. The JVM would not know those are "deleted" data.
You need a register to count the number of deletions and if that threshold is crossed, garbage collection would kick in. Or you have the programmer decide to invoke a compact() method. Because if deletions are sparse and distributed across various data blocks, might as well leave the null/deleted cells hang around. You could only merge adjacent blocks and only if the sum of holes in both blocks count up to 200, obviously.
Perhaps, when data is appended to a list, you deliberately leave null holes in between the data. It's like driving down the street and you see house addresses incremented by ten because the the city has decided that if people wish to build new houses in between existing houses. In that way you don't have to recreate and split a block every time an insertion occurs.
Therefore, the answer is obvious to yourself, of course you can write null to signify a cell is deleted, because it is your strategy in managing the list class.
No, an array element containing a null is still there, it just doesn't contain any useful value.
You could try moving every element from further down in the list up by 1 element to fill the gap, then you have a gap at the end of the array - the array will not shrink from doing this!
If you're doing this a lot, you can use System.arraycopy() to do this packing operation quickly.
Use ArrayList.remove(int index).
if(a[i].equals(foo()))
a.remove(i)
But be careful when using for-loops and removing objects in arrays.
http://java.sun.com/j2se/1.3/docs/api/java/util/ArrayList.html